UCLA STAT 100A Introduction to Probability

UCLA STAT 100A Introduction to Probability

Fendrich, Adam, Midday Host has reference to this Academic Journal, PHwiki organized this Journal UCLA STAT 100A Introduction to Probability Instructor: Ivo Dinov, Asst. Prof. In Statistics in addition to Neurology Teaching Assistants: Romeo Maciuca, UCLA Statistics University of Cali as long as nia, Los Angeles, Fall 2002 http://www.stat.ucla.edu/~dinov/ Statistics Online Compute Resources http://www.stat.ucla.edu/~dinov/courses-students.dir/Applets.dir/OnlineResources.html Interactive Normal Curve Online Calculators as long as Binomial, Normal, Chi-Square, F, T, Poisson, Exponential in addition to other distributions Galton’s Board or Quincunx Chapter 8: Limit Theorems Parameters in addition to Estimates Sampling distributions of the sample mean Central Limit Theorem (CLT) Markov Inequality Chebychevs ineqiality Weak & Strong Law of Large Numbers (LLN)

This Particular University is Related to this Particular Journal

Basic Laws Basic Laws The first two inequalities specify loose bounds on probabilities knowing only µ (Markov) or µ in addition to s (Chebyshev), when the distribution is not known. They are also used to prove other limit results, such as LLN. The weak LLN provides a convenient way to evaluate the convergence properties of estimators such as the sample mean. For any specific n, (X1+ X2+ + Xn)/n is likely to be near m. However, it may be the case that as long as all k>n (X1+ X2+ + Xk)/k is far away from m. Basic Laws The strong LLN version of the law of large numbers assures convergence as long as individual realizations. Strong LLN says that as long as any e>0, with probability 1 may be larger than e only a finite number of times.

Basic Laws – Examples The weak LLN – Based on past experience, the mean test score is µ=70 in addition to the variance in the test scores is s2=10. Twenty five students, n =25, take the present final. Determine the probability that the average score of the twenty five students will between 50 in addition to 90. Basic Laws – Examples The strong LLN – Based on past experience, the mean test score is µ=70 in addition to the variance in the test scores is s2=10. n=1,000 students take the present final. Determine the probability that the average score of the twenty five students will between 50 in addition to 90. Parameters in addition to estimates A parameter is a numerical characteristic of a population or distribution An estimate is a quantity calculated from the data to approximate an unknown parameter Notation Capital letters refer to r in addition to om variables Small letters refer to observed values

Questions What are two ways in which r in addition to om observations arise in addition to give examples. (r in addition to om sampling from finite population  r in addition to omized scientific experiment; r in addition to om process producing data.) What is a parameter Give two examples of parameters. (characteristic of the data  mean, 1st quartile, std.dev.) What is an estimate How would you estimate the parameters you described in the previous question What is the distinction between an estimate (p^ value calculated as long as m obsd data to approx. a parameter) in addition to an estimator (P^ abstraction the the properties of the ransom process in addition to the sample that produced the estimate) Why is this distinction necessary (effects of sampling variation in P^) The sample mean has a sampling distribution Sampling batches of Scottish soldiers in addition to taking chest measurements. Population m = 39.8 in, in addition to s = 2.05 in. Sample number Chest measurements 12 samples of size 6 Twelve samples of size 24 Sample number 12 samples of size 24 Chest measurements

Histograms from 100,000 samples, n=6, 24, 100 What do we see! 1.R in addition to om nature of the means: individual sample means vary significantly 2. Increase of sample-size decreases the variability of the sample means! Mean in addition to SD of the sampling distribution E(sample mean) = Population mean Review We use both in addition to to refer to a sample mean. For what purposes do we use the as long as mer in addition to as long as what purposes do we use the latter What is meant by the sampling distribution of  (sampling variation  the observed variability in the process of taking r in addition to om samples; sampling distribution  the real probability distribution of the r in addition to om sampling process) How is the population mean of the sample average related to the population mean of individual observations (E( ) = Population mean)

Review How is the population st in addition to ard deviation of related to the population st in addition to ard deviation of individual observations ( SD( ) = (Population SD)/sqrt(sample-size) ) What happens to the sampling distribution of if the sample size is increased ( variability decreases ) What does it mean when is said to be an unbiased estimate of m (E( ) = m. Are Y^= ¼ Sum, or Z^ = ¾ Sum unbiased) If you sample from a Normal distribution, what can you say about the distribution of ( Also Normal ) Review Increasing the precision of as an estimator of m is equivalent to doing what to SD( ) (decreasing) For the sample mean calculated from a r in addition to om sample, SD( ) = . This implies that the variability from sample to sample in the sample-means is given by the variability of the individual observations divided by the square root of the sample-size. In a way, averaging decreases variability. Central Limit Effect  Histograms of sample means Triangular Distribution Sample means from sample size n=1, n=2, 500 samples Area = 1 2 1 0 2 1 0 2 1 0 Y=2 X

Central Limit Effect – Histograms of sample means Triangular Distribution Sample sizes n=4, n=10 Central Limit Effect  Histograms of sample means Area = 1 Uni as long as m Distribution Sample means from sample size n=1, n=2, 500 samples Y = X Central Limit Effect – Histograms of sample means Uni as long as m Distribution Sample sizes n=4, n=10

Central Limit Effect  Histograms of sample means Sample means from sample size n=1, n=2, 500 samples Area = 1 Exponential Distribution Central Limit Effect – Histograms of sample means Exponential Distribution Sample sizes n=4, n=10 Central Limit Effect  Histograms of sample means Sample means from sample size n=1, n=2, 500 samples Quadratic U Distribution Area = 1

Central Limit Effect – Histograms of sample means Quadratic U Distribution Sample sizes n=4, n=10 Central Limit Theorem  heuristic as long as mulation Central Limit Theorem: When sampling from almost any distribution, is approximately Normally distributed in large samples. CLT Applet Demo Central Limit Theorem  theoretical as long as mulation Let be a sequence of independent observations from one specific r in addition to om process. Let in addition to in addition to in addition to both are finite ( ). If , sample-avg, Then has a distribution which approaches N(m, s2/n), as .

Review What does the central limit theorem say Why is it useful (If the sample sizes are large, the mean in Normally distributed, as a RV) In what way might you expect the central limit effect to differ between samples from a symmetric distribution in addition to samples from a very skewed distribution (Larger samples as long as non-symmetric distributions to see CLT effects) What other important factor, apart from skewness, slows down the action of the central limit effect (Heavyness in the tails of the original distribution.) Review When you have data from a moderate to small sample in addition to want to use a normal approximation to the distribution of in a calculation, what would you want to do be as long as e having any faith in the results (30 or more as long as the sample-size, depending on the skewness of the distribution of X. Plot the data – non-symmetry in addition to heavyness in the tails slows down the CLT effects). Take-home message: CLT is an application of statistics of paramount importance. Often, we are not sure of the distribution of an observable process. However, the CLT gives us a theoretical description of the distribution of the sample means as the sample-size increases (N(m, s2/n)). For the sample mean calculated from a r in addition to om sample, SD( ) = . This implies that the variability from sample to sample in the sample-means is given by the variability of the individual observations divided by the square root of the sample-size. In a way, averaging decreases variability. Recall that as long as known SD(X)=s, we can express the SD( ) = . How about if SD(X) is unknown! The st in addition to ard error of the mean  remember

Students t-distribution cont. For r in addition to om samples from a Normal distribution, is exactly distributed as Student(df = n – 1), but methods we shall base upon this distribution as long as T work well even as long as small samples sampled from distributions which are quite non-Normal. By (prob), we mean the number t such that when T ~ Student(df), pr(T t) = prob; that is, the tail area above t (that is to the right of t on the graph) is prob. CLT Example  CI shrinks by half by quadrupling the sample size! If I ask 30 of you the question Is 5 credit hour a reasonable load as long as Stat13, in addition to say, 15 (50%) said no. Should we change the as long as mat of the class Not really  the 2SE interval is about [0.32 ; 0.68]. So, we have little concrete evidence of the proportion of students who think we need a change in Stat 13 as long as mat, If I ask all 300 Stat 13 students in addition to 150 say no (still 50%), then 2SE interval around 50% is: [0.44 ; 0.56]. So, large sample is much more useful in addition to this is due to CLT effects, without which, we have no clue how useful our estimate actually is