Stat Review - Keller
Stat Review - Keller
Stat Review - Keller
Statisticsisawaytogetinformationfromdata
Statistics
Data Information
Data:Facts,especially Information:Knowledge
numericalfacts,collected communicatedconcerning
togetherforreferenceor someparticularfact.
information.
Statisticsisatoolforcreatingnewunderstandingfromasetof
numbers.
Definitions:OxfordEnglishDictionary
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Key Statistical Concepts
Population
apopulationisthegroupofallitemsofinterestto
astatisticspractitioner.
frequentlyverylarge;sometimesinfinite.
E.g.All5millionFloridavoters,perExample12.5
Sample
Asampleisasetofdatadrawnfromthe
population.
Potentiallyverylarge,butlessthanthepopulation.
E.g.asampleof765votersexitpolledonelectionday.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Key Statistical Concepts
Parameter
Adescriptivemeasureofapopulation.
Statistic
Adescriptivemeasureofasample.
Subset
Statistic
Parameter
PopulationshaveParameters,
SampleshaveStatistics.
DescriptiveStatisticshelpstoanswerthesequestions
Sample
Inference
Statistic
Parameter
WhatcanweinferaboutaPopulationsParameters
basedonaSamplesStatistics?
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Definitions
Avariableissomecharacteristicofapopulationorsample.
E.g.studentgrades.
Typicallydenotedwithacapitalletter:X,Y,Z
Thevaluesofthevariablearetherangeofpossiblevalues
foravariable.
E.g.studentmarks(0..100)
Dataaretheobservedvaluesofavariable.
E.g.studentmarks:{67,74,71,83,93,55,48}
ArithmeticoperationscanbeperformedonIntervalData,
thusitsmeaningfultotalkabout2*Height,orPrice+$1,
andsoon.
Becausethenumbersarearbitraryarithmeticoperations
dontmakeanysense(e.g.doesWidowed2=Married?!)
Nominaldataarealsocalledqualitativeorcategorical.
E.g.Collegecourseratingsystem:
poor=1,fair=2,good=3,verygood=4,excellent=5
Whileitsstillnotmeaningfultodoarithmeticonthisdata
(e.g.does2*fair=verygood?!),wecansaythingslike:
excellent > poororfair < very good
Thatis,orderismaintainednomatterwhatnumericvalues
areassignedtoeachcategory.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Graphical & Tabular Techniques for Nominal
Data
Theonlyallowablecalculationonnominaldataistocount
thefrequencyofeachvalueofthevariable.
Wecansummarizethedatainatablethatpresentsthe
categoriesandtheircountscalledafrequencydistribution.
Arelativefrequencydistributionliststhecategoriesandthe
proportionwithwhicheachoccurs.
RefertoExample2.1
BarChartsareoftenusedtodisplayfrequencies
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Nominal Data
Itallthesameinformation,
(basedonthesamedata).
Justdifferentpresentation.
Themostimportantofthesegraphicalmethodsisthe
histogram.
Thehistogramisnotonlyapowerfulgraphicaltechnique
usedtosummarizeintervaldata,butitisalsousedtohelp
explainprobabilities.
Wecreateanogiveinthreesteps
1)Calculaterelativefrequencies.
2)Calculatecumulativerelativefrequenciesbyaddingthe
currentclassrelativefrequencytothepreviousclass
cumulativerelativefrequency.
(Forthefirstclass,itscumulativerelativefrequencyisjustitsrelativefrequency)
firstclass
nextclass:.355+.185=.540
:
:
lastclass:.930+.070=1.00
around $35
(ReferalsotoFig.2.13inyourtextbook)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Scatter Diagram
Example2.9Arealestateagentwantedtoknowtowhat
extentthesellingpriceofahomeisrelatedtoitssize
1) Collectthedata
2) Determinetheindependentvariable(Xhousesize)and
thedependentvariable(Ysellingprice)
3) UseExceltocreateascatterdiagram
Observationsmeasuredatsuccessivepointsintimeare
calledtimeseriesdata.
Timeseriesdatagraphedonalinechart,whichplotsthe
valueofthevariableontheverticalaxisagainstthetime
periodsonthehorizontalaxis.
MeasuresofVariability
Range,StandardDeviation,Variance,CoefficientofVariation
MeasuresofRelativeStanding
Percentiles,Quartiles
MeasuresofLinearRelationship
Covariance,Correlation,LeastSquaresLine
Itiscomputedbysimplyaddingupalltheobservationsand
dividingbythetotalnumberofobservations:
SampleMean
PopulationMean
Population Sample
Size N n
Mean
isseriouslyaffectedbyextremevaluescalledoutliers.
E.g.assoonasabillionairemovesintoaneighborhood,the
averagehouseholdincomeincreasesbeyondwhatitwas
previously!
Range=LargestobservationSmallestobservation
E.g.
Data:{4,4,4,4,50} Range=46
Data:{4,8,15,24,39,50} Range=46
Therangeisthesameinbothcases,
butthedatasetshaveverydifferentdistributions
Size N n
Mean
Variance
Thevarianceofapopulationis:
population size
sample mean
Thevarianceofasampleis:
Whatarewelookingtocalculate?
Thefollowingsampleconsistsofthenumberofjobssix
randomlyselectedstudentsappliedfor:17,15,23,7,9,13.
Findsitsmeanandvariance.
asopposedtoor2
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Mean & Variance
SampleMean
SampleVariance
SampleVariance(shortcutmethod)
Populationstandarddeviation:
Samplestandarddeviation:
Yougetmore
consistent
distancewiththe
newclub.
Theproportionofobservationsinanysamplethatlie
withinkstandarddeviationsofthemeanisatleast:
For k=2 (say), the theorem
states that at least 3/4 of all
observations lie within 2
standard deviations of the
mean. This is a lower bound
compared to Empirical Rules
approximation (95%).
Wendysservicetimeis
shortestandleastvariable.
Hardeeshasthegreatest
variability,whileJackin
theBoxhasthelongest
servicetimes.
Sampling(i.e.selectingasubsetofawholepopulation)is
oftendoneforreasonsofcost(itslessexpensivetosample
1,000televisionviewersthan100millionTVviewers)and
practicality(e.g.performingacrashtestonevery
automobileproducedisimpractical).
Inanycase,thesampledpopulationandthetarget
populationshouldbesimilartooneanother.
Wewillfocusourattentiononthesethreemethods:
SimpleRandomSampling,
StratifiedRandomSampling,and
ClusterSampling.
Drawingthreenamesfromahatcontainingallthenamesof
thestudentsintheclassisanexampleofasimplerandom
sample:anygroupofthreenamesisasequallylikelyas
pickinganyothergroupofthreenames.
Thismethodisusefulwhenitisdifficultorcostlytodevelop
acompletelistofthepopulationmembersorwhenthe
populationelementsarewidelydispersedgeographically.
Clustersamplingmayincreasesamplingerrordueto
similaritiesamongclustermembers.
Anotherwaytolookatthisis:thedifferencesinresultsfor
differentsamples(ofthesamesize)isduetosamplingerror:
E.g.Twosamplesofsize10of1,000households.Ifwe
happenedtogetthehighestincomeleveldatapointsinour
firstsampleandallthelowestincomelevelsinthesecond,
thisdeltaisduetosamplingerror.
Errorsindataacquisition,
Nonresponseerrors,and
Selectionbias.
Note:increasingthesamplesizewillnotreducethistypeof
error.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Approaches to Assigning
Probabilities
Therearethreewaystoassignaprobability,P(Oi),toan
outcome,Oi,namely:
Classicalapproach:makecertainassumptions(suchas
equallylikely,independence)aboutsituation.
Relativefrequency:assigningprobabilitiesbasedon
experimentationorhistoricaldata.
Subjectiveapproach:Assigningprobabilitiesbasedonthe
assignorsjudgment.
Ifarandomexperimentisrepeatedaninfinitenumberof
times,therelativefrequencyforanygivenoutcomeisthe
probabilityofthisoutcome.
Forexample,theprobabilityofheadsinflipofabalanced
coinis.5,determinedusingtheclassicalapproach.The
probabilityisinterpretedasbeingthelongtermrelative
frequencyofheadsifthecoinisflippedaninfinitenumber
oftimes.
ConditionalprobabilitiesarewrittenasP(A|B)andreadas
theprobabilityofAgivenBandiscalculatedas:
Inparticular,wewouldliketoknowwhethertheyare
independent,thatis,iftheprobabilityofoneeventisnot
affectedbytheoccurrenceoftheotherevent.
TwoeventsAandBaresaidtobeindependentif
P(A|B)=P(A)
or
P(B|A)=P(B)
Thecomplementrulegivesustheprobabilityofanevent
NOToccurring.Thatis:
P(AC)=1P(A)
Forexample,inthesimplerollofadie,theprobabilityofthe
number1beingrolledis1/6.Theprobabilitythatsome
numberotherthan1willberolledis11/6=5/6.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Multiplication Rule
Themultiplicationruleisusedtocalculatethejoint
probabilityoftwoevents.Itisbasedontheformulafor
conditionalprobabilitydefinedearlier:
IfwemultiplybothsidesoftheequationbyP(B)wehave:
P(AandB)=P(A|B)P(B)
Likewise,P(AandB)=P(B|A)P(A)
IfAandBareindependentevents,thenP(AandB)=P(A)P(B)
P(AorB)=P(A)+P(B)P(AandB)
WhydowesubtractthejointprobabilityP(AandB)from
thesumoftheprobabilitiesofAandB?
P(AorB)=P(A)+P(B)P(AandB)
P(AandB)=0
Theadditionruleformutuallyexclusiveeventsis
P(AorB)=P(A)+P(B)
Weoftenusethisformwhenweaddsomejointprobabilities
calculatedfromaprobabilitytree
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Two Types of Random Variables
DiscreteRandomVariable
onethattakesonacountablenumberofvalues
E.g.valuesontherollofdice:2,3,4,,12
ContinuousRandomVariable
onewhosevaluesarenotdiscrete,notcountable
E.g.time(30.1minutes?30.10000001minutes?)
Analogy:
IntegersareDiscrete,whileRealNumbersareContinuous
2. E(X+c)=E(X)+c
3. E(cX)=cE(X)
Wecanpullaconstantoutoftheexpectedvalueexpression
(eitheraspartofasumwitharandomvariableXorasacoefficient
ofrandomvariableX).
2. V(X+c)=V(X)
Thevarianceofarandomvariableandaconstantisjustthe
varianceoftherandomvariable(per1above).
3. V(cX)=c2V(X)
Thevarianceofarandomvariableandaconstantcoefficientis
thecoefficientsquaredtimesthevarianceoftherandomvariable.
1. Fixednumberoftrials,representedasn.
2. Eachtrialhastwopossibleoutcomes,asuccessanda
failure.
3. P(success)=p(andthus:P(failure)=1p),foralltrials.
4. Thetrialsareindependent,whichmeansthatthe
outcomeofonetrialdoesnotaffecttheoutcomesofany
othertrials.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Binomial Random Variable
Thebinomialrandomvariablecountsthenumberof
successesinntrialsofthebinomialexperiment.Itcantake
onvaluesfrom0,1,2,,n.Thus,itsadiscreterandom
variable.
Tocalculatetheprobabilityassociatedwitheachvaluewe
usecombintorics:
forx=0,1,2,,n
P(X4)=.967
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Binomial Table
WhatistheprobabilitythatPatgetstwoanswerscorrect?
i.e.whatisP(X=2),givenP(success)=.20andn=10?
P(X=2)=P(X2)P(X1)=.678.376=.302
remember, the table shows cumulative probabilities
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
=BINOMDIST() Excel Function
ThereisabinomialdistributionfunctioninExcelthatcan
alsobeusedtocalculatetheseprobabilities.Forexample:
WhatistheprobabilitythatPatgetstwoanswerscorrect?
# successes
# trials
P(success)
cumulative
(i.e. P(Xx)?)
P(X=2)=.3020
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
=BINOMDIST() Excel Function
ThereisabinomialdistributionfunctioninExcelthatcan
alsobeusedtocalculatetheseprobabilities.Forexample:
WhatistheprobabilitythatPatfailsthequiz?
# successes
# trials
P(success)
cumulative
(i.e. P(Xx)?)
P(X4)=.9672
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Binomial Distribution
Asyoumightexpect,statisticianshavedevelopedgeneral
formulasforthemean,variance,andstandarddeviationofa
binomialrandomvariable.Theyare:
E.g.Onaverage,96trucksarriveatabordercrossing
everyhour. time
period
E.g.Thenumberoftypographicerrorsinanewtextbook
editionaverages1.5per100pages.
successes
interval
(?!)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Poisson Probability Distribution
TheprobabilitythataPoissonrandomvariableassumesa
valueofxisgivenby:
andeisthenaturallogarithmbase.
FYI:
Thatis,whatisP(X=0)giventhat=1.5?
Thereisabouta22%chanceoffindingzeroerrors
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Poisson Distribution
AsmentionedonthePoissonexperimentslide:
Theprobabilityofasuccessisproportionaltothesizeof
theinterval
Thus,knowinganerrorrateof1.5typosper100pages,we
candetermineameanvaluefora400pagebookas:
=1.5(4)=6typos/400pages.
P(X=0)=
thereisaverysmallchancetherearenotypos
Thus,wecandeterminetheprobabilityofarangeofvalues
only.
E.g.withadiscreterandomvariableliketossingadie,itis
meaningfultotalkaboutP(X=5),say.
Inacontinuoussetting(e.g.withtimeasarandomvariable),the
probabilitytherandomvariableofinterest,saytasklength,takes
exactly5minutesisinfinitesimallysmall,henceP(X=5)=0.
ItismeaningfultotalkaboutP(X5).
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Probability Density Function
Afunctionf(x)iscalledaprobabilitydensityfunction(over
therangeaxbifitmeetsthefollowing
requirements:
1) f(x)0forallxbetweenaandb,and
f(x)
area=1
a b x
2) Thetotalareaunderthecurvebetweenaandbis1.0
Itlookslikethis:
Bellshaped,
Symmetricalaroundthemean
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Normal Distribution
Importantthingstonote:
Thenormaldistributionisfullydefinedbytwoparameters:
itsstandarddeviationandmean
Thenormaldistributionisbellshapedand
symmetricalaboutthemean
Unliketherangeoftheuniformdistribution(axb)
Normaldistributionsrangefromminusinfinitytoplusinfinity
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Standard Normal Distribution
Anormaldistributionwhosemeaniszeroandstandard
deviationisoneiscalledthestandardnormaldistribution.
0
1
Asweshallseeshortly,anynormaldistributioncanbe
convertedtoastandardnormaldistributionwithsimple
algebra.Thismakescalculationsmucheasier.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Calculating Normal Probabilities
Wecanusethefollowingfunctiontoconvertanynormal
randomvariabletoastandardnormalrandomvariable
Some advice:
always draw a
picture!
0
Whatistheprobabilitythatacomputerisassembledina
timebetween45and60minutes?
Algebraicallyspeaking,whatisP(45<X<60)?
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Calculating Normal Probabilities
P(45<X<60)?
meanof50minutesanda
standarddeviationof10minutes
WecanbreakupP(.5<Z<1)into:
P(.5<Z<0)+P(0<Z<1)
Thedistributionissymmetricaroundzero,sowehave:
P(.5<Z<0)=P(0<Z<.5)
Hence:P(.5<Z<1)=P(0<Z<.5)+P(0<Z<1)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Calculating Normal Probabilities
HowtouseTable3
ThistablegivesprobabilitiesP(0<Z<z)
Firstcolumn=integer+firstdecimal
Toprow=seconddecimalplace
P(0<Z<0.5)
P(0<Z<1)
P(.5<Z<1)=.1915+.3414=.5328
0 1.6
P(Z > 1.6) = .5 P(0 < Z < 1.6)
= .5 .4452
= .0548
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Using the Normal Table (Table 3)
WhatisP(Z<2.23)?
P(0 < Z < 2.23)
-2.23 0 2.23
P(Z < -2.23) = P(Z > 2.23)
= .5 P(0 < Z < 2.23)
= .0129
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Using the Normal Table (Table 3)
WhatisP(Z<1.52)?
0 1.52
P(Z < 1.52) = .5 + P(0 < Z < 1.52)
= .5 + .4357
= .9357
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Using the Normal Table (Table 3)
WhatisP(0.9<Z<1.9)?
P(0 < Z < 0.9)
0 0.9 1.9
P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)
=.4713 .3159
= .1554
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Finding Values of Z
OtherZvaluesare
Z.05=1.645
Z.01=2.33
P(1.96<Z<1.96)=.95
Similarly
P(1.645<Z<1.645)=.90
StudenttDistribution,
ChiSquaredDistribution,and
FDistribution.
(nu)iscalledthedegreesoffreedom,and
(Gammafunction)is(k)=(k1)(k2)(2)(1)
Figure 8.24
Asthenumberofdegreesoffreedomincreases,thet
distributionapproachesthestandardnormaldistribution.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Determining Student t Values
Thestudenttdistributionisusedextensivelyinstatistical
inference.Table4inAppendixBlistsvaluesof
Thatis,valuesofaStudenttrandomvariablewithdegrees
offreedomsuchthat:
ThevaluesforAarepredetermined
criticalvalues,typicallyinthe
10%,5%,2.5%,1%and1/2%range.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Using the t table (Table 4) for
values
Forexample,ifwewantthevalueoftwith10degreesof
freedomsuchthattheareaundertheStudenttcurveis.05:
Area under the curve value (tA) : COLUMN
t.05,10
t.05,10=1.812
F>0.Twoparametersdefinethisdistribution,andlike
wevealreadyseentheseareagaindegreesoffreedom.
isthenumeratordegreesoffreedomand
isthedenominatordegreesoffreedom.
Paycloseattentiontotheorderoftheterms!
Sampling Distributions
TheprobabilitydistributionofXis:
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
andthemeanandvariancearecalculatedaswell:
Whilethereare36possiblesamplesofsize2,thereareonly
11valuesfor,andsome(e.g.=3.5)occurmore
frequentlythanothers(e.g.=1).
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sampling Distribution of Two Dice
Thesamplingdistributionofisshownbelow:
6/36
P()
1.0 1/36 5/36
1.5 2/36
2.0 3/36
4/36
P()
2.5 4/36
3.0 5/36
3.5 6/36 3/36
4.0 5/36
4.5 4/36 2/36
5.0 3/36
5.5 2/36
6.0 1/36 1/36
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
withthesamplingdistributionof.
Aswell,notethat:
Thelargerthesamplesize,themorecloselythesampling
distributionofXwillresembleanormaldistribution.
Ifthepopulationisnonnormal,thenXisapproximately
normalonlyforlargervaluesofn.
Inmanypracticalsituations,asamplesizeof30maybe
sufficientlylargetoallowustousethenormaldistribution
asanapproximationforthesamplingdistributionofX.
2.
3.IfXisnormal,Xisnormal.IfXisnonnormal,Xis
approximatelynormalforsufficientlylargesamplesizes.
Note:thedefinitionofsufficientlylargedependsonthe
extentofnonnormalityofx(e.g.heavilyskewed;
multimodal)
Ifacustomerbuysonebottle,whatistheprobabilitythatthe
bottlewillcontainmorethan32ounces?
thereisabouta75%chancethatasinglebottleofsoda
containsmorethan32oz.
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?
Thingsweknow:
1) Xisnormallydistributed,thereforesowillX.
2) =32.2oz.
3)
Thereisabouta91%chancethemeanofthefourbottles
willexceed32oz.
what is the probability that one what is the probability that the
bottle will contain more than 32 mean of four bottles will exceed 32
ounces? oz?
independentrandomsamplesbedrawnfromeachoftwo
normalpopulations
Ifthisconditionismet,thenthesamplingdistributionofthe
differencebetweenthetwosamplemeans,i.e.
willbenormallydistributed.
(note:ifthetwopopulationsarenotbothnormally
distributed,butthesamplesizesarelarge(>30),the
distributionofisapproximatelynormal)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sampling Distribution: Difference of two
means
Theexpectedvalueandvarianceofthesampling
distributionofaregivenby:
mean:
standarddeviation:
(alsocalledthestandarderrorifthedifferencebetweentwo
means)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Estimation
Therearetwotypesofinference:estimationandhypothesis
testing;estimationisintroducedfirst.
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
E.g.,thesamplemean()isemployedtoestimatethe
populationmean().
Therearetwotypesofestimators:
PointEstimator
IntervalEstimator
pointestimate intervalestimate
Analternativestatementis:
Themeanincomeisbetween380and420$/week.
containsthepopulationmeanis1.Thisisa
confidenceintervalestimatorfor.
Table 10.1
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 10.1
Acomputercompanysamplesdemandduringleadtimeover
25timeperiods:
235 374 309 499 253
421 361 514 462 369
394 439 348 344 330
261 374 302 466 535
386 316 296 332 334
Itsisknownthatthestandarddeviationofdemandoverlead
timeis75computers.Wewanttoestimatethemeandemand
overleadtimewith95%confidenceinordertosetinventory
levels
Inordertouseourconfidenceintervalestimator,weneedthe
followingpiecesofdata:
370.16 Calculatedfromthedata
1.96
75
Given
n 25
therefore:
Thelowerandupperconfidencelimitsare340.76and399.56.
Theestimationforthemeandemandduringleadtimelies
between340.76and399.56wecanusethisasinputin
developinganinventorypolicy.
Thatis,weestimatedthatthemeandemandduringleadtime
fallsbetween340.76and399.56,andthistypeofestimator
iscorrect95%ofthetime.Thatalsomeansthat5%ofthe
timetheestimatorwillbeincorrect.
Incidentally,themediaoftenrefertothe95%figureas19
timesoutof20,whichemphasizesthelongrunaspectof
theconfidencelevel.
Contrastthiswith:a95%confidenceintervalestimateof
startingsalariesbetween$42,000and$45,000.
Thesecondestimateismuchnarrower,providingaccounting
studentsmorepreciseinformationaboutstartingsalaries.
Supposewewanttoestimatethemeandemandtowithin5
units;i.e.wewanttotheintervalestimatetobe:
Since:
Itfollowsthat
Solveforntogetrequisitesamplesize!
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Selecting the Sample Size
Solvingtheequation
thatis,toproducea95%confidenceintervalestimateofthe
mean(5units),weneedtosample865leadtimeperiods
(vs.the25datapointswehavecurrently).
Requiresasamplesizeofatleastthislarge:
Howmanytreesneedtobesampled?
Confidencelevel=99%,therefore=.01
1
Wewant,henceW=1.
Wearegiventhat=6.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 10.2
Wecompute
Thatis,wewillneedtosampleatleast239treestohavea
99%confidenceintervalof 1
Thealternativehypothesisorresearchhypothesisis
H1:Thedefendantisguilty
Thejurydoesnotknowwhichhypothesisistrue.Theymust
makeadecisiononthebasisofevidencepresented.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Nonstatistical Hypothesis Testing
Therearetwopossibleerrors.
ATypeIerroroccurswhenwerejectatruenullhypothesis.
Thatis,aTypeIerroroccurswhenthejuryconvictsan
innocentperson.
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis.Thatoccurswhenaguiltydefendantisacquitted.
Thetwoprobabilitiesareinverselyrelated.Decreasingone
increasestheother.
H0:thenullhypothesis
H1:thealternativeorresearchhypothesis
Thenullhypothesis(H0)willalwaysstatethattheparameter
equalsthevaluespecifiedinthealternativehypothesis(H1)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Concepts of Hypothesis Testing
ConsiderExample10.1(meandemandforcomputersduring
assemblyleadtime)again.Ratherthanestimatethemean
demand,ouroperationsmanagerwantstoknowwhetherthe
meanisdifferentfrom350units.Wecanrephrasethis
requestintoatestofthehypothesis:
H0:=350
Thus,ourresearchhypothesisbecomes:
This is what we are
H1:350 interested in
determining
Concludethatthereisenoughevidencetosupportthe
alternativehypothesis
(alsostatedas:rejectingthenullhypothesisinfavorofthe
alternative)
Concludethatthereisnotenoughevidencetosupportthe
alternativehypothesis
(alsostatedas:notrejectingthenullhypothesisinfavorof
thealternative)
NOTE:wedonotsaythatweacceptthenullhypothesis
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Concepts of Hypothesis Testing
Oncethenullandalternativehypothesesarestated,thenext
stepistorandomlysamplethepopulationandcalculateatest
statistic(inthisexample,thesamplemean).
Iftheteststatisticsvalueisinconsistentwiththenull
hypothesiswerejectthenullhypothesisandinferthatthe
alternativehypothesisistrue.
Forexample,ifweretryingtodecidewhetherthemeanis
notequalto350,alargevalueof(say,600)wouldprovide
enoughevidence.Ifiscloseto350(say,355)wecouldnot
saythatthisprovidesagreatdealofevidencetoinferthatthe
populationmeanisdifferentthan350.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Types of Errors
ATypeIerroroccurswhenwerejectatruenullhypothesis
(i.e.RejectH0whenitisTRUE)
H0 T F
Reject I
Reject II
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis(i.e.DoNOTrejectH0whenitisFALSE)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Recap I
1)Twohypotheses:H0&H1
2)ASSUMEH0isTRUE
3)GOAL:determineifthereisenoughevidencetoinferthat
H1isTRUE
4)Twopossibledecisions:
RejectH0infavorofH1
NOTRejectH0infavorofH1
5)Twopossibletypesoferrors:
TypeI:rejectatrueH0[P(TypeI)=]
TypeII:notrejectafalseH0[P(TypeII)=]
Arandomsampleof400monthlyaccountsisdrawn,for
whichthesamplemeanis$178.Theaccountsare
approximatelynormallydistributedwithastandarddeviation
of$65.
Canweconcludethatthenewsystemwillbecosteffective?
Weexpressthisbeliefasaourresearchhypothesis,thatis:
H1:>170(thisiswhatwewanttodetermine)
Thus,ournullhypothesisbecomes:
H0:=170(thisspecifiesasinglevalueforthe
parameterofinterest)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.1
Whatwewanttoshow:
H1:>170
H0:=170(wellassumethisistrue)
Weknow:
n=400,
=178,and
=65
Hmm.Whattodonext?!
Therejectionregionapproach(typicallyusedwhen
computingstatisticsmanually),and
Thepvalueapproach(whichisgenerallyusedwitha
computerandstatisticalsoftware).
Wewillexplorebothinturn
isthecriticalvalueoftorejectH0.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.1
Allthatslefttodoiscalculateandcompareitto170.
wecancalculatethisbasedonanylevelof
significance()wewant
Solvingwecompute=175.34
Sinceoursamplemean(178)isgreaterthanthecriticalvaluewe
calculated(175.34),werejectthenullhypothesisinfavorofH1,i.e.
that:>170andthatitiscosteffectivetoinstallthenewbilling
system
H1:>170 =175.34
H0:=170
=178
RejectH0infavorof
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Standardized Test Statistic
Aneasiermethodistousethestandardizedteststatistic:
andcompareitsresultto:(rejectionregion:z>)
Sincez=2.46>1.645(z.05),werejectH0infavorofH1
Inthecaseofourdepartmentstoreexample,whatisthe
probabilityofobservingasamplemeanatleastasextreme
astheonealreadyobserved(i.e.=178),giventhatthenull
hypothesis(H0:=170)istrue?
p-value
Ifthepvalueislessthan,wejudgethepvaluetobe
smallenoughtorejectthenullhypothesis.
Ifthepvalueisgreaterthan,wedonotrejectthenull
hypothesis.
Sincepvalue=.0069<=.05,werejectH0infavorofH1
H1:<22
Thenullhypothesisis
H0:=22
x
x
4,759
i
21.63
and 220 220
x 21.63 22
z .91
/ n 6 / 220
pvalue=P(Z<.91)=.5.3186=.1814
Thereisnotenoughevidencetoinferthattheplanwillbe
profitable.
SinceZ(.91)>Z.10(1.28)
WefailtorejectHo: > 22
at a 10% level of significance.
Theythensample100customersatrandomandrecalculatea
monthlyphonebillbasedoncompetitorsrates.
Whatwewanttoshowiswhetherornot:
H1:17.09.Wedothisbyassumingthat:
H0:=17.09
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.2
Therejectionregionissetupsowecanrejectthenull
hypothesiswhentheteststatisticislargeorwhenitissmall.
statissmall statislarge
Thatis,wesetupatwotailrejectionregion.Thetotalarea
intherejectionregionmustsumto,sowedividethis
probabilityby2.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.2
Ata5%significancelevel(i.e.=.05),wehave
/2=.025.Thus,z.025=1.96andourrejectionregionis:
z<1.96orz>1.96
z.025 +z.025 z
0
Usingourstandardizedteststatistic:
Wefindthat:
Sincez=1.19isnotgreaterthan1.96,norlessthan1.96
wecannotrejectthenullhypothesisinfavorofH1.Thatis
thereisinsufficientevidencetoinferthatthereisa
differencebetweenthebillsofAT&Tandthecompetitor.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
PLOT POWER CURVE
Inference
Statistic
Parameter
Wewilldeveloptechniquestoestimateandtestthree
populationparameters:
PopulationMean
PopulationVariance
PopulationProportionp
Buthowoftendoweknowtheactualpopulationvariance?
Instead,weusetheStudenttstatistic,givenby:
whichisStudenttdistributedwith=n1degreesof
freedom.Theconfidenceintervalestimatorofisgiven
by:
Experiencedworkerscanprocess500packages/hour,thusif
ourconjectureiscorrect,weexpectnewworkerstobeable
toprocess.90(500)=450packagesperhour.
Giventhedata,isthisthecase?
Ourobjectiveistodescribethepopulationofthenumbersof
packagesprocessedin1hourbynewworkers,thatiswe
wanttoknowwhetherthenewworkersproductivityismore
than90%ofthatofexperiencedworkers.Thuswehave:
H1:>450
Thereforewesetourusualnullhypothesisto:
H0:=450
Ourteststatisticis:
Withn=50datapoints,wehaven1=49degreesoffreedom.
Ourhypothesisunderquestionis:
H1:>450
Ourrejectionregionbecomes:
Thuswewillrejectthenullhypothesisinfavorofthe
alternativeifourcalculatedteststaticfallsinthisregion.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 12.1 COMPUTE
Fromthedata,wecalculate=460.38,s =38.83andthus:
Since
werejectH0infavorofH1,thatis,thereissufficient
evidencetoconcludethatthenewworkersareproducingat
morethan90%oftheaverageofexperiencedworkers.
Canweestimatethereturnoninvestmentforcompaniesthat
wonqualityawards?
Wearegivenarandomsampleofn=83suchcompanies.
Wewanttoconstructa95%confidenceintervalforthemean
return,i.e.whatis:??
Fromthedata,wecalculate:
Forthisterm
andso:
Tocheckthisrequirement,drawahistogramofthedataand
seehowbellshapedtheresultingfigureis.Ifahistogram
isextremelyskewed(sayinthecaseofanexponential
distribution),thatcouldbeconsideredextremely
nonnormalandhencetstatisticswouldbenotbevalidin
thiscase.
Thesamplevariance(s2)isanunbiased,consistentand
efficientpointestimatorfor.Moreover,
thestatistic,,hasachisquareddistribution,
withn1degreesoffreedom.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Testing & Estimating Population
Variance
Combiningthisstatistic:
Withtheprobabilitystatement:
Yieldstheconfidenceintervalestimatorfor:
Consideracontainerfillingmachine.Managementwantsa
machinetofill1liter(1,000ccs)sothatthatvarianceofthe
fillsislessthan1cc2.Arandomsampleofn=251literfills
weretaken.Doesthemachineperformasitshouldatthe5%
significancelevel?
Sinceouralternativehypothesisisphrasedas:
H1:<1
WewillrejectH0infavorofH1ifourteststatisticfallsinto
thisrejectionregion:
Wecomputerthesamplevariancetobe:s2=.8088
re
Andthusourteststatistictakesonthisvalue
pa
m
co
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 12.4
Aswesaw,wecannotrejectthenullhypothesisinfavorof
thealternative.Thatis,thereisnotenoughevidencetoinfer
thattheclaimistrue.
Note:theresultdoesnotsaythatthevarianceisgreaterthan
1,ratheritmerelystatesthatweareunabletoshowthatthe
varianceislessthan1.
Wecouldestimate(at99%confidencesay)thevarianceof
thefills
Inordertocreateaconfidenceintervalestimateofthe
variance,weneedtheseformulae:
weknow(n1)s2=19.41fromourpreviouscalculation,and
wehavefromTable5inAppendixB:
Becausewearecomparetwopopulationmeans,weusethe
statistic:
2.Theexpectedvalueofis
3.Thevarianceofis
andthestandarderroris:
isastandardnormal(orapproximatelynormal)random
variable.Wecouldusethistobuildteststatisticsor
confidenceintervalestimatorsfor
??
Insteadweuseatstatistic.Weconsidertwocasesforthe
unknownpopulationvariances:whenwebelievetheyare
equalandconverselywhentheyarenotequal.
Sincethepopulationvariancesareunknown,wecantknow
forcertainwhethertheyreequal,butwecanexaminethe
samplevariancesandinformallyjudgetheirrelativevalues
todeterminewhetherwecanassumethatthepopulation
variancesareequalornot.
2) anduseithere:
degrees of freedom
degrees of freedom
Likewise,theconfidenceintervalestimatoris:
Twomethodsarebeingtestedforassemblingofficechairs.
Assemblytimesarerecorded(25timesforeachmethod).At
a5%significancelevel,dotheassemblytimesforthetwo
methodsdiffer?
Thatis,H1:
Hence,ournullhypothesisbecomes:H0:
Reminder:Thisisatwotailedtest.
Theassemblytimesforeachofthetwomethodsare
recordedandpreliminarydataisprepared
Recall,wearedoingatwotailedtest,hencetherejection
regionwillbe:
Thenumberofdegreesoffreedomis:
Henceourcriticalvaluesoft(andourrejectionregion)
becomes:
Inordertocalculateourtstatistic,weneedtofirstcalculate
thepooledvarianceestimator,followedbythetstatistic
Sinceourcalculatedtstatisticdoesnotfallintotherejection
region,wecannotrejectH0infavorofH1,thatis,thereisnot
sufficientevidencetoinferthatthemeanassemblytimes
differ.
Excel,ofcourse,alsoprovidesuswiththeinformation
Compare
or look at p-value
Thatis,weestimatethemeandifferencebetweenthetwo
assemblymethodsbetween.36and.96minutes.Note:zero
isincludedinthisconfidenceinterval
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Matched Pairs Experiment
Previouslywhencomparingtwopopulations,weexamined
independentsamples.
If,however,anobservationinonesampleismatchedwith
anobservationinasecondsample,thisiscalledamatched
pairsexperiment.
Tohelpunderstandthisconcept,letsconsiderexample13.4
Whenlookingattwopopulationvariances,weconsiderthe
ratioofthevariances,i.e.theparameterofinteresttousis:
Thesamplingstatistic:isFdistributedwith
degreesoffreedom.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Inference about the ratio of two
variances
Ournullhypothesisisalways:
H0:
(i.e.thevariancesofthetwopopulationswillbeequal,hence
theirratiowillbeone)
Therefore,ourstatisticsimplifiesto:
df1=n11
df2=n21
Inexample13.1,welookedatthevariancesofthesamples
ofpeoplewhoconsumedhighfibercerealandthosewhodid
notandassumedtheywerenotequal.Wecanusetheideas
justdevelopedtotestifthisisinfactthecase.
Wewanttoshow:H1:
(thevariancesarenotequaltoeachother)
Hencewehaveournullhypothesis:H0:
Sinceourresearchhypothesisis:H1:
Wearedoingatwotailedtest,andourrejectionregionis:
Ourteststatisticis:
.58 1.61 F
Hencethereissufficientevidencetorejectthenull
hypothesisinfavorofthealternative;thatis,thereisa
differenceinthevariancebetweenthetwopopulations.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 13.6 INTERPRET
WemayneedtoworkwiththeExceloutputbeforedrawing
conclusions