Linear Discriminant Analysis Summary
Linear Discriminant Analysis Summary
Linear Discriminant Analysis Summary
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Lineardiscriminantanalysis
FromWikipedia,thefreeencyclopedia
Lineardiscriminantanalysis(LDA)andtherelatedFisher'slineardiscriminant
aremethodsusedinstatistics,patternrecognitionandmachinelearningtofinda
linearcombinationoffeatureswhichcharacterizesorseparatestwoormoreclasses
ofobjectsorevents.Theresultingcombinationmaybeusedasalinearclassifier,
or,morecommonly,fordimensionalityreductionbeforelaterclassification.
LDAiscloselyrelatedtoanalysisofvariance(ANOVA)andregressionanalysis,
whichalsoattempttoexpressonedependentvariableasalinearcombinationof
otherfeaturesormeasurements.[1][2]However,ANOVAusescategorical
independentvariablesandacontinuousdependentvariable,whereasdiscriminant
analysishascontinuousindependentvariablesandacategoricaldependentvariable
(i.e.theclasslabel).[3]Logisticregressionandprobitregressionaremoresimilarto
LDA,astheyalsoexplainacategoricalvariablebythevaluesofcontinuous
independentvariables.Theseothermethodsarepreferableinapplicationswhereit
isnotreasonabletoassumethattheindependentvariablesarenormallydistributed,
whichisafundamentalassumptionoftheLDAmethod.
LDAisalsocloselyrelatedtoprincipalcomponentanalysis(PCA)andfactor
analysisinthattheybothlookforlinearcombinationsofvariableswhichbest
explainthedata.[4]LDAexplicitlyattemptstomodelthedifferencebetweenthe
classesofdata.PCAontheotherhanddoesnottakeintoaccountanydifferencein
class,andfactoranalysisbuildsthefeaturecombinationsbasedondifferences
ratherthansimilarities.Discriminantanalysisisalsodifferentfromfactoranalysis
inthatitisnotaninterdependencetechnique:adistinctionbetweenindependent
variablesanddependentvariables(alsocalledcriterionvariables)mustbemade.
LDAworkswhenthemeasurementsmadeonindependentvariablesforeach
observationarecontinuousquantities.Whendealingwithcategoricalindependent
variables,theequivalenttechniqueisdiscriminantcorrespondenceanalysis.[5][6]
Contents
1LDAfortwoclasses
2Canonicaldiscriminantanalysisforkclasses
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
1/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
3Fisher'slineardiscriminant
4MulticlassLDA
5Practicaluse
6Applications
6.1Bankruptcyprediction
6.2Facerecognition
6.3Marketing
6.4Biomedicalstudies
7Seealso
8References
9Furtherreading
10Externallinks
LDAfortwoclasses
Considerasetofobservations (alsocalledfeatures,attributes,variablesor
measurements)foreachsampleofanobjectoreventwithknownclassy.Thisset
ofsamplesiscalledthetrainingset.Theclassificationproblemisthentofinda
goodpredictorfortheclassyofanysampleofthesamedistribution(not
necessarilyfromthetrainingset)givenonlyanobservation .[7]:338
LDAapproachestheproblembyassumingthattheconditionalprobabilitydensity
functions
and
arebothnormallydistributedwithmeanand
covarianceparameters
and
,respectively.Underthisassumption,
theBayesoptimalsolutionistopredictpointsasbeingfromthesecondclassifthe
logofthelikelihoodratiosisbelowsomethresholdT,sothat
Withoutanyfurtherassumptions,theresultingclassifierisreferredtoasQDA
(quadraticdiscriminantanalysis).
LDAinsteadmakestheadditionalsimplifyinghomoscedasticityassumption(i.e.
thattheclasscovariancesareidentical,so
)andthatthecovariances
havefullrank.Inthiscase,severaltermscancel:
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
2/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
because isHermitian
andtheabovedecisioncriterionbecomesathresholdonthedotproduct
forsomethresholdconstantc,where
Thismeansthatthecriterionofaninput beinginaclassyispurelyafunctionof
thislinearcombinationoftheknownobservations.
Itisoftenusefultoseethisconclusioningeometricalterms:thecriterionofan
input beinginaclassyispurelyafunctionofprojectionofmultidimensional
spacepoint ontovector (thus,weonlyconsideritsdirection).Inotherwords,
theobservationbelongstoyifcorresponding islocatedonacertainsideofa
hyperplaneperpendicularto .Thelocationoftheplaneisdefinedbythethreshold
c.
Canonicaldiscriminantanalysisforkclasses
Canonicaldiscriminantanalysis(CDA)findsaxes(k1canonicalcoordinates,k
beingthenumberofclasses)thatbestseparatethecategories.Theselinear
functionsareuncorrelatedanddefine,ineffect,anoptimalk1spacethroughthe
ndimensionalcloudofdatathatbestseparates(theprojectionsinthatspaceof)the
kgroups.SeeMulticlassLDAfordetailsbelow.
Fisher'slineardiscriminant
ThetermsFisher'slineardiscriminantandLDAareoftenusedinterchangeably,
althoughFisher'soriginalarticle[1]actuallydescribesaslightlydifferent
discriminant,whichdoesnotmakesomeoftheassumptionsofLDAsuchas
normallydistributedclassesorequalclasscovariances.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
3/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Supposetwoclassesofobservationshavemeans
andcovariances
.
Thenthelinearcombinationoffeatures
willhavemeans
andvariances
for
.Fisherdefinedtheseparationbetweenthesetwodistributions
tobetheratioofthevariancebetweentheclassestothevariancewithintheclasses:
Thismeasureis,insomesense,ameasureofthesignaltonoiseratiofortheclass
labelling.Itcanbeshownthatthemaximumseparationoccurswhen
WhentheassumptionsofLDAaresatisfied,theaboveequationisequivalentto
LDA.
Besuretonotethatthevector isthenormaltothediscriminanthyperplane.As
anexample,inatwodimensionalproblem,thelinethatbestdividesthetwogroups
isperpendicularto .
Generally,thedatapointstobediscriminatedareprojectedonto thenthe
thresholdthatbestseparatesthedataischosenfromanalysisoftheone
dimensionaldistribution.Thereisnogeneralruleforthethreshold.However,if
projectionsofpointsfrombothclassesexhibitapproximatelythesame
distributions,agoodchoicewouldbethehyperplanebetweenprojectionsofthe
twomeans,
and
.Inthiscasetheparametercinthresholdcondition
canbefoundexplicitly:
.
MulticlassLDA
Inthecasewheretherearemorethantwoclasses,theanalysisusedinthe
derivationoftheFisherdiscriminantcanbeextendedtofindasubspacewhich
appearstocontainalloftheclassvariability.ThisgeneralizationisduetoC.R.
Rao.[8]SupposethateachofCclasseshasamean andthesamecovariance .
Thenthescatterbetweenclassvariabilitymaybedefinedbythesamplecovariance
oftheclassmeans
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
4/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Thismeansthatwhen isaneigenvectorof
thecorrespondingeigenvalue.
theseparationwillbeequalto
If
isdiagonalizable,thevariabilitybetweenfeatureswillbecontainedinthe
subspacespannedbytheeigenvectorscorrespondingtotheC1largest
eigenvalues(since isofrankC1atmost).Theseeigenvectorsareprimarily
usedinfeaturereduction,asinPCA.Theeigenvectorscorrespondingtothesmaller
eigenvalueswilltendtobeverysensitivetotheexactchoiceoftrainingdata,andit
isoftennecessarytouseregularisationasdescribedinthenextsection.
Ifclassificationisrequired,insteadofdimensionreduction,thereareanumberof
alternativetechniquesavailable.Forinstance,theclassesmaybepartitioned,anda
standardFisherdiscriminantorLDAusedtoclassifyeachpartition.Acommon
exampleofthisis"oneagainsttherest"wherethepointsfromoneclassareputin
onegroup,andeverythingelseintheother,andthenLDAapplied.Thiswillresult
inCclassifiers,whoseresultsarecombined.Anothercommonmethodispairwise
classification,whereanewclassifieriscreatedforeachpairofclasses(giving
C(C1)/2classifiersintotal),withtheindividualclassifierscombinedtoproduce
afinalclassification.
Practicaluse
Inpractice,theclassmeansandcovariancesarenotknown.Theycan,however,be
estimatedfromthetrainingset.Eitherthemaximumlikelihoodestimateorthe
maximumaposterioriestimatemaybeusedinplaceoftheexactvalueintheabove
equations.Althoughtheestimatesofthecovariancemaybeconsideredoptimalin
somesense,thisdoesnotmeanthattheresultingdiscriminantobtainedby
substitutingthesevaluesisoptimalinanysense,eveniftheassumptionofnormally
distributedclassesiscorrect.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
5/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
AnothercomplicationinapplyingLDAandFisher'sdiscriminanttorealdataoccurs
whenthenumberofmeasurementsofeachsampleexceedsthenumberofsamples
ineachclass.[4]Inthiscase,thecovarianceestimatesdonothavefullrank,andso
cannotbeinverted.Thereareanumberofwaystodealwiththis.Oneistousea
pseudoinverseinsteadoftheusualmatrixinverseintheaboveformulae.However,
betternumericstabilitymaybeachievedbyfirstprojectingtheproblemontothe
subspacespannedby .[9]Anotherstrategytodealwithsmallsamplesizeistouse
ashrinkageestimatorofthecovariancematrix,whichcanbeexpressed
mathematicallyas
Applications
Inadditiontotheexamplesgivenbelow,LDAisappliedinpositioningandproduct
management.
Bankruptcyprediction
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
6/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Inbankruptcypredictionbasedonaccountingratiosandotherfinancialvariables,
lineardiscriminantanalysiswasthefirststatisticalmethodappliedtosystematically
explainwhichfirmsenteredbankruptcyvs.survived.Despitelimitationsincluding
knownnonconformanceofaccountingratiostothenormaldistributionassumptions
ofLDA,EdwardAltman's1968modelisstillaleadingmodelinpractical
applications.
Facerecognition
Incomputerisedfacerecognition,eachfaceisrepresentedbyalargenumberof
pixelvalues.Lineardiscriminantanalysisisprimarilyusedheretoreducethe
numberoffeaturestoamoremanageablenumberbeforeclassification.Eachofthe
newdimensionsisalinearcombinationofpixelvalues,whichformatemplate.The
linearcombinationsobtainedusingFisher'slineardiscriminantarecalledFisher
faces,whilethoseobtainedusingtherelatedprincipalcomponentanalysisare
calledeigenfaces.
Marketing
Inmarketing,discriminantanalysiswasonceoftenusedtodeterminethefactors
whichdistinguishdifferenttypesofcustomersand/orproductsonthebasisof
surveysorotherformsofcollecteddata.Logisticregressionorothermethodsare
nowmorecommonlyused.Theuseofdiscriminantanalysisinmarketingcanbe
describedbythefollowingsteps:
1. FormulatetheproblemandgatherdataIdentifythesalientattributes
consumersusetoevaluateproductsinthiscategoryUsequantitative
marketingresearchtechniques(suchassurveys)tocollectdatafromasample
ofpotentialcustomersconcerningtheirratingsofalltheproductattributes.
Thedatacollectionstageisusuallydonebymarketingresearchprofessionals.
Surveyquestionsasktherespondenttorateaproductfromonetofive(or1to
7,or1to10)onarangeofattributeschosenbytheresearcher.Anywherefrom
fivetotwentyattributesarechosen.Theycouldincludethingslike:easeof
use,weight,accuracy,durability,colourfulness,price,orsize.Theattributes
chosenwillvarydependingontheproductbeingstudied.Thesamequestionis
askedaboutalltheproductsinthestudy.Thedataformultipleproductsis
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
7/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
codifiedandinputintoastatisticalprogramsuchasR,SPSSorSAS.(This
stepisthesameasinFactoranalysis).
2. EstimatetheDiscriminantFunctionCoefficientsanddeterminethestatistical
significanceandvalidityChoosetheappropriatediscriminantanalysis
method.Thedirectmethodinvolvesestimatingthediscriminantfunctionso
thatallthepredictorsareassessedsimultaneously.Thestepwisemethodenters
thepredictorssequentially.Thetwogroupmethodshouldbeusedwhenthe
dependentvariablehastwocategoriesorstates.Themultiplediscriminant
methodisusedwhenthedependentvariablehasthreeormorecategorical
states.UseWilkssLambdatotestforsignificanceinSPSSorFstatinSAS.
Themostcommonmethodusedtotestvalidityistosplitthesampleintoan
estimationoranalysissample,andavalidationorholdoutsample.The
estimationsampleisusedinconstructingthediscriminantfunction.The
validationsampleisusedtoconstructaclassificationmatrixwhichcontains
thenumberofcorrectlyclassifiedandincorrectlyclassifiedcases.The
percentageofcorrectlyclassifiedcasesiscalledthehitratio.
3. Plottheresultsonatwodimensionalmap,definethedimensions,andinterpret
theresults.Thestatisticalprogram(orarelatedmodule)willmaptheresults.
Themapwillploteachproduct(usuallyintwodimensionalspace).The
distanceofproductstoeachotherindicateeitherhowdifferenttheyare.The
dimensionsmustbelabelledbytheresearcher.Thisrequiressubjective
judgementandisoftenverychallenging.Seeperceptualmapping.
Biomedicalstudies
Themainapplicationofdiscriminantanalysisinmedicineistheassessmentof
severitystateofapatientandprognosisofdiseaseoutcome.Forexample,during
retrospectiveanalysis,patientsaredividedintogroupsaccordingtoseverityof
diseasemild,moderateandsevereform.Thenresultsofclinicalandlaboratory
analysesarestudiedinordertorevealvariableswhicharestatisticallydifferentin
studiedgroups.Usingthesevariables,discriminantfunctionsarebuiltwhichhelpto
objectivelyclassifydiseaseinafuturepatientintomild,moderateorsevereform.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
8/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Inbiology,similarprinciplesareusedinordertoclassifyanddefinegroupsof
differentbiologicalobjects,forexample,todefinephagetypesofSalmonella
enteritidisbasedonFouriertransforminfraredspectra,[12]todetectanimalsource
ofEscherichiacolistudingitsvirulencefactors[13]etc.
Seealso
Datamining
Decisiontreelearning
Factoranalysis
KernelFisherdiscriminantanalysis
Logit(forlogisticregression)
Multidimensionalscaling
Multilinearsubspacelearning
Patternrecognition
Perceptron
Preferenceregression
Quadraticclassifier
References
1. ^abFisher,R.A.(1936)."TheUseofMultipleMeasurementsinTaxonomic
Problems".AnnalsofEugenics7(2):179188.doi:10.1111/j.1469
1809.1936.tb02137.x(https://dx.doi.org/10.1111%2Fj.14691809.1936.tb02137.x).
hdl:2440/15227(http://hdl.handle.net/2440%2F15227).
2. ^McLachlan,G.J.(2004).DiscriminantAnalysisandStatisticalPatternRecognition.
WileyInterscience.ISBN0471691151.MR1190469
(https://www.ams.org/mathscinetgetitem?mr=1190469).
3. ^AnalyzingQuantitativeData:AnIntroductionforSocialResearchers,Debra
WetcherHendricks,p.288
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
9/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
4. ^abMartinez,A.M.Kak,A.C.(2001)."PCAversusLDA"
(http://www.ece.osu.edu/~aleix/pami01.pdf).IEEETransactionsonPatternAnalysis
andMachineIntelligence23(=2):228233.doi:10.1109/34.908974
(https://dx.doi.org/10.1109%2F34.908974).
5. ^Abdi,H.(2007)"Discriminantcorrespondenceanalysis."
(http://www.utdallas.edu/~herve/AbdiDCA2007pretty.pdf)In:N.J.Salkind(Ed.):
EncyclopediaofMeasurementandStatistic.ThousandOaks(CA):Sage.pp.270275.
6. ^Perriere,G.&Thioulouse,J.(2003)."UseofCorrespondenceDiscriminantAnalysis
topredictthesubcellularlocationofbacterialproteins",ComputerMethodsand
ProgramsinBiomedicine,70,99105.
7. ^Venables,W.N.Ripley,B.D.(2002).ModernAppliedStatisticswithS(4thed.).
SpringerVerlag.ISBN0387954570.
8. ^Rao,R.C.(1948)."Theutilizationofmultiplemeasurementsinproblemsof
biologicalclassification"(http://www.jstor.org/stable/2983775).JournaloftheRoyal
StatisticalSociety,SeriesB10(2):159203.
9. ^Yu,H.Yang,J.(2001)."AdirectLDAalgorithmforhighdimensionaldatawith
applicationtofacerecognition",PatternRecognition,34(10),20672069
10. ^Friedman,J.H.(1989)."RegularizedDiscriminantAnalysis"
(http://www.slac.stanford.edu/cgiwrap/getdoc/slacpub4389.pdf).Journalofthe
AmericanStatisticalAssociation(AmericanStatisticalAssociation)84(405):165175.
doi:10.2307/2289860(https://dx.doi.org/10.2307%2F2289860).JSTOR2289860
(https://www.jstor.org/stable/2289860).MR0999675
(https://www.ams.org/mathscinetgetitem?mr=0999675).
11. ^Ahdesmki,M.StrimmerK.(2010)"Featureselectioninomicspredictionproblems
usingcatscoresandfalsenondiscoveryratecontrol"
(http://projecteuclid.org/euclid.aoas/1273584465),AnnalsofAppliedStatistics,4(1),
503519.
12. ^PreisnerO,GuiomarR,MachadoJ,MenezesJC,LopesJA.ApplicationofFourier
transforminfraredspectroscopyandchemometricsfordifferentiationofSalmonella
entericaserovarEnteritidisphagetypes.ApplEnvironMicrobiol.201076(11):3538
3544.
13. ^DavidDE,LynneAM,HanJ,FoleySL.Evaluationofvirulencefactorprofilingin
thecharacterizationofveterinaryEscherichiacoliisolates.ApplEnvironMicrobiol.
201076(22):75097513.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
10/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Furtherreading
Duda,R.O.Hart,P.E.Stork,D.H.(2000).PatternClassification(2nded.).
WileyInterscience.ISBN0471056693.MR1802993
(https://www.ams.org/mathscinetgetitem?mr=1802993).
Hilbe,J.M.(2009).LogisticRegressionModels.Chapman&Hall/CRCPress.
ISBN9781420075755.
Mika,S.etal.(1999)."FisherDiscriminantAnalysiswithKernels"
(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.9904).IEEE
ConferenceonNeuralNetworksforSignalProcessingIX:4148.
doi:10.1109/NNSP.1999.788121
(https://dx.doi.org/10.1109%2FNNSP.1999.788121).
Externallinks
ALGLIB(http://www.alglib.net/dataanalysis/lineardiscriminantanalysis.php)
containsopensourceLDAimplementationinC#/C++/Pascal/VBA.
Psychometrica.de(http://www.psychometrica.de/lds.html)opensourceLDA
implementationinJava
LDAtutorialusingMSExcel
(http://people.revoledu.com/kardi/tutorial/LDA/index.html)
Biomedicalstatistics.Discriminantanalysis
(http://www.biomedicalstatistics.info/en/prognosis/discriminantanalysis.html)
Retrievedfrom"http://en.wikipedia.org/w/index.php?
title=Linear_discriminant_analysis&oldid=644189195"
Categories: Multivariatestatistics Statisticalclassification
Classificationalgorithms
Thispagewaslastmodifiedon26January2015,at02:00.
TextisavailableundertheCreativeCommonsAttributionShareAlike
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
11/12
28/01/2015
LineardiscriminantanalysisWikipedia,thefreeencyclopedia
Licenseadditionaltermsmayapply.Byusingthissite,youagreetothe
TermsofUseandPrivacyPolicy.Wikipediaisaregisteredtrademarkofthe
WikimediaFoundation,Inc.,anonprofitorganization.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis
12/12