ML 1

Download as pdf
Download as pdf
You are on page 1of 35
For ® Engineering Student. Educating People As Per New . % January 2020 a z a New Edition Vi Semester J — Machine Learning UNIT -t Introduction regression. to machine leaming, scope and 1a crac raises and near aigabra for machine vation, data visualization. " atgsting, data distributions, data tion Sugmentation, normalizing ate Sots, 2 ate. supervised and unsupervised frachine learning learning UNIT =I UNIT ys non tinea, activation Fane like sigmoid, RLU, ete, weights and bias, [Ss function, gradient descent, Fat wor network, backpropagation, : Fyaight intiazation, training, testrg Unstable gradient Pi paten normalization, dropout. and jem, auto encod prottguiarization, momentum, tuning NYP! parameters. UNIT «It Convolutional neural coring, suige, convolution ayer, Poona [ayer Gense layer 1*1 convolution . i cer on network, input channels, transfer jeer et ‘one shot teceeing. dimension reductions, implementation ‘of CNN like tensor flow, Keras etc i - network, flattening, subsampling loss layer, unit -1V UNITY al network, long short-term memonk, Oe! Recurrent rotranslaon,beam search and with, Ble 0", fattention model z : attention mode igarning, RLramework, MOP. Belt Seine vave tration and policy teration, acorn ‘model, Q-learning, SARSA.- a oe UNIT-V Shppon vector machines, Bayesian learning, applica of Supper tearing in computer vision, speech Process0: machine ear processing ete, case study ~ ImageNt ‘competition... s loa (0310-48) CTION TO MACHINE LEARNING, SCOPE AND REGRESSION, PROBABILITY, STATISTICS [ND LINEAR ALGEBRA OR MACHINE LEARNING, CONVEX (49 106 AR AMIZATION, DATA VISUALIZATION Ah Whar is macrve tenet PF chine learning isa branch o sence that deals wiih Pr (691099 ye cysioms in such a way that they ‘automatically learn and improve with th sae” Here, learning means fecognizing and understanding the input “eteate 9g) datz and making wise decisions pased on the supplied data aoe vtay difficult to cater to all the decisions based on all possible inputs san pron sure a nce. Ts bail bred rom apciic dat and past experience with the principles o! ove eens iIiy theory, logic, combinatorial optimization, search eae control theory. Y 1e developed algorithms from the basis of v ae ce basis of various applications such as Oyen Gi) Language processing eee ea market trends) eee CsGanee yiFPODER ool Raster (vii) Expert systeris QZ. Write the applications of machine l asi) 4% The applications of mach ae Shae learning are as follows — tases na fii) Machine perception Popiaanee eee (iv) Information retrieval EN) Nein Geen einen vi) Affective computing cust J) Rate enn rcesng, “eid Resormmnr aes (%) Sequence mining a 4:3. What are machine learning tools ? Explain. te, Ans, Machine learning gives a set of tools that use com data into actionable information. Tools are a big part of machine leaning choosing the right tool can be as important as working with the bestalgontin, Machine learning tools make applied machine learning faster, easier Gang tools can automate each step in the applied machine learning process hy shortening the time. ‘The machine learning tools are as follows, NPULCTS to transform (@ Platforms ~ Platforms are used to complete machine leaning project from beginning to end. (a) Provide capabilities required at each step in a machine learning project. (b) The interface may be graphical or command line, (©) They provide a lose coupling of features (d)_ They are provided for general purpose use and explora rather than speed, scalability or accuracy. (ii) Library — Library gives capabilities for completing par of machine learning project. a: (a) Provide a specific capability for one or more step i chine learning project ' — © The interface is typically an application progronmst iterfaée requiring programming. eee ey The ae tfored fra pei 8 environment. (iii) Graphical User Interfaces — (a) Allows less-technical users to w problem iy fork through macht sit learning. vateond ‘enn (b) Focus on process and how to get the most I learning techniques. a info cal presentations of (c) Stronger focus on graphical presental such as visualization. inet user by the (@) Structured process imposed on the user PY (iv) Command Line Interface — (a) Allows technical users who are not programmers to work yough machine learning projects, 8 (b) Frames machine learing tasks in terms of and output to be generated. (©) Promotes reproducible results 6) commands and command line arguments, the input required 'Y recording or scripting (v) Application Programming Interfaces — (a) To incorporate machine learning into our own software jects. Le (b) To create our own machine learning tools, (©) Gives the flexibility to use our own processes and ‘automations on machine learning projects. (@) Allows combining our own methods with those provided by the library as well as extending provided methods, (i) Local Tools — Local tools can be downloaded, installed and run on local environment. (a) Customized for in-memory data and algorithms (b) Control over run configuration and parameterization, (©) Integrate into our own systems to meet our needs (vii) Remote Tools — Remote too! called from local environment, Th learning as a service (MLaaS). (a) Tailored for scale to be run on larger datasets (b) Run across multiple systems, multiple cores and shared ls can be hosted on a server and ese tools are often referred to as machine memory, Q4. Write short note on arti ial intelligence vs machine learning. be broadly defined as machines those roblem on their own without any human Programmed directly into the system but eting that data produce a solution by itself. ath is nothing buta data mining algorithm. takes promote the approach to an advanced level by posed 10 eae ag ental for a machine to tain and modify suitably when information from ts, TS is known as “raining”. It focuses on extracting underlying pater miderably larg sets of data, and then detects and identifies interpret new dary Sine ¥ariOUS statistical measures to improve its ability to Parameters should bee Produce more effective results. Evident % “tuned” at the incipient level for better productivity. ‘Ans. Artificial intelligence may juing the ability to solve a given py ine eon ‘The solutions are not sary data and the Al interpr *otrretation that goceunece Machine learn Providing the data es. havi a at once. That task woul ‘to, Id have been impossible to solve. Mor Ver, a system cannot be cor intelligent if it lacked the ability to learn ar a Eales and improve from is preigd QS. Write and explain scope of machine learning, Ans. The scope of machine learning are as follows © (@ Explaining Human Learning ~ mentioned eater, Jearning theories have been preceived fitting to comprehend features of nen in humans and animals. Reinforcement learning algorithms estimac at dopaminergic neurones induced activites in animals during rewardboq learning with surprising accuracy. ML algorithms for uncovers sporadicdelineations of naturally appearing images predict visual fatus detected in animals inital visual cortex, Nevertheless, the important den hursan or animal learning like simulation, horror, urgency, hunger, insist actions and learning by tral and ergor over numerous time scales, arent taken into account in ML algorithms, This a potential opportunity to dscoe a more generalised concept of learning that enailsboth animals nd machi (ii) Programming Languages Containing Machine Learn Primitives — In majority of applications, ML algorithms are incorporatd wt ‘manually coded programs as part of an application software. The nei et new programming language that is self-sufficient to support manual wit ubroutines as well as those defined as “to be learned”. Programming aes! Tike Python (Sckit-lear), Rete. already making use ofthis cone scope. Buta fascinating new question is raised as to develop a bel seewant learning experience for each subroutines tagged 80 based on different algorithms dat Unit-1 7 “gt Write the advantages of machine earning. “Ans. The five advantages of machine learning are as follows ~ (@ Accurate - Machine learning uses data to discover the optimal decision making engine for your problem. As you collect more data, the ‘ecuraey can increase automatically. (ii) Automated — As answers are validated or discarded, the machine teaming model can lear new patterns automatically. This allows users to embed machine learning directly into an automated workflow. (i) Fast ~ Machine learning can generate answers in a matter of nillseconds as new data streams in, allowing systems to react in real time. (@») Customizable ~ Many data-driven problems can be addressed wi ‘machine learning. Machine learning models are custom built from a a ata, and can be configured to optimize whatever metric drives your business. (Scalable ~ As your business grows, machine lear ses tohandle ineeased dai ates. Some machine leaming sige oe, ‘scale to handle large amounts of data on many machines in the cloud. Q7. Write the disadvantages of machine learning, Ans. The disadvantages of machine learning are as follows — (@ Machine learning has the major challenge called acquisition. Also ta need to be processed. And, it must be ut to respective algorithms, Thus, thas be achieved or obtained. me before providing as icant impact on reulots (ii) AS we have ins aor challenge Than Oe (2 interpretation. That it result i also a iat need to determine the effectiveness of machine learning ; We the pe timing, and security in case of any unforeseen modiiston 10 tenga bre kw ee an function. compu CTY me underdog es cases acne lean ere cS chine leaning fis " —_ Rate ae ing fails. Ths, it requires ri posi A ae ae ofthe problem at hand to apply the right algerie rm of can link ML algorithms which are used in numero fo “a perception today including d recognition etc. is another potential research area. Rhee problem is the integration of different senses (€, Sidhe prepare a system which employ sef-supervise ear Bloc knowledge using the others. Aree, SE a ai ma ie hy 8 ate (iy) = Like deey “ 8 vision 5 of ta P learning algorih ic ae ae eyed ining at rithm, machine learning also needs a lot imited in 7e nc dala, For ca S8Y it might be cumbersome to work witha large ately there area lt of training data for image recognition (¥) One notaby Boynjolticon sto limitation of machine learning is its susceptiblit That when wnat! MEA‘ sid atthe acta poem wins nr , diagnosing and correcting them can be 8 through the underlying compli 8 Machine Leaming (VI-Sem,) >> 0.9. Discuss straight-line a Ans, Straight-line regression 1 single predictor variable, x. Itis the simpl vyas a linear function of x. That is, bewx ‘analysis involves a response variable, y, and est form of regression and models whee the variance fy assumed to eeonstant and nd ee | oe ping the Yinecep ad ope of he Ks coin ee, wad, cat aa be tough fs we Sate Ccnequvalenty wre = wot 1% > ove for bythe method of east sates whch ine as the one that minimizes the eror BeNee the line. Let D be a training set comsisin on and their associated val 3 [D) datapoints ofthe fort is can be estat ‘These coeficients can be: estimates the best-fitting straight li the actual data and the estimate of ‘values of predictor variable, x, for some ropula in for response variable, y. The wining set contain 1S (Gp yp) 3 ¥2)e on pp Yr) The regression coeice using this method with the following equations ~ | Le\-9Oi-Y i wom YW of yp 32 is the mean value A ns 10 where isthe mean value of Aa» Nand 7 ati , Yipy The coefficients wo and w; often provide otherwise complicated regression equations. Unies 9 “Multiple linear regression isan extension of straight-line regression so as toinvolve more than one predictor variable. allows response variable y tobe modeled as liner function of, say, n predictor variables oratrbutes, Ae "Age describing a tuple, X. Our training dataset, D, contain data ofthe form (X), 30s Yao» (Ni od Where the X, are the n-dimensional taining tuples with associated class labels, y, An example of a mull liner regression ‘model based on two predictor attributes or variables, A, and Ay. is y where xy and x3 are the values of attributes Ay and A, respectively in X 10+ WIR + WaXa, 0.10. Explain regression and log-linear models. ‘Ans. Regression and log-linear models can be used to approximate the tiven data. In (simple) linear regression, the data are modeled to ita straight line. For example, a random variable, y (called a response variable), can be ‘modeled asa linear function of another random variable, x (called a predictor variable), with the equation y= wxtb where, the variance of y is assumed to be constant. In data mining, xand y are ‘numerical database attributes. The coefficients, w and b are called regression pattems, they specify the slope of the line and the Y-intercept, respectively ina oetTicients can be solved for by the method of least squares, which Othe in nance between the actual ine separating the data and the estimate reteson, wna lineae regression is an extension of (simple) linear function of en OWS & response variable, y, to be modeled as a linear eat OF more preitr variables ogelinear Aisvbutions Gat sPPreximate disrete multidimensional probability atibutesy me eg SEO! tuples in n dimensions (eg, described by n Logtinear mode cond aeh tuple asa point in an n-dimensional space Plimensional gn PS Use 1 estimate the probability ofeach point ina Subst of dimenign ot 8 88 F discretized attributes, based ona smaller SPIE to be conse ination. This allows a higher-dimensional data therefore a ute oy Leer dimensional spaces. Loner models Repessonnet nt etsonaity edton and data smth She ppication may es ees can both be wed on sparse data, hough $B Sion doesent, While bth methods can handle skewed ta applied uly Well. Regression can be computationally intensive Salty foe gh dimensiona data, whereas log-linear models show good ‘°F Upto 10 oF so dimensions enna ae Machine Learing (Vise) (uit, Hlusiated the simple Tinea ion derivation. eres a lnc regression the model speciica that the dependent ie sa finear combination of the parernlors {but need not be linear in ste). Fr xa ins nen ee aoe jatng v data point there 18 OTE Independent variable xj, and two parameters, Bo and By ~ yo Bot Brit fet In multiple linear regression. there functions of independent variables vaainga ter in x? tthe preceding Feression gives — regression and multiple linear fear ‘are several independent variables or yee Bot BrXs Bax pi = Hs 2e 35 Me “This is sll linear regression; although the expression on the right hand side Rrpodrai in the independent variable; its linear inthe parameters Bo, By and By. tn both cases ian eror term and the subscript i indexes a particular observation Returing our attention to the straight line case ~ Given a random sample from the population, we estimate the population parameters and obtain the sample linear regression model ~ 51 = Bo + Bix: The residual, ¢, =; ~Jj. is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable, y,- One method of estimation is ordinary least squares ‘This method obtains parameter estimates that minimize the sum of square! residuals, SSE, also sometimes denoted RSS; s.asete Minimization ofthis function results in vet of normal equations “ di simultaneous linear equations in the parameters, which are solved 1° >"° parameter estimators, fy, sins Inthe cas of simple regression, the formulas forthe last sous" are 5 20i-™y; - JI ie vi-¥) Ux) -%) ee onthe" here iste mean (average ofthe values isthe rmeat of Predictor of each ot is an 20% Under the assumption that the population err: a variance, the estimate of that variance is given by > _ SSE - term has a constant This is called the mean square eros (MSE) of denominator is the sample size redt Ch i estimated from the same data, (n imeroet i sed, In this ease 1 the regression. The er oF model param ) for py reressors or (n The standard errors of the parameter estim: f p-1ifan ales are given by “V3; - Se he rer assumption that he population enor termina lsibuted, the researcher can ase these estate standard tr fo eae confidence intervals and conduct hypot ae conduct hypothesis tests about the population “0 aw Verto Fig. 1.1 Mlustration of Linear Regress Data Set 0.12. Explai on i” ia ae Wantin aati f variance for simple liner regression 's 0 association between Y and XB = 0), the best fy (in terms of minimizing sum of squares the total variation can be denoted as TSS = 0, the toval sum of sq oes ee petween Y and X (fi #0), the best predictor mizing sum of squares in tems of minimizing sum of sq ror variation can be denoted as SSE = \When there isan association sentation = Xs ote ee) hs es EEO Les 1 she difference between TSS regression of ¥ 00 X (a8 oppose the eror sum of squares , and SSE is the variation “explained” by the 4 to having ignored X). It represents the -YO-TY the sscon the fitted values and the mean SSR = 2 diterence regression sum of squares ‘TSS 2 Say Hy? +E VP a is SE + SSR Li-v rs ited with it. The ‘sum of squares has a degrees of freedom associate woul depen of feed is gy = 81 Te emo dgres of redo 2 (for simple regression) The regression degrees of freedom i (for simple regression), foe = 8 Slacgresina Atriuat ~ Herron + Apearession T=n-241 Table 1.1 Analysis of Variance for Simple Linear Regression Regresion Model, iy |* eZ. | | Tort [ot Ts8=)0y, ag PE Error ae ise Error and regression sums of squares have a mean square, which sum of squares divided by its corresponding degrees of freedom N = SSElin ~ 2) and MSR = SSRI, It can be shown that these mean 4 have the following expected values, average values in repeated samplité the same observed X levels — E(MSE} = 02, EyMSR} = o° +B LO — XY" a sion 2.13. Explain analysis of variance for multiple linear reste’ in Ans. When ther a " 1 association between Y and Xo B, = 0), the best predictor of each observation is Y =o minimizing um of squares of prediction errors), In thi ease, the total vatatio can be denoted as TSS = )(¥ -Y)?, the total sum of squares, just as with a simple regression. When there isan association between Y anda least one of X;, Xp (not , anon P all fj=0). the best predictor ofeach observations Y, =f +X +... +f x, {interns of minimizing sum of squares of prediction ers). In this case thon variation can be denoted as Ss DOV — Vo)? the error sum of squares ‘The difference between TS regression of ¥ on X,, and SSE is the variation “explained” by the » Xp (88 opposed to having ignored X), XI represents the difference between the fitted values and the mean— SSR = DG -¥)? the regression sum of squares TSS = SSE +SSR LM -¥P = Loy Hy? + Dy VF a Each sum of squares has a degrees of freedom associated with it, The total degrees of freedom is df,,,) =n ~ 1. The error degrees of freedom is Fe gge= 1p. The regression degrees of freedom is dfgcurasion =P» Note that when we have p = | predictor, this generalizes to simple regression, Fret = Boe * dl Rgreion n-t=n-ptp Error and regression sums 0 sum of squares divided by E hat PD and MSR = SSR/p. It can be shown that these mean squares have the following expe ‘wing expected values, average values in repeated sampling at the same observed X levels — E(MSE)= 02, E{MSR} > 6? ‘able 1.2 Analysis of Vai F squares have a mean square, which is the 'y its corresponding degrees of freedom- MS rianee for Multiple Linear Regression Source | m ree | ap Resrenca t ss MST gy value Madey | P| SSR= DWV)? | sm = SSR |p MSR | peg 2 Fins) ie a) P se | free Fs elite, 8 886-$y, Tous | : (Coreereg |") U8S-S vy, ys ee Machine Learning (""" ‘By What is the concept of probabilit bai extamely porto Has nd set Senn fal pial scensee Te 2 Explain with theorem, ‘Ans. The concept of P very extensive application in t ‘ Theorem of Total Probability or Addition Law of Probability ~ ent happening as a result of a trial is P(A) and wn tay extn event happenin i Pa) ts he wey fer ofthe events happening as result ofthe rai HA = B)ar MAW B)= PA) +P) Proof Let n be te toa numberof eau likely cases and fet my be favousbletothe event Aand my be favourubleto th event B, Then the number of cases favourable to or Bs my * ene the probably oF Aor B happenings a result ofthe tral _. Lem LM ay HD) ithe evens and arent my exsist thre are some uo wih nob Aan By bet number ten these tle nth my an Hence the total number of outcomes favourin either A or B or both is Thus the proba A ity P(A+ B) or P(A WB) of occurence ofA and B or Both = Tutma—ms _ Mm, my _ my PCA +B)= P(A) + P(B) — P(AB) PLA B)= P(A) + PCB) ~ P(A 9) B) ‘when Aand B are mutually exclusive P(A 7B) or P(AB) ~ 0 and we have PIA +B) or PCA UB) = P(A) + P(B) Proved Particular Cases — () IFA and B are defined on the sample space S, then AUB) = n(A) + n(B) PAU B)= P(A) + p(B) (9 ‘Since $ and 6 are mutually exclusive events and SU6=S PSU 9)= HS) P(S) + PC) = Psy (ii) Let A and _X be complementary events. Then A and X are mutually exclusive by definition. Hence PAU A)= PIS) P(A) + P(A) =1 PA) or PA)= 1 (iv) We know that P(S)= 1) PA) PA) A= (ANB)U(ANB) That A is the union of two mutually exclusive events PA)= PAN B)+ P(A) or P(AMB) = P(A)-P(ANB) or PCANB) = P(A) (ANB) Similarly, P(ANB) = P(B)-P(ANB) () BCA, then (@ PAB) = P(A)~ PB) (&) PB) s P(A). | Jab B— | s0 black, | 90 black, SU white | 10 white 02.22. Explain the term statistical analysis of data.(R-GPN:, May 2019) Ans, Siatistical data analysis isa procedure of performing various statis tative research, which seeks to quantify the “operations. I is a kind of quan dlata, and typically, applies some form of statistical analysis. Quantitative data basicaly involves descriptive data, such as survey data and observational dats, cal tools Statistical data analysis generally involves some form of stats Which layman cannot perform without having any statistical knowledge. The®> ate various sofware packages to perform statistical data analysis, This softwar includes Statistical Analysis System (SAS), Statistical Package for the Soci Sciences (SPSS), Stat sofi, etc, Data in statistical data analysis consists of variable(s). Sometimes the data is univariate or multivariate. Depending uP" the numberof variables, the researcher performs different statistical tehnian** IT the data in statistical data analysis is multiple in numbers, then seve ‘multivariates can be performed. These are factor statistical data analyS** discriminant statistical data analysis, ete. Similarly, if the data is singul#" number, then the univariate statistical data analysis is performed. This ie™ ‘test for significance, zest, Fest, ANOVA one way, etc. The data in statistical data analysis is basically of 2 types. continuous data and discreet data. The continuous data is the one that ©". 1 name 5 Marys continuous data in statistical data analysis is distributed under jstibution function, which can also be called the probability mass funetion or simple pt. We use the word “density” in continuous data of statistical data analysis tecause density cannot be counted, but can be measured, We use the word “mass” iscreet data of statistical data analysis because mass cannot be counted. “There are various pdf's and pmts in statistical data analysis. For example, Poisson distribution is the commonly known pmf, and normal distribution is the commonly known pdf. ‘These distributions in statistical data analysis help us to understand which data falls under which distribution. If the data is about the intensity of a bulb, then the data would be falling in Poisson distribution, ‘There is a major task in statistical data analysis, which comprises of statistical inference, The statistical inference is mainly comprised of two parts, ~ estimation and tests of hypothesis. Estimation in statistical data analysis, mainly involves parametric data - the data that consists of parameters. On the other hand, tests of hypothesis in statistical data analysis mainly involve non Parameters data ~ the data that consists of no parameters. Traditional methods for statistical analysis ~ from sampling data to roaee ne sults = have been used by scientists fr thousands of years. But ‘Aad gelumes make statistics ever more valuable and powerful Coan crease) Be: PoWerTul computers and advanced algorithms have all led cet use of computational statistics, remenenee Working with lage data volumes or running multiple Tor today’ statisneas ations statistical computing has become essential 7 ‘an. Popular statistical computing practices includes O Statistic : 2d tna ey tial Programming — From traditional analysis of variance Satiicalproprssoy 9 X8¢t methods and statistical visualization techniques, fla ‘ming is essential for making data-based decisions in every fw E Processed Econometrics — Mo, se improved ng in oecastng and simulating business economies to for eee tactical planning. This method applies 7 24 Machine Learning (Vi-Sem.) crore (id Operations Research ~ beni the actions ta the best results ~ based on many possible options and outcome simulation, and related modeling processes are used to opt, processes and management challenges. "m Will poe, Scheduling ie busines, sana”, Mar: Programming -Powstil compe niga implementing your own statistical methods and exploratory dataanalysiay row operation algorithms. 2 apn Mal watcton ~ Fast interative sats a and exploratory capabilities ina visual interface ean be used to undenens data and build models. cine (vi) Statistical Quality Improvement — A mathematical approach i reviewing the quality and safety characteristics for all aspects of production 0.23. What is the function of F test ? Explain. Ans, An F test for whether the simple linear regression mode! “explains” (really, predicts) a “significant” amount of the variance in the response, Wha! this really does is compare two versions of the simple linear regression model “The null hypothesis is that all of the assumptions of that mode! hold, and slope, By is exactly 0. This is sometimes called the “intercept-only” mods for obvious reasons. The alternative is that all ofthe simple linear regresioe assumptions hold with B, © R. The alternative, non-zero-slope model Wf save fit the data better than the null, inteeept-only model: the F test sts the improvement in fit is lager than we would expect under the ul about this precise quanti hardly ever, however, a good model is correctly specified, Beem formation about There are situations where itis useful to know and so run an F test on the regression. Itis ccheck whether the simple linear regression neither retaining nor rejecting the null gives us in really want (o know. ose first that we retain the null hypothesis, /€» sienteee sare of wanes associated wih the Te mretpuse (i) the intereept-only model is right: (i) By have enough power to detect departures from the ml ane There is also possibility thatthe real elationshiP Tinvar approximation to it has stope (nearly) Zero 10 power to detect the nonline is nonlinear ase the arity have n ot Suppose instead that we reject the nal sae nt joes met mean thatthe simple Hinear model SA. yal se ein panty model ~ 100 eg is better than the intercep +e aimple Tinear regression model ¢2” 0° model predi Aue tn chem unit 25 ry single one of is assumptions flagrantly violated, and yet elit wa uc mas il hse asumpns and inks she opie copes 20 " = 0 9s By #0 or the Wald test of the same Neither the F test of By pypotess tell us anything aout thecoretnes ofthe simple linear regression dora hse tests presume the simple inear regression model with Gaussian rode etrue and check a special case (Na ine) against the general one (tiled os hey do not test linearity, constant variance, lack of correlation, oF Gaussianity. 9.24, Write short note on t-test test is used to compare the mean scores obtained by two groups fable. The critical ratio test or t-test is used for two sample difference of means. Here it is applied to determine the differences between ‘neans of two scores obtained from the one group based on the two variables. It is very useful when the population variance is not known and when the sample size is small. The formula for estimating the ratio following ANOVA. test is Mean of the first sample ‘Mean of second sample Standard deviation of first sample 2 = Standard deviation of second sample Nj = Sample size of the first sample. Nz = Sample size of the second sample Interpretation of t-ratk salttin a tai — ec ss hn he blatd calculated tis grou tog ene then the aull hypothesis is accepted. Ifthe fata ey anh ule a 05 9 cs ee ll q Inthe presente Tapes fbr subjeid to cs n NOVA value icant B28. Descrit 7 ibe the scope of statistical method. ins. Im Govern Planing aod any pan amet aid Public Sectors ~ These are the days of missareants de Successful, must be based on lati. Statistical TERME uc same lems suchas of od shore, manne ear inece A tino the even and expendi forthe successful running of the Government Pp 126 Machine Leaming (VI-Sem.) i) In Business and Commerce — A manufacturer in order tof succesfl should make a study ofthe seasonal changes in the demand of hig goods andthe rate ofinterest for borrowing. A manufacturer such a8 of shoe, Sccloth must know the sizes and designs which are most in demand. A railway ompaay ought to know when to run special trains and when to run curated services. Insurance companies in deciding upon the premium to be changed or the annuities tobe granted have to consider the mortality ot sickness etc. likely to be experienced, rate of interest likely to be earned. (iii) In Medical Science - Statistical methods are necessary in finding the effectiveness of medicines and drugs for the prevention and cure of disease, (i) In Agricultura Research ~ Much ingenuity and statisicl knowledge i required in the desig and analysis to test the effect of different types of manures, levels of iigation and varieties of crops. (0) Im Meteorology ~ Weather forecasting depends on statistical (+i) tisavantageous in Education, Anthropometry and higher sciences. A, Describe the statistical methods and specify ther limitations. aad Testis methods eves hy whch omplexand numeri ares systematically treated aso presenta comprehensible and intelligible ‘iow of them. In other word the satstical methods is technique used to cba, analyse and present numerical data, The diferent steps that ae included inthe statistical methods are ~ Collection of data, Classification, Tabulation Presentation, Analysis, Interpretation and Forecasting. " ; Limitations of statistical methods are as follows — (@)_ Statistical laws are not exact laws like mathematical i are note lor chemical {avs They are derived by taking a majority of eases and are not true for every ‘individual. Thus the statistical inferences are uncertain. Gi) Statistical meth individuals rather than with i ‘of an Indian is 1 metre and individuals but as found by’ 1ods deal with population: or ag of ds i segregate 0} viduals. When we say that the average height 80 centimetres, it shows the height of not of Gantt byte siy of an agaezte finials ; nel ‘technique applies only to data which are reducible 0 bs oe ‘Sonsequently; the characteristics which cannot be measured i ee Such characteristics are beau} v) Stati ea % quoted shore ee might lead to fallacious conclusions if theY "sted penons deg out The argument that “in country 15: ‘small-pox, therefore vaccination is useless” Unit ar yee We are not told what percentage of the persons who were not vaccinated and died (¥) Statistical technique isthe same forthe social as forthe physical sciences, while both are different on nature (vi) Only one who has an expert knowledge of statistical methods, can handle the statistical data properly. The data placed in the hands of an inexpert may lead to fallacious results. 0.27, Explain the term probabilistic analysis of data “Ans. A widespread application of such an analysis is weather forecasting for more than a century, hundreds of weather stations around the world record ‘various important parameters such as the air temperature, wind speed precipitation, snowfall etc. Based on these data, scientists build models reflecting seasonal weather changes (depending on the time of the year) as ‘well as the global trends — for example, temperature change during the last $0 years. These models thought of by themselves, but not necessarily generate food assessments, The very fact that there was correspondence about the gambles and occasionally some disputes about them ~ indicated that people {do not automaticaly assess probabilities in the same way, or accurately (ex, corresponding to relative frequencies, or making good gambling choices) ‘The Von-Neumann and Morgenstem work, however, does involve some psychological assumptions that people can engage in ‘good’ probabilistic thinking, First, they must do so implicitly, so that the choice follows certai “choice axioms” that allow the construction ofan expected uility model ~ ie 1a model that represents choice as a maximization of implicit expected utility; that in turn requires that probabilities at the very least follow the standard ‘axioms of probability theory. Italso implies considering conditional probabilities in a rational manner, which is done only when implicit or explicit conditional probabilities are consistent with Bayes’ theorem. Thus, the Von-Neumann and Morgenstern ‘work required that people be ‘Bayesian’ in a consistency sense, although that {erm is sometimes used to imply that probabilities should at base be interpreted as degrees of belief, ‘Another way in which probability assessment must be ‘good! is that there should be some at least reasonable approximation between probabilities an long-term relative frequencies; in fact, under particular circumstances (oF interchangeability and indefinitely repeated observations) the probabilities of someone whose belie is constrained by Bayes’ theorem must approximate relative frequencies. Consider an analogy with swimming. People do swim well, but ‘ccasionally we, drown. What happens is that there is particular systematic ‘Machine Leaming (VF-Sem.) ‘as in attempting to swim that makes it difficult. We want t0 hold our heag, aac When howeet. we ase ut heads todos, We end Wo asm 4 vertical position in the water, which is one of the few ways of drowning {aside from freezing or exhaustion or being swept away in rough waters). Jug as people drown occasionally by trying to hold their heads above water, peop systematically deviate from the rules of probabilistic thinking. Again, howeve, the emphasis is on “systematic.” Forexample, there is now evidence that people’s probabilistic judgement, are “sub additive’ in that when a general class is broken into components, the judgementally estimated probabilities assigned to disjoint components tha comprise the class sum to larger number than the probability assigned to the class, That is particularly true in memory, where, for example, people may recall the frequency with which they were angry at a close friend or relative in the last month and the frequency with which they were angry at a toal stranger, and the sum of the estimates is greater than an independent estimate of being angry period (even though itis possible to be angry at someone who is neither 1 close friend or relative nor a total stranger). The clever opponent will then bet against the occurrence of each component but on the occurrence of the basic event, thereby creating a Dutch Book. O28. Write short note on statistics and linear algebra for ML. Ans, Linear algebra is valuable tool in other branches of mathematics, especially statistics, ‘The impact of linear algebra is important to consider, given the foundational relationship both fields have with the field of applied machine leaming, Some points of linear algebra on statistics and statistical methods ae as follows, (i) Use of vector and matrix notation, especially with multivariate statistics. (i) Solutions to east squares and weighted least squares, suchas fo linear regression, (ii) Estimates of mean and variance of data matrices. (iv) The covariance matrix that plays a key role in multinomil! Gaussian distributions (9) Principal component analysis for data reduction that draws ™!"9 of these elements together AAs we can see, modem statistics and data analysis, at least as far interests of a machine learning practitioner are concerned, depend °” understanding and tools of linear algebra, units 29 (0.29, Define some examples of linear algebra in machine learning. “Ans, Some examples of linear algebra in machine leaming areas follows (i) Linear regression Gi) Regularization (GiiyPrincipal component analysis (PCA) (iv) Singular-value decomposition (SVD) (Gi) Regularization — In applied machine learning, we often seek the simplest possible models that achieve the best skill on our problem. Simpler ‘models are often better at generalizing from specific examples to unseen data, In many methods that involve coefficients, such as regression methods and artificial neural networks, simpler models are often characterized by models, that have smaller coefficient values. A technique that is often used to encourage 4 model to minimize the size of coefficients while it is being fit on data is called regularization. Common implementations include the L? and L! forms of regularization. Both of these forms of regularization are in fact a measure of the magnitude or length of the coefficients as a vector and are methods lifted directly from linear algebra called the vector norm, (ii) Principat Component Analysis (PCA) ~ Otten dataset has ‘many columns, perhaps tens, hundreds, thousands of more. Modeling dat with many features is challenging, and models built from data that include irrelevant features are often less skillful than models trained from the most relevant data, It is hard to know which features of the data are relevant and hich are not. Methods tor automatically reducing the number of columns of 4 dataset are called dimensionality reduction, and perhaps the ost popular 1: 30, Machine Learning (Vi-Sem.) seth ili the principal component analysisor PCA For sor This methy ‘used in machine learning to create projections of high-dimensional dav both visualization and for taining models. The core of the PCA method i matrix factorization method from lin ra, The eigen d f robust implementations may use the singularafe be used and m decomposition or SVD. ingular-value Decomposition (SVD) ~ Another pops ar-value decomposition methog name of the method suggests, i» ra. It has wide use (wy dimensionality reduction method isthe si or SVD for short. As mentioned and ast ‘matrix factorization method from the field of linear in linear algebra and can be used directly in applications such as featng selection, visualization, noise reduction and more (0) Deep Learning — Anificial neural networks are nonlinear machine algorithms that are inspired by elements of the information processing inthe brain and have proven effective ata range of problems not least predicting modeling, Deep learning isthe recent resurged use of artifical neural networks with newer methods and faster hardware that allow for the development and and deeper (more layers) networks on very lange datas raining of Ia Deep leaming methods are routinely achieve state-of-the-art results on a range of challenging problems such as machine translation, photo captioning, speed AL their core, the execution of neural networks involves linear algetr: data structures multiplied and added together: Scaled up to multiple dimensions deep learning methods work with vectors, matrices and even tensors of inpuls and coefficients, where 2 tensor is a matrix with more than two dimensions Linear algebra is central to the description of deep learning methods via mattis ‘notation to the implementation of deep learning methods such as Google’ TensorFlow Python library that has the word “tensor” in its name, 4}, 240. Define some conver optimization problems for machine learnité fetta t® Some convex optimization problems for machine learning ae #* min 9°f,(0) + AR(9) 0 a xen! Taint fs ate conven and > Oa ied pret orm lo 8th (0) represents the ost of using x on the “simplcy” na We ge 8 eularzation term which enforces *0% Cha a fan We discuss now major instances of equation (i) In all ‘one has a data set of the Form (oy yi) © RP Y, 1 1, 2,3, suey n and the tuneti0n &, depends Units 39 In classification one has Y = {-1, 1}. Taking f(x) = max(0, 1 ~ y;x"w,) (the so-called hinge loss) and R(s) = x} one obtains the SVM problem. On te other hand taking f(x) = log(1 * exp(-y,x"w)) (the logistic loss) and again R(x) = {|x|} one obtains the logistic regression problem. In regression one has Y = R. Taking f(x) = (x7w,~y,)? and R(x) = 0 one ‘obtains the vanilla least-squares problem which can be rewritten in vector notation as — min || Wx-¥ (3 sek where W R™"is the matrix with w?" on the i row and ¥ = (y}, 0s ¥y)™ With R(x) = |x| one obtains the ridge regression problem, while with R(x) ™ In this i the LASSO problem. {nour fast example the design variable x is best viewed as a matrix, and thus we denote itby a capital letter X. Here our data set consists of observations ‘of some of the entries of an unknown matrix Y, and we want to “complete” the unobserved entries of ¥in such a way that the resulting matrix is “simple” (nthe sense that it has low rank). After some massaging the matrix completion problem can be formulated as follows — min. THX) SLX R™, XT=X,X20,X,5= Yi, for (ij) € 0 where 2 & [n} and (¥,, i ye09 ate given. (1031. What do you understand by data visualization ? Discuss some Prthon's data visualization tools such as box plots pie charts and bar charts in brief. Ans. Data visualizations helps top management who are the decision inskers to view analytics being visually represented, so it makes them to casily Understand the complex ideas and identify the new structures or patterns, When Taauization becomes interactive, then we ae abe to push the concept a litle vig teteby using technological tools to grasp more details from graphs ine charts, therefore making changes tothe data that is being seen an howe Such data is being processed. meine Means putting data forward and representing them in a particular Hippies levout which contains some variables and attributes for bringing nation. Visualization-based data discovery techniques gives room 1 business owner inven: Nits Owners to make up sources of completely differ wee 32 Machine Leaming (VI-Sem ) one Maina “pe wt | 4 utter sk — wb 1si0R | ser Fome sain EL hs08 - } Third Quart rox 4 sininwn a “First Quart Fig. 1.4 Simple Box Plot Fig. 1.5 Complex Box Plot ‘We can sce that it’s super easy to create this plot with Matplotib. Al ** need is the function pit boxplot ) The first argument isthe data points The simple box plot in using python’s [1.2,5.6.6,7,7,8, 8.8.9, 10, 20) plt.boxplot values) pli yticks(ranget1, 21)) pltylabelt" plt.show( » values » units 33 Charts — its. as well-known WP Mh A pe chartshows information dana ay tats not dificult salle "pie-slice” form and the various a rslice shows howe much ofan elements WPS ence, When the slice sigs thenithows ine daa was gathered. Iti also used to STmpare values of data and the moment some Salute are represented on pie cart, then we will beable o sew which ofthe items i the least popular or which ismore popula. Thebestand Fie 1-64 Simple Standard ‘effective way to make use of apie chart is when ’ie Chart they contin a few components and when the percentages and texts are also ttvolved inorder to define the content. By providing additional information report consumers do not have to guess the meaning and value ofeach slice. If You choose to we a pie cha, the slices should be a percentage of the whole asacicle onut Chart Demonstration Fig. 1.7 A Simple Doughnut Pie Chart ive tery a Pamphin we recas bis is: LA Simple Exploding Pie Chact A wedge is used to represent a data pats that iat has the same and the pie chart control usually decides the data wedge size wir eis compared with the other data wedges. Pie charts consist of twa eet variations called Doughnut chart and Exploding pie chart. The Dough ‘ae almost same as the standard pie chart just that it consist of hollow and the exploding charts, the wedges are being obtained from the other nan” Api chart with Matpltlb, we can use the plpie) function. Thea, parameter allows us to display the percentage value using the Python formatting sizes = (25, 20, 45, 10] labels = ("Cats "Dogs", "Tigers", "Goats'] pltpiesizes, labels-labels, autopct="%.20") pltaxes( )set_aspect("equal") pltshow() Bar Chart — Bar chart is as well referred to as column charts they are used to for comparison of items of different groups. The bars ares to represent the various values of a group and the bar chart makes use of bt horizontal bars and vertical bars. When the values to be represented are cle different and such differences in the bar are been seen by human eye, thenot can decide to make use of a bar chart, but when there are very huge numb of value to be displayed, then it might be a bit more hard to make comparist between the bars. Most times, bar chart is used to represent discrete dat sw jas well used to present single data series while the data points that ae are often being grouped in a series. ‘The Cookie Shop 2003-2005 Income uh Fig. 1.9 Displaying a Simple Bar Chart ‘A bar chart with Maptotlib, we will need the pi-bar( ) #Our data 10000 function Unit! 35 tabets ~ ("2003", "2008", "2005"] usage =[0, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000] ‘AGenerating the y positions. Later, we will use them to replace them with labels. positions range(len(labels)) ‘Creating our bar plot pli.barty_positions, usage) plt.xticksty_positions, labels) pliylabel("Usage (%)") plttitle("The cookie shop") pH show( ) \QH. Explain data visualization techniques. ‘Ans. Visualization is the use of computer-supported, visual representation of data, Unlike static data visualization, interactive data visualization allows users to specify the format used in displaying data. Common visualization techniques are as shown in fig. 1.10. ‘Line Graph | Data Visualization 110 Commonly used Data Vsualation Techniques @ Line Graph ~ This shows the rel is show he reainship between items it ‘be used to compare. ‘changes over a period of time. eee! (i) Bar Chart—This is « sed compar quits of i encaeprics Scatter Plot — two in a Seater Plot This isa two-dimensional plot showing variation () Pie Chart This is used Ths, the f 10 compare the parts of a whole. hart ing SOTA OF graphs and chants can take the form o| vont BP te. ICs important to w fepaecenenart te inderstand which chart or grap Data visualization uses computer graphics to show relationship among elements of the data I can generate pen Satter plots, and other types of data graph with simple pulldown ent mouse clicks. Colors are carefully selected for cerain types of eM When color is used to represent data, we must choose effect a differentiate between data elements a mn data visualization, data is abstracted and summarized Spatial such as position, size, and shape represent ke Visualization system should perform a data redu the original dataset on a screen. alters, trends, yy aiabe, y elements in the data, ction, transform and pro, ‘At should visualize results in the form of charts and ‘graphs and presen, results in user friendly way. 9:33. Discuss the application of data visualization. ‘Ans. Most visualization designs are to aid decision making and sen tools that augment cognition. In designing and building a data visualizaxe ‘prototype, one must be guided by how the visualization will be applied. Daa visualization is more than just representing numbers; it involves selectingard rethinking the numbers on which the visualization is based Visualization of data is an important branch of computer science andhs wide range of application areas. Several application-specifi tools have be developed to analyze individual datasets in many fields of medicine and sien: (@ Public Health ~The ability to analyze and present data in understandable manner is critical to the success of public health surveil Health researchers need useful and intelligent tolsto ad ther work Sees is important in cloud-based medical data visualizations. Open any medi health magazine today, and we will see all kinds of graphical representa (ii) Renewal Energy — Calculation of energy consumption compet to production is important for optimum solution, ie ‘environmental managers # (iii) Environmental Science ~ As envio me required to make devisions based on highly complex data, they Ir ‘visualization. Visualization applications wthinaplied envionment ag are beginning to emerge. It is desirable to have at one’s disp programs for displaying results. uows 37 (») Library-decision Making ~ Data visualization software allows librarians the flexibility to better manage and present information collected from different sources gives them the skill o present information in a creative, ‘compelling way. Visualization of library data highlights purchasing decisions, future library needs and goals. Librarians, as de facto experts of data sisualization, can assist students, faculty and researchers visualize theie data Several information visualization algorithms and associated software have ‘heen developed. These software enable users to interpret data more rapidly than ever before (0.34 Write short notes om the following ~ Histogram (ii) Quantile plots eaploss (iv) Scatter plot () Loess carve. Ans. () Histogram — Plotting histograms, or frequency histograms, 'sa graphical method for summarizing the distribution of a given attribute. A histogram for an attribute A partitions the data distribution of A into disjoint subsets or buckets, Typically, the width ofeach buckets uniform. Each bucket is represented by a rectangle whose height.is equal to the count or relative frequency of the values at the bucket. IFA is categoric, such as automobile ‘model or item _type, then one rectangle is drawn for each known value of land the resulting graph is more commonly referred to as a bar chart. If is numeric, the term histogram is preferred. In an equal-width histogram each bucket represents an equal-width range of numerical attribute Fig. 11 shows a histogram for the data set of table 1.6, where buckets ‘ats defined by equal-width ranges representing $20 increments and the frequency isthe count of items sold Fig. 1.11 A Histowrase 6... 7 38 Machine Learning (Vi-Sem ) Table 1.6 A Set of Unit Price Data for Items Sold ata Brana All Electronies Branch of ae ee | 40 | 75 | " 280 _ 360 2 is 3 so tis 30 17 20 | 2 350 (i) Quantite Plots — A quantile plots a simple and effective way» have a first look at a univariate data distribution. First, it displays all of he dats for the given attribute. Second, it plots quantile information. The ‘mechanism used in this step is slightly different from the percentile computation Let x, for i= 1 to N, be the data sorted in increasing order so that; sth smallest observation and Xz is the largest. Each observation, x. is paired wih a percentage, f, which indicates that approximately 100 f, % of the dat ae below or equal to the value, x;. We say “approximately” because there may pt bea value with exactly a fraction, f,, ofthe data below or equal tox, Fig. 12 shows a quantile plot for the unit price data of table 1.6, tn wet Ut Fics) oueeee \ 5 wea 000 . z ols Fig. 1.12.4 Quantile Plot for the Unit Price Data of Table 16 js he (iy qq Plot ~ A quantite-quantie plot or 3 plot. BAN Ny quantiles of one univariate distribution against the corresponding avant ‘another. It is powerful visualization tool in that itallows the user to vie there is shit in going from one distribution to another. 3 tantle- quantile plot for unit price dat so Fig. 1.13 shows a quantile. oc da fem sawo differcat branches during a given time period. Esch p S'S rid of se Other atin units 39 a ” Branch 2 (Ui Price 5) 8 re ee Branch (Unit Price 8) Fig. 1.13 A Quantile-quantile Plot for Unit Price Data from Two Different Branches (i) Scatter Plot ~ A scatter plot is one of the most effective graphical ‘methods for determining if there appears to be a relationship, pattern, of trend between two numerical attributes. To construct a scatter plot, each pair of values is treated as a pair of coordinates in an algebraic sense and plotted as Points in the plane. Fig. 1.14 shows a scatter plot for the set of data in table 1,6. The scatter plot is a useful method for providing a first look at bivariate cata to see clusters of points and outliers, or to explore the possibility of correlation relationships. Plot. Given n attributes, OS that provides a visulictin of each oan with every ate Pn 40 Machine Leaming (VI-Sem.) (0) Loess Curve Aloess curve isa sn mporant explora aid thatadds a smooth curvetoaseaterplotinenderto prove ene of the pater of dependence. The word loess is shor for loca Fig. 1.15 shows loess curve forthe set of data in able 1.6, "Sh eres Tegtession om” © om 10 1 1 Unie Price) LIS A Loess Curve ‘Two parameters are needed to fita loess curve. a, a smoothing paramels, and A, the degree of the polynomials. While a can be any positive numbe, dean be I of 2. The goal in choosing a is to produce a fit that is as smooth s possible without unduly distorting the underlying pattern in the data. The cune becomes smoother as «increases. There may be some lack of fit, howeve: indicating possible “missing” data patterns. If cis very small, the underlig pattern is tracked, yet overfiting of the data may occur where local “wigale! in the curve may not be supported by the data. Ifthe underlying patter of data has a ‘gentle’ curvature with no local maxima and minima, then 13) linear fitting is usually sufficient (2.~ 1). However if there are local maxina minima, then local quadratic fiting (= 2) typically does a better job following the pattern of the data and maintaining local smoothness. 0.35. Write short notes on the following — @ Table chart (ii) Bubble chart (ii) Tree map (iv) Parallel coordinate () Line chart (vi) Area chart - “Ans. (Table Chart—Table is simply the arrangement of 10. and column. In conducting research and analysis of data, ther O° very important. Tables are simple to understand and analyze af Oe Jnterpret the method of data representation, Arow 154 FEPPESEDIND yes and column is as well a representation of records that have 8 ext times, this order of arrangement can be changed, 4©. 7% TO, evords and columns represent variables. AVEFaBe AMMEN M44 a elected categories of consumer spending, by housing {Sn are shown in table 1.7. 20 unt 44 Table 1.7 _ _ a f “Homeowners ‘Remers ] Espemire 1186 [ae [e ann | ferent | ee is | $55,780) Tsaam] 0 fodachome te) elk | el GR) 8 | Ean fo 22] ame vn| 3 Hsing siseniocs | 32030] 1781 eal oo ee rien] 91056 408 ies 31 Cand ete il nigy| aase| 1s | tons{ 1sit| 18 Seats a7] doi) a2 | 1313) rsis| 16 eat surance a3) 2318) 145 2| 09) 1 tem Ss) Jama] “i | vate] a0] 6 erence] See] goes] 24 | 3374] 2907] (ii) Bubble Chart — A bubble plot is some degree of difference of a scatter plot and the markers init are being substituted with bubbles and this is possible only we have a set of data points which has three values contained in teach data item. It shows the relationship that exists between the minimum of three variables. Two of them gets represented by the plot axes ive., x-axis and ‘yas wieethid one bythe buble si and each bubble is representation fof an observation. cin tbble Pots used wit a lot of value, say hundreds of them or also used ithe values aresomewhat different by numerous structure of magnitude, Colors ing used to represent an additional measure and the bubbles could be subjected to anim: rected to animation in order to show data changes over a period of time. Annual Sales Chart a Price (BUR) as Fig. 1, "he bubble pot iste = coe Bubble Plot 42 Machine Leaming (v!-Sem) ahvee values a net present values then the probability f SUeeeSs andthe cay eoprset the Bubble iz. pe Map Awe naps # viii CMG that hs vere ne rch na nested oF YS Ne Fay anv of shown a an wed to saze suis of arch ieaery eet eo ed sa nodes at ierent depth and as thy eae er ed ealand pers. To of Gt hae are able et Pea he objets are thereby vided into dey divisions, sub divisions, etc Specific Defenses ——_Aquires Immniy Hummoral Cell Meditatsd Immunity Immunity First Line Defense Second Line | | Defense BCs Teas ‘skin Pi tering ey Inflammatory Response Interferons Fever strom batering Body I Il Redness and Swelling ‘Muces, Tear, Sal Hair and Cla Inert eo Flow ‘Stomach Acid WAS (White Blood Cells) Eat Up Paneges Phase) Histamines Signal Fig, 1.17 Tree Map Display Hierarchical Data (iv) Parallel Coordinates ~ The parallel coordinate technique make usc of the concept of networking a muli-dimensional point to some axes 2° all of these are paralle! to each other. In these technique, single data eleme® ‘ae being potted across many dimensions and these dimensions are conne poe nl sense that smaller p-values will push a gy ent the (finite-sample) up the distribtuion of T i units 53 | a mite sat notes the fling cnc ate sinc i eco Ans (p Level of Significance ~ The probability level below which re atens is known asthe evel of significance. Therepionin which ‘dample value falling is ejected is ee Known asthe critical region. We generally take two critical regions hich cover 5% and 1% areas of the normal curve. The shaded portion inthe figure corresponds to S% level of significance, Hence the probability ofthe value of the satiate fllingin the critical region isthe level of significance. Dependingon the nature of the problem, we use a sngle-tl est or Joubl tail test to estimate the significance of a result. In a double-til test, the arca of both the tails of the curve representing the sampling distribution are taken into account whereas in the single til est only the areaon the right ofan ordinate fare taken into consideration. For instance, to test whether a coi i biased oF ‘not, double-tail test should be used, since a biased coin gives ether more number of heads than tails (which corresponds to righ al), of more qumber of tails than heads (which corresponds to lef tail only) shane Testo Stniseance Te proses wich enables us to decide “eter to ase or ret the hypothesis a he test of significance. re we test whether the differences between the sample values and the Population values (rhe values given by wo samples) a large that they lent evidence agains the hypothesis oF there ferences ae so sal 51 ecount for fluctuations of sampling ili) C i Se mal ila Lini- ong sn seaemal wih mea and standard itn oA 1.3 the sample expected to lie in the interval (1 1.968, 41+ 968) 0498" reject Ceti Region oe ¥ Fig, 1.23 limes ie, we time we canbe coin of ening i ts mal S| 94 2) 98 cases Besause of hs, we call 1.968, § + 1.968) the lence interval tor estimation of The ends of ths tena (¢- S41 hired arate 9% confidence itso fide imu foe S Similarly secon 24 comtdence iis, The mabe 136,258 sido naa elses. The valu ofeonfience oe ans coresponding significance can be obtained fom the normal une a 1 (VI-Sem) rriable (Small Samples) ~ \n practical probley, rat nt totam core hana See aerate ae omen 4. Machine Leorin (iv) Sampling of Va ‘we cannot always have large ive have to depend on small sample (= Cannot assume as in the case of large sample that t Giamibution ofa parameter is approximately normal, Secondly the estimate, population made from a stall sample are not reliable of parameters of the Bet. Write shar mote on composite Iypothesis. (RGR, June 2007) Jims, As the order ofthe integration method is increased, the order ofthe derivative in the error term associated with the method, also increases. Foran method to produce meaningful results, these higher order derivatives must remain continuous inthe interval of interest. Also, Newton-cotes type methods Of higher order sometimes produce diverging results. An alternative to obtain Securate results, while using lower order methods is the use of composite Jnegration methods, We subdivide the given interval [a, b} or [= 1,1] intoa number of subintervals and evaluate the integral in each subinterval by 2 particular method. This is known as composite or multisegment hypothesis. 0.43. Write short note on critical region. (RGPY., June 2007) Ans, 4 region (corresponding {o a statistic t) is called the sample space “The par of sample space which amounts to rejection of null hypothesis Hy, is called critical region or region of rejection. HEX = (4), Xp, sa Xp 8 the random vector observed and We is the critical region (which corresponds the rejection of the hypothesis according 10 ® prescribed test procedure) of the sample space W, then w,=W-W, ‘of the sample space is called the acceptance region. 0.44. Write short note on most powerful critical region. (GRY, Dec. 2006) Ans. In testing the hypothesis Ho; 0 = Op against the alternative Hy :® = 8) te critical regions bes if the type Il error is minimum or the powers maxitu™ ‘when compared to every other possible critical region of size & ‘A test defined by this critical region is called most powerful fest 0.45, What do you mean by probability distribution and discret probability distribution ? Unit-1 55 soo ese ae sed moe fen ha ers Inert Discrete Probability Distributions ~ A diserete random variable assumes cach of its values with a certain probability, ie. each possible value of the random variable has an associated probability. Let X be a diserete random ‘variable and let each value ofthe random variable have an associated probability, denoted p(x) = P(X = x), such that [a [a pO] p. |P2 ‘The incon is own a he probsbity dtbutn of he nd variable X if the following conditions are satisfied — " (p(x) 20 forall values x of X (i) Lpey=1, 0) also refered to asthe probability function or probability mass function 7 ‘46. What do you understand by the term Binomial distribution ? ns. discon Binomial distribution is avery simple discrete probability or cxpenatbeeause it models situation inwhicha single tial ofsome process tal oa result in only one of two mutually exclusive outcomes the "moulitrial after the mathematician Bemouli). Wehave already ‘met examples o ‘ies of this distribution in the earlier discussion on probability Toobiainthe times. exactly sii ofthe happening ofan event once twice thrice t en Ain oneal ls, Suppose the probability of the happening of an Teil and the 2 ttt 80" happening is | ~ pg, We suppose that here 'PPening of the event A is r times and its not isn —r times. pit = Ans. Probability Distribution - We have explored the idea of POPS je “This may be shown as follows ‘we ean consider the concept of a probability distribution. In situations jy Woe ‘arible beng studied sa random variable, then this can often be 00 gas : KAR probability distribution. Simply put, a probability distribution | hast wine n—rtimes Ai) ~ the collection of possible values that the variable can takes 18 ys 8. A its failure Ky eer eer eterna ee gee van) Oe A its failure then P(A) =p and P(A) = 156 Machine Learning (VI-Sem,) ‘We see that relation (i) has the probability PPP. d..d= Pia n times n—rtimes Clearly relation (i) is merely one order of arranging FA's “The probability of elation i) = p'g?~'* numberof different arrangement, of rA’sand (0-1) A's “The numberof different arrangements of A's and (n-1)A’s = "¢ Probability of happening of an event r times = "C,p'q"! P(e) = "C,piqh* s(F= 051, 2pm) = (c+ 1) termof (q+ PP Ifr = 0, the probability of happening of an event 0 times Ifr= I, the probability of happening of an event | times. Ifr=2, the probability of happening of an event 2 times = 8Caq"~2p? and so on ‘These terms are clearly the successive terms i (q+ pyP. Hence itis called Binomial distribution. 0.47. What is Binomial frequency distribution ? Give the application of Binomial distribution. ‘Ans, Binomial Frequency constitutes one experiment and thi the frequency of r successes is N°C, pa”. together with these expected frequencies c¢ distribution. “Applications of Binomial Dis applied to problems concerning ~ : (i) Number of defectives in a sample from production line i). Estimation of reliability of systems. ‘Number of rounds fired from a gun hitting # (iv) Radar detection. io (0.48. Explain the term Poisson distribution, ase give the atl bil tion is a distribution related 1 oe eet ‘rare, but which have: ‘a large number in the expansion 0 Distribution — If n dependent tial is experiment be repeated N times, het “The possible numberof success* ‘onstitute the Binomial frequen tribution — Bionomial distribuion * target. oft. Ans. Poisson distribt ‘events which are extremely Units 57 sjeson distribution isa particular limiting form of binomial distribution Po large and p very small, keeping np fixed (= m say). by making 2 VEY babii ofr successes ina Binomial distebuton is ate ap(np-p)(np~2p)....(np~ r= i (py So atthe probabilities of, 1,2, are given by pea erie a The sum of these probabilities is unity as it shouldbe. Applications of Poisson Di sopitigtons of etaon Distbation — Poon dtibun is oped (i)_Arival patter of defective vei sonia Al aterm f defective vticesin a worishop pti (Demand patter for certain spare pats Gi) Numbers of fragments from a shall iting a target. (iv) Spatial distribution of bomb hits. i Define continuous distribution. So far ke oly thine fl wth srt dsbuions where the variate eth ca gee aes. Bu the variates ike tempertures, eights and Values in given intervals. Such variable are said 19 be Conse vrbler ‘PPese f(x) is a continuous function, then Mean= J Te t(x)ax ae ae. 58 Machine Leaming (Vi-Sem,) 0.50. What do you understand by the term normal distribution » [RGR May 2019 (iin, Ans, Normal distribution isa continuous distribution. Its derived gro! limiting form of the Binomial distribution for large values ofn and, pang. not very small. uot “The normal distribution is given by the equation , ee pee eee 8 olin “0 where, j1=Mean, = Stndard deviation. CATED = BMI gy ou as Pon <<) = of” tx > = La | xou A v te on puting z= *=!*in relation (), we get fa)= Ere? Here, mean = 0, standard deviatio Relation (ii) is known as standard form of normal distribution. Note - Moment generating function of the continuous probabil distribution about x = ais given by ef f(x) ax MO where, isp “Applications — Norma distribution is appied to problems conceit @ Calculation of errors made by chance in experimental measureme Gi) Computation of hit probability ofa hot. (iii) Statistical inference in. almost every Lranch of science, SL. Write short note on exponential distribution oe “fe lowibton of random variable whose natal esi ‘a normal distribution is called as the distribution. ‘The exponential density function is given by tart | Sa) ae er where the range of random variable is x> 0. The defined as y= E(log X) | and ot = Vlog X) a E(X) o as oot of species of animals, incubation period of infectious diseases, fandom phenomena occu (0.52. Explain the term frequency distribution. sins. Frequency distribution is an arrangement of data according to the umber (called frequency) possessing the individual or grouped values of the Pehemical elements in geological materials, and many other ring in both the social and natural sciences. Grouped Frequency Distribution ~ Beare le or a continuous variable A frequency distribution ange of given observations on adiscrete listributions the frequencies over these Fig, 1.24 Graph of Rectangular Distribution This distribution is socalled since the curve y= f{x) deseribes a rectangulr over the x-axis and between the ordinates at x= a and x = b, This implies tt X is a continuous variable. ip Hence X is a rectangular variate in the range [a,b], we have PP toar= hie, feof? a= as f(x) is constant. | fix) (b- a), ‘Thus a rectangular distribution is given by probability function 1 9)" Fa" | , ie, =p asxsb xb Cor.1. The probability that an ‘observation fallin any interval within dite 5 oT Maps ony a asxsbis Gop times he length fy, 1.25 sf sae ‘Function the interval. ‘Suppose (c,d) is the new interval so that dx _d-e peesxsd~f, Goa boo 154, Explain in det about the erm dts rprcesing. Mena Ans. Data preprocessing isan imporan stp ie mny anni nc problem that aims to transform the raw input features TE is easily interpretable by a machine. The most common jenn ace that are standardization and whitening. Here X denotes thedaaer ens tmean, and o the population standard deviation Standardization ~ Standardization isthe mos scaling bythe standard deviation, The reason for mean acces eet that non-zero mean input data creates a loss surface that is socp ean directions and shallow nthe such htt slows down conveecectinaion, terse itt cri in spread along different directions negatively ffs the convergene ra Mean subtraction canbe formalized as ; a ra XO=X—p ‘where 1 denotes the mean and X‘©) the zero centred data. Mean subtraction has the geometric interpretation of centring the cloud ‘of data around the origin along every dimension a shown in fig 1.26 (). the dataset, the population (©) Original Data (6) Zero Centred Data (c) Sundardiced Data Fig. 1.26 Visualization of the Standardization Transform lization refers to altering the data dimensions such that th of iments the same scale. Ths is commonly achieved by dividing each by its standard deviation once it has been zero centred asin ey are 22 Lx; -w? nh ) ° @ Senos the standard deviation and Xe standardized st = Dividing by the standard deviation has the geometric interpretation of altering the spread of the data such that the data dimensions are proportional to each other. Whitening — It is sometimes not enough to centre and scale the features independently using the standardization process, since a downstream model can further make assumptions on the linear independence of the features. To address this issue, we can make use of the whitening transformation, to further remove the linear correlation across features. There are many possible ways to obtain a whitening transformation such as zero component analysis (ZCA) or principal component analysis (PCA). PCA is also widely used in the machine learning for the purpose of dimensionality reduction. The whitening transformation is a two- step process that involves decorrelation using the computed'eigenvectors, and subsequen. scaling of the decorrelated data with the eigenvalues. The first step of PCA whitening is to perform the singular value decomposition (SVD) of the covariance matrix as in- U, S, UT = SVD@) We where £ denotes the covariance matrix, U its eigenvectors, and S$ contains the i i Jong its diagonal. Sea beck classification mining, normalizing the iors learning phase. For distance-based methods, normalization helps prevent want ‘ranges from out weighing attributes with initially ates wh nly gs ras main Some em are as follows ~ () Min-masx Normalization - it performs a linea transformation pose that ming and max, are the minimum and onthe original data. Sup 2 or aan abut, A Minaaxnornalizaton map alc, ar tv inthe range [neW_ming, new max] by computing = (n0w_max,-new_ming)-+new_ming imax, — min, Min-max normalization preserves the relationships among the original ‘values. It will encounter an “out-of-bounds” error if future input case fr normalization falls ouside ofthe original data range for A. i) Z-score Normalization ~ It is also called zero-mean normalization, In this normalization the values for an attribute, A, are normalized based on the mean and standard deviation of A. A value, v, of Ais normalized to V by computing v-a oA where A and o', are the mean and standard deviation, respectively, of attribute ‘A. This method of normalization is useful when the actual minimum and maximum of attribute A are unknown or when there are outliers that dominate the min-max normalization, (ii) Decimal Scaling Normalization ~ This method normalizes > moving the decimal point of values of attribute A. The number of deci! Points moved depend on the maximum absolute value of A. The value, °F is nomalized tov by computing 10 where, j is the smallest integer such that Max(\v) < 1. 297 Write and explain diferent types of machine learniné- ‘Ans. ‘The different types of machine learning are shown in fit: 7° [Lene ten Fig. 1.28 Types of Machine Learning ()) Supervised Learning — In this type of learning, the machine is Gina given set of inputs with their desired outputs, The machine reeds to study those given sets of inputs and outputs and find a general function that maps inputs to desired outputs. i) Unsupervised Learning — This type of learning is termed as, provided ‘Jearned by its own’ by discovering and adopting, based on the input pattern. Inthis learning the data are divided into different clusters and hence the learning iscalled a clustering algorithm. (i) Semi-supervised Learning — This learning is used for the same applications as supervised learning. But it uses both labeled and unlabeled data for trainin. This type of learning can be used with methods such as classification, repression and prediction. Semi-supervised learning is useful when the cost «associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person’s face on a web cam, () Reinforcement Learning (RL) ~In this type of learning, machine ‘strained to take specific decisions based on the business requirement with the ‘iectve to maximize the effcieney (performance). This continual learning Reais Gusues less participation of human expertise and saves more time With ramet leaming is often used for robotics, gaming and navigation which smercement learning, the algorithm discovers through trial and error ‘ctions yield the greatest rewards. ~@54. What do you mean by supervised learning ? Ans. Super sei hanes eaming means lamin rom examples, wher atrining lsctpion ors 2! 28 examples forthe classes. The system finds a Spr cae ah One the destin as been formule, tis sed analysis wha sS of Deviously unseen objects. Thisis similar to diver bas ceca nce en NES Tso rnin eve lean oA Sipe eels With ean fnction fom availble waning Suneret cing ni ert anaes the ning dat and proces “tape ns fr upping example. Conon 3) eubeised learning include Classising eal as spam = —— (ii) Labeling webpages based on their content (iii) Voice recognition. “There are many examples of supervised learning algorithms like SVMs spport vector machines), Naive Bayes classifiers, neural networks ang let Fig, 1.29 Supervised Learning 0.59. Describe the problem and issues in supervised learning. | vAns. There are sx issues taken into account while dealing with supervise | leaming a follows — () Heterogeneity of Data ~ Many algorithms like neural networks \ and support vector machines like their feature vectors to be homogencot | umeric and normalized. The algorithms that employ distance metrics are Veo, sensitive to this, and hence if the data is heterogeneous, these methods shou!d te ihe afterthought Decision tres ean handle heterogeneous data very esi (ii) Redundancy of Data — \f the data contains | redundantinformation, ic. comin highly corlated valves, then i's useless to use distance Wr methods because of numerical instability. In this case, some 20% regularization can be employed to the data to prevent tis situation. (ii Dependent Features ~\f there is some dependenc® ewer feature vectors, then algorithms tht monitor complex iterations 1° networks and decision tres fare better than other algorithms. dierent but equally good data ses, Now the learning agorthin biased for a particular input if, when trained on these d9/8 qyutamaticaliy innoenact while peudictinn tha comect Amp SOTTO?” spot A learning algorithm has a high variance for ap re provides different outputs when skied sate T Hanh pmvjeoff between bias and variance learning approach is to adjust this tradeoff (v) Amount of Training Data and Function Co y-T amount of da ired to provide dui : 0 complexity of the Function required to 3 s simple funetion with low com the learni ‘ 1 smnall amount of data. Wher other hand, gh comp functions, the learning algorithn re amount of data. (+i) Dimensionality of the Input Space ithe input feature vee have high dimension then the learning algorithm can bedi pends fon a small number of features. This is because the many “extra” dimensions ‘ean confuse the learning algorithm and cause it to have high variance. Hence, high input dimensionality typically requires tuning the classifier to have low variance and high bias. cult even it Q.60-What do you mean by unsupervised learning ? ‘Ans. Unsupervised leaming is learning from observation and discovery In this mode of leaming, there is no training set or prior knowledge of the classes. The system analyzes the given set of data to observe similarities emerging out of the subsets of the data. The outcome is a set of class ‘descriptions, one for each class, discovered in the environment. This is similar , discovered in it si to custer analysis in statistics. Nev Te Dean a rete sea a GD nmap, Tit 158 Unsupervised Learn poectted data set ning makes sense of unlabeled data without having any Wl or analyiny manine. Unsupervised learning is an extremely Ane available data and look for patterns and trends. It Similar input into logical groups. Common. 68. Machine Learning (Vi-Sem.) proaches to unsupervised learning include — approa (i) k-means (ii) Self-organizing maps i) Hierarchical clustering. : 7 —— examples of unsupervised learning algorithm are — (i) Genetic algorithms (ii) Clustering approaches / (iii) A priori algorithm for association Gule learning problems. B61. Write short note on semi-supervised learning. Ans, Problems where we have a large amount of input data and only some of the data is labeled are called semi-supervised learning problems. These problems sit in between both supervised and unsupervised learning. A good example is a photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled. Many real world jachine learning problems fall into this area. This is because it can be expensive or}ime-consuming to label data as it may require access to domain experts, 'eteas unlabeled data is cheap and easy to collect and store, We can use unsupervised learning techniques to discover and learn the Structure in the input variables. We can also use supervised learning techniques ‘o make best guess predictions for the unlabeled data, feed that data back into the supervised learning algorithm as training data and use the model to make predictions on new unseen data. Y°2. Differences between Supervised and unsupervised learning. [R.GPY., May 2019 (VIII-Sem.)] and unsupervised learning are as Ans, Differences follows — between supervised Supervised Learning Unsupervised Learning Knowledge of out put learnin, with presence of . No knowledge of output class Data is lab, elles an expert, or value. re celled with a class Data is unlabelled or value Its goal is to pr unknown, Predict class . . Value label, cessor Its goal is to determine data Examples patterns, Neural netw. SVM decision tre ote Baveci Examples — k-means, genet! | Tees ee, Ba Classifiers, etc, sealed : hes algorithms, clustering appro ete,

You might also like