Academia.eduAcademia.edu

Descriptive Statistics.pdf

London School of Commerce Quantitative Methods for Business Decisions (Lecture Notes 01) _______________________________________________________________________ Instructor: Mohammad Moniruzzaman Bhuiya Quantitative Methods for Business Decisions E-mail: [email protected] Q. What are Statistics?  Procedures for organizing, summarizing, and interpreting information  Standardized techniques used by scientists  Vocabulary & symbols for communicating about data  Two main branches:  Descriptive statistics  Tools for summarising, organising, simplifying data  Tables & Graphs  Measures of Central Tendency  Measures of Variability  Examples:  Average rainfall in Manchester last year  Number of car thefts in last year  Your test results  Percentage of males in our class  Inferential statistics  Inference is the process of drawing conclusions or making decisions about a population based on sample results  Data from sample used to draw inferences about population  Generalising beyond actual observations  Generalise from a sample to a population Statistical terms  Population  complete set of individuals, objects or measurements  Sample  a sub-set of a population  Variable  a characteristic which may take on different values  Data  numbers or measurements collected  A parameter is a characteristic of a population  e.g., the average height of all Britons.  A statistic is a characteristic of a sample  e.g., the average height of a sample of Britons.  Σ This symbol (called sigma) means ‘add everything up’. So, if you see something like Σxi it just means ‘add up all of the scores you’ve collected’.  Π This symbol means ‘multiply everything’. So, if you see something like Π xi it just means ‘multiply all of the scores you’ve collected’. Data  There are two general types of data.  Quantitative data is information about quantities; that is, information that can be measured and written down with numbers. Some examples of quantitative data are your height, your shoe size, and the length of your fingernails.  Qualitative data is information about qualities; information that can't actually be measured. Some examples of qualitative data are the softness of your skin, the grace with which you run, and the color of your eyes. Qualitative Data Overview: • • • • Deals with descriptions. Data can be observed but not measured. Colors, textures, smells, tastes, appearance, beauty, etc. Qualitative → Quality Quantitative Data Overview: • • • • Deals with numbers. Data which can be measured. Length, height, area, volume, weight, speed, time, temperature, humidity, sound levels, cost, members, ages, etc. Quantitative → Quantity Example Example Oil Painting Oil Painting Qualitative data: • • • • • blue/green color, gold frame smells old and musty texture shows brush strokes of oil paint peaceful scene of the country masterful brush strokes Quantitative data: • • • • • picture is 10" by 14" with frame 14" by 18" weighs 8.5 pounds surface area of painting is 140 sq. in. cost $300 Nominal Data Nominal basically refers to categorically discrete data such as name of your school, type of car you drive or name of a book. This one is easy to remember because nominal sounds like name (they have the same Latin root). Ordinal Data Ordinal refers to quantities that have a natural ordering. The ranking of favorite sports, the order of people's place in a line, the order of runners finishing a race or more often the choice on a rating scale from 1 to 5. Interval Data Interval data is like ordinal except we can say the intervals between each value are equally split. The most common example is temperature in degrees Fahrenheit. The difference between 29 and 30 degrees is the same magnitude as the difference between 78 and 79. Ratio Data Ratio data is interval data with a natural zero point. For example, time is ratio since 0 time is meaningful. Degrees Kelvin has a 0 point (absolute 0) and the steps in both these scales have the same degree of magnitude. Central value  Unfortunately, no single measure of central tendency works best in all circumstances  Nor will they necessarily give you the same answer  Mean  Give information concerning the average or typical score of a number of scores  The Mean is a measure of central value  What most people mean by “average”  Sum of a set of numbers divided by the number of numbers in the set [1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10] 55 = = 5.5 10 10 Σx Arithmetic average: X = n If X = [1, 2,3, 4,5, 6, 7,8,9,10] Then  ∑ X /=n 1 + 2 + 3 + ... + 10 55 = = 5.5 10 10 Median  Middlemost or most central item in the set of ordered numbers; it separates the distribution into two equal halves  If odd n, middle value of sequence  if X = [1,2,4,6,9,10,12,14,17] then 9 is the median  If even n, average of 2 middle values  if X = [1,2,4,6,9,10,11,12,14,17] then 9.5 is the median; i.e., (9+10)/2  Median is not affected by extreme values Quartiles  Split Ordered Data into 4 Quarters (Q1) (Q2) (Q3)  Q1 = first quartile Q2 = second quartile= Median Q3 = third quartile    Mode         The mode is the most frequently occurring number in a distribution  if X = [1,2,4,7,7,7,8,10,12,14,17]  then 7 is the mode Easy to see in a simple frequency distribution Possible to have no modes or more than one mode bimodal and multimodal Don’t have to be exactly equal frequency major mode, minor mode Mode is not affected by extreme values When to Use What Mean is a great measure. But, there are times when its usage is inappropriate or impossible.  Nominal data: Mode  The distribution is bimodal: Mode  You have ordinal data: Median or mode  Are a few extreme scores: Median Mean, Median, Mode Dispersion  How tightly clustered or how variable the values are in a data set.  Example  Data set 1: [0,25,50,75,100]  Data set 2: [48,49,50,51,52]  Both have a mean of 50, but data set 1 clearly has greater Variability than data set 2.  The Range is one measure of dispersion  The range is the difference between the maximum and minimum values in a set  Example  Data set 1: [1,25,50,75,100]; R: 100-1 +1 = 100  Data set 2: [48,49,50,51,52]; R: 52-48 + 1= 5  The range ignores how data are distributed and only takes the extreme scores into account  RANGE = (X largest – X smallest ) + 1  Difference between third & first quartiles  Inter-quartile Range = Q 3 - Q 1   Spread in middle 50% Not affected by extreme values Variance and standard deviation    The standard deviation and the variance are measures of how the data is distributed about the mean. The larger the SD and the variance, the more spread out the data is. SD and variance are inversely proportionate to the sample size as well. This means that as your sample size increases, your standard deviation/variance decreases. SD/Variance are mostly used to determine what your sample size should be in order to accurately produce statistical results from a test. Variance: s    2 ∑(X − X ) = n −1 2 A measure of the spread of the recorded values on a variable. A measure of dispersion. The larger the variance, the further the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean. Standard Deviation of sample: s =       ∑(X − X ) n −1 2 let X = [3, 4, 5 ,6, 7] Mean= X = 5 (X - X ) = [-2, -1, 0, 1, 2] subtract mean, X , from each number in X (X - X )2 = [4, 1, 0, 1, 4] squared each value achieved from (X - X ) ∑ ∑ (X - X)2 = 10 sum of all the squared values achieved from (X - X )2 (X - X)2 / n-1 = 10 / 5-1 = 2.5 (this is called “variance”) ∑(X − X ) divided the sum by n -1, n is the total number of samples minus 1  2 2.5 = 1.58 = n −1 Square root the variance to get standard deviation. Symmetry Skew - asymmetry Kurtosis - peakedness or flatness Symmetrical vs. Skewed Frequency Distributions  Symmetrical distribution  Approximately equal numbers of observations above and below the middle  Skewed distribution  One side is more spread out that the other, like a tail  Direction of the skew  Positive or negative (right or left)  Side with the fewer scores  Side that looks like a tail 0 0 20 10 40 60 20 30 80 100 Symmetrical vs. Skewed - 2 0 2 o 0 r m . . 0 0 . 0 2 x . 0 4 u . 0 6 n i . 1 8 . x 0 0 20 40 60 80 20 40 60 80 100 120 n 0 5 1 c 0 h Positively skewed 1 i 5 s 5 . x 1 01 52 c h Negatively skewed 02 i 53 s 2  Statistical graphs of data  A picture is worth a thousand words!  Graphs for numerical data:  Histograms  Frequency polygons  Pie  Graphs for categorical data  Bar graphs  Pie