Introduction To Statistics
Introduction To Statistics
Introduction To Statistics
What is Information?
Arithmetic Mean
In mathematics and statistics, the arithmetic mean, or simply the
mean or average when the context is clear, is the sum of a
collection of numbers divided by the count of numbers in the
collection, The arithmetic mean is the simplest and most widely
used measure of a mean, or average. It simply involves taking the
sum of a group of numbers, then dividing that sum by the count of
the numbers used in the series.
Median
The median is the middle number in a sorted, ascending or
descending, list of numbers and can be more descriptive of
that data set than the average. The median is sometimes used as
opposed to the mean when there are outliers in the sequence that
might skew the average of the values.
In statistics and probability theory, a median is a value separating
the higher half from the lower half of a data sample, a population
or a probability distribution. For a data set, it may be thought of as
"the middle" value
Mode
The mode is the number that appears most frequently in a data
set. A set of numbers may have one mode, more than one mode,
or no mode at all. The mode of a set of data values is the value
that appears most often. If X is a discrete random variable, the
mode is the value x at which the probability mass function takes
its maximum value. In other words, it is the value that is most
likely to be sampled.
Combined mean
A combined mean is simply a weighted mean, where
the weights are the size of each group. For more than two
groups: Add the means of each group—each weighted by the
number of individuals or data points, Divide the sum from Step 1
by the sum total of all individuals (or data points
Measures of dispersion
The measures of central tendency are not adequate to describe
data. Two data sets can have the same mean but they can be
entirely different. Thus to describe data, one needs to know the
extent of variability. This is given by the measures of dispersion.
Range, interquartile range, Mean deviation and standard
deviation are the commonly used measures of dispersion.
Range - The range is the difference between the largest and the
smallest observation in the data
Inter Quartile Range - Interquartile range is defined as the
difference between the 25th and 75th percentile (also called the
first and third quartile). Hence the interquartile range describes
the middle 50% of observations
Standard Deviation
In statistics, the standard deviation is a measure of the amount of
variation or dispersion of a set of values. A low standard deviation
indicates that the values tend to be close to the mean of the set,
while a high standard deviation indicates that the values are
spread out over a wider range.
The main and most important purpose of standard deviation is
to understand how spread out a data set is. ... A high standard
deviation implies that, on average, data points in the first cloud
are all pretty far from the average (it looks spread out). A
low standard deviation means most points are very close to the
average
Mean Deviation
Referred to as average deviation, it is defined as the sum of the
deviations(ignoring signs) from an average divided by the number
of items in a distribution The average can be mean, median or
mode. Theoretically median is d best average of choice because
sum of deviations from median is minimum, provided signs are
ignored. However, practically speaking, arithmetic mean is the
most commonly used average for calculating mean deviation
What Is Skewness?
Correlation
Correlation is a statistical technique that can show
whether and how strongly pairs of
variables are related. Correlation is a statistic that measures
the degree to which two variables move in relation to each
other.
In finance, the correlation can measure the movement of a
stock with that of a benchmark index, such as the S&P 500.
Correlation measures association, but doesn't show if x
causes y or vice versa, or if the association is caused by a
third–perhaps unseen–factor.
Positive correlation is a relationship between two variables in
which both variables move in the same direction. This is
when one variable increases while the other increases and
visa versa. For example, positive correlation may be that the
more you exercise, the more calories you will burn.
Whilst negative correlation is a relationship where one
variable increases as the other decreases, and vice versa.
1 indicates a perfect positive correlation.
Regression analysis
Time Series
Index numbers.
(ii) Index numbers are meant to study the changes in the effects
of such factors which cannot be measured directly. For example,
the general price level is an imaginary concept and is not capable
of direct measurement. But, through the technique of index
numbers, it is possible to have an idea of relative changes in the
general level of prices by measuring relative changes in the price
level of different commodities.
Probability
Probability is the branch of mathematics concerning numerical
descriptions of how likely an event is to occur or how likely it is
that a proposition is true. The probability of an event is a number
between 0 and 1, where, roughly speaking, 0 indicates
impossibility of the event and 1 indicates certainty.
Probability Tree
The tree diagram helps to organize and visualize the different
possible outcomes. Branches and ends of the tree are two main
positions. Probability of each branch is written on the branch,
whereas the ends are containing the final outcome
Mutually exclusive events
Examples:
Turning left and turning right are Mutually Exclusive (you
can't do both at the same time)
Tossing a coin: Heads and Tails are Mutually Exclusive
Cards: Kings and Aces are Mutually Exclusive
Independent events
Dependent events
Data classification
Pie Diagram
A pie chart is a circular statistical graphic, which is divided into
slices to illustrate numerical proportion. In a pie chart, the arc
length of each slice, is proportional to the quantity it represents
Uses of Pie diagram:
1. Pie diagram is useful when one wants to picture proportions of
the total in a striking way.
Merits of a Graph
Sampling
Limitations of sampling
The first is called a Type I error. This occurs when the researcher
assumes that a relationship exists when in fact the evidence is
that it does not. In a Type I error, the researcher should accept
the null hypothesis and reject the research hypothesis, but the
opposite occurs. The probability of committing a Type I error is
called alpha.