Definition of Statistical Terms

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6
At a glance
Powered by AI
The key takeaways are the different statistical terms defined such as population, sample, parameter, statistic, descriptive statistics, inferential statistics, qualitative and quantitative data.

The different types of variables are discrete variables, continuous variables, independent variables, dependent variables, qualitative variables and quantitative variables.

The different levels of measurement are nominal level, ordinal level, interval level and ratio level. Nominal level uses numbers or symbols to classify objects into categories while ordinal level ranks objects. Interval and ratio levels have equal distances between numbers.

www.edmodo.

com

GLOSSARY OF STATISTICAL TERMS


Statistics

Plural sense: a set of numerical data

Singular sense: branch of science which deals with the collection, presentation,
analysis, and interpretation of data.

Population – a collection of all the elements under consideration in any statistical


study.

Sample – a part (or subset) of the population from which information is collected.

Parameter - a numerical characteristic of the population.

Statistic – a numerical characteristic of the sample.

Example of parameter and statistic:

In order to estimate the true proportion of employees at a certain office who smoke
cigarettes, the personnel department polled a sample of 200 employees and determined
that the proportion of employees from the sample who smoke cigarettes is 12%. (200 is
the parameter and 12% is the statistic)

AREAS OF STATISTICS

Descriptive Statistics – comprise those methods concerned with collecting, describing,


and analyzing a set of data without drawing conclusions or inferences about a large
group.

Inferential Statistics – comprise those methods concerned with the analysis of sample
data leading to predictions or inferences about the population.

DATA – refers to the information collected, organized, analyzed, and interpreted by


researchers.

CLASSIFICATION OF DATA:

Qualitative – have labels or names assigned to their respective categories.

Examples:

Color – red, blue, yellow, green


Sex – male, female

Quantitative – any attribute that we measure in numbers.

Examples:

Weight, height, age, prices of commodities, etc.


Raw Data – data in original form

Array – data arranged either from highest to lowest or from lowest to highest.

Example:

Raw data: 7, 15, 4 , 20, 12, 10, 15, 12, ...


Array data: 4, 7, 10, 12, 12, 15, 20, ...

Variable – a characteristic or attribute of persons or objects which can assume different


values for different persons or objects.

- property of an object or event that can take on different values.   For example,
college major is a variable that takes on values like mathematics, computer
science, English, psychology, etc.

o Discrete Variable - a variable with a limited number of values (e.g.,


gender (male/female), college class (freshman/sophomore/junior/senior).
- values can be counted: 1, 2, 3 …
o Continuous Variable - a variable that can take on many different values,
in theory, any value between the lowest and highest points on the
measurement scale.
- weight, height, prices of commodities ..
o Independent Variable - a variable that is manipulated, measured, or
selected by the researcher as an antecedent condition to an observed
behavior.  In a hypothesized cause-and-effect relationship, the
independent variable is the cause and the dependent variable is the
outcome or effect.
o Dependent Variable - a variable that is not under the experimenter's
control -- the data.  It is the variable that is observed and measured in
response to the independent variable.

o Qualitative Variable - a variable based on categorical data.

Quantitative Variable - a variable based on quantitative data.

Measurement – refers to the process of determining the value or label, either


qualitative or quantitative, of a particular variable for a particular unit of analysis.
LEVELS OF MEASUREMENT
1. Nominal level

 Numbers or symbols are used simply to classify an object, person, or


characteristics into categories
 The categories must be distinct, non-overlapping, and exhaustive
 Weakest level of measurement
Examples: Political affiliation (Liberal Party, Nacionalista Party, etc.)
Sex, Religion, Civil Status, etc.
2. Ordinal level – contains the properties of the nominal level but the numbers
assigned to categories of any variable may be ranked or ordered in some low – to –
high manner.
Examples: Socio-Economic Status (High, Above Average, Average, below average,
low)
Size of T-shirt (double extra large, extra large, large, medium, small)
3. Interval level

 Contains the properties of the ordinal level but the distances between any two
numbers on the scale are of known sizes.
 Characterized by a common and constant unit of measurement.
 Units of measurement are arbitrary
 the number zero does not imply the absence of the characteristic under
consideration (thus, the zero points is arbitrary)
Examples: Temperature in 0C and 0F, Intelligence Quotient (75, 100, 125, ...)
4. Ratio Level

 contains the properties of the interval level but it has a true zero point, that is, the
number zero indicates the absence of the characteristic under consideration
 strongest level of measurement
Examples: height, weight, tuition fees, price of commodities, etc.

Graphs - visual display of data used to present frequency distributions so that the
shape of the distribution can easily be seen.

o Bar graph - a form of graph that uses bars separated by an arbitrary


amount of space to represent how often elements within a category
occur.  The higher the bar, the higher the frequency of occurrence.  The
underlying measurement scale is discrete (nominal or ordinal-scale data),
not continuous.

o Histogram - a form of a bar graph used with interval or ratio-scaled data. 


Unlike the bar graph, bars in a histogram touch with the width of the bars
defined by the upper and lower limits of the interval.  The measurement
scale is continuous, so the lower limit of any one interval is also the upper
limit of the previous interval.
o Boxplot - a graphical representation of dispersions and extreme scores. 
Represented in this graphic are minimum, maximum, and quartile scores
in the form of a box with "whiskers."  The box includes the range of scores
falling into the middle 50% of the distribution (Inter Quartile Range =
75th percentile - 25th percentile)and the whiskers are lines extended to the
minimum and maximum scores in the distribution or to mathematically
defined (+/-1.5*IQR) upper and lower fences.
o Scatterplot - a form of graph that presents information from a bivariate
distribution.  In a scatterplot, each subject in an experimental study is
represented by a single point in two-dimensional space.  The underlying
scale of measurement for both variables is continuous (measurement
data).  This is one of the most useful techniques for gaining insight into the
relationship between two variables.
 Measures of Center - Plotting data in a frequency distribution shows the general
shape of the distribution and gives a general sense of how the numbers are
bunched.  Several statistics can be used to represent the "center" of the
distribution.  These statistics are commonly referred to as measures of central
tendency.
o Mode - The mode of a distribution is simply defined as the most frequent
or common score in the distribution.  The mode is the point or value
of X that corresponds to the highest point on the distribution.  If the highest
frequency is shared by more than one value, the distribution is said to
be multimodal.  It is not uncommon to see distributions that are bimodal
reflecting peaks in scoring at two different points in the distribution.
o Median - The median is the score that divides the distribution into halves;
half of the scores are above the median and half are below it when the
data are arranged in numerical order.  The median is also referred to as
the score at the 50th percentile in the distribution.  The median
location of N numbers can be found by the formula (N + 1) / 2.  When N is
an odd number, the formula yields a integer that represents the value in a
numerically ordered distribution corresponding to the median location. 
(For example, in the distribution of numbers (3 1 5 4 9 9 8) the median
location is (7 + 1) / 2 = 4.  When applied to the ordered distribution (1 3 4 5
8 9 9), the value 5 is the median, three scores are above 5 and three are
below 5.  If there were only 6 values (1 3 4 5 8 9), the median location is
(6 + 1) / 2 = 3.5.  In this case the median is half-way between the 3 rdand
4th scores (4 and 5) or 4.5.
o Mean - The mean is the most common measure of central tendency and
the one that can be mathematically manipulated.  It is defined as the
average of a distribution is equal to the X / N.  Simply, the mean is
computed by summing all the scores in the distribution (X) and dividing
that sum by the total number of scores (N).  The mean is the balance point
in a distribution such that if you subtract each value in the distribution from
the mean and sum all of these deviation scores, the result will be zero.

 Measures of Spread - Although the average value in a distribution is informative


about how scores are centered in the distribution, the mean, median, and mode
lack context for interpreting those statistics.  Measures of variability provide
information about the degree to which individual scores are clustered about or
deviate from the average value in a distribution.
o Range - The simplest measure of variability to compute and understand is
the range.  The range is the difference between the highest and lowest
score in a distribution.  Although it is easy to compute, it is not often used
as the sole measure of variability due to its instability.  Because it is based
solely on the most extreme scores in the distribution and does not fully
reflect the pattern of variation within a distribution, the range is a very
limited measure of variability.
o Interquartile Range (IQR) - Provides a measure of the spread of the
middle 50% of the scores.  The IQR is defined as the 75 th percentile - the
25th percentile.  The interquartile range plays an important role in the
graphical method known as the boxplot.  The advantage of using the IQR
is that it is easy to compute and extreme scores in the distribution have
much less impact but its strength is also a weakness in that it suffers as a
measure of variability because it discards too much data.  Researchers
want to study variability while eliminating scores that are likely to be
accidents.  The boxplot allows for this for this distinction and is an
important tool for exploring data.
o Variance - The variance is a measure based on the deviations of
individual scores from the mean.  As noted in the definition of the mean,
however, simply summing the deviations will result in a value of 0.  To get
around this problem the variance is based on squared deviations of scores
about the mean.  When the deviations are squared, the rank order and
relative distance of scores in the distribution is preserved while negative
values are eliminated.  Then to control for the number of subjects in the
distribution, the sum of the squared deviations, (X - X), is divided
by N (population) or by N - 1 (sample).  The result is the average of the
sum of the squared deviations and it is called the variance.
o Standard deviation - The standard deviation (s or ) is defined as
the positive square root of the variance.  The variance is a measure in
squared units and has little meaning with respect to the data.  Thus, the
standard deviation is a measure of variability expressed in the same units
as the data.  The standard deviation is very much like a mean or an
"average" of these deviations.  In a normal (symmetric and mound-
shaped) distribution, about two-thirds of the scores fall between +1 and -1
standard deviations from the mean and the standard deviation is
approximately 1/4 of the range in small samples (N < 30) and 1/5 to 1/6 of
the range in large samples (N > 100).

 Measures of Shape - For distributions summarizing data from continuous


measurement scales, statistics can be used to describe how the distribution rises
and drops.
o Symmetric - Distributions that have the same shape on both sides of the
center are called symmetric.  A symmetric distribution with only one peak
is referred to as a normal distribution.
o Skewness - Refers to the degree of asymmetry in a distribution. 
Asymmetry often reflects extreme scores in a distribution.
 Positively skewed - A distribution is positively skewed when it has
a tail extending out to the right (larger numbers)  When a
distribution is positively skewed, the mean is greater than the
median reflecting the fact that the mean is sensitive to each score
in the distribution and is subject to large shifts when the sample is
small and contains extreme scores.
 Negatively skewed - A negatively skewed distribution has an
extended tail pointing to the left (smaller numbers) and reflects
bunching of numbers in the upper part of the distribution with fewer
scores at the lower end of the measurement scale.
o Kurtosis - Like skewness, kurtosis has a specific mathematical definition,
but generally it refers to how scores are concentrated in the center of the
distribution, the upper and lower tails (ends), and the shoulders (between
the center and tails) of a distribution.
 Mesokurtic - A normal distribution is called mesokurtic.  The tails
of a mesokurtic distribution are neither too thin or too thick, and
there are neither too many or too few scores in the center of the
distribution.
 Platykurtic - Starting with a mesokurtic distribution and moving
scores from both the center and tails into the shoulders, the
distribution flattens out and is referred to as platykurtic.
 Leptokurtic - If you move scores from shoulders of a mesokurtic
distribution into the center and tails of a distribution, the result is a
peaked distribution with thick tails.  This shape is referred to as
leptokurtic.

You might also like