SS 104 - Lecture Notes Part 1 EDITED

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

SS 104 – TEST & MEASUREMENT IN HUMAN MOVEMENT (PART 1)

Sports and physical education professionals can effectively use tests and measurements to assess students and
athletes and help them achieve their goals and maximize their potential. Tests and measurements form the objective
core of the evaluation process.

Reasons for Test, Measurement and Evaluation in Sports and PE


1. Motivation – encourage people to become better
2. Diagnosis – assess strengths and weaknesses. Ex: determining areas for improvement in fitness
3. Classification – classify groups according to an attribute. Ex. According to age, skill level, or fitness level
4. Evaluation of Instruction & Programs – To determine if exercise, health, or sports programs were successful
5. Prediction – To predict future success based on skill level, athleticism, etc. Ex. NFL and NBA combine
6. Research – To study and answer questions; add new knowledge or support existing knowledge

While some physical abilities are innate and not amenable to change, other physical abilities can be
improved through physical training. Tests can be used by teachers and coaches to determine which deficits
can be addressed by participating in prescribed group or individual programs.

Testing Terminologies
Variable – a trait or characteristic that can assume any given value. Ex: name, age, height

Test – An instrument, tool, or process used to make a particular measurement

Measurement – The collection of numerical data

Evaluation – The interpretation or judgment about a particular measurement

Field Test – a test used to assess ability that is performed away from the laboratory and does not require extensive
training or expensive equipment

Statistics – The collection, organization, analysis and presentation of data

Pretest – A test administered at the beginning to determine initial characteristic or ability level

Posttest – A test administered after a period of time, usually after an intervention like a training program, to
determine changes from the pretest

Test Battery – A series of tests that are designed to take specific measurements of performance or capacity

More Statistical Terms


Data – numerical result of measurement

Population – Refers to all members in a defined group

Sample – A subgroup of the population

Parameter – A value or characteristic of a population

Statistic – A value or characteristic of a sample

Descriptive Statistics – statistics that describes or summarizes a given data set

Inferential Statistics – statistics that aims to draw conclusions beyond the immediate data
Levels of Measurement of Variables
Nominal Scale – Describes the identity of a variable but has no numerical value. Used simply for labeling.
Ex: name, nationality, marital status, gender

Ordinal Scale – Describes the order of the values of the variables relation to each other.
Ex: 1st-2nd-3rd, Gold-Silver-Bronze, Excellent-Good-Average-Poor

Interval Scale – Compares the values but has no “true zero” point. May have negative values.
Ex: Temperature in F or C, Score in Golf, Likert scales in surveys

Ratio Scale – Variables that have specific values and have a “true zero”. Cannot have negative values.
Ex: Height, Weight, Population, Correct answers in an exam

DESCRIPTIVE AND INFERENTIAL STATISTICS

When analyzing data, such as the exam scores of 100 students, it is possible to use both descriptive and inferential
statistics in your analysis. Typically, in most research conducted on groups of people, you will use both descriptive
and inferential statistics to analyze your results and draw conclusions.

Descriptive Statistics
Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a
meaningful way such that patterns might emerge from the data. Descriptive statistics do not, however, allow us to
make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have
made. Descriptive statistics are very important because if we simply presented our raw data it would be hard to
visualize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore enables us to
present the data in a more meaningful way, which allows simpler interpretation of the data. Typically, there are two
general types of statistic that are used to describe data – Measures of Central Tendency and Measures of Variability:

 Measures of central tendency - describe the central position of a frequency distribution for a group of data.
We can describe this central position using the mean, median and mode.

Mean
The mean (or average) is the most popular and well known measure of central tendency. It can be used with
both discrete and continuous data, although its use is most often with continuous. The mean is equal to the
sum of all the values in the data set divided by the number of values in the data set. There can only be one
mean. The formula for the mean is:

Median
The median is the middle score for a set of data that has been arranged in order of magnitude. In order to
calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56. It is the middle mark because there are 5 scores
before it and 5 scores after it. This works fine when you have an odd number of scores, but what happens
when you have an even number of scores? What if you had only 10 scores? Well, you simply take the
average of the middle two scores.

Mode
The mode is simply the most frequent score in our data set. There can be more than on or no mode at all. In
the above set of scores, the modes are 55 and 56.
When to use mean, median or mode to describe data
There can often be a "best" measure of central tendency with regards to the data you are analyzing, but
there is no one "best" measure of central tendency. This is because whether you use the median, mean or
mode will depend on the type of data you have

The mean is usually the best measure of central tendency to use when your data distribution is continuous
and symmetrical, such as when your data is normally distributed.

The median is usually preferred to other measures of central tendency when your data set is skewed. The
median is also preferred when the data has outliers because the value of the mean can be distorted by the
outliers.

The mode is the least used of the measures of central tendency and can only be used when dealing with
nominal data. For this reason, the mode will be the best measure of central tendency (as it is the only one
appropriate to use) when dealing with nominal data.

 Measures of Variability or Spread - these are ways of summarizing a group of data by describing how spread
out the scores are. For example, the mean score of 10 students may be 81 out of 100 percent. However, not
all students will have scored 81. Rather, their scores will be spread out. Some will be lower and others will be
higher. To describe this spread, a number of statistics are available to us most commonly the Range and
Standard Deviation:

Range – Difference between the highest score and the lowest score.
R = XHighest – XLowest

Standard Deviation – Describes the scatter of scores around the mean. The most useful and sophisticated
measure of variability.

Example:

For the example above, the data can be described as:

“The mean exam score of the 10 students in 81.3 + 25.4 points with a range of 88 points.”
Inferential Statistics
We have seen that descriptive statistics provide information about our immediate group of data. For example, we
could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide
valuable information about this group of 100 students.

Often, however, you are interested in investigating the whole population but only a limited number of data exists.
For example, you might be interested in the GWA of all UP Diliman students for a particular semester. It is not
feasible to measure all GWA’s of ALL students in UP Diliman so you have to measure a smaller sample of students
(eg: 1,000 students) which are used to represent the larger population. Inferential statistics are techniques that
allow us to use these samples to make generalizations about the populations from which the samples were drawn. It
is, therefore, important that the sample accurately represents the population. Here, appropriate sample size and
sampling methods allows us to make more accurate conclusions beyond the available data.

EVALUATION OF TEST QUALITY

Test results are useful only if the test actually measures what it is supposed to measure (validity) and if the
measurement is repeatable (reliability). These two characteristics are the key factors in evaluating test quality.

Validity
Validity refers to the degree to which a test or test item measures what it is supposed to measure, and is the most
important characteristic of testing.

For tests of physical properties such as height and weight, validity is easy to establish. The validity of tests of some
abilities and characteristics is more difficult to establish. There are several types of validity, including construct
validity, face validity, content validity, and criterion-referenced validity.

 Construct validity is the ability of a test to represent the underlying construct. The construct represents the
theory developed to organize and explain some aspects of existing knowledge and observations.

 Face validity is the appearance to the test subject and other casual observers that the test measures what it
is purported to measure.

 Content validity is the assessment by experts that the testing covers all relevant subtopics or component
abilities in appropriate proportions. Sometimes referred to as expert validity.

While the terms face validity and content validity are sometimes used interchangeably, content validity
relates to actual validity as approved by experts while face validity relates to the appearance of validity to
non-experts.

 Criterion-Referenced Validity is the extent to which test scores are associated with some other measure of
the same ability. There are four types of criterion-referenced validity: concurrent, convergent, predictive,
and discriminant.
o Concurrent validity is the extent to which test scores are associated with those of other
accepted tests that measure the same ability.

o Convergent validity is evidenced by high positive correlation between results of the test
being assessed and those of the “gold standard”. A test may be preferable over the gold
standard if it exhibits convergent validity but is less demanding in terms of time, equipment,
expense, or expertise.

o Predictive Validity is the extent to which the test score corresponds with future behavior or
performance. This can be measured through comparison of a test score with some measure
of success in sport. For example, one could calculate the correlation between the overall
score on a battery of tests used to assess potential for basketball and a measurement of
actual basketball performance as indicated by such quantities as points scored, rebounds,
assists, blocked shots, forced turnovers and steals (much like the NBA combine).
o Discriminant Validity is the ability of a test to distinguish between two different constructs.
Discriminant validity of tests in a battery avoids unnecessary expenditures of time, energy,
and resources in administering tests that may be measuring the same component.

Reliability
Reliability is a measure of the degree of consistency or repeatability of a test. If an individual whose ability does not
change is measured twice, very similar scores must be obtained on both times.

On an unreliable test, an individual could obtain a high score on one occasion and a low score on another. A test
must be reliable to be valid because highly variable results have little meaning.

There are several ways to determine the reliability of a test; the most obvious one is to administer the same test
twice to the same group of individuals. Statistical correlation of the scores from the two administrations provides a
measure of test-retest reliability. A significant difference between the two sets indicates a variability and is due to
any of the following:

 Intrasubject Variability is a lack of consistent performance by the person tested.

 Lack of Interrater Reliability is a lack of consistency in scoring between different testers conducting the
same test on the same individual on separate instances.

 Intrarater Variability is the lack of consistent scoring by a given tester.

In intrarater variability, for example, a coach eager to see improvement may unintentionally be more lenient
on a posttest than on a pretest. Other causes of intrarater variability include inadequate training,
inattentiveness, failure to follow standardized procedures.

 Failure of the test itself to provide consistent results. Sometimes the test itself is the problem due to
various reasons including being in the trial stages, lack of calibration of equipment or dysfunctional
equipment.

TEST ADMINISTRATION

To achieve accurate test results, tests must be administered safely, correctly, and in an organized manner. Staff
should ensure the health and safety of participants, testers should be carefully selected and trained, tests should be
well organized and administered efficiently, and participants should be properly prepared and instructed.

Health and Safety Considerations


Tester must be aware of testing conditions that can threaten the health of athletes and be observant of signs and
symptoms of health problems that warrant exclusion from testing.

Strenuous exercise, such as maximal runs or 1-repetition maximum (1RM) tests, can uncover or worsen existing
heart problems, such as impaired blood flow to the heart muscle and irregular heartbeats.

When aerobic endurance exercise tests are being administered in a hot environment, caution must be observed to
protect both the health and safety of the participant and the validity of the test

Selection and Training of Testers


Test administrators should be well trained and should have a thorough understanding of all testing procedures and
protocols. The testing supervisor should make sure that all novice personnel perform and score all tests correctly. It
is essential that all testers have sufficient practice so that the scores they obtain correlate closely with those
produced by experienced and reliable personnel. The testers should be trained to explain and administer the tests as
consistently as possible.
Recording Forms (Score sheets)
Scoring forms should be developed before the testing session and should have space for all test results and
comments. This allows test time to be used more efficiently and reduces the incidence of recording errors. At least 2
sets of recording forms should be provided – one copy for the test administrators and one copy that the participant
can keep.

Test Format
A testing session wherein the participants are aware of testing purpose and procedures usually enhances the
reliability of test measures. Test planning must address such issues as whether athletes will be tested all at once or in
groups and whether the same person will administer a given test to all participants. Having the same tester assigned
to a specific test eliminates the possibility of interrater variability. As a rule, each tester should administer only one
test at a time, especially when the test requires complex judgments.

Sequence of Tests
Testers must carefully design the order of tests and duration of rest periods between tests to ensure test reliability.
Tests requiring high-skill, non-fatiguing movements should be administered before tests that are likely to produce
fatigue and confound the results of subsequent tests. A logical sequence, although there are some variations, is to
administer tests in this order:

 Anthropometric tests (e.g., height, weight, skinfold and girth measurements)

 Non-fatiguing tests (e.g., ruler drop, flexibility, vertical jump, broad jump)

 Agility tests (e.g., T-test, pro agility test)

 Maximum power and strength tests (e.g., 1RM power clean, 1RM bench press)

 Sprint tests (e.g., 40-yard sprint, 100-m sprint)

 Local muscular endurance tests (e.g., partial curl-up test, 1-minute pushup test)

 Anaerobic capacity tests (e.g., 400 m run, 300-yard shuttle)

 Aerobic capacity tests (e.g., 1.5-mile run, 12-minute run, 3-minute step test)

An effort should be made to administer aerobic tests on a different day than the other tests if possible. If performed
on the same day, aerobic tests should be performed last, after an adequate rest period.

Preparing Participants for Testing


The date, time, and purpose of a test battery should be announced in advance to allow athletes to prepare physically
and mentally. Instructions should cover the purpose of the test, how it is to be performed, the number of practice
attempts allowed, the number of trials, test scoring, criteria for disallowing attempts, and recommendations for
maximizing performance. The participants should be given opportunities to ask questions before and after the
demonstration.

After the anthropometric tests, adequate warmup should be given before the administration of the other tests as
this improves reliability. An appropriately organized warm-up consists of a general warm-up followed by a specific
warm-up.

Both types of warm-ups include body movements similar to those involved in the test. An organized, instructor-led
general warm-up ensures uniformity. It is acceptable to allow two to three activity-specific warm-up trials,
depending on the test protocol. Depending on the test protocol, the score can be the best or the average of the
trials.
ANTHROPOMETRY AND THE COMPONENTS OF PHYSICAL FITNESS

In test and measurement for sports and PE, a test will typically measure anthropometric scores and any of the
following components of fitness categorized according to health-related or skill-related.

Anthropometry
Anthropometry is the science of measurement applied to the human body and generally includes measurements of
height, weight, selected body girths as well as skinfolds and bone breadths to determine somatotypes.

Health-Related components - The most important factors related to one’s health.

o Cardiovascular Endurance
The ability of the circulatory system (heart and blood vessels) to supply oxygen to working muscles during
prolonged exercise. Also known as cardiovascular endurance, cardiorespiratory endurance, aerobic capacity,
aerobic power.
o Body Composition
The relative percentage of fat and lean tissues (muscle, bone) to overall body weight. Examples are body fat
percentage, bone mass, muscle mass.

o Flexibility
The maximum range of motion possible at various joints.

o Muscular strength
The maximum amount of force that can be produced by a single contraction of a muscle. Involve relatively
low movement speeds (up to about 4 seconds) against a maximum resistance. Also known as low-speed
muscular strength, maximum muscular strength.

o Muscular endurance
The ability of a muscle group to continue muscle movement over a length of time against a submaximal
resistance. Also known as Local Muscular Endurance. Example are pushup tests, situp tests

Skill-Related components - Aspects of fitness which form the basis for successful sport or activity participation.

o Speed
The ability to quickly cover a fixed distance in a straight line. Tests of speed are not usually conducted over
distances greater than 200 m because longer distances reflect anaerobic or aerobic capacity more than
absolute ability to move the body at maximal speed.

o Agility
The ability of the body to stop, start and change direction. Recently, the definition of agility has added the
need for a response to a stimulus rather than simply a change of direction.

o Balance
The ability to maintain a desired posture while still or in motion

o Coordination
Integration with hand and/or foot movements with the input of the senses to perform a desired task

o Reaction Time
The time it takes to quickly respond to a stimulus such as pressing a button that lights up or catching a ruler.

o Power
The ability to do maximal muscle exertion at a high velocity. Tests involve maximal movement speeds lasting
1 second or less. Examples are vertical jump, standing long jump, power clean. Also known as high-speed
muscular strength, explosive strength, and anaerobic power.

o Anaerobic Capacity
The maximal rate of energy production for moderate-duration activities. It is typically quantified as the
maximal power output during muscular activity between 30 and 90 seconds using a variety of tests for the
upper and lower body. It is characterized by the combined phosphagen and lactic acid energy systems.

Stamina, although commonly (and mistakenly) used to refer to cardiovascular endurance, is endurance specific to a
task, sport or activity which may include one or more combinations of the fitness components. As such, there is no
universal definition for stamina.

You might also like