Lesson 1 (Obtaining Data)
Lesson 1 (Obtaining Data)
Lesson 1 (Obtaining Data)
COLLEGE OF ENGINEERING
Data engineering is the aspect of data science that focuses on practical applications of data collection
and analysis. For all the work that data scientists do to answer questions using large sets of information,
there have to be mechanisms for collecting and validating that information. In order for that work to
ultimately have any value, there also have to be mechanisms for applying it to real-world operations in
some way. Those are both engineering tasks: the application of science to practical, functioning
systems.
1. OBTAINING DATA
COLLEGE OF ENGINEERING
Variables
are any characteristics that can take on different values, such as height, age, species, or exam score,
etc.
Independent Variable
Is the cause. Its value is independent of other variables in your study.
Dependent Variable
Is the effect. Its value depends on changes in the independent variable.
Experimental Units
The experimental unit is "the smallest division of experimental material such that any two units
may receive different treatments in the actual experiment". For some experiments, the experimental
unit may be larger than the unit of observation or unit of randomization and often implies the appropriate
unit of analysis
The experimental unit for a study refers to an object or entity expected to produce a change in
some outcome about which a researcher wishes to generalize. The experimental unit must account for
important assumptions, such as independence of errors required by analysis methods. The
experimental unit plays a large role in the design of a research study
Example, if individual students were randomly assigned to receive a small-group intervention, the small
group would become the experimental unit because it is the smallest division in which students can
receive a different intervention
Random Assignment
Random assignment is a procedure used in experiments to create multiple study groups that
include participants with similar characteristics so that the groups are equivalent at the beginning of the
study.
EN MATH 4 – Engineering Data Analysis Page 2 of 7
Republic of the Philippines
CAMARINES NORTE STATE COLLEGE
F. Pimentel Avenue, Brgy. 2, Daet, Camarines Norte – 4600, Philippines
COLLEGE OF ENGINEERING
MEAN
The mean, often called the average, of a numerical set of data, is simply the sum of the data
values divided by the number of values. This is also referred to as the arithmetic mean. The mean is
the balance point of a distribution.
COLLEGE OF ENGINEERING
COLLEGE OF ENGINEERING
MEDIAN
The median is the number that falls in the middle position once the data has been organized.
Organized data means the numbers are arranged from smallest to largest or from largest to smallest.
The median for an odd number of data values is the value that divides the data into two halves. If n
represents the number of data values and n is an odd number, then the median will be found in the
𝑛+1
position.
2
COLLEGE OF ENGINEERING
MODE
The mode of a set of data is simply the value that appears most frequently in the set.
COLLEGE OF ENGINEERING