AST 110 - Data Analytics (Reviewer)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

AST 110 – Data Analytics V – Volume – the amount of data that exists.

V – Velocity – how quickly data is generated and moves.


2 Types of Data V – Variety – diversity of data types.
1. Qualitative – Verbal/narrative V – Veracity – quality and accuracy of data.
2. Quantitative – Numerical numbers V – Value – value that big data can provide.

Sources of Data Charts and Graphs – help readers understand it at a glance.


1. Documentary – published/unpublished (basta
binabasa) Graph – mathematical diagram
2. Field – living persons have sufficient knowledge
Chart – picture, diagram, table
Methods of Collecting Data
1. Direct – face-to-face encounter of interviewee and Types of Graphs
interviewer 1. Bar Graph – consists of the x and y-axis.
2. Indirect – questionnaire method, relevant 2. Histogram – displays the distribution of numerical
3. Registration – utilizing existing data data.
4. Observation – collects data on pertaining attitude, 3. Pictograph – visually appealing graph.
behaviour, etc. 4. Line Graph – consists of the x and y-axis,
5. Experiment – cause & effect relationship connecting the dots.
5. Area Graph – visualize the data better, similar to
Questionnaire – list of questions the line graph.
6. Ven Diagram – represented by circles.
Data Analytics – the process of analyzing raw data to find 7. Heat Map – data with colours.
trends and answers. 8. Cartesian Graphs – It has 4 quadrants.
Types of Charts
Descriptive Analytics – the science of analyzing raw data to 1. Pie Chart – displaying parts of a whole.
make conclusions about the information. 2. Flow Chart – shows how to complete a task by
conveying each step.
(Describing historical trends) > What happened? 3. Organizational Chart – depicts the relationship
and ranks of the position in a company.
– the pursuit of extracting meaning from raw data using a
specialized computer system.
Basic Statistical Concepts
Advanced Analytics – make predictions and discover
trends. (What if?) Variable – characteristics or attributes.

Types of Data Analytics Classification of Variables according to Functional


1. Descriptive – what happened? Relationship
2. Diagnostic – why things happened? 1. Independent – experimental, manipulated,
3. Predictive – what will happen in the future? treatment, grouping
4. Prescriptive – what should be done? 2. Dependent – depends on other factors

Role – increase efficiency and improve performance by Classification of Variables according to Continuity of
discovering patterns in data. Values
1. Continuous Variable – decimal
Steps in Data Analytics Process 2. Discrete Variable – not decimal, as a whole
1. Data Mining – extracting data from unstructured
data sources. Classification of Variables according to Levels of
2. Data Warehousing – involves designing and Measurements
implementing databases that allow easy access to 1. Nominal – measurement, distinguishes, responses
data mining results. into attributes
3. Statistical Analysis – creates insights from data. 2. Ordinal – measurement, arranged from low to
4. Data Presentation – allows insight to be shared high
with stakeholders.
Internal – no true value of 0.
Importance of Data Analytics – analyzing data can optimize
efficiency in many different industries. Ratio – has a true value of 0, sameness/differences

The V’s of Big Data Population – complete set of individuals, groups, or


aggregate of people.
Big Data – unstructured, semi-structured, structured;
machine learning; to increase customer engagement and Infinite – cannot be determined or counted.
conversion roles.

1
Mean: x̄ = Σfx/n
Finite – Counted immediately.
n
Sample – a cross-section of elements drawn from a −¿ cf
Median: x̃ = LL + 2 i
population. ( )
f
^
Randomization – process of getting a sample.
d1
Mode: x = LL + ( )i
Sampling – the process of getting the number of individuals d 1+ d 2
from a population.

Basic Sampling Techniques Measures of Dispersion

Probability Sampling Standard Deviation – Square root of variance.


1. Simple Random Sampling – lottery
2. Stratified Sampling – proportion to the total s = √ Σ(x-x̄)²
population N-1
3. Clustered Sampling – not individuals, are Variance – Square of standard deviation.
randomly selected; (Cluster and Respondent)
4. Systematic Sampling – taking every kth element of Ungrouped Data Formula
the population
Population: σ = √ Σ(x-x̄)²
Non-Probability Sampling N
1. Accidental or Incidental Sampling – conducts at a
convenient time, preferred place, or venue Sample: s = √ Σ(x-x̄)²
2. Quota Sampling – a quota system N-1
3. Purposive Sampling – depending on the purpose
of the study Grouped Data Formula

Presentation of Data Population: σ = √ Σf(x-x̄)²


N
Organization of Data
1. Textual Presentation – text and figures Sample: s = √ Σf(x-x̄)²
2. Tabular Presentation – tables, rows, and columns N-1

Tabulation – the process of condensing data and arranging 3 and above – Extremely Above Normal
them in the table. Below 3 – Extremely Below Normal
3 – Normal
Classification – the process of putting together similar
items.
Hypothesis Testing – not to question the computed value;
Frequency Distribution – Mean (Average), Median to make a judgement; to generalized a population from
(Middle), and Mode (Repeated) relatively small samples.

Range – the difference between the highest and lowest Hypothesis – explanation for a certain event; measurable
scores. R = HS-LS and testable

Class Interval – a category defined by a lower limit and an Null Hypothesis (Ho) – empty; statement of equality
upper limit. indicating the existence of relationship.

Class Boundaries – true limit between the upper limit and Alternative Hypothesis (HA) – statement of the
lower limit. expectation; derived from the theory under study.

Class Mark (x) – the middle value 1. Non-Directional Alternative Hypothesis – the
existence of difference
Class Size (i) – the difference between the upper and the 2. Directional Alternative Hypothesis – specifies that
lower-class boundary. CS = R/k one group performs better than the other

Relative Frequency – percentage distribution in every class Critical Value – depends on the nature of the null
interval. hypothesis; depends on the level of significance

Class – Counting tally Types of Error


1. Chance of rejecting null hypothesis when it is true

2
2. Chance of failing to reject null hypothesis when it
is false

90% = 10 ; 95% = 5 ; 99% = 1


Steps in Performing Hypothesis Testing
1. State the null and alternative hypothesis.
2. Determine the level of significance and direction of
test.
3. Determine the appropriate statistical test based on
the level of measurement of the gathered data.
4. Write the decision rule expressing on how we
accept or reject the null hypothesis.
5. Compute for the test statistic and compare with
the critical value.
6. State the decision based on the resulting
computed value.
7. Write the conclusion for the given problem.

t= (Σd) .
√ n(Σd²)-(Σd)²
n(n-1)

You might also like