AGR3701 PORTFOLIO (UPDATED 2 Feb) Latest

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 36

2.

PAIRED TWO SAMPLE T-TEST

A food scientist wants to study whether differences in the firmness exists between
yogurt made from skim milk with and without the preculture of PC bacteria. The
milk is sampled from 7 dairy farms. One-half of the milk sampled from each farm is
inoculated with PC, and the other half is not. Below are the firmness data. Run a
statistical analysis for paired data to determine if adding PC bacteria resulted in
differences in the firmness of yogurt. Show a manual calculation and analysis using R
software.

Dairy farm A B C D E F G
With PC 68 75 62 86 52 46 72
Without PC 61 69 64 76 52 38 68

Definition of paired two sample T-Test

 The paired sample t-test, sometimes called the dependent sample t-test, is a


statistical procedure used to determine whether the mean difference between two
sets of observations is zero

Usage (the objective of experiment that requires the analysis)


 Common applications of the paired sample t-test include case-control studies or
repeated-measures designs. 
 Using two treatments where the observation on one treatment is naturally paired with
an observation on the other treatment.

Statistical model with explanations


Paired two sample t-test
Because there are two treatment to see the differences in the firmness exists between
yogurt made from skim milk

Manual Calculation
Dairy farm A B C D E F G
With PC 68 75 62 86 52 46 72
Without PC 61 69 64 76 52 38 68
Differences 7 6 -2 10 0 8 4
Response variable :
Hypothesis : HO : μd = 0
: Ho : μd > 0

Mean of differences = 4.71


n=7
df = 7-1 = 6

∑ ( x i−x́ )2 =¿
s=
√ n−1 √ (7−4.7143)²+(6−4.7143)²+(−2−4.7143) ². ..(4−4.7143)²
7−1
¿

= 4.3480

T statistics:
d́−μd 4.7143−0
T= = =2.
s 4.3480 8686
√n √7
T table :

T (0.025 ,df =6 )=2.4 47

α = 0.025 α = 0.025

Tcalc > Ttable


Conclusion :

Tcalc = 2.47
Ttable = 2.447

The |tcalc| > |ttable| or (The p-value is < α=0.05)


Thus we reject Ho.
There is significant difference between without PC and with PC in the firmness exists

Run appropriate statistical analysis (T-TEST, ANOVA, or regression analysis) in


order to test the effect of treatment/factors or the interaction between factors.

Analyzed using T-test

Analyzed using R studio


Data visualition ( Box Plot)
Present the R codes that you used for all analyses and explain the functions for
the codes.

Explanation
>view(ttest)
>attach(ttest)
>names(ttest)

This step is very important to call the codes, which file need to operate.

Output

[1] "With pc" "Without pc"


>’With pc’
[1] 68 75 62 86 52 46 72
>’Without pc’
[1] 61 69 64 76 52 38 68

>help(t.test)

this step will lead or guide to the t.test code


Output

>t.test(‘With pc’, ‘Without pc’, mu = 0, alternative = “two.sided”, paired = T, var.equal =


F, conf.level = 0.95)

This step will lead to the output for the


4. COMPLETELY RANDOMIZED DESIGN

An experiment was set up to compare the effect of calcium additives on the increase in trunk diameters for
orange trees. Three levels of a calcium supplement (10, 200 and 300 kg/ha) were applied and the soil has the pH
of 7. At the end of a 2-year period, three diameters in cm were examined at each treatment. Below are the data.

Calcium (kg/ha)
Rep
100 200 300
1 18 18.5 17
2 18.75 17.5 16.5
3 18 17.25 16

i. Definition
An experimental design is a plan for the assignment of the treatment to the plots in
the experiment. The design differ primarily in the way the plots are grouped before
the treatments are applied. It is because we want to see how much restriction is
imposed on the random assignment of treatments to the plots.

ii. Usage (the objective of experiment that requires the analysis)


Objective:
To study the effect of calcium additives on the increase in trunk diameters for orange
trees.

Hypothesis:
H0: The trunk diameters of orange trees not effected by different level calcium
additives.
H1: The trunk diameters of orange trees effected by different level calcium additives.

Treatment:
Treatment 1: Calcium additives 100kg/ha
Treatment 2: Calcium additives 200kg/ha
Treatment 3: Calcium additives 300kg/ha

Level:
3 level (100, 200 and 300 kg/ha)

Number of replications:
3 replications

Measured variables:
Trunk diameters

Experimental unit:
Orange trees
iii. Layout of experiment

R1 R2 R2
T1 T2 T3

R3 R1 R2
T2 T3 T1

R3 R1 R3
T1 T2 T3

iv. Statistical model with explanations

Yij = observation for the ith treatment and jth replicate


 = overall mean

i = I th treatment effect
ij = random error for the ith treatment and jth replicate

Linear model underlying the completely randomized design shows the dependent variable


being equal to a constant plus a treatment effect plus individual variation. For i ranging from
1 to r and j going from 1 to ni. The value of the jth observation of the dependent variable at
the ith treatment level is yij. There are r levels of the treatment variable and ni observations
of Y at the ith treatment level. The constant is represented by μ, and the effect of the ith
treatment level is represented by α. Where xij is an indicator variable which has the value of
1 if the ijth subject has received the ith level of the treatment and 0.
In this experiment, we need to use one way ANOVA it is because ANOVA is used to
determine whether if there any statistically significant difference between the mean of
levels calcium additives. We need to choose one way ANOVA for this experiment
because as stated in the objective of experiment that require the analysis, we only
have one factor to determine the results. One way ANOVA also provide information
needed to perform tests of significance and construct interval estimates.

Firstly, one way ANOVA can be conduct by using R studio and Excel. We need to
study the objective of this experiment, then state the null and alternative hypothesis.
After that, create the plot design that we use for this experiment, Complete
Randomize Design (CRD) which is the treatments are randomly assigned (scattered)
all over the plots. Thus, all of the results from this experiment will be present in figure
and table to make analysis interpretation either we accept or reject the hypothesis.

v. You can also highlight any prominent features of the analysis as compared to
other designs/topics, if applicable.

Advantages we can get from this experimental design of CRD is very flexibility, it can
be any number of treatments and any number of replications and Don’t have to have
the same number of replications per treatment. The simplest and least restrictive
design compare to RCBD if there any missing data can cause some difficulty in the
analysis.
i. Run appropriate statistical analysis (T-TEST, ANOVA, or regression analysis) in
order to test the effect of treatment/factors or the interaction between factors.

Results:

Rep T1 (100kg/ha) T2(200kg/ha) T3(300kg/ha)


1 18 18.5 17
2 18.75 17.5 16.5
3 18 17.25 16

Total 54.75 53.25 49.5


Mean 18.25 17.75 16.5

ANOVA Analysis using Excel


Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
T1 (100kg/ha) 3 54.75 18.25 0.1875
T2(200kg/ha) 3 53.25 17.75 0.4375
T3(300kg/ha) 3 49.5 16.5 0.25

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 4.875 2 2.4375 8.357143 0.018431 5.143253
Within Groups 1.75 6 0.291666667

Total 6.625 8
vi. Present the results in a table or figure with labels and footnotes as well as the
interpretations.

ANOVA Analysis using R

 Since, the calculated data is greater than the F critical. Hence, we reject the null
hypothesis of this experiment. It is significant different between treatments.
 P-value < 0.05. Thus, we reject the null hypothesis.
Bar graph of mean comparison

INTERPRETATION

The mean diameter of trunk orange trees are significantly different for all calcium additives
treatment. In this experiment, Treatment 1(100kg/ha) as show the highest result of mean
diameter which is 18.25 cm. Treatment 2 (200kg/ha) of calcium additives was use in this
experiment, the result present in the bar graph where the measurement diameter for trunk
orange trees reduced to 17.75 cm. The result for treatment 3(300kg/ha) show that it is the
lowest result compare to others treatment because the excessive usage of calcium additives
may lead to disturbances in ion balance, to the disadvantage of other nutrients and the
orange trees may have uneven growth that will cause low yield production.
i. Present the R codes that you used for all analyses and explain the functions for
the codes.

STEP 1
# changing file name to datacrd
>attach(datacrd)
>names(datacrd)

STEP 2
# checking data structure
>str(datacrd)

STEP 3
# need to change your variable to factor to fit to anova model
# change Trt to factor
> datacrd$Trt=as.factor(datacrd$Trt)

STEP 4
# setting data into data frame
> datacrd=as.data.frame(datacrd)

STEP 5
# checking data structure
> str(datacrd)

STEP 6
# fit anova model
> fitcrd=lm(Diameter~Trt,data = datacrd)

STEP 7
# displaying output
> anova(fitcrd)

STEP 8
# checking for assumptions
> par (mfrow=c (2,2))

STEP 9
# displaying 4 plots in 2x2 dimension
> plot(fitcrd)

STEP 10
# mean comparison need to
> library(agricolae)

STEP 11
#using lsd
> lsd=LSD.test(fitcrd,"Trt")
> lsd

 This CDR code was used to determine if there any significant difference in
this experiment by using one way ANOVA in R studio.

vii. Show the selected output for the R code.


5. RANDOMIZED COMPLETE BLOCK DESIGN

Light-emitting diodes (LEDs) are a potential irradiation source for intensive plant culture
systems and photobiological research. In this study, the lead area of ‘Hungarian Wax’
pepper (Capsicum annum L.) plants grown under red LEDs were compared with similar
plants grown under red LEDs with supplemental blue or far-red radiation or under broad
spectrum metal halide (MH) lamps. Below are the data for leaf area (cm 2) of pepper plant
grown in a randomized complete block design with five replications.

Types of light R1 R2 R3 R4 R5
Metal halide (MH) 864.89 651.46 644.27 608.88 768.81
Red and Blue (RB) 840.58 792.18 765.30 738.97 894.35
Red (R) 609.09 649.50 499.16 505.75 644.80
Red and Far red
424.37 582.20 386.92 428.03 674.13
(RFR)

i. Definition
A Randomized Complete Block Design (RCBD) is defined by the experimental units are
divided into smaller groups to minimize the effect of environmental variability which the
treatment combinations are assigned randomly to the experimental units within a block.
A CRBD is a two way classification of factors which can be grouping of factors by
treatment and block.

ii. Usage
For the RCBD design, the purpose of blocking is to reduce the experimental error by
eliminating the contribution of known sources of variation among the experimental units.
In summary, it will provide a high precision and make the treatment comparison become
more uniform.
 Objectives
To study the leaf area (cm2) of pepper plant affected by the different types of
Light-emitting diodes (LEDs).
 Hypothesis
H0: There is no significant difference on the leaf area (cm 2) of pepper plant
between the different types of LEDs.
Ha: There is significant difference on the leaf area (cm 2) of pepper plant between
the different types of LEDs.
 Treatment
Metal halide (MH)
Red and Blue (RB)
Red (R)
Red and Far red (RFR)
 Level
4
 Number of replications
5 replications
 Measured variables
Leaf area (cm2)
 Experimental unit
Pepper plant

iii. Layout of experiment

iv. Statistical model with explanation

General mean Effect of the jth treatment


Observation on ith Effect of the ith treatment Experimental random error
treatment in jth block.

v. You can also highlight any prominent features of the analysis as compared to
other design / topic if applicable.

 Generally, if the blocking is effective, RCBD is more precise than the


Completely Randomized Design (CRD).
 There is no restriction on the number of treatments or replicates as an
effort to control the experimental error.
 The analysis is relatively easy as the missing plots are easily estimated.

vi. Run appropriate statistical analysis (T-test, ANOVA or Regression analysis) in


order to test the effect of treatment / factor or the interaction between factors.

vii. Present the results in a table or figure with labels and footnotes as well as the
interpretations.

Analysis of variance table.


Source df SS MS Fcalc Pr > (Fcalc) Ftable
Light 3 275861 91954 21.1378 4.424 × 3.49
(Treatment) 10-5
Block 4 91205 22801 5.2414 0.0112 3.26
Error 12 52202 4350
Total 19 419268 119105

In conclusion, the Fcalc for treatment and block are greater than F table. Plus,
the p-value for block and treatment is smaller than alpha = 0.05, thus we reject
the null hypothesis. There is significant difference between blocks and between
treatments at p-value 0.05.
Report of analysis
Types of light Red and Blue Metal halide Red Red and Far red
(RB) (MH) (R) (RFR)
Means 806.276 707.662 581.660 499.130

Bar graph

Figure 1. The effect of different types of Light-emitting diodes (LEDs) on the leaf
area (cm2) of pepper plant. Means with different letters is significantly different
using LSD at P<0.05.
Interpretation
 The Red and Blue (RB) gives the highest value of leaf area (cm 2) of pepper plant.
 Red (R) and Red and Far red (RFR) give the lowest value of leaf area (cm 2) of
pepper plant.
 There is significant effect on the leaf area (cm 2) of pepper plant by using different
types of LEDs.
viii. Present the R codes that you used for all analyses and explain the functions for
the codes.

 Step 1
# Check the data structure.
> str(rcbd)

 Step 2
# The variables (light and block) need to change into factors to fit the ANOVA
model.
> rcbd$light = as.factor(rcbd$light)
> rcbd$block = as.factor(rcbd$block)

 Step 3
# Data needs to be set into the data frame.
> rcbd = as.data.frame(rcbd)

 Step 4
# ANOVA model for RCBD.
> fitrcbd = lm(yield ~ light + block, data=rcbd)

 Step 5
# The output is displayed for the RCBD analysis.
> anova(fitrcbd)

 Step 6
# Install.packages for mean comparison.
> library(agricolae)

 Step 7
# LSD is used for the mean comparison.
> lsd=LSD.test(fitrcbd, "light")
# The output is displayed for the LSD analysis
> lsd

 Step 8
# The output is displayed for the bar graph.
> bar.group(lsd$groups, ylim = c(1,900), ylab = "Leaf area",
col=c("red", "blue", "green", "yellow"), xlab = "Types of light")

ix. Show the selected output for the R code.

> str(rcbd)
'data.frame': 20 obs. of 3 variables:
$ light: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 2 2 2 2 2 ...
$ block: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 ...
$ yield: num 865 651 644 609 769 ...

> fitrcbd = lm(yield ~ light + block, data=rcbd)


> anova(fitrcbd)

Analysis of Variance Table

Response: yield
Df Sum Sq Mean Sq F value Pr(>F)
light 3 275861 91954 21.1378 4.424e-05 ***
block 4 91205 22801 5.2414 0.0112 *
Residuals 12 52202 4350
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> lsd=LSD.test(fitrcbd, "light")


> lsd

$statistics
MSerror Df Mean CV t.value LSD
4350.197 12 648.682 10.1677 2.178813 90.88755

$parameters
test p.ajusted name.t ntr alpha
Fisher-LSD none light 4 0.05

$means
yield std r LCL UCL Min Max Q25 Q50 Q75
1 707.662 106.53030 5 643.3948 771.9292 608.88 864.89 644.27 651.46 768.81
2 806.276 61.93065 5 742.0088 870.5432 738.97 894.35 765.30 792.18 840.58
3 581.660 74.00998 5 517.3928 645.9272 499.16 649.50 505.75 609.09 644.80
4 499.130 123.24871 5 434.8628 563.3972 386.92 674.13 424.37 428.03 582.20

$comparison
NULL

$groups
yield groups
2 806.276 a
1 707.662 b
3 581.660 c
4 499.130 c

attr(,"class")
[1] "group"

> bar.group(lsd$groups, ylim = c(1,900), ylab = "Leaf area",


col=c("red", "blue", "green", "yellow"), xlab = "Types of light")
6. LATIN SQUARE DESIGN
Below is the yield (kg/ha) of four maize varieties (A, B, C, D) conducted in a Latin Square
design experiment. The row blocking was due to the gradient in water supply through
irrigation system. Also, there is a line of tree located on one side of the field, that shaded the
experimental area in the morning. The line of tree is perpendicular to the gradient of water
supply in this experimental plot.

Irrigatio Shade
n 1 2 3 4
C D B A
1
13.5 10.7 15 16.2
B A C D
2
14.1 15 13.3 10.5
D C A B
3
10.8 17.2 16.2 18.7
A B D C
4
14.6 15.3 8.9 13.2

i. Definition
Latin square design (LSD) is an experimental design that will allow to use two sources of
variation perpendicularly as blocking factors. From PennState (n.d.), an example like if
there is a land plot with the fertility that might be changed in two direction due to either by
soil or moisture. Thus, the rows and columns are used as the blocking factors.

ii. Usage (the objective of experiment that requires the analysis)

Usage:
As we acknowledged, this design allowed to block two factors, and both must be
controlled in the meantime. The treatments are arranged in respective repeated times.
By which, each treatment must be placed once for each row and each column in this
design (Springer, 2008).
Objective of this question in this design:
To determine the effect of four different varieties (A, B, C, D) on the yield of maize
(kg/ha).

Hypothesis:
Ho: There is no significance difference among the treatments
Ha: There is significance difference among the treatments

Ho: There is no significance difference in rows


Ha: There is significance difference in rows

Ho: There is no significance difference in columns


Ha: There is significance difference in columns
iii. Layout of experiment

Shade

C D B A
13.5 10.7 15 16.2
B A C D
14.1 15 13.3 10.5
D C A B
Irrigation
10.8 17.2 16.2 18.7
A B D C
14.6 15.3 8.9 13.2

C D B A A B D C
3 14.6 15.3 8.9 13.2
13.5 10.7 15 16.2
D C A B
B A C D
4 10.8 17.2 16.2 18.7
14.1 15 13.3 10.5
C D B A
D C A B
2 13.5 10.7 15 16.2
10.8 17.2 16.2 18.7 B A C D
A B D C 14.1 15 13.3 10.5
1
14.6 15.3 8.9 13.2 4 1 2 3

B D C A
15.3 8.9 13.2 14.6

C A B D
17.2 16.2 18.7 10.8

D B A C
10.7 15 16.2 13.5

A C D B
15 13.3 10.5 14.1
iv. Statistical model with explanations

kth treatment
observation ith block effect effect

Yij = µ + βi + γj + τ k + ε ij random
error

Overall mean effect jth column effect

v. You can also highlight any prominent features of the analysis as compared to
other designs/topics, if applicable.
As this design can be blocked for two factors (rows and columns), we can compare this
design with the randomized complete block design (RCBD). By comparing with RCBD,
the experimental error in LSD can be reduced as we compared each of the design. In
addition, this design is more restrictive when compared it to RCBD

vi. Run appropriate statistical analysis (T-TEST, ANOVA, or regression analysis) in


order to test the effect of treatment/factors or the interaction between factors.

ANOVA analysis (Using R)


> fitq6group3 = lm(yield ~ rowirrigation + columnshade + variety,
data = q6group3)
> anova(fitq6group3)

Analysis of Variance Table

Response: yield
Df Sum Sq Mean Sq F value Pr(>F)
rowirrigation 3 18.355 6.1183 13.496 0.004469 **
columnshade 3 6.800 2.2667 5.000 0.045197 *
variety 3 78.925 26.3083 58.033 7.987e-05 ***
Residuals 6 2.720 0.4533
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
vii. Present the results in a table or figure with labels and footnotes as well as the
interpretations.

ANOVA (Using Excel)

Source of
Variation df SS MS F value F crit P value
6.1183 13.496 4.7570
Rows (Irrigation) 3 18.355 3 3 6 0.00446893
2.2666 4.7570 0.04519745
Columns (Shade) 3 6.8 7 5 6 3
Treatments 26.308 58.033 4.7570 7.98673E-
(Variety) 3 78.925 3 1 6 05
0.4533
Error 6 2.72 3
Total 15 106.8

ANOVA (Using R)

Intepretation:
From both analyses, there is significance difference among the treatment (variety) means
due to the F value (58.033) which higher compared to the F crit (4.757), the P value is less
than 0.05; thus, we reject the null hypothesis.
Among the block of rows, the F value (13.496) is higher compared to the F crit (4.757), the P
value is less than 0.05; thus, we reject the null hypothesis.
Among the block of columns, the F value (5.000) is higher compared to the F crit (4.757), the
P value is less than 0.05; thus, we reject the null hypothesis.
Mean comparison

LSD: 1.165

Maize variety A B C D
Mean yield 15.5 (a) 15.76 (a) 14.3 (b) 10.2 (c)

Interpretation:
There is highly significance different among the treatment means. Between maize A and
maize B are highly produced yield but there is no difference in producing the yield in both.
However, in both maize C and maize D, the mean yield is slightly lower by which they
produced less its yield. Thus, by blocking the irrigation effect are very useful to reduce the
experimental error. Even though the distance of shade is not significantly high.
viii. Present the R codes that you used for all analyses and explain the functions
for the codes.

Step 1
# assigning the new name for data set (renaming the data set) – not necessary to do
> q6group3 = latinsquareportfolioq6

Step 2
# checking structure of data
> str(q6group3)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 16 obs. of 4


variables:
$ rowirrigation: num 1 1 1 1 2 2 2 2 3 3 ...
$ columnshade : num 1 2 3 4 1 2 3 4 1 2 ...
$ variety : chr "C" "D" "B" "A" ...
$ yield : num 13.5 10.7 15 16.2 14.1 15 13.3 10.5 10.8
17.2 ...

Step 3
# changing the variables to factor to fit in ANOVA model
> q6group3$rowirrigation = as.factor(q6group3$rowirrigation)
> q6group3$columnshade = as.factor(q6group3$columnshade)
> q6group3$variety = as.factor(q6group3$variety)

Step 4
# setting the data into frame
> q6group3 = as.data.frame(q6group3)
> str(q6group3)

'data.frame': 16 obs. of 4 variables:


$ rowirrigation: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2
2 2 3 3 ...
$ columnshade : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2
3 4 1 2 ...
$ variety : Factor w/ 4 levels "A","B","C","D": 3 4 2 1 2 1
3 4 4 3 ...
$ yield : num 13.5 10.7 15 16.2 14.1 15 13.3 10.5 10.8
17.2 ...

Step 5
# fit ANOVA model for Latin Square design
> fitq6group3 = lm(yield ~ rowirrigation + columnshade + variety,
data = q6group3)
Step 6
# displaying output
> anova(fitq6group3)

Analysis of Variance Table

Response: yield
Df Sum Sq Mean Sq F value Pr(>F)
rowirrigation 3 18.355 6.1183 13.496 0.004469 **
columnshade 3 6.800 2.2667 5.000 0.045197 *
variety 3 78.925 26.3083 58.033 7.987e-05 ***
Residuals 6 2.720 0.4533
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 7
# checking for assumptions – does this design follow the rules of the distribution
> par(mfrow = c(2,2)) # displaying 4 plots in 2x2 dimension
> plot(fitq6group3)

Step 8
# to call the statistical procedure (Statistical Procedures for Agricultural Research)
> library(agricolae)

Step 9
# to compare the means between the treatments (using LSD method)
> mclsd = LSD.test(fitq6group3, "variety")
> mclsd
$statistics
MSerror Df Mean CV t.value LSD
0.4533333 6 13.95 4.826526 2.446912 1.164963

$parameters
test p.ajusted name.t ntr alpha
Fisher-LSD none variety 4 0.05

$means
yield std r LCL UCL Min Max Q25 Q50
Q75
A 15.500 0.8246211 4 14.676247 16.32375 14.6 16.2 14.900 15.60
16.200
B 15.775 2.0155644 4 14.951247 16.59875 14.1 18.7 14.775 15.15
16.150
C 14.300 1.9373521 4 13.476247 15.12375 13.2 17.2 13.275 13.40
14.425
D 10.225 0.8920949 4 9.401247 11.04875 8.9 10.8 10.100 10.60
10.725

$comparison
NULL
$groups
yield groups
B 15.775 a
A 15.500 a
C 14.300 b
D 10.225 c

attr(,"class")
[1] "group"
10. REGRESSION AND CORRELATION
Normal hatchery processes in aquaculture inevitably produce stress in fish, which may
negatively impact growth, reproduction, flesh quality, and susceptibility to disease. Such
stress manifests itself in elevated and sustained corticosteroid levels. An experiment where
fish were subjected to a stress protocol and then removed and tested at various times after
the protocol had been applied. The accompanying data on x, which is the time for stress
protocol (min) and y which is the blood glucose level (mmol/L) was read from a plot.

Time Blood glucose Time Blood glucose


(x) (y) (x) (y)
2 4 29 5.8
2 3.6 30 4.3
5 3.7 34 5.5
7 4 36 5.6
12 3.8 40 5.1
13 4 41 5.7
17 5.1 44 6.1
18 3.9 56 5.1
23 4.4 56 5.9
24 4.3 57 6.8
26 4.3 60 4.9
28 4.4 60 5.7

i. Definition
Regression is a method that will be used to conduct the relationship between a
response variable and a set of explanatory variables. Response variable is known as
dependent variable while explanatory variables is known is independent variables.
From this experiment, we can predict the value of a numerical value from other value.
Correlation is an analysis that are used to conduct a statistical method to measure
the strength of the relationship between the response variable and the explanatory
variables. The more variables have a strong relationship or vice versa.

Usage (the objective of experiment that requires the analysis)


Usage:
GraphPad (2019) summarized that both of regression and correlation are similar but
there are few things that made both of it differ to each other. Such as, in regression,
this is used in predicting the response variable (Y) from the equation of straight line
which is the key thing in a model. While for correlation is used to summary the
direction and the strength of the relationship between both of the variables.

Objective:
To study the various time for stress protocol that affect the blood glucose level of the
fish

Hypothesis
Ho: There is no significance difference between the various time on the blood glucose
level of the fish
Ha: There is significance difference between the various time on the blood glucose
level of the fish
ii. Layout of experiment
There is no layout design can be applied in this topic since this experiment conducted in
analysis method.

iii. Statistical model with explanations

Y = β0 + β1X + ε
Y = time for the stress protocol (dependent variable)
X = blood glucose level of the fish (independent variable)
β0 = y-intercept (value of Y when X=0)
β1 = gradient/slope (change in Y for every unit change in X)
ε = random error

There are two type of regression model which are Model I and Model II. First thing first,
before conducting regression analysis, we need to identify the independent and
dependent variable for us to set by which model we will use to conduct the analysis of
the experiment.

Model I
The values that will be used in the experiment is the variable X which is controlled by the
experimenter. Next, the experimental error of the analysis will be assumed to NOT be
measured it. And lastly, the independent variable (Y) will be measured to the changes in
dependent variable (X).

Model II
Both variable X and Y will be measured, and its subject is error. However, the
independent variable can be choosing depends on the experiment. And this model
usually will be corresponded to be used in correlation in both variables in order to
observe the relationship between them.

iv. You can also highlight any prominent features of the analysis as compared to
other designs/topics, if applicable.
Not applicable when comparing to other topics.

v. Run appropriate statistical analysis (T-TEST, ANOVA, or regression analysis) in


order to test the effect of treatment/factors or the interaction between factors.

> fit_BG <- lm(bloodglucose~time, data=regressionporfolio10)


> summary(fit_BG)

Call:
lm(formula = bloodglucose ~ time, data = regressionporfolio10)

Residuals:
Min 1Q Median 3Q Max
-1.0702 -0.3528 -0.1702 0.4661 1.0046

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.696487 0.215919 17.120 3.34e-14 ***
time 0.037895 0.006137 6.174 3.25e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5525 on 22 degrees of freedom


Multiple R-squared: 0.6341, Adjusted R-squared: 0.6174
F-statistic: 38.12 on 1 and 22 DF, p-value: 3.251e-06

vi. Present the results in a table or figure with labels and footnotes as well as the
interpretations.

Scatter plot (Using Excel)

Blood Glucose Level vs. Time


7
Blood glucose level (mmol/L)

6
f(x) = 0.04 x + 3.7
5 R² = 0.63
4
3
2
1
0
0 10 20 30 40 50 60
Time (min)

Regression Analysis and Correlation Output (Using Excel)


SUMMARY OUTPUT

Regression
Statistics
Multiple 0.79629
R 2
R 0.63408
Square 2
Adjusted
R 0.61744
Square 9
Standard 0.55250
Error 7
Observat
ions 24

ANOVA
Signific
  df SS MS F ance F
Regressi 1 11.63 11.63 38.12 3.25E-
on 751 751 272 06
6.715 0.305
Residual 22 82 265
18.35
Total 23 333      

Low Upp
Stand er er
Coeffici ard P- Lower Upper 95. 95.
  ents Error t Stat value 95% 95% 0% 0%
3.69648 0.215 17.11 3.34E 3.24869 4.144
Intercept 7 919 981 -14 9 275 3.2 4
X
Variable 0.03789 0.006 6.174 3.25E 0.02516 0.050
1 5 137 36 -06 7 623 0 0
Regression Analysis and Correlation Output (Using R)
Call:
lm(formula = bloodglucose ~ time, data = regressionporfolio10)

Residuals:
Min 1Q Median 3Q Max
-1.0702 -0.3528 -0.1702 0.4661 1.0046

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.696487 0.215919 17.120 3.34e-14 ***
time 0.037895 0.006137 6.174 3.25e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5525 on 22 degrees of freedom


Multiple R-squared: 0.6341, Adjusted R-squared: 0.6174
F-statistic: 38.12 on 1 and 22 DF, p-value: 3.251e-06

Pearson's product-moment correlation

data: regressionporfolio10$time and


regressionporfolio10$bloodglucose
t = 6.1744, df = 22, p-value = 3.251e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5788279 0.9080150
sample estimates:
cor
0.7962925

Intepretation:

At x-axis, the time denoted as independent variable while at the y-axis, the blood glucose
level denoted as dependent variable.
After we plotted the data, the result showed that the equation of the straight line:
y = 0.0379x + 3.6965
From that we obtained:
b (slope) = 0.0379
a (y-intercept) = 3.6965
Thus, this equation fitted to the linear model (y = a + bx).

From the ANOVA output, since the F calc (38.12) is greater than F crit (3.25E-06); thus, we
reject the null hypothesis. It showed that there is significance difference between the time
and the blood glucose level.
From the output above,

R square, R2 = 0.63
An increase of the blood glucose level is associated with an increase in the various time. It is
a positive relationship between various time and blood glucose level. Thus, 63% of the
variation in Y (blood glucose level) explained by the regression line.

Multiple R/ Correlation = 0.796


The value of the multiple R/ correlation showed that 0.796 which is strong and positive
relationship. Since the sign of r is positive, the relation showed that it is a direct between
both variables.

vii. Present the R codes that you used for all analyses and explain the functions for
the codes.

Step 1
# checking the data structure
> str(regressionporfolio10)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 24 obs. of 2


variables:
$ time : num 2 2 5 7 12 13 17 18 23 24 ...
$ bloodglucose: num 4 3.6 3.7 4 3.8 4 5.1 3.9 4.4 4.3 ...

Step 2
# fit the data into regression model
> fit_BG <- lm(bloodglucose~time, data=regressionporfolio10)

Step 3
# displaying the output of the regression analysis
> summary(fit_BG)

Call:
lm(formula = bloodglucose ~ time, data = regressionporfolio10)

Residuals:
Min 1Q Median 3Q Max
-1.0702 -0.3528 -0.1702 0.4661 1.0046

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.696487 0.215919 17.120 3.34e-14 *
time 0.037895 0.006137 6.174 3.25e-06 *
---
Signif. codes: 0 ‘*’ 0.001 ‘*’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5525 on 22 degrees of freedom


Multiple R-squared: 0.6341, Adjusted R-squared: 0.6174
F-statistic: 38.12 on 1 and 22 DF, p-value: 3.251e-06

Step 4
# used Pearson’s product-moment correlation as a method to test the correlation
> cor.test(x=regressionporfolio10$time,
y=regressionporfolio10$bloodglucose, method = "pearson")

Pearson's product-moment correlation

data: regressionporfolio10$time and


regressionporfolio10$bloodglucose
t = 6.1744, df = 22, p-value = 3.251e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.5788279 0.9080150
sample estimates:
cor
0.7962925
REFERENCES

Graphpad (2019). What is the difference between correlation and linear regression.
https://www.graphpad.com/support/faq/what-is-the-difference-between-correlation-and-linear-
regression/#:~:text=Regression%20is%20primarily%20used%20to,2%20or%20more%20numeric
%20variables.

PennState (2021). 4.3 - The Latin Square Design. https://online.stat.psu.edu/stat503/lesson/4/4.3

Springer (2008). Latin Square Designs. In: The Concise Encyclopedia of Statistics.
https://doi.org/10.1007/978-0-387-32833-1

Completely randomized design. (n.d.). ScienceDirect.com | Science, health and


medical journals, full text articles and books.
https://www.sciencedirect.com/topics/mathematics/completely-randomized-design

You might also like