STA1007S Lab 6: Custom Functions: "Sample"
STA1007S Lab 6: Custom Functions: "Sample"
STA1007S Lab 6: Custom Functions: "Sample"
SUBMISSION INSTRUCTIONS:
Go into the Submissions section and click on Lab Session 6 to access the submission form. Please note that
the answers get automatically marked and so have to be in the correct format:
ENTER YOUR ANSWERS TO 2 DECIMAL PLACES UNLESS THE ANSWER IS A ZERO OR AN INTE-
GER (for example if the answer is 0 you just enter 0 and not 0.00, or if the answer is 2 you enter 2 and not 2.00).
Introduction
So far, we’ve seen quite a few built-in functions in R; now, we will see how we create our own. Once custom
functions are defined by the user, they are executed in the same way as built-in functions. Creating our own
functions will not only help us write much more efficient code, but it will also help us understand what R is
doing “under the hood” when we execute commands. In the second part of the lab, we will see how to plot
an input to a function vs. the output of the function. For this, we will use the function plot(), which is one
of the most used commands in R.
You will find that most of the R code necessary to execute the R commands is provided. This lab is meant to
be practice for you, so even if the code and the output of the code is provided, you are expected to create
your own script, run the pieces of code yourself and check whether the output is what you would expect it to
be. Every now and then, you will be asked to fill in blank pieces of code marked as ---. In addition to “fill
in the code”, you will need to answer other questions for which you must produce plots, run your own code
or explore your data. The questions you need to submit through Vula will appear in the submission boxes.
At any time you might call the function help(), to obtain information from any function you want. E.g. If
you wanted to obtain a description of how the function sample() works, you can at any time type in the
console (bottom left panel in RStudio):
help("sample")
You should take this as a habit and check the help files of the functions you use for the first time.
1
Remember to add a line to clean your working environment and one to double check that your working
directory is correct.
Remember to save your script frequently!
Writing functions in R
One of the most powerful features of R is that it allows us to make our own functions quite easily, using
the command function(). We place the code that we want the function to carry out into curly brackets {}
following the function() command. This will tell the function what we expect it to do. Inside the round
brackets () we tell R what arguments (inputs) the function will need to perform its task. The function is
usually assigned to an object that we name. For example, here is a simple function that calculates the mean
of a set of values:
n
1X
x̄ = xi
n i=1
This function is called calcMean and takes one argument called my_vector. These two names are arbitrary
and you can pick whatever you want. It is usually recommended to give functions “action” names, like
“calculate_mean” or abbreviated “calcMean”. Then, we must try to avoid using names that R already uses for
other functions like “mean”. The function would still work, but it might cause you problems down the line.
The function works like this:
1. You give the function calcMean() an input: my_vector.
2. Then, the function calcMean() starts executing the code inside the curly brackets and calculates the
number of elements inside the object my_vector, using the function length() and creates an object n
with the result.
3. Next, it creates the object my_mean by summing through the elements in my_vector and dividing this
sum by the number of elements in my_vector, which was stored in the object n.
4. Finally, the function returns the value stored in the object my_vector. We specify what the output
of our function is with the function return(). Actually, the function will output the last value it
calculated if we don’t specify a return() statement, but it is good to do it while we are learning.
Now, let’s see this in action. Create an arbitrary vector of five values (remember that a vector is just a series
of numbers stored in an object):
# Create a vector of 5 arbitrary numbers
x <- c(3.215, 0.561, 0.714, 1.643, 1.227)
And now, we will use the function calcMean() to calculate the mean of these five values:
# Use the function I've created to calculate the mean of the vector x
calcMean(my_vector = x)
## [1] 1.472
And that is your answer! We can check that it is indeed the correct mean by using the built-in R function
mean() and compare the results.
2
# Confirm that the function calcMean gives the correct answer
mean(x)
## [1] 1.472
These results should agree. . . otherwise, you will need to review your calcMean() function. Notice that
all those objects that the function creates and uses, like: n or my_mean, don’t show up on your working
environment. This is because R creates those objects, uses them internally and then discards them. This is
one of the advantages of using functions to run series of commands; your working environment doesn’t get
too cluttered with intermediate objects.
Now, we are going to add a second argument to the function calcMean() that specifies how many decimal
places should be printed out:
# Modify the function calcMean to be able to specify the number of decimal places in the output
calcMean <- function(my_vector, decimals = 2){
n <- length(my_vector)
my_mean <- sum(my_vector)/n
round_mean <- round(my_mean, digits = decimals)
return(round_mean)
}
Note the use of the round() command that rounds off the calculated number to the specified number of
decimal places. Also note that the first argument (called my_vector) does not have any default values
specified, whereas the 2nd argument (called decimals) has a default value of 2 specified. This means that if
we run the function and do not specify anything for the 2nd argument, the default of 2 decimal places will be
used.
Now, we will use this new version of the function calcMean() to calculate the mean of the vector x, first
with 2 and then with 1 decimal places:
# Calculate the mean of x with 2 decimal places
calcMean(my_vector = x, decimals = 2)
## [1] 1.47
# Calculate the mean of x with 1 decimal
calcMean(my_vector = x, decimals = 1)
## [1] 1.5
We have created a function to calculate the mean of a vector of numbers, let’s now create a function to
calculate the, equally important, variance. We’ll let you do this on your own, but we’ll give you some hints.
Most importantly, R “vectorizes” operations. Don’t panic, this is really good news and it means that if you
tell R to sum, subtract, multiply or perform any other arithmetic operation with vectors (those series of
numbers created with the c() function), it will perform them, element by element; first with first, second
with second, etc.
If you sum two vectors, R will give you another vector corresponding to the element-wise sum of the two
vectors. So
# vector arithmetic examples
c(1,2,3) + c(2,2,1)
## [1] 3 4 4
Or lets try a regular multiplication:
c(1,2,3) * c(2,2,1)
## [1] 2 4 3
3
R just performs these operations element by element.
If you ask R to perform one of these operations with a vector and a single number, it will perform the same
operation to all elements of the vector. For example:
c(1,2,3) * 2
## [1] 2 4 6
Or
c(1,2,3) - 2
## [1] -1 0 1
Same thing with powers:
c(1,2,3)^2
## [1] 1 4 9
Believe it or not that is all you need to know to build your variance calculating function. Let’s remind
ourselves of the variance formula:
n
1 X
s2 = (xi − x̄)2
n − 1 i=1
You have the calcMean() function that you can use as a template and it could actually help you calculate
the mean of the vector my_vector, which you will need. Then, you are only a few arithmetic operations away
from the variance. It is up to you if you want to add an argument to round up the result, but remember that
you will need to round to 2 decimal places to answer questions in Vula.
SUBMISSION:
Vula Question 1. Use the function you just created to calculate the variance of the vector c(4.35, 0.48,
10.78, 7.34, 8.74).
Vula Question 2. Use the function you just created, together with the built-in function sqrt(), to calculate
the standard deviation of the vector c(5.53, 6.00, 3.60, 2.64, 6.30) .
f (x) = x2
# Create the square function
calcSq <- function(x){
square <- x^2
return(square)
}
Then, we are going to generate a sequence of values to input to this function. To generate the sequence of
x values we will use the seq() function. This function works similar to the : operator, which generated a
sequence of integers between the number to the left of the : sign and the number to the right. However, the
seq() function will allow us to define the interval between the numbers to be generated, among other things.
4
# Generate a sequence of number between 0 and 5, increasing by 0.5
x <- seq(from = 0, to = 5, by = 0.5)
x
## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
The use of the seq() function is quite self-explanatory. Refer to the help file for further details.
Once we have our x values, we can plug them into our square function calcSq(), to obtain the square of
each number in the sequence. As we mentioned, R “vectorizes” operations and if we pass a vector of values
to a function, it will typically apply the function to each value in the vector.
# Square the values in vector x and store them in vector y
y <- calcSq(x)
Finally, we will plot the x values against the corresponding y values. The function plot() takes a vector of x
values and a vector of y values and treats them as coordinates. The first element of x and the first element of
y, will define a point, the second element of x and the second element of y, another point, and so on. Let’s
see how the output of the function calcSq() looks like.
# Plot the x values against their squares (y)
plot(x = x, y = y)
25
20
15
y
10
5
0
0 1 2 3 4 5
In the x axis we have the values in the vector x and in the y axis we have the values resulting from applying
the function calcSq() to the values of x. We could have run the following code and we would have obtained
the same result (you are welcome to try):
# Plot the x values against their squares calcSq(x)
plot(x = x, y = calcSq(x))
The function plot() is extremely flexible and can take on MANY different arguments. Too many to cover
them in a single lab. For now, we’ll stick to the basic use of this function that will already be very useful.
5
SUBMISSION:
Create a function that takes an input and divides it by its square:
x
f (x) =
x2
Apply this function to a vector of values between 1 and 5, increasing by 0.25. Plot the values of the vector
against the function output.
Vula Question 3. Is the function decreasing or increasing?
Vula Question 4. Does the function increase/decrease at a constant rate?
Vula Question 5. What is the last value of the output generated by the function?
Vula Question 6. What is the sum of the values of the output generated by the function?