R Unit5
R Unit5
R Unit5
data of utmost importance of analysts. The primary styles are: dot plot, density plot (can be classified as
histograms and kernel), line graphs, bar graphs (stacked, grouped and simple), pie charts (3D,simple
and expounded), line graphs(3D,simple and expounded), box-plots(simple,notched and violin plots),
bag-plots and scatter-plots (simple with fit lines, scatter-plot matrices, high-density plots and 3-D plots).
The foundational function for creating graphs: plot(). This includes how to build a graph, from adding
lines and points to attaching a legend.
#This draws axes labeled x and y. The horizontal (x) axis ranges from
−3 to 3. The vertical (y) axis ranges from −1 to 5. The argument
type="n" means that there is nothing in the graph itself.
x <- seq(-pi,pi,0.1)
plot(x,sin(x),main="overlaying Graphs",type="l",col="blue")
lines(x,cos(x),col="red")
legend('topleft',c("sin(x)","cos(x)"),fill=c("blue","red"))
Bar plot:- A bar chart represents data in rectangular bars with length of the bar proportional to the value
of the variable. R uses the function barplot( ) to create bar charts. R can draw both vertical and
horizontal bars in the bar chart. In bar chart each of the bars can be given different
colors.
A bar graph is a chart that uses bars to show comparisons between
categories of data. A bar graph will have two axes. One axis will describe the types of
categories being compared, and the other will have numerical values that represent
the values of the data. It does not matter which axis is which, but it will determine
what bar graph is shown. If the descriptions are on the horizontal axis, the bars will
be oriented vertically, and if the values are along the horizontal axis, the bars will be oriented
horizontally.
Syntax:- barplot(H, xlab, ylab, main, names.arg, col)
Following is the description of the parameters used −
H is a vector or matrix containing numeric values used in bar chart.
xlab is the label for x axis.
ylab is the label for y axis.
main is the title of the bar chart.
names.arg is a vector of names appearing under each bar.
col is used to give colors to the bars in the graph.
Advantages
Show each data category in a frequency distribution
Display relative numbers/proportions of multiple categories
Summarize a large amount of data in a visual, easily intepretable form
Make trends easier to highlight than tables do
Estimates can be made quickly and accurately
Permit visual guidance on accuracy and reasonableness of calculations
Accessible to a wide audience
Disadvantages
Often require additional explanation
Fail to expose key assumptions, causes, impacts and patterns
Can be easily manipulated to give false impressions
Pie Chart :- A pie-chart is a representation of values as slices of a circle with different colors. The slices
are labeled and the numbers corresponding to each slice is also represented in the chart.
In pie chart, the circle is drawn with radii proportional to the square root of
the quantities to be represented because the area of a circle is given by 2pr2. The sectors
are coloured and shaded differently. To construct a pie chart, we draw a circle with some
suitable radius (square root of the total). The angles are calculated for each sector as
follows:
Angles for each sector = Component Part × 360o
Total
Syntax:- pie(x, labels, radius, main, col, clockwise)
Following is the description of the parameters used
x is a vector containing the numeric values used in the pie chart.
labels is used to give description to the slices.
radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
main indicates the title of the chart.
col indicates the color palette.
clockwise is a logical value indicating if the slices are drawn clockwise or anti clockwise.
Advantages
Display relative proportions of multiple classes of data.
Size of the circle can be made proportional to the total quantity it represents.
Summarize a large data set in visual form.
Be visually simpler than other types of graphs.
Permit a visual check of the reasonableness or accuracy of calculations.
Disadvantages
Do not easily reveal exact values
Many pie charts may be needed to show changes over time
Fail to reveal key assumptions, causes, effects, or patterns
Be easily manipulated to yield false impressions
Histogram:- A histogram represents the frequencies of values of a variable bucketed into ranges.
Histogram is similar to bar chat but the difference is it groups the values into continuous ranges. Each
bar in histogram represents the height of the number of values present in that range.
R creates histogram using hist() function. This function takes a vector as an input and uses some
more parameters to plot histograms.
Syntax:- hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −
v is a vector containing numeric values used in histogram.
main indicates title of the chart.
col is used to set color of the bars.
border is used to set border color of each bar.
xlab is used to give description of x-axis.
xlim is used to specify the range of values on the x-axis.
ylim is used to specify the range of values on the y-axis.
breaks is used to mention breakpoints between histogram cells
counts: The count of values in a particular range.
mids: center point of multiple cells.
density: cell density
Examples of histogram
#Simple histogram
v <- c(9,13,21,8,36,22,12,41,31,33,19)
h <- hist(v,xlab = "Weight",col = "pink",border = "blue")
>h
$breaks
[1] 5 10 15 20 25 30 35 40 45
$counts
[1] 2 2 1 2 0 2 1 1
$density
[1] 0.03636364 0.03636364 0.01818182 0.03636364
0.00000000 0.03636364 0.01818182
[8] 0.01818182
$mids
[1] 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5
$xname
[1] "v"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
To specify the range of values allowed in X axis
and Y axis, we can use the xlim and ylim
parameters.The width of each of the bar can be
decided by using breaks.
v <- c(9,13,31,8,31,22,12,31,35)
hist(v,xlab = "Weight",col = "light green",
border = "red", xlim = c(0,40), ylim = c(0,5),breaks = 5)
x<- c(5,3,5,7,3,6,5)
hist(x,breaks = 4,col="violetred3", ,main="breaks=4" )
hist(x,breaks = 10,col="slateblue3",main="breaks=10" )
Kernel Density Plots:- Kernal density plots are usually a much more effective way to view the
distribution of a variable. Create the plot using plot(density(x)) where x is a numeric vector.
Examples
# Kernel Density Plot
v <- c(9,13,21,8,36,22,12,41,31,33,19)
# returns the density data
d <- density(v)
# plots the results
plot(d)
Advantages
Visually strong.
Can compare to normal curve.
Usually vertical axis is a frequence count of item falling in to each category.
Disadvantages
Cannot read exact values because data is grouped in categories.
More difficult to compare two data sets.
Use only with continuous data
Box plot:- Boxplots are a measure of how well distributed is the data in a data set. It divides the data
set into three quartiles. This graph represents the minimum, maximum, median, first quartile and
third quartile in the data set. It is also useful in comparing the distribution of data across data sets by
drawing boxplots for each of them. Boxplots are created in R by using the boxplot() function.
Syntax:- boxplot(x, data, notch, varwidth, names, main)
Following is the description of the parameters used −
x is a vector or a formula.
data is the data frame.
notch is a logical value. Set as TRUE to draw a notch.
varwidth is a logical value. Set as true to draw width of the box
proportionate to the sample size.
names are the group labels which will be printed under each
boxplot.
main is used to give a title to the graph.
Examples
x <- c(7,3,2,4,8)
boxplot(x,col="pink")
> input <- mtcars[,c('mpg','cyl')]
> print(head(input))
mpg cyl
Mazda RX4 21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710 22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant 18.1 6
$names
[1] "1"
Advantages:
⚫ A box plot is a good way to summarize large amounts of data.
⚫ It displays the range and distribution of data along a number line.
⚫ Box plots provide some indication of the data’s symmetry and skew-ness.
⚫ Box plots show outliers.
Disadvantages
⚫ Original data is not clearly shown in the box plot; also, mean and mode cannot be
identified in a box plot.
⚫ Exact values not retained.
Customizing Graphs:-
a) Changing Character Sizes: (The cex Option) The cex (for character expand) function allows to expand or
shrink characters within a graph, which can be very useful. You can use it as a named parameter in
various graphing functions. For instance, you may wish to draw the text “abc” at some point, say (2.5,4),
in your graph but with a larger font,in order to call attention to this particular text.
Example:- text(2.5,4,"abc",cex = 1.5)
This prints the same text as in our earlier example but with characters 1.5 times the normal size.
In the call to polygon() here, the first argument is the set of x- coordinates for the rectangle, and the
second argument specifies the y-coordinates. The third argument specifies that the rectangle in this case
should be shaded in solid gray.
As another example, we could use the density argument to fill the rectangle with striping. This call
specifies 10 lines per inch:
polygon(c(1.2,1.4,1.4,1.2),c(0,0,f(1.3),f(1.3)),density=10)
Saving Graphs:-The R graphics display can consist of various graphics devices. The default device is the
screen. Inorder to save a graph to a file, you must set up another device.
The graph can be saved in a variety of formats from the menu File -> Save As.
The graph can also be saved using one of the following functions.
Function Output to
pdf("mygraph.pdf") pdf file
win.metafile("mygraph.wmf") windows metafile
png("mygraph.png") png file
jpeg("mygraph.jpg") jpeg file
bmp("mygraph.bmp") bmp file
postscript("mygraph.ps") postscript file
Let’s go through the basics of R graphics devices first to introduce R graphics device concepts, and then
discuss a second approach that is much more direct and convenient.
> pdf("d12.pdf")
This opens the file d12.pdf. We now have two devices open, as we can confirm:
> dev.list()
X11 pdf
2 3
The screen is named X11 when R runs on Linux. (It’s named windows on Windows systems.) It is device
number 2 here. Our PDF file is device number 3. Our active device is the PDF file:
> dev.cur()
pdf
3
All graphics output will now go to this file instead of to the screen. But what if we wish to save what’s
already on the screen?
Saving the Displayed Graph:-One way to save the graph currently displayed on the screen is to
reestablish the screen as the current device and then copy it to the PDF device, which is 3 in our
example, as follows:
> dev.set(2)
X11
2
> dev.copy(which=3)
pdf
3
But actually, it is best to set up a PDF device as shown earlier and then rerun whatever analyses led to
the current screen. This is because the copy operation can result in distortions due to mismatches
between screen devices and file devices.
Closing an R Graphics Device:-Note that the PDF file we create is not usable until we close it, which we
do as follows:
> dev.set(3)
pdf
3
> dev.off()
X11
2
You can also close the device by exiting R, if you’re finished working with it. But in future versions of R,
this behavior may not exist, so it’s probably better to proactively close.
Example:
# Create the data for the chart.
H <- c(7,12,28,3,41)