Explain the difference between categorical and continuous data.
Interpret a data table using a meta-data sheet.
Use ggplot2 to create a scatterplot that shows the relationship between two variables.
Use colours, point size, point shape, and facets to include more than two variables in a ggplot.
Here is a brief description of the basic building blocks of a creating a ggplot.
argument | description of component |
---|---|
data | as a data.frame (long format!) |
aesthetic (aes) | mapping variables to visualise properties - position, colour, line, type, size |
geom | actual visualisation of the data |
scale | map values to the aesthetics, colour, size, shape (show up as legends and axes) |
stat | statistical transformations, summaries of data (e.g., line fits, etc., ) |
facet | splitting data across panels based on different subsets of the data |
1. Using the meta-data on worksheet one of the data file EST-PR-PlanktonChemTax.xls, give a brief description (2 - 3 sentences), in your own words, of the plankton data set.
2. Also in the meta-data worksheet, you will find a ‘Variable Descriptions’ section, much like you created for your first lab write-up of the semester. Looking through that and the other information on the meta-data sheet if you need to, identify at least two examples of categorical data and at least two examples of numerical data.
library(readxl)
Read the plankton data into R and take a look at the structure of the data.
Confirm that the data matches what you expect it to given the meta-data.
str(plankton)
It is important that we get our data sorted out and checked before plotting. While visualisation is a great way to detect problems in your data, it helps that you have a decent data.frame to start. Also, by looking at the structure, we remind ourselves what the data columns are called.
3. Use the function unique()
to look at the SampleType. What do you notice? How do you think this could cause problems in analysis? Describe some possible solutions to prevent this problem before you get to the step of reading the data into R.
# Install the package dplyr if you have not already done so for this class
#install.packages('dplyr')
library(dplyr)
plankton <-
plankton %>%
mutate(SampleType = replace(SampleType, SampleType == 'Whole water',
'Wholewater')) %>%
mutate(SampleType = replace(SampleType, SampleType == '<20' |
SampleType == "< 20 um", '<20um'))
One last step. Let’s create a subset of the data to work with today to keep things more manageable. Create a new object called plankton_sub
that contains the first 164 rows of the plankton
dataset.
plankton_sub <- plankton[1:164, ]
# Let's check what dates this covers in the data.
range(plankton_sub$Date)
## [1] "2003-04-16 UTC" "2006-12-05 UTC"
library(ggplot2)
4.Given the examples we have seen, make this plot
5. Create the following plots and briefly describe what you see.
6. What is the difference between colour and fill?