Week 1. Data and Metadata.
Readings: G&W Introductory chapter.
Objective(s): Introduce the students to the course; understand what is data, discuss how we preserve information about data, view different examples of datasets from different disciplines.
Lab: Introduction to Excel, How would you describe different biological observations as data?
Lectures:Intro and Course Overview, The Data Collection Process
Week 2. Data Creation
Objective(s): Compare poor versus good practice in creating data. Differentiate between data recording and data entry, Develop a practical familiarity with data quality control
Lab: Planning and collecting data. Meet on the 3rd floor of the ISC.
Lectures:Data Entry, Introduction to R
Code: R Intro
Week 3 & 4. Visualization & Introduction to R
Readings: G&W Chapter Import, G&W Chapter on Data Viz, Unwin 2008, Choosing a Good Chart Cheat Sheet
Objective(s): Begin to learn the R computing language, develop understanding of graphical presentation best practices. Identify the syntax of an R function (name and arguments); Create an R project in RStudio; Read data into R using read.csv(); Use R as a basic calculator; Describe and create variables in R; Interpret the output of the str() function; Install packages in R; Create a scatterplot using ggplot();
Labs: Bringing data into R, visualization of Plankton data via ggplot2
Lectures:Lists, Matrices, and Data Frames in R, Reading Data and Using Libraries, Extending Ggplot2
Code: Lists, Matrices, and Data Frames, Loading data and Libraries,
Files: Sample Data File
Week 5&6. Data Reduction
Readings: G&W Chapter on Data Transformation, G&W Chapter on Variation, Anderson 2014
Objective(s): Describe the meaning and identify applications of the following summary/descriptive statistics: mean, mode, median, standard deviation; Describe the split-apply-combine strategy of data reduction and summarization; Use group_by() and summarise() to calculate summary statistics for groupings within a dataset; Subset data using filter()
Lab: Descriptive statistics in R, Introduce vectors, dplyr for Microbial data aggregation
Lectures: Intro to statistics – sampling and distributions (slides); Sampling distributions (slides)
Files: Sockeye data file; Human gene lengths data file
Etherpad: https://etherpad.wikimedia.org/p/BNxKbc1zwL
Thursday etherpad: https://etherpad.wikimedia.org/p/YeyDNw0D6i
Week 7. Tidy Data
Readings: G&W Chapter on Tidy Data
Objective(s): Understand how to reshape and manipulate data. Describe the difference between the two fundamental forms of data – long versus wide, Use the tidyr package in R to convert between long and wide data; Use unite and separate to create tidy data (where each column is a variable); learn how to complete ragged data
Lecture: Regular Expressions, Tidy data
Files: Axoltl data, Mammal data
Lab: Tidyr and data reshaping with Axoltl Limb Regneration data
Week 8. Functions
Readings: G&W Functions Chapter
Objective(s): Learn the benefits of reusable code, Understand the structure of a function, Discover debugging and making functions fail usefully, Apply conditional logic to build flexible code, Derive principles to make functions that are easy to understand and apply to multiple data sets.
Lab: Use functions to read, clean, and visualize environmental data from local MA area oceanographic sensing suites
Lecture: Functions Intro, Modular Functions and Loops
Data: Buoy Data from 44013
Etherpad: https://etherpad.wikimedia.org/p/buoy_function
Week 9. Data “Mashups”
Readings: G&W Relational Data Chapter
Objective(s): Know when and where to use different types of joins, Understand how to merge survey data with geospatial information to get a geographic understanding of epidemiological patterns
Lecture: Joining Data, Joins and Maps, Dynamic Maps
Lab: Joins and merging different mammal physiology data sets
Data: Hemlock and Woolly Adelgids, Maps and Joins Data
In Class Code: Maps and Joins
Etherpad: https://etherpad.wikimedia.org/p/join
Note: To install gdal on a mac, there are two steps
1) Install Homebrew from http://brew.sh/ (this is an awesome thing to have anyway)
2) in Terminal type
brew install gdal
To install on a Windows PC
1) Install OSGEO4W https://trac.osgeo.org/osgeo4w/wiki
2) use it to install gdal
Week 10. Accessing Online Data
Readings: Intro to R for Bioinformatics, Sequence Analysis
Lecture: Using R for Wet and Dry Scientists, Sequence Alignment
Code: Chapter 1 code, Chapter 9 code, Exercise answers
Objective(s): List the major source of online data for bioinformatics and ecoinformatics, acquire the basic tools for scraping data from the web
Lab: Querying Genbank, Gene alignment, RCURL and web scraping
Week 11 & 12. T-Tests and P-Values
Readings: Cortina and Dunlop 1997, ASA Statement on P-Values
Objective(s): Describe the basics of probability and p-values, Compare groups of data using T-tests and its extensions
Lecture: T-tests, ANOVA
Code: Example T-Test in R
Lab: Implementing statistical tests in R for Microbial Abundance Data, Data Simulation and P-Values, Evaluation of assumptions
Data: Blackbird Testosterone from W&S, data sets for ANOVA
Week 13 & 14. Linear Regression
Readings: Handouts on linear regression
Objective(s): Fit a linear regression using lm() in R through a bivariate scatterplot, Describe when to use nonlinear models/curves
Lectures: Linear Models, Multiple Linear Regression
Data: Seals, Nonlinear Regression Data, Naked mole rats, Keeley Fire Data
Lab: Fitting linear models in R, Testing Assumptions, Evaluating model outputs and generating predictions, using lizard evo-devo data
Recent Comments