**Week 1. Data and Metadata.**

Readings: G&W Introductory chapter.

Objective(s): Introduce the students to the course; understand what is data, discuss how we preserve information about data, view different examples of datasets from different disciplines.

Lab: Introduction to Excel, How would you describe different biological observations as data?

Lectures:Intro and Course Overview, The Data Collection Process

**Week 2. Data Creation**

Objective(s): Compare poor versus good practice in creating data. Differentiate between data recording and data entry, Develop a practical familiarity with data quality control

Lab: Planning and collecting data. Meet on the 3rd floor of the ISC.

Lectures:Data Entry, Introduction to R

Code: R Intro

**Week 3 & 4. Visualization & Introduction to R**

Readings: G&W Chapter Import, G&W Chapter on Data Viz, Unwin 2008, Choosing a Good Chart Cheat Sheet

Objective(s): Begin to learn the R computing language, develop understanding of graphical presentation best practices. Identify the syntax of an R function (name and arguments); Create an R project in RStudio; Read data into R using read.csv(); Use R as a basic calculator; Describe and create variables in R; Interpret the output of the str() function; Install packages in R; Create a scatterplot using ggplot();

Labs: Bringing data into R, visualization of Plankton data via ggplot2

Lectures:Lists, Matrices, and Data Frames in R, Reading Data and Using Libraries, Extending Ggplot2

Code: Lists, Matrices, and Data Frames, Loading data and Libraries,

Files: Sample Data File

**Week 5&6. Data Reduction**

Readings: G&W Chapter on Data Transformation, G&W Chapter on Variation, Anderson 2014

Objective(s): Describe the meaning and identify applications of the following summary/descriptive statistics: mean, mode, median, standard deviation; Describe the split-apply-combine strategy of data reduction and summarization; Use group_by() and summarise() to calculate summary statistics for groupings within a dataset; Subset data using filter()

Lab: Descriptive statistics in R, Introduce vectors, dplyr for Microbial data aggregation

Lectures: Intro to statistics – sampling and distributions (slides); Sampling distributions (slides)

Files: Sockeye data file; Human gene lengths data file

Etherpad: https://etherpad.wikimedia.org/p/BNxKbc1zwL

Thursday etherpad: https://etherpad.wikimedia.org/p/YeyDNw0D6i

**Week 7. Tidy Data**

Readings: G&W Chapter on Tidy Data

Objective(s): Understand how to reshape and manipulate data. Describe the difference between the two fundamental forms of data – long versus wide, Use the tidyr package in R to convert between long and wide data; Use unite and separate to create tidy data (where each column is a variable); learn how to complete ragged data

Lecture: Regular Expressions, Tidy data

Files: Axoltl data, Mammal data

Lab: Tidyr and data reshaping with Axoltl Limb Regneration data

**Week 8. Functions**

Readings: G&W Functions Chapter

Objective(s): Learn the benefits of reusable code, Understand the structure of a function, Discover debugging and making functions fail usefully, Apply conditional logic to build flexible code, Derive principles to make functions that are easy to understand and apply to multiple data sets.

Lab: Use functions to read, clean, and visualize environmental data from local MA area oceanographic sensing suites

Lecture: Functions Intro, Modular Functions and Loops

Data: Buoy Data from 44013

Etherpad: https://etherpad.wikimedia.org/p/buoy_function

**Week 9. Data “Mashups”**

Readings: G&W Relational Data Chapter

Objective(s): Know when and where to use different types of joins, Understand how to merge survey data with geospatial information to get a geographic understanding of epidemiological patterns

Lecture: Joining Data, Joins and Maps, Dynamic Maps

Lab: Joins and merging different mammal physiology data sets

Data: Hemlock and Woolly Adelgids, Maps and Joins Data

In Class Code: Maps and Joins

Etherpad: https://etherpad.wikimedia.org/p/join

Note: To install gdal on a mac, there are two steps

1) Install Homebrew from http://brew.sh/ (this is an awesome thing to have anyway)

2) in Terminal type

brew install gdal

To install on a Windows PC

1) Install OSGEO4W https://trac.osgeo.org/osgeo4w/wiki

2) use it to install gdal

**Week 10. Accessing Online Data**

Readings: Intro to R for Bioinformatics, Sequence Analysis

Lecture: Using R for Wet and Dry Scientists, Sequence Alignment

Code: Chapter 1 code, Chapter 9 code, Exercise answers

Objective(s): List the major source of online data for bioinformatics and ecoinformatics, acquire the basic tools for scraping data from the web

Lab: Querying Genbank, Gene alignment, RCURL and web scraping

**Week 11 & 12. T-Tests and P-Values**

Readings: Cortina and Dunlop 1997, ASA Statement on P-Values

Objective(s): Describe the basics of probability and p-values, Compare groups of data using T-tests and its extensions

Lecture: T-tests, ANOVA

Code: Example T-Test in R

Lab: Implementing statistical tests in R for Microbial Abundance Data, Data Simulation and P-Values, Evaluation of assumptions

Data: Blackbird Testosterone from W&S, data sets for ANOVA

**Week 13 & 14. Linear Regression**

Readings: Handouts on linear regression

Objective(s): Fit a linear regression using lm() in R through a bivariate scatterplot, Describe when to use nonlinear models/curves

Lectures: Linear Models, Multiple Linear Regression

Data: Seals, Nonlinear Regression Data, Naked mole rats, Keeley Fire Data

Lab: Fitting linear models in R, Testing Assumptions, Evaluating model outputs and generating predictions, using lizard evo-devo data

## Recent Comments