Intro to Data Science


1) DOA Lizards

Horned lizards have a problem. They get eaten by birds. BUT they are horned for a reason. Those horns might well protect them from being eaten. We have a data set of lizards with their aquamosal horn length. Lizards were either sampled live from the wild (Survive = 1), or from the corpses of those killed by birds (Survive = 0). Load up the data (noting that you’ll have to deal with a non-standard na.strings) and using `ggplot2 plot a histogram of the horn length of living and dead lizards.

2) T-Test with Equal Sample Sizes

Let’s try the simplest two-sample unpaired t-test with this data. But to satisfy that, we’ll have to make one change first…

2a) Making a balanced data frame

A basic two-sample unpaired t-test needs sample sizes that are the same for both treatments. Using dplyr, make a data set that has 30 entries for both survival classes. You’ll need to use both group_by and either slice or sample_n depending on your approach.

2b) Put it to the test

With this new data set, use t.test to run a t-test on the data. Note, the default values for the t.test function do not do a simple unpaired t-test with equal variances. You’ll have to look at the help file to make sure you set the arguments properly. What are the results?

2c) Assume nothing

OK, now that you have an answer, let’s make sure it’s the right one. Evaluate the normality of the residuals of this t-test, and make sure the residuals for each group are normal centered on 0. Do this by visualizing histograms of residuals overall and by treatment. dplyr might help you here.

Is this test OK? Do you believe it? If not, what do you need to do to the data to meet the assumptions of the t-test?

3) T-Test with Unequal Sample Sizes

OK, we actually have a lot more data for surviving than dead horned lizards. How do the results of your t-test differ if you use all of the data, again, assuming that each population has the same variance. Apply any transformation to the data you feel appropriate given your tests of assumptions.

4) T-Test with Unequal Sample Sizes and Variances

Now, how do the results differ if you DON’T assume equal variances in addition to using unequal sample sizes? Apply any transformation to the data you feel appropriate given your tests of assumptions.