Analysing a data set, start to finish

For this assignment we’re going to look at birthweights of babies in California from 2000 to 2013. The data has year, groups of birthweights, and counts of number of babies in that group of birth weights. There’s also information on county, zip code, lat/long, etc. These may or may not be useful, but are interesting grouping factors.

1) Ask a question

With this in hand, what is a question that you want to ask? This can be about how things change over time, how they change within a group, how things change by city (or zip). Write down a question. Keep it simple and direct!

2) What are the steps you will need to undertake to answer the question?

OK, based on that question, what steps will you need to undertake to manipulate the data into a usable form, and then make plots? Remember, no code! Just steps.

3) Data manipulation

Based on your steps above, code the steps of data manipulation to make bend the dataset to your will.

4) Plot and an answer

With this reshaped data, make a plot that answers your question! What does it say? What did you learn from this plot with respect to your question?

5) Final project: data and question

Now is the time that you need to begin to think more about your final project. Provide a link to the dataset you are going to use for your final project (if you have it on your laptop, you can share it with me via dropbox or other service). Also provide a brief description.

For those who have not yet found the data they want to work with, consider the following links to searchable data archives.

https://www.dataone.org/
http://wonder.cdc.gov/
http://www.data.gov/ecosystems/
http://www.data.gov/ocean/
http://www.data.gov/health/
http://ebird.org/content/ebird/
http://www.healthdata.gov/
https://crcns.org/NWB
https://phpartners.org/health_stats.html

Now, with the data at hand, what questions might you ask of this data set?

6) Extra credit (variable, depending on awesomeness of data viz)

Note that there is latitude and longitude information in this data. Can you use that in some way to plot out anything interesting in the data in terms of geographic distribution. Note, log(x+1) transformations may be your friend for some things. Have fun with this!