For this assignment we’re going to look at birthweights of babies in California from 2000 to 2013. The data has year, groups of birthweights, and counts of number of babies in that group of birth weights. There’s also information on county, zip code, lat/long, etc. These may or may not be useful, but are interesting grouping factors.
With this in hand, what is a question that you want to ask? This can be about how things change over time, how they change within a group, how things change by city (or zip). Write down a question. Keep it simple and direct!
OK, based on that question, what steps will you need to undertake to manipulate the data into a usable form, and then make plots? Remember, no code! Just steps.
Based on your steps above, code the steps of data manipulation to make bend the dataset to your will.
With this reshaped data, make a plot that answers your question! What does it say? What did you learn from this plot with respect to your question?
Now is the time that you need to begin to think more about your final project. Provide a link to the dataset you are going to use for your final project (if you have it on your laptop, you can share it with me via dropbox or other service). Also provide a brief description.
For those who have not yet found the data they want to work with, consider the following links to searchable data archives.
https://www.dataone.org/
http://wonder.cdc.gov/
http://www.data.gov/ecosystems/
http://www.data.gov/ocean/
http://www.data.gov/health/
http://ebird.org/content/ebird/
http://www.healthdata.gov/
https://crcns.org/NWB
https://phpartners.org/health_stats.html
Now, with the data at hand, what questions might you ask of this data set?
Note that there is latitude and longitude information in this data. Can you use that in some way to plot out anything interesting in the data in terms of geographic distribution. Note, log(x+1) transformations may be your friend for some things. Have fun with this!