Posts

Extinct Plants

For this week’s Tidy Tuesday, I’m exploring data from the International Union for Conservation of Nature on extinct plants, including both species which are totally extinct and those which survive only under cultivation. As I explored the data on these 500 species (see my notes), a handful of stories leaped out at me that I thought could best be communicated through simple, bold graphics. The first is how much plant extinction has taken place in Africa and in Madagascar in particular.

Energy

I created this map of modern renewable (wind and solar) electricity generation in Europe as part of this week’s Tidy Tuesday. The source code is available here.

Systemic Racism in the NYPD

ProPublica recently released a partial database of New York Police Department (NYPD) disciplinary records. An analysis of substantiated complaints of police misconduct reveals clear systemic racism. Black people face wildly disproportionate amounts of police misconduct regardless of the race or gender of individual officers. Following a change to the New York law that kept police officers' disciplinary records secret—and amid an ongoing lawsuit—ProPublica has released a searchable database of complaints to the Civilian Complaint Review Board (CCRB).

Twitter

I’ve been exploring my personal Twitter data using the Twitter API (with the rtweet package) and the tidytext text-mining package. I haven’t come up with any mind-blowing conclusions but it’s been fun to see who my favorite tweeters are, who their favorite tweeters are, what we tweet about, and how the sentiment of my tweets has changed over time. I did not like it when Bernie Sanders dropped out of the presidential race or when Brian Kemp reopened Georgia’s economy!

Tidy Astronauts

As part of the TidyTuesday project, I created this visualization of who has gone into space based on gender and nationality. This is my first attempt at mapping data geographically! I’m pretty pleased with how it turned out, but I would welcome any feedback. The code is available here.

COVID-19 in DeKalb County, Georgia

So this post may be something of a cautionary tale about getting ahead of yourself when it comes to analyzing data. The DeKalb County Board of Health releases numbers on the spread of COVID-19 in the county, most recently on July 6. Included with these data is a breakdown of the county’s 7,043 cases by ZIP Code. It is immediately apparent that while this disease has affected the entire county, its effects have not been felt evenly.

Running Predictor

The Model I have been recording my runs in Strava for about five years. I wanted to see if I could use this data to make predictions about my racing pace. I downloaded my data from the website, including a spreadsheet collecting all my activity data (I’ve deleted some data from this file for privacy reasons). I spent some time using visualizations and linear models to determine which variables would provide the most predictive power.

COVID-19 in Metro Atlanta

Early on in the pandemic, I became frustrated by the lack of quality visualizations of local COVID-19 data, particularly concerning Metro Atlanta, where I live, so I set out to create a set of visualizations of these data. Since then, the situation has improved greatly, but these plots still provide some details and comparisons that I have not seen elsewhere. The first item I set out to create is an Rmarkdown document focused on Metro Atlanta that visualizes the distribution of cases and deaths as they change over time.