So I’ve been learning about neural networks and how to use TensorFlow. I wanted to get in on the trend of using recurrent neural networks to generate predictive texts (inspired by Jacqueline Nolis' banned license plate generator), so I trained one to create text in the style of James Joyce’s Ulysses. Here’s a sample. I think it’s a pretty good approximation of the original. Stephen, he had murdered to.
As part of my ongoing effort to learn Python for data analysis, I created this chart summarizing the relative performance of Atlanta’s major league sports teams since the Hawks moved to Atlanta in 1968. Atlanta has not consistently had an NHL team during this time period, and Atlanta’s MLS team is a much more recent addition. In order to facilitate comparisons across sports, ties have been disregarded. The data is presented here as a 10-year rolling average in order to smooth out spikes from one season to the next.
I’ve started learning Python, so I decided to apply some of my newly developing skills to this Tidy Tuesday from a few weeks ago. The data come from the Ask a Manager Survey, which includes earnings information from more than 24,000 self-selecting survey respondents. The respondents are non-random and skew heavily toward white women in professional jobs in the United States. While exploring the data, I found, unsurprisingly, that formal education and years of experience in a field seem to have a profound effect on compensation.
This week’s Tidy Tuesday deals with commercial fishing on the Great Lakes. While exploring the data, I was struck by a rapid increase followed by a rapid decline in commercial fish hauls. I was further struck by how much of this rise and fall occurred entirely due to one species (alewife) in one lake (Michigan). It turns out that alewife are an invasive species that were first found in Lake Michigan in 1949.
This week’s Tidy Tuesday deals with Mario Kart 64 world records. In my exploration of the data, I found that newly discovered shortcuts can lead to massive improvements in world record times. While the records without shortcuts tend to improve very gradually, records with shortcuts can show large, sudden improvements. Here’s a plot showing the biggest jumps: My source code and data exploration is available on GitHub.
It’s important to add alt text to images in order to make them accessible to users of screen readers. I compose this blog using the excellent blogdown package, which enables me to easily include code-generated plots. Yesterday, I decided to finally figure out how to add alt text to these plots. I’m sharing what I learned in order to help others in the R community make their visualizations more accessible and as a reminder to myself.
This week’s Tidy Tuesday includes data on broadband usage in the United States. I started out with some exploratory analysis of this data set. Using linear regression, I modeled the relationship of broadband usage in a county to broadband availability (per the FCC), the poverty rate, median household income, the percentage of Black residents, and the county’s rural or urban character. I found that each of these parameters had a statistically significant impact on broadband usage, but when constructing a multivariate model, I found that, of these factors, only broadband availability and median household income explained a significant amount of variance:
This week’s Tidy Tuesday uses data from Water Point Data Exchange, an organization which gathers water point data from various sources with the goal of improving water access for millions of people. This week I decided to take a different approach than usual and not use ggplot2. Instead, I used Leaflet to create an interactive map of water sources in Madagascar with information about each source visible as a popup.
For this Tidy Tuesday, I decided to try my hand at interactive visualization. This week’s data comes from the Urban Institute and includes all sorts of interesting and important demographic data on wealth and income distribution. For the purposes of this visualization, I focused purely on income distribution.
Inspired by Cameron Blevins' visualization (and using his data) I created this Tidy Tuesday entry, my first animated plot. Code available in my GitHub repository.