This week’s Tidy Tuesday includes data on broadband usage in the United States. I started out with some exploratory analysis of this data set. Using linear regression, I modeled the relationship of broadband usage in a county to broadband availability (per the FCC), the poverty rate, median household income, the percentage of Black residents, and the county’s rural or urban character.

I found that each of these parameters had a statistically significant impact on broadband usage, but when constructing a multivariate model, I found that, of these factors, only broadband availability and median household income explained a significant amount of variance:

##
## Call:
## lm(formula = BROADBAND USAGE ~ BROADBAND AVAILABILITY PER FCC +
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -0.46386 -0.07715 -0.00304  0.07454  0.73250
##
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)                      -3.538e-01  9.753e-03  -36.27   <2e-16 ***
## BROADBAND AVAILABILITY PER FCC  3.467e-01  9.739e-03   35.60   <2e-16 ***
## Median_Household_Income_2019      6.631e-06  1.607e-07   41.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1216 on 3100 degrees of freedom
##   (40 observations deleted due to missingness)
## Multiple R-squared:  0.5916, Adjusted R-squared:  0.5913
## F-statistic:  2245 on 2 and 3100 DF,  p-value: < 2.2e-16


In order to visualize this finding, I decided to construct two choropleths, one showing broadband usage by county, the other median income. The similarities between these maps demonstrates the correlation between a county’s median household income and its broadband usage.

Source code available on GitHub.