class: center, middle, inverse, title-slide # Group 8 Presentation ### Alex Vand, Clarissa Boyajian, Scout Leonard ### 2021/10/27 --- #Our Question **How do population demographic factors impact lead exposure in Philadelphia?** - Developed from an interest in an EJ question - DataONE had environmental health data for this question specifically from a 2021 published study <img src="images/philly_map.png" width="864" /> --- #Data Management Plan - **Data managers:** Alex Vand, Clarissa Boyajian, and Scout Leonard - **Repository owner:** Scout Leonard (Clarissa and Alex are collaborators) - This team effort calls two datasets: one which contains **lead risk factors for Philadelphia communities**, and another which contains **human demographic data.** - 6 hours for the 2 datasets - Code to call/clean/combine data in our repo - **Preservation plan:** We plan to save until the end of MEDS! - No legal constraints. <img src="images/Octocat.png" width="384" /> --- #Obtaining and Merging Data - Data Retrieval - Public Health/Lead Data: `metajam` package - This uses an API to download the DataONE data - This creates the `metajam` log; stashed in our `.gitignore` - Census/demographic data: `censusapi` package - Requires an access token to control server traffic - `censusapi` package restricted us from pulling based on 1 county; we downloaded all of the data from state 42 before filtering - Data Combination - Left joined datasets based on census tract - Census tract format required cleaning in lead dataset <img src="images/gitignore_image.png" width="924" /> --- #Analysis and Results <img src="images/ebll_vs_income_plot.png" width="2361" /> --- #Analysis and Results - Winsorized to control for outliers ```r median_income_95 <- quantile( lead_census_joined$acs_median_income_2019, probs = 0.95, na.rm = TRUE) lead_census_joined <- lead_census_joined %>% mutate(income_winsor = case_when( acs_median_income_2019 >= median_income_95 ~ median_income_95, acs_median_income_2019 < median_income_95 ~ acs_median_income_2019)) model <- lm(number_of_children_with_ebll_2015 ~ income_winsor, data = lead_census_joined) model %>% summary() ``` --- #Analysis and Results - OLS results showing 5% variation in EBLL is explained by median income <img src="images/ols_results_image.png" width="659" /> --- #Future Analysis - Compare **multiple socioeconomic factors** to individual lead risk factors or resulting health impacts - Spatial visualization of these variables - The real world is more complicated! <img src="images/variables_image.png" width="795" /> --- ##Data Preservation: KNB and Github - GitHub - ReadMe with Data Management Plan - Code to call original datasets - Code for combining datasets - Analyses - KNB - Our final combined dataset in .csv format - Metadata for our combined dataset - Our contact info and Orcid IDs <img src="images/knb.png" width="3824" />