AE03-02 Exploring data on plstic waste

Published

June 29, 2022

This exercise is adopted from Data Science in a Box.

Learning goals

  • Visualizing numerical and categorical data and interpreting visualizations
  • Recreating visualizations
  • Getting more practice using with R, RStudio, Git, and GitHub

Setup

library(tidyverse)

Data

The data set for this assignment can be found as a csv file in the data folder of your repository. You can read it in using the following.

plastic_waste <- read_csv("data/plastic-waste.csv")

The variable descriptions are as follows:

  • code: 3 Letter country code
  • entity: Country name
  • continent: Continent name
  • year: Year
  • gdp_per_cap: GDP per capita constant 2011 international $, rate
  • plastic_waste_per_cap: Amount of plastic waste per capita in kg/day
  • mismanaged_plastic_waste_per_cap: Amount of mismanaged plastic waste per capita in kg/day
  • mismanaged_plastic_waste: Tonnes of mismanaged plastic waste
  • coastal_pop: Number of individuals living on/near coast
  • total_pop: Total population according to Gapminder

Exercises 1.

Glimpse at the data

plastic_waste %>% glimpse()
Rows: 240
Columns: 10
$ code                             <chr> "AFG", "ALB", "DZA", "ASM", "AND", "A…
$ entity                           <chr> "Afghanistan", "Albania", "Algeria", …
$ continent                        <chr> "Asia", "Europe", "Africa", "Oceania"…
$ year                             <dbl> 2010, 2010, 2010, 2010, 2010, 2010, 2…
$ gdp_per_cap                      <dbl> 1614.255, 9927.182, 12870.603, NA, NA…
$ plastic_waste_per_cap            <dbl> NA, 0.069, 0.144, NA, NA, 0.062, 0.25…
$ mismanaged_plastic_waste_per_cap <dbl> NA, 0.032, 0.086, NA, NA, 0.045, 0.01…
$ mismanaged_plastic_waste         <dbl> NA, 29705, 520555, NA, NA, 62528, 52,…
$ coastal_pop                      <dbl> NA, 2530533, 16556580, NA, NA, 379004…
$ total_pop                        <dbl> 31411743, 3204284, 35468208, 68420, 8…
plastic_waste %>% head()
# A tibble: 6 × 10
  code  entity     continent  year gdp_per_cap plastic_waste_p… mismanaged_plas…
  <chr> <chr>      <chr>     <dbl>       <dbl>            <dbl>            <dbl>
1 AFG   Afghanist… Asia       2010       1614.           NA               NA    
2 ALB   Albania    Europe     2010       9927.            0.069            0.032
3 DZA   Algeria    Africa     2010      12871.            0.144            0.086
4 ASM   American … Oceania    2010         NA            NA               NA    
5 AND   Andorra    Europe     2010         NA            NA               NA    
6 AGO   Angola     Africa     2010       5898.            0.062            0.045
# … with 3 more variables: mismanaged_plastic_waste <dbl>, coastal_pop <dbl>,
#   total_pop <dbl>

Exercises 2.

Compute summary statistics using skimr::skim() . Load the package first or even install it if needed.

library(skimr)
plastic_waste %>% skim()
Data summary
Name Piped data
Number of rows 240
Number of columns 10
_______________________
Column type frequency:
character 3
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
code 0 1 3 3 0 233 0
entity 0 1 4 32 0 233 0
continent 0 1 4 13 0 6 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2010.0 ▁▁▇▁▁
gdp_per_cap 45 0.81 17682.60 19908.70 660.21 3816.04 10436.37 23107.79 125140.8 ▇▂▁▁▁
plastic_waste_per_cap 51 0.79 0.20 0.28 0.01 0.10 0.14 0.25 3.6 ▇▁▁▁▁
mismanaged_plastic_waste_per_cap 51 0.79 0.05 0.05 0.00 0.01 0.03 0.07 0.3 ▇▂▁▁▁
mismanaged_plastic_waste 51 0.79 171569.84 726640.23 1.00 2245.00 16259.00 80759.00 8819717.0 ▇▁▁▁▁
coastal_pop 51 0.79 10944246.95 30930553.85 596.00 257904.00 1986723.00 7573074.00 262892387.0 ▇▁▁▁▁
total_pop 10 0.96 30936344.00 124266269.69 50.00 532412.00 5413332.50 19703072.25 1341335152.0 ▇▁▁▁▁

Summary statistics by continent.

plastic_waste %>% 
  group_by(continent) %>% 
  skim()
Data summary
Name Piped data
Number of rows 240
Number of columns 10
_______________________
Column type frequency:
character 2
numeric 7
________________________
Group variables continent

Variable type: character

skim_variable continent n_missing complete_rate min max empty n_unique whitespace
code Africa 0 1 3 3 0 58 0
code Asia 0 1 3 3 0 53 0
code Europe 0 1 3 3 0 53 0
code North America 0 1 3 3 0 38 0
code Oceania 0 1 3 3 0 24 0
code South America 0 1 3 3 0 14 0
entity Africa 0 1 4 28 0 58 0
entity Asia 0 1 4 20 0 53 0
entity Europe 0 1 5 22 0 53 0
entity North America 0 1 4 32 0 38 0
entity Oceania 0 1 4 24 0 24 0
entity South America 0 1 4 16 0 14 0

Variable type: numeric

skim_variable continent n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year Africa 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2.010000e+03 ▁▁▇▁▁
year Asia 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2.010000e+03 ▁▁▇▁▁
year Europe 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2.010000e+03 ▁▁▇▁▁
year North America 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2.010000e+03 ▁▁▇▁▁
year Oceania 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2.010000e+03 ▁▁▇▁▁
year South America 0 1.00 2010.00 0.00 2010.00 2010.00 2010.00 2010.00 2.010000e+03 ▁▁▇▁▁
gdp_per_cap Africa 7 0.88 5442.00 7093.30 660.21 1406.43 2365.68 6170.47 3.372337e+04 ▇▂▁▁▁
gdp_per_cap Asia 5 0.91 23227.78 27752.65 1614.26 4407.30 10973.97 31249.36 1.251408e+05 ▇▂▁▁▁
gdp_per_cap Europe 7 0.87 29171.33 17957.04 3910.86 16564.70 26198.77 39630.60 9.174329e+04 ▇▇▃▁▁
gdp_per_cap North America 14 0.63 18133.37 14693.41 1502.03 7966.22 12561.91 23364.65 5.639501e+04 ▇▃▂▁▁
gdp_per_cap Oceania 10 0.58 9233.16 12110.15 1731.54 3037.73 4231.67 7162.18 4.146404e+04 ▇▁▁▁▁
gdp_per_cap South America 2 0.86 12440.43 4971.02 5407.47 8836.64 12556.19 16679.33 1.944205e+04 ▇▇▁▇▇
plastic_waste_per_cap Africa 18 0.69 0.10 0.07 0.01 0.04 0.07 0.14 3.600000e-01 ▇▅▁▁▁
plastic_waste_per_cap Asia 12 0.77 0.17 0.13 0.01 0.08 0.14 0.21 6.900000e-01 ▇▅▂▁▁
plastic_waste_per_cap Europe 17 0.68 0.20 0.11 0.04 0.12 0.20 0.25 4.800000e-01 ▇▇▇▁▂
plastic_waste_per_cap North America 0 1.00 0.34 0.56 0.03 0.15 0.25 0.26 3.600000e+00 ▇▁▁▁▁
plastic_waste_per_cap Oceania 2 0.92 0.18 0.07 0.10 0.10 0.14 0.25 3.300000e-01 ▇▁▁▃▁
plastic_waste_per_cap South America 2 0.86 0.21 0.13 0.12 0.14 0.16 0.25 5.900000e-01 ▇▃▁▁▁
mismanaged_plastic_waste_per_cap Africa 18 0.69 0.06 0.04 0.00 0.04 0.05 0.09 1.700000e-01 ▇▇▆▂▁
mismanaged_plastic_waste_per_cap Asia 12 0.77 0.05 0.06 0.00 0.01 0.03 0.07 3.000000e-01 ▇▃▁▁▁
mismanaged_plastic_waste_per_cap Europe 17 0.68 0.02 0.02 0.00 0.00 0.01 0.02 6.000000e-02 ▇▂▁▁▁
mismanaged_plastic_waste_per_cap North America 0 1.00 0.03 0.04 0.00 0.01 0.02 0.05 1.900000e-01 ▇▃▁▁▁
mismanaged_plastic_waste_per_cap Oceania 2 0.92 0.08 0.06 0.00 0.01 0.09 0.10 2.400000e-01 ▆▇▁▂▁
mismanaged_plastic_waste_per_cap South America 2 0.86 0.04 0.06 0.00 0.02 0.03 0.04 2.200000e-01 ▇▁▁▁▁
mismanaged_plastic_waste Africa 18 0.69 123364.73 227741.77 96.00 15996.25 34961.50 69788.75 9.670120e+05 ▇▁▁▁▁
mismanaged_plastic_waste Asia 12 0.77 574429.02 1482265.25 1.00 5599.00 33747.00 480493.00 8.819717e+06 ▇▁▁▁▁
mismanaged_plastic_waste Europe 17 0.68 31355.72 82280.10 62.00 2502.50 10997.00 25006.75 4.859370e+05 ▇▁▁▁▁
mismanaged_plastic_waste North America 0 1.00 35790.18 59840.20 11.00 294.00 2865.50 73833.75 2.754240e+05 ▇▂▁▁▁
mismanaged_plastic_waste Oceania 2 0.92 10569.95 21106.05 4.00 426.25 2165.50 8620.50 8.983500e+04 ▇▁▁▁▁
mismanaged_plastic_waste South America 2 0.86 101595.67 133049.24 5.00 17157.00 67087.50 121481.50 4.714040e+05 ▇▃▁▁▁
coastal_pop Africa 18 0.69 4817735.85 6335427.38 6839.00 858697.75 1994069.00 6782124.50 2.747711e+07 ▇▂▁▁▁
coastal_pop Asia 12 0.77 30428085.05 58277023.87 596.00 1124249.00 6048920.00 22890252.00 2.628924e+08 ▇▁▁▁▁
coastal_pop Europe 17 0.68 6892015.69 10594720.27 33483.00 550160.00 2729103.50 8590222.00 4.325889e+07 ▇▂▁▁▁
coastal_pop North America 0 1.00 5498696.42 18491616.02 5173.00 66873.75 251974.50 3305490.75 1.129250e+08 ▇▁▁▁▁
coastal_pop Oceania 2 0.92 1231487.00 3701294.94 1379.00 21562.00 130575.50 267439.50 1.723595e+07 ▇▁▁▁▁
coastal_pop South America 2 0.86 12003834.17 20681096.17 2602.00 485492.00 6010799.00 14347930.25 7.469677e+07 ▇▂▁▁▁
total_pop Africa 0 1.00 17624731.03 26749851.97 4118.00 1798031.75 9373146.50 19703072.25 1.584232e+08 ▇▁▁▁▁
total_pop Asia 2 0.96 83878532.55 250240369.56 596.00 3659834.50 14138255.00 48073298.00 1.341335e+09 ▇▁▁▁▁
total_pop Europe 1 0.98 16241523.63 27387302.63 29244.00 1857545.00 5413332.50 10873886.00 1.429582e+08 ▇▁▁▁▁
total_pop North America 4 0.89 15933532.15 55717986.49 5934.00 72995.25 374345.50 6091785.50 3.103839e+08 ▇▁▁▁▁
total_pop Oceania 2 0.92 1657649.68 4893477.78 50.00 20334.00 101802.00 265790.50 2.226838e+07 ▇▁▁▁▁
total_pop South America 1 0.93 30196303.54 51854582.96 231151.00 3368786.00 14464739.00 29076512.00 1.949465e+08 ▇▂▁▁▁

Exercises 3.

Let’s start by taking a look at the distribution of plastic waste per capita in 2010.

plastic_waste %>% 
  ggplot() +
  aes(x = plastic_waste_per_cap) +
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 51 rows containing non-finite values (stat_bin).

One country stands out as an unusual observation at the top of the distribution.

One way of identifying this country is to filter the data for countries where plastic waste per capita is greater than 3.5 kg/person.

plastic_waste %>%
  filter(plastic_waste_per_cap > 3.5)
# A tibble: 1 × 10
  code  entity     continent  year gdp_per_cap plastic_waste_p… mismanaged_plas…
  <chr> <chr>      <chr>     <dbl>       <dbl>            <dbl>            <dbl>
1 TTO   Trinidad … North Am…  2010      31261.              3.6             0.19
# … with 3 more variables: mismanaged_plastic_waste <dbl>, coastal_pop <dbl>,
#   total_pop <dbl>

Did you expect this result? You might consider doing some research on Trinidad and Tobago to see why plastic waste per capita is so high there, or whether this is a data error.

Another way of visualizing numerical data is using density plots.

Built a density plot below.

plastic_waste %>% 
  ggplot() +
  aes(x = plastic_waste_per_cap) +
  geom_density()
Warning: Removed 51 rows containing non-finite values (stat_density).

And compare distributions across continents by colouring density curves by continent.

plastic_waste %>% 
  ggplot() +
  aes(x = plastic_waste_per_cap, color = continent) +
  geom_density()
Warning: Removed 51 rows containing non-finite values (stat_density).

Transform the variable of plastic waste per capita to observe more informative density plot.

plastic_waste %>% 
  mutate(plastic_waste_per_1000_pple = 
           plastic_waste_per_cap * 1000) %>% 
  ggplot() +
  aes(x = plastic_waste_per_1000_pple, 
      color = continent) +
  geom_density() + 
  scale_x_log10() + 
  labs(x = "Amount of plastic waste per 1000 people, kg / day")
Warning: Removed 51 rows containing non-finite values (stat_density).

Exercises 4.

Built a box plot of plastic sate by continent

plastic_waste %>% 
  mutate(plastic_waste_per_100_pple = 
           plastic_waste_per_cap * 100 * 365 / 1000) %>% 
  ggplot() +
  aes(x = plastic_waste_per_100_pple,
      y = continent, 
      color = continent) +
  geom_boxplot() + 
  scale_x_log10() + 
  labs(x = "Amount of plastic waste per 100 people, MT / year", 
       y = NULL)
Warning: Removed 51 rows containing non-finite values (stat_boxplot).

Exercises 5.

Build a scatter plot of plastic waste per capital and GDP per capital.

plastic_waste %>% 
  mutate(plastic_waste_per_1000_pple = 
           plastic_waste_per_cap * 100 * 365 / 1000) %>% 
  ggplot() +
  aes(x = plastic_waste_per_1000_pple,
      y = gdp_per_cap, 
      color = continent,
      size = total_pop) +
  geom_point() + 
  scale_x_log10() + 
  scale_y_log10() + 
  labs(x = "Amount of plastic waste per 100 people, MT / year")
Warning: Removed 91 rows containing missing values (geom_point).