There are two main options of data download in COMTRADE (CT):
Regular - manually insert a query in the browser and get a .csv file response
Bulk - manually insert short query on a web page, or use the browser command line (API) to pass arguments to the server. In return, consumer downloads a zip archive with data that covers trade of at least one country in one time period in a specified coding system. Bulk download has an option to download data of all reporters for one period/coding system in file.
Regular downloads are available at https://comtrade.un.org/data/. This is a query based approach to load CT data selecting a specific range of parameters (API calls are covered for bulk downloads). The downloaded CSV file can then be inserted into the R code.
On https://comtrade.un.org/data/, submit a query and download the following data:
Before starting practicing in R, make sure that you have setup a proper working directly. Best option would be to create an R project see and use the project folder as your working directory. To check what is your working directory use getwd().
After downloading the CSV file, it’s time to load the data in R, using the tools available in the base package. It’s good to always use the stringsAsFactors = FALSE argument to avoid having too many factors variables, which may complicate your work.
file <- "01-data-import/ct_regular_extraction.csv"
ctData <- read.csv(file, stringsAsFactors = FALSE)
names(ctData) # See the names of all variables
## [1] "Classification" "Year"
## [3] "Period" "Period.Desc."
## [5] "Aggregate.Level" "Is.Leaf.Code"
## [7] "Trade.Flow.Code" "Trade.Flow"
## [9] "Reporter.Code" "Reporter"
## [11] "Reporter.ISO" "Partner.Code"
## [13] "Partner" "Partner.ISO"
## [15] "X2nd.Partner.Code" "X2nd.Partner"
## [17] "X2nd.Partner.ISO" "Customs.Proc..Code"
## [19] "Customs" "Mode.of.Transport.Code"
## [21] "Mode.of.Transport" "Commodity.Code"
## [23] "Commodity" "Qty.Unit.Code"
## [25] "Qty.Unit" "Qty"
## [27] "Alt.Qty.Unit.Code" "Alt.Qty.Unit"
## [29] "Alt.Qty" "Netweight..kg."
## [31] "Gross.weight..kg." "Trade.Value..US.."
## [33] "CIF.Trade.Value..US.." "FOB.Trade.Value..US.."
## [35] "Flag"
head(ctData) # See the first 5 rows of the data frame
tail(ctData) # See the last 4 rows of the data frame
str(ctData) # See the structure of the data object
class(ctData) # See the class of the data object
Attention: when assigning the .csv file to file specify the path leading to the .csv file from the working directory. To find out what the working directory is run getwd(). In the example above the .csv file is situated in the 01-data-directory, which itself is in the working directory.
There are more efficient packages than the base package to load data in R. We can, for example, do the same using the more user friendly tools from the tidyverse package:
library(tidyverse) # Setups
library(readr) # Setups
ctData <- read_csv(file)
Using read_csv() returns a tbl_df(), or tibble, object that allows you to explore all variables of the data frame without printing it all in the console. There is no need to use head(), tail(), str().
ctData
## # A tibble: 24 x 35
## Classification Year Period `Period Desc.` `Aggregate Level`
## <chr> <int> <int> <int> <int>
## 1 H4 2014 2014 2014 2
## 2 H4 2014 2014 2014 2
## 3 H4 2014 2014 2014 2
## 4 H4 2014 2014 2014 2
## 5 H4 2015 2015 2015 2
## 6 H4 2015 2015 2015 2
## 7 H4 2015 2015 2015 2
## 8 H4 2015 2015 2015 2
## 9 H4 2014 2014 2014 4
## 10 H4 2014 2014 2014 4
## # ... with 14 more rows, and 30 more variables: `Is Leaf Code` <int>,
## # `Trade Flow Code` <int>, `Trade Flow` <chr>, `Reporter Code` <int>,
## # Reporter <chr>, `Reporter ISO` <chr>, `Partner Code` <int>,
## # Partner <chr>, `Partner ISO` <chr>, `2nd Partner Code` <chr>, `2nd
## # Partner` <chr>, `2nd Partner ISO` <chr>, `Customs Proc. Code` <chr>,
## # Customs <chr>, `Mode of Transport Code` <chr>, `Mode of
## # Transport` <chr>, `Commodity Code` <chr>, Commodity <chr>, `Qty Unit
## # Code` <int>, `Qty Unit` <chr>, Qty <chr>, `Alt Qty Unit Code` <chr>,
## # `Alt Qty Unit` <int>, `Alt Qty` <chr>, `Netweight (kg)` <int>, `Gross
## # weight (kg)` <chr>, `Trade Value (US$)` <dbl>, `CIF Trade Value
## # (US$)` <chr>, `FOB Trade Value (US$)` <chr>, Flag <int>
Bulk downloads are available using the web page https://comtrade.un.org/data/bulk.
There are two options how to access data here:
Web based query in the browser https://comtrade.un.org/data/bulk as we have just seen.
API call based approach to load CT data. In this case the user constructs a query such as “http://comtrade.un.org/api/get/bulk/C/A/1994/251/HS” following the guidelines outlined on https://comtrade.un.org/data/doc/api/bulk/. To request a bulk data file we need to use the specific format explained on COMTRADE, as follows: http://comtrade.un.org/api/get/bulk/{type}/{freq}/{ps}/{r}/{px}?{token=}, with type (trade type), freq (frequency), ps (period), r (reporter), px (classification), and optionally token (see authentication).
Using the web query approach let’s download data for France’s annual trade of commodities in 1994.
The zip archive returned by COMTRADE is named “type-C_r-251_ps-1994_freq-A_px-HS_pub-20020901_fmt-csv_ex-20150310.zip”, and contains the file “type-C_r-251_ps-1994_freq-A_px-HS_pub-20020901_fmt-csv_ex-20150310.csv” with data. Attention! Data files can be VERY HEAVY (up to 9-10 GB), which may lead to difficulties while loading them.
The same file can be obtained through the API call “http://comtrade.un.org/api/get/bulk/C/A/1994/251/H”.
Manually unzipping the bulk COMTRADE data file into the same folder and reading it is done in the same way as before.
file <- "01-data-import/type-C_r-251_ps-1994_freq-A_px-HS_pub-20020901_fmt-csv_ex-20150310.csv"
ctFranceData <- read_csv(file, guess_max = 1000000) # guess_max = 1000000 is used to avoid useless warnings.
## Parsed with column specification:
## cols(
## .default = col_integer(),
## Classification = col_character(),
## `Trade Flow` = col_character(),
## Reporter = col_character(),
## `Reporter ISO` = col_character(),
## Partner = col_character(),
## `Partner ISO` = col_character(),
## `Commodity Code` = col_character(),
## Commodity = col_character(),
## `Qty Unit` = col_character(),
## Qty = col_double(),
## `Netweight (kg)` = col_double(),
## `Trade Value (US$)` = col_double()
## )
## See spec(...) for full column specifications.
ctFranceData
## # A tibble: 465,137 x 22
## Classification Year Period `Period Desc.` `Aggregate Level`
## <chr> <int> <int> <int> <int>
## 1 H0 1994 1994 1994 2
## 2 H0 1994 1994 1994 2
## 3 H0 1994 1994 1994 2
## 4 H0 1994 1994 1994 2
## 5 H0 1994 1994 1994 2
## 6 H0 1994 1994 1994 2
## 7 H0 1994 1994 1994 2
## 8 H0 1994 1994 1994 2
## 9 H0 1994 1994 1994 2
## 10 H0 1994 1994 1994 2
## # ... with 465,127 more rows, and 17 more variables: `Is Leaf Code` <int>,
## # `Trade Flow Code` <int>, `Trade Flow` <chr>, `Reporter Code` <int>,
## # Reporter <chr>, `Reporter ISO` <chr>, `Partner Code` <int>,
## # Partner <chr>, `Partner ISO` <chr>, `Commodity Code` <chr>,
## # Commodity <chr>, `Qty Unit Code` <int>, `Qty Unit` <chr>, Qty <dbl>,
## # `Netweight (kg)` <dbl>, `Trade Value (US$)` <dbl>, Flag <int>
To speed up the reading process, in the package tradeAnalysis there is a function readCTCSV() which reads the bulk downloaded COMTRADE data file and converts all columns into a readable format, as well as removing all irrelevant columns and formats.
To install the tradeAnalysis package:
install.packages("devtools") # Run if needed
devtools::install_github("EBukin/tradeAnalysis-pack")
library(tradeAnalysis)
And to read the file using the new readCTCSV() function:
ctData_2 <- readCTCSV(file)
Using tradeAnalysis the file is slightly lighter and loads faster. This is especially important when loaded files are 4 Gb or more in volume.
ctData_2
## # A tibble: 465,137 x 12
## Classification Year Period Trade.Flow.Code Reporter.Code Partner.Code
## <chr> <int> <chr> <int> <int> <int>
## 1 H0 1994 1994 1 251 0
## 2 H0 1994 1994 2 251 0
## 3 H0 1994 1994 1 251 8
## 4 H0 1994 1994 1 251 12
## 5 H0 1994 1994 2 251 12
## 6 H0 1994 1994 1 251 20
## 7 H0 1994 1994 2 251 20
## 8 H0 1994 1994 1 251 32
## 9 H0 1994 1994 2 251 32
## 10 H0 1994 1994 1 251 36
## # ... with 465,127 more rows, and 6 more variables: Commodity.Code <chr>,
## # Qty.Unit.Code <int>, Qty <dbl>, Netweight..kg. <dbl>,
## # Trade.Value..US.. <dbl>, Flag <int>
To load bulk CT data files into R directly from the .zip archive we need to use the tradeAnalysis package. In the tradeAnalysis library we use the function readCTZIP(). To learn more about the syntax of the readCTZIP()function run ?readCTZIP, to see the source code run readCTZIP, or select readCTZIP with the mouse and press F2.
ctData_3 <-
readCTZIP(folder = "01-data-import/",
file = "type-C_r-804_ps-2014_freq-A_px-HS_pub-20150317_fmt-csv_ex-20150609.zip")
ctData_3
## # A tibble: 193,015 x 12
## Classification Year Period Trade.Flow.Code Reporter.Code Partner.Code
## <chr> <int> <chr> <int> <int> <int>
## 1 H4 2014 2014 1 804 0
## 2 H4 2014 2014 2 804 0
## 3 H4 2014 2014 2 804 31
## 4 H4 2014 2014 1 804 40
## 5 H4 2014 2014 2 804 51
## 6 H4 2014 2014 1 804 84
## 7 H4 2014 2014 1 804 90
## 8 H4 2014 2014 2 804 112
## 9 H4 2014 2014 1 804 124
## 10 H4 2014 2014 2 804 124
## # ... with 193,005 more rows, and 6 more variables: Commodity.Code <chr>,
## # Qty.Unit.Code <int>, Qty <dbl>, Netweight..kg. <dbl>,
## # Trade.Value..US.. <dbl>, Flag <int>
Attention: if the .zip file we want to open is placed in the home directory, specify folder = "." in the function.
In R it is possible to see all the data available in the COMTRADE bulk download data files. This is achieved by using the COMTRADE API interface (https://comtrade.un.org/data/doc/api/bulk/#DataRequests).
Firstly, we run the command “http://comtrade.un.org/api//refs/da/bulk?parameters” in the browser address bar. After a certain loading time due to the large volume of the data, we will be returned the JSON object with thousands of rows of text in the browser. This Object can be parsed in R into a data frame showing all data available in COMTRADE and all query commands needed to download each data file.
To load this object into R we can do the following, using the package jsonlite:
install.packages("jsonlite")
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
availableCtData_1 <- jsonlite::fromJSON("http://comtrade.un.org/api/refs/da/bulk?parameters") %>% tbl_df()
availableCtData_1
## # A tibble: 51,288 x 10
## name
## * <chr>
## 1 type-C_r-ALL_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170825.zip
## 2 type-C_r-ALL_ps-201705_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170825.zip
## 3 type-C_r-ALL_ps-201704_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170825.zip
## 4 type-C_r-124_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## 5 type-C_r-152_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## 6 type-C_r-203_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## 7 type-C_r-208_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## 8 type-C_r-233_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## 9 type-C_r-31_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## 10 type-C_r-32_ps-201706_freq-M_px-HS_pub-20170818_fmt-csv_ex-20170824.zip
## # ... with 51,278 more rows, and 9 more variables: r <chr>, px <chr>,
## # ps <chr>, type <chr>, freq <chr>, publicationDate <chr>,
## # extractDate <chr>, filesize <int>, downloadUri <chr>
This table illustrates the reporters (‘r’ variable), classification (‘px’), period (‘ps’) of each data file, and what data files exist.
glimpse(availableCtData_1)
## Observations: 51,288
## Variables: 10
## $ name <chr> "type-C_r-ALL_ps-201706_freq-M_px-HS_pub-20170...
## $ r <chr> "ALL", "ALL", "ALL", "124", "152", "203", "208...
## $ px <chr> "HS", "HS", "HS", "HS", "HS", "HS", "HS", "HS"...
## $ ps <chr> "201706", "201705", "201704", "201706", "20170...
## $ type <chr> "COMMODITIES", "COMMODITIES", "COMMODITIES", "...
## $ freq <chr> "MONTHLY", "MONTHLY", "MONTHLY", "MONTHLY", "M...
## $ publicationDate <chr> "2017-08-18T00:00:00", "2017-08-18T00:00:00", ...
## $ extractDate <chr> "2017-08-25T00:00:00", "2017-08-25T00:00:00", ...
## $ filesize <int> 45392464, 58406789, 109458248, 2837034, 113915...
## $ downloadUri <chr> "/api/get/bulk/C/M/201706/ALL/HS", "/api/get/b...
The variable downloadUri provides a COMTRADE query (such as “/api/get/bulk/C/A/2016/428/H0”) that can be combined into one command with http://comtrade.un.org/.
The result would look something like “http://comtrade.un.org/api/get/bulk/C/A/2016/428/H0” and will return a zipped data file if run in the browser address bar.
The same list of available bulk data in COMTRADE can be accessed using the getCTParameters() function of the tradeAnalysis package, making some of the variables look more user friendly.
library(tradeAnalysis)
availableCtData_2 <- getCTParameters()
glimpse(availableCtData_2)
## Observations: 51,288
## Variables: 12
## $ name <chr> "type-C_r-ALL_ps-201706_freq-M_px-HS_pub-20170...
## $ r <chr> "ALL", "ALL", "ALL", "124", "152", "203", "208...
## $ px <chr> "HS", "HS", "HS", "HS", "HS", "HS", "HS", "HS"...
## $ ps <chr> "201706", "201705", "201704", "201706", "20170...
## $ type <chr> "COMMODITIES", "COMMODITIES", "COMMODITIES", "...
## $ freq <chr> "MONTHLY", "MONTHLY", "MONTHLY", "MONTHLY", "M...
## $ publicationDate <date> 2017-08-18, 2017-08-18, 2017-08-18, 2017-08-1...
## $ extractDate <date> 2017-08-25, 2017-08-25, 2017-08-25, 2017-08-2...
## $ filesize <int> 45392464, 58406789, 109458248, 2837034, 113915...
## $ downloadUri <chr> "/api/get/bulk/C/M/201706/ALL/HS", "/api/get/b...
## $ year <chr> "2017", "2017", "2017", "2017", "2017", "2017"...
## $ month <chr> "06", "05", "04", "06", "06", "06", "06", "06"...
After checking for data availability we may want to download some of this data.
To do the download, after loading the table of available data with tradeAnalysis::getCTParameters(), we need to select the row containing the name of the country that we want to download and the destination folder where we want to save it.
See the following example.
We want to download data about all annual trade of commodities by Ukraine (reporter code 804), in 2014, in the HS ‘As reported’ classification. To see an extensive list of reporters and their codes, run tradeAnalysis::getCTReporters()
library(tidyverse)
Let’s first create oneLine, containing all the specifications for our query (the usage of the ‘filter()’ function will be discussed with more detail in the following lesson):
oneLine <-
filter(availableCtData_2,
r == 804,
type == "COMMODITIES",
freq == "ANNUAL",
year == 2014,
px == "HS")
To load the data file:
downloadCTZIP(df = oneLine, toFolder = "01-data-import/", token = NA)
Explore more about the token parameter with ?downloadCTZIP.
token is an important parameter, necessary to access COMTRADE data from outside FAO headquarters. To access the data from the outside, it is first necessary to register our FAO e-mail address in the COMTRADE system and then save the token provided on the computer. Inserting the token in the function downloadCTZIP() will ensure CT data can be downloaded from outside headquarters.
To generate an authentication token from the registered IP range (i.e. from within headquarters), we first need to go to https://comtrade.un.org/data/auth/login?ReturnUrl=%2Fdata%2Fbulk and insert our email into the Sign in using: IP Authentication: field. Pressing Authenticate my IP will ensure we are registered in the system. Subsequently we need to go to https://comtrade.un.org/api/swagger/ui/index#/ and select /getSubUserToken in the Auth section. After inserting our email into the corresponding field we press “try it out”. In “Response body” the token will appear as a long string of symbols and characters. This token is to be copied to a secure place and used to access data as follows:
downloadCTZIP(df = oneLine, toFolder = "01-data-import/", token = "token_content_as_it_is_in_comtrade")
Copyright © 2017 Eduard Bukin. All rights reserved.