SIPRI Study > About the Project

This project was built using data from the most recent update (March 11, 2019) of the heavy arms trade registries from the Stockholm International Peace Research Institute (SIPRI). The entire database is comprised of more than 26,000 records of weapons transfers from the 1940s to 2018. According to their “Sources and methods” page, the institute pulls from only public disclosures of weapons transfers, including sources such as, and among many others:

  1. *Newspapers and periodicals
    *Monographs and annual reference materials
    *Press releases
    *The UN Register of Conventional Arms
    *Defense budget documents

The section “High Profits, High Costs” utilizes the SIPRI’s records of the top 100 arms producers and military services providers (based on annual sales), covering 2002-2017. For reasons that are not clearly explained, the database contains no information regarding Chinese companies. This data is readily available for download as an Excel spreadsheet. Note: the axes for the visualizations in this section were mistakenly labeled as “estimated profits” when they should have been “estimated sales”.

The online interface for requesting trade registries can only export the data into a Rich Text Format(.rtf) file. Such file types cannot be easily opened in a conventional spreadsheet software. Luckily, there is a chunk of code on Github that teaches people how to pull data from the SIPRI database into a .csv file. The code needed to be run in a Linux terminal, which could be done either through an emulator or a virtual machine (assuming that the computer is not already on a Linux OS). For this project, the script was run on a virtual machine built using VirtualBox. However, pulling the data in this manner led to the loss of certain data fields that were present in the original 1000-paged .rtf file, most importantly the “comments” section, which sometimes provided contextual details into a transaction such as whether the transaction was made as part of an aid program or whether the weapon being transferred was second hand. In the .csv file, some of this data pops up in a column titled “term”, but the value for this field is noted as “N/A” for 80% of the data entries. Some of the data in the other 20% are briefly discusses in the section “High Profits, High Costs”.

Cleaning/modifications to the data that should be noted:
  1. *Some of the fields in the .csv utilize acronyms that are not explicitly clarified in the SIPRI’s “Sources and Methods” page. This applies for the field “Weapons_Category”, wherein the following acronym spell-outs were made:
  2. AC -> aircraft
    MI -> missiles
    AV -> armored vehicles
    GR -> sensors / radar
    AR –> artillery
    SH –> ships
    EN –> engines
    AD –> air defense systems
    OT –> other
    NW –> naval weapons
  1. *Some of the weapons had the same or similar names. These were identified using OpenRefine’s clustering tools and disambiguated with further online research.

    *A (thankfully) small number of weapons names included diacritics, which had to be manually edited in OpenRefine. For example, the Swedish Västergötland-class of submarines had been indicated as “V�Ã�¤stergotland”.
Regarding the Tableau visualizations:
  1. *Graphs make heavy use of a value known as the TIV (Trend-Indicator Value), which is a measure developed by SIPRI to estimate gains in military capacity with each transfer, as opposed to monetary worth or unit load. Methods of calculation can be found here.

    *Tableau maps are built off of current geopolitical boundaries. As mentioned earlier, the dataset contains records from as early as the 1940s. For this reason, records for countries and states such as the Soviet Union, Yugoslavia, Czechoslovakia, and North Vietnam do not appear.

    *In the original .csv file pulled from the script, some of the records had “date of purchase” values that were enclosed in parentheses to indicate that they were only estimates. The parentheses were lost in the course of processing the data using OpenRefine and Tableau. However, it is likely safe to assume that these cases make up only a small section of the values for this field. Substantial work had also already been done to make visualizations in Tableau before the presence of estimates was noticed upon cross-referencing some of the spreadsheet data with the .rtf file. As such, the visualizations used in this project do not distinguish between estimates and definite date values.
According to the SIPRI’s terms and conditions, users of their data cannot reproduce more than 10% of their published data sets. For this reason, the "View and download data" option on Tableau has been disabled.