Country Concordances

From Pardee Wiki
Revision as of 00:30, 28 September 2025 by Norah.Shamin (talk | contribs)
Jump to navigation Jump to search

Summary

Country concordance is an important aspect of a data technician's job to maintain the International Futures database. Country concordance refers to the differences between the IFs Country list and another organization's country list and merging them to ensure the IFs country list in IFs system. For example, in some organization "Türkiye" is the name displayed, but for IFs we use the name "Turkey". Therefore IFs data technicians need to change the name to "Turkey" for the system to process it.

GitHub Repositories

There are two repositories essential to this process; a mini app that can handle concordance for files less than 15mb and a centralized GitHub repository for all country concordance tables.

The app allows users to upload datasets from other organizations and utilizes fuzzy matching against standardized International Futures territory names. This is to make it easier for data technicians to ensure all country data points are being pulled correctly. You can clone the repository but a link to the application can be found here.

The centralized GitHub is a growing project thank contains excel sheets for mapping tables between an organization's country list and International Futures country names.

The format for these excels are formatted as:

IFs Country ex: ITU Country
Afghanistan Afghanistan
Albania Albania
and so on...

Using pandas and Python users can use the following snippet of code to merge the two before inputting into a db file:

country = pd.read_excel('ITU Country.xlsx') #get this excel from the repo listed above

dt = country.merge(df2, on='entityName', how='left') #entityName is what the list is called in ITU

dt = dt.drop(columns='entityName') #after merging you must drop this column before importing

dt = dt.rename(columns={'IFs Country':'Country'}) #this must be changed before importing as the column name in IFs is Country

Proxies

Proxies are used when a certain country does not have data points and estimates are needed. Proxies should be used on a case by case basis and for certain series. Proxies essentially use a similar country to the country being approximated (in terms of population, GDP, etc.) and calculates the scale of what the approximated countries value should be.

An example from IHME's series is Kosovo and Albania. Albania is similar and geographically close to Kosovo making it an ideal for a proxy. For a series that deals with death such as SocietalViolenceDeathsTotal use the following equation:

  • Kosovo’s number of deaths = Albania’s number of deaths * (Kosovo’s Population / Albania’s Population)

Disaggregation

A lot of organizations have values for dissolved states such as USSR, Yugoslavia, Czechoslovakia, and more. This then leads to gaps in values for the newly formed such as Serbia, Slovakia, Czechia, etc. Therefore, disaggregation is a great tool to fill in these gaps. Below is the disaggregation steps for common groups:

  1. Czechoslovakia = Slovakia and Czech Republic
  2. Yugoslav SFR = Slovenia, Croatia, Bosnia and Herzegovina, North Macedonia, Serbia, and Montenegro
  3. Serbia and Montenegro = Serbia and Montenegro
  4. Sudan (former) = Sudan and South Sudan
  5. USSR = Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan

For disaggregation you are adding up the values for the year after the state disbanded, using the data when the state existed, have the values for the new states after the entity disbanded, and then extrapolate (find the percentage of each state in the data from the OG state and then multiply it to get the right amount) data for the new states in the previous years.

  1. USSR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 15 states for 1992; extrapolate data for 15 states 1961-1991
  2. Czechoslovakia add up the values for for 1993 (year it disbanded); have totals for 1961-1992; have the values for 2 states for 1993; extrapolate data for 2 states 1961-1992
  3. Sudan (former) add up the values for 2012 (year it disbanded); have totals for 1961-2011; have the values for 2 states for 2012; extrapolate data for 2 states 1961-2011
  4. Yugoslav SFR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 6 states for 1992; extrapolate for 6 states 1961-1991
  5. Serbia and Montenegro add up the values for 2006 (year it disbanded); have totals for 1992-2005; have the values for 2 states for 2006; extrapolate data for 2 states 1992-2005

An example of this disaggregation can be found in FAOSTAT Land Use.