Country Concordances: Difference between revisions

From Pardee Wiki
Jump to navigation Jump to search
No edit summary
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== Summary ===
== Summary ==
[https://github.com/n1shamin/Country-Concordance Country concordance] is an important aspect of a data technician's job to maintain the International Futures database. Country concordance refers to the differences between the IFs Country list and another organization's country list and merging them to ensure the IFs country list in IFs system. For example, in some organization "Türkiye" is the name displayed, but for IFs we use the name "Turkey". Therefore IFs data technicians need to change the name to "Turkey" for the system to process it. '''Below lists the organizations IFs draw their data from and the common changes needed as well as a link to a GitHub repository that contains the corresponding country list.''' 
[https://github.com/n1shamin/Country-Concordance Country concordance] is an important aspect of a data technician's job to maintain the International Futures database. Country concordance refers to the differences between the IFs Country list and another organization's country list and merging them to ensure the IFs country list in IFs system. For example, in some organization "Türkiye" is the name displayed, but for IFs we use the name "Turkey". Therefore IFs data technicians need to change the name to "Turkey" for the system to process it. To do this there are multiple steps that can be taken and they are shown below: 


==== Proxies ====
== GitHub Repositories ==
Proxies are used when a certain country does not have data points and estimates are needed. Proxies should be used on a case by case basis and for certain series. Proxies essentially use a similar country to the country being approximated (in terms of population, GDP, etc.) and calculates the scale of what the approximated countries value should be.
There are two repositories essential to this process; [https://github.com/quciet/datagator_flask_mongo a mini app that can handle concordance for files less than 15mb] and [https://github.com/n1shamin/Country-Concordance a centralized GitHub repository for all country concordance tables].  
 
An example from IHME's series is Kosovo and Albania. Albania is similar and geographically close to Kosovo making it an ideal for a proxy. For a series that deals with death such as SocietalViolenceDeathsTotal use the following equation: 
 
* Kosovo’s number of deaths = Albania’s number of deaths * (Kosovo’s Population / Albania’s Population)


==== Disaggregation ====
The app allows users to upload datasets from other organizations and utilizes fuzzy matching against standardized International Futures territory names. This is to make it easier for data technicians to ensure all country data points are being pulled correctly. You can clone the repository but a link to the application can be found [https://datagator-lite.onrender.com/ here].  
A lot of organizations have values for dissolved states such as USSR, Yugoslavia, Czechoslovakia, and more. This then leads to gaps in values for the newly formed such as Serbia, Slovakia, Czechia, etc. Therefore, disaggregation is a great tool to fill in these gaps. Below is the disaggregation steps for common groups:
 
# Czechoslovakia = Slovakia and Czech Republic
# Yugoslav SFR = Slovenia, Croatia, Bosnia and Herzegovina, North Macedonia, Serbia, and Montenegro
# Serbia and Montenegro = Serbia and Montenegro
# Sudan (former) = Sudan and South Sudan
# USSR = Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan
 
For disaggregation you are adding up the values for the year after the state disbanded, using the data when the state existed, have the values for the new states after the entity disbanded, and then extrapolate (find the percentage of each state in the data from the OG state and then multiply it to get the right amount) data for the new states in the previous years.
 
# USSR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 15 states for 1992; extrapolate data for 15 states 1961-1991
# Czechoslovakia add up the values for for 1993 (year it disbanded); have totals for 1961-1992; have the values for 2 states for 1993; extrapolate data for 2 states 1961-1992
# Sudan (former) add up the values for 2012 (year it disbanded); have totals for 1961-2011; have the values for 2 states for 2012; extrapolate data for 2 states 1961-2011
# Yugoslav SFR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 6 states for 1992; extrapolate for 6 states 1961-1991
# Serbia and Montenegro add up the values for 2006 (year it disbanded); have totals for 1992-2005; have the values for 2 states for 2006; extrapolate data for 2 states 1992-2005
An example of this disaggregation can be found in [[FAOSTAT Land Use]].


== Organizations ==
The centralized GitHub is a growing project thank contains excel sheets for mapping tables between an organization's country list and International Futures country names.


=== BGR ===
The format for these excels are formatted as:  
=== CIA ===
=== [https://github.com/n1shamin/Country-Concordance/blob/main/FAO%20Country.xlsx Food Agricultural Organization (FAO)] ===
{| class="wikitable"
{| class="wikitable"
|+
|+
!IFs Countries
!IFs Country
!FAO Countries
!ex: ITU Country
|-
|-
|Bolivia
|Afghanistan
|Bolivia (Plurinational State of)
|Afghanistan
|-
|China
|China, mainland
|-
|-
|Congo, Dem. Republic of the
|Albania
|Democratic Republic of the Congo
|Albania
|-
|-
|Cote D'Ivoire
| colspan="2" |and so on...
|Côte d'Ivoire
|-
|Czech Republic
|Czechia
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|China, Hong Kong SAR
|-
|Iran
|Iran (Islamic Republic of)
|-
|Korea, Dem. People's Republic
|Democratic People's Republic of Korea
|-
|Korea, Republic of
|Republic of Korea
|-
|Kosovo
|
|-
|Lao People's Dem. Republic
|Lao People's Democratic Republic
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|Micronesia (Federated States of)
|-
|Moldova, Republic of
|Republic of Moldova
|-
|Netherlands
|Netherlands (Kingdom of the)
|-
|Sahrawi Arab Democratic Republic
|Western Sahara
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Taiwan
|China, Taiwan Province of
|-
|Tanzania
|United Republic of Tanzania
|-
|Turkey
|Türkiye
|-
|United Kingdom
|United Kingdom of Great Britain and Northern Ireland
|-
|Venezuela, Bolivarian Republic
|Venezuela (Bolivarian Republic of)
|}
|}
FAO does not have values for Kosovo.


=== [https://github.com/n1shamin/Country-Concordance/blob/main/GCP%20Country.xlsx Global Carbon Project GCP] ===
=== Process to Create Country Concordance ===
{| class="wikitable"
Use the app to import your dataset. Once you choose a file, and click upload and select the worksheet if needed: 
!IFs Countries
[[File:Step 1- Upload Excel-CSV File - sheet.png|none|thumb|600x600px]]The page will go to Data Preview. Select the row or column that contains the country names. For this example it is the column "country_name"[[File:Step 2- Data Preview.png|none|thumb|600x600px]]Select Fuzzy Match and at least 80%. You can go higher but high percentages mean that names can be missed if they fall short. [[File:Step 3- Name-matching.png|none|thumb|600x600px]]Once you click match names you will be led to this page: [[File:Review Name Matches (Page 1 of 1).png|none|thumb|600x600px]]In the bottom of the page there are two important options; "Save Mapping" and "Submit & Download File.
!GCP Countries
|-
|Cabo Verde
|Cape Verde
|-
|Congo, Dem. Republic of the
|Democratic Republic of the Congo
|-
|Cote D'Ivoire
|Côte d'Ivoire
|-
|Czech Republic
|Czechia
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Korea, Dem. People's Republic
|North Korea
|-
|Korea, Republic of
|South Korea
|-
|Lao People's Dem. Republic
|Laos
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|Micronesia (Federated States of)
|-
|Moldova, Republic of
|Moldova
|-
|Palestine
|State of Palestine
|-
|Puerto Rico
|
|-
|Russian Federation
|Russia
|-
|Sahrawi Arab Democratic Republic
|
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|Sudan
|-
|Syrian Arab Republic
|Syria
|-
|Turkey
|Türkiye
|-
|United States of America
|USA
|-
|Venezuela, Bolivarian Republic
|Venezuela
|}
Puerto Rico and Sahrawi Arab Democratic Republic are not present in dataset.  


=== [https://github.com/n1shamin/Country-Concordance/blob/main/GWNO%20Country.xlsx GNWO Codes]: Used in [[Historical GDP/GDPPC/Population|Historical GDP]] ===
"Submit & Download File" changes the name in the original file to the IFs country names so the data technician can use the IFs interface to import into a db file. '''Before doing this make sure that all the country names have been changed correctly'''. Sometimes Turkey can be missed due to a less percentage in FuzzyMatch.
Numerical values are used instead of IDs and or names. Click the links to head to the GitHub repo or wiki.  


=== [https://github.com/n1shamin/Country-Concordance/blob/main/CountryIHME.xlsx Institute for Health Metrics and Evaluation (IHME)] ===
"Save Mapping" will download a file that shows the organization's country list compared to IFs country list.
{| class="wikitable"
[[File:Mapping Table.png|none|thumb|1157x1157px]]
|+
Before inputting the excel file into this [https://github.com/n1shamin/Country-Concordance GitHub] change "original_name" to Country or IFs Country and "matched_name" to the organization's name. For example this is from World Bank's Poverty and Inequality Platform so the name would PIP Country.
!IFs Countries
!IHME Countries
|-
|
|
|-
|
|
|-
|
|
|}


=== [https://github.com/n1shamin/Country-Concordance/blob/main/ILO%20Country.xlsx International Labor Organization] ===
If you are using a code to import directly into a db file, here is an example using Python and panadas:
{| class="wikitable"
!IFs Countries
!ILO Countries
|-
|Bolivia
|Bolivia  (Plurinational State of)
|-
|Congo, Dem. Republic of the
|Congo, Democratic Republic of the
|-
|Cote D'Ivoire
|Côte d'Ivoire
|-
|Czech Republic
|Czechia
|-
|Grenada
|
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|Hong Kong, China
|-
|Iran
|Iran (Islamic Republic of)
|-
|Kiribati
|
|-
|Korea, Dem. People's Republic
|Korea (the Democratic People's Republic  of)
|-
|Korea, Republic of
|Republic of Korea
|-
|Kosovo
|
|-
|Lao People's Dem. Republic
|Lao People's Democratic Republic
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|
|-
|Moldova, Republic of
|Republic of Moldova
|-
|Palestine
|Occupied Palestinian Territory
|-
|Sahrawi Arab Democratic Republic
|Western Sahara
|-
|Seychelles
|
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Taiwan
|Taiwan, China
|-
|Tanzania
|Tanzania, United Republic of
|-
|Turkey
|Türkiye
|-
|United Kingdom
|United Kingdom of Great Britain and  Northern Ireland
|-
|Venezuela, Bolivarian Republic
|Venezuela (Bolivarian Republic of)
|}
Grenada, Kiribati, Kosovo, Micronesia, and Seychelles are not present in ILO.


=== International Monetary Funds (IMF) ===
country = pd.read_excel('ITU Country.xlsx') #get this excel from the repo listed above
{| class="wikitable"
|+
!IFs Countries
!IMF Countries
|-
|
|
|-
|
|
|-
|
|
|}


====== [https://github.com/n1shamin/Country-Concordance/blob/main/IMFGFS%20Country.xlsx IMF Government Financial Statistics] ======
dt = country.merge(df2, on='entityName', how='left') #entityName is what the list is called in ITU
{| class="wikitable"
|+
!IF Countries
!IMFGFS Countries
|-
|Afghanistan
|Afghanistan, Islamic  Rep. of
|-
|Armenia
|Armenia, Rep. of
|-
|Azerbaijan
|Azerbaijan, Rep. of
|-
|Belarus
|Belarus, Rep. of
|-
|Hong Kong
|China, P.R.: Hong Kong
|-
|China
|China, P.R.: Mainland
|-
|Croatia
|Croatia, Rep. of
|-
|Czech Republic
|Czech Rep.
|-
|Dominican Republic
|Dominican Rep.
|-
|Egypt
|Egypt, Arab Rep. of
|-
|Estonia
|Estonia, Rep. of
|-
|Iran
|Iran, Islamic Rep. of
|-
|Kazakhstan
|Kazakhstan, Rep. of
|-
|Korea, Dem. People's Republic
|Korea, Rep. of
|-
|Kosovo
|Kosovo, Rep. of
|-
|Kyrgyzstan
|Kyrgyz Rep.
|-
|Moldova, Republic of
|Moldova, Rep. of
|-
|Netherlands
|Netherlands, The
|-
|Poland
|Poland, Rep. of
|-
|Serbia
|Serbia, Rep. of
|-
|Slovenia
|Slovenia, Rep. of
|-
|Tajikistan
|Tajikistan, Rep. of
|-
|Timor-Leste
|Timor-Leste, Dem. Rep. of
|-
|Turkey
|Türkiye, Rep of
|-
|United States of America
|United States
|-
|Uzbekistan
|Uzbekistan, Rep. of
|-
|Yemen
|Yemen, Rep. of
|}
GFS is limited and many countries are missing. View the GitHub repo.


=== [https://github.com/n1shamin/Country-Concordance/blob/main/ITU%20Country.xlsx International Telecommunication Union] ===
dt = dt.drop(columns='entityName') #after merging you must drop this column before importing
{| class="wikitable"
!IFs Countries
!ITU Countries
|-
|Bolivia
|Bolivia  (Plurinational State of)
|-
|Central African Republic
|Central African Rep.
|-
|Congo
|Congo (Rep. of the)
|-
|Congo, Dem. Republic of the
|Dem. Rep. of the Congo
|-
|Cote D'Ivoire
|Côte d'Ivoire
|-
|Dominican Republic
|Dominican Rep.
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|Hong Kong, China
|-
|Iran
|Iran (Islamic Republic of)
|-
|Korea, Dem. People's Republic
|
|-
|Korea, Republic of
|Korea (Rep. of)
|-
|Kosovo
|
|-
|Lao People's Dem. Republic
|Lao P.D.R.
|-
|Macedonia, North
|North Macedonia
|-
|Moldova, Republic of
|Moldova
|-
|Nepal
|Nepal (Republic of)
|-
|Palestine
|State of Palestine
|-
|Sahrawi Arab Democratic Republic
|
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Taiwan
|Taiwan, Province of China
|-
|Turkey
|Türkiye
|-
|United States of America
|United States
|-
|Venezuela, Bolivarian Republic
|Venezuela
|}
Korea, Dem. People's Republic, Kosovo, and Sahrawi Arab Democratic Republic are not present in the database.


=== [https://github.com/n1shamin/Country-Concordance/blob/main/JMP%20WHO%20UNICEF%20Country.xlsx Joint Monitoring Programme WHO UNICEF] ===
dt = dt.rename(columns={'IFs Country':'Country'}) #this must be changed before importing as the column name in IFs is Country
{| class="wikitable"
!IFs Countries
!JMP Countries
|-
|Bolivia
|Bolivia (Plurinational State of)
|-
|Congo, Dem. Republic of the
|Democratic Republic of the Congo
|-
|Cote D'Ivoire
|Côte d’Ivoire
|-
|Czech Republic
|Czechia
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|China, Hong Kong SAR
|-
|Iran
|Iran (Islamic Republic of)
|-
|Korea, Dem. People's Republic
|Democratic People's Republic of Korea
|-
|Korea, Republic of
|Republic of Korea
|-
|Kosovo
|
|-
|Lao People's Dem. Republic
|Lao People's Democratic Republic
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|Micronesia (Federated States of)
|-
|Moldova, Republic of
|Republic of Moldova
|-
|Netherlands
|Netherlands (Kingdom of the)
|-
|Palestine
|State of Palestine
|-
|Sahrawi Arab Democratic Republic
|
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Taiwan
|
|-
|Tanzania
|United Republic of Tanzania
|-
|Turkey
|Türkiye
|-
|United Kingdom
|United Kingdom of Great Britain and Northern Ireland
|-
|Venezuela, Bolivarian Republic
|Venezuela (Bolivarian Republic of)
|}
Kosovo, Sahrawi Arab Democratic Republic, and Taiwan are not present in dataset.


=== OECD ===
== Special Cases ==
=== [https://github.com/n1shamin/Country-Concordance/blob/main/Transparency%20International%20Country.xlsx Transparency International] ===
Each organization might have special cases due to the nature of their country list. Therefore in the "Notes"/"Data Notes" section of each data source page (example: [[FAOSTAT Land Use#Notes]]) contains information necessary to complete the data pulling tasks (example: FAOSTAT Land Use needs disaggregation, notes for Ethiopia, and notes for Kosovo and Serbia).  
{| class="wikitable"
!IFs Countries
!TI Countries
|-
|Belize
|
|-
|Congo, Dem. Republic of the
|Democratic Republic of the Congo
|-
|Cote D'Ivoire
|Côte d'Ivoire
|-
|Czech Republic
|Czechia
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Kiribati
|
|-
|Korea, Dem. People's Republic
|North Korea
|-
|Korea, Republic of
|South Korea
|-
|Lao People's Dem. Republic
|Laos
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|
|-
|Moldova, Republic of
|Moldova
|-
|Palestine
|
|-
|Russian Federation
|Russia
|-
|Sahrawi Arab Democratic Republic
|
|-
|Samoa
|
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Syrian Arab Republic
|Syria
|-
|Tonga
|
|-
|Venezuela, Bolivarian Republic
|Venezuela
|-
|Viet Nam
|Vietnam
|}
Belize, Kiribati, Micronesia, Palestine, Sahrawi Arab Democratic Republic, Samoa, and Tonga are not present in the dataset.  


=== [https://github.com/n1shamin/Country-Concordance/blob/main/UIS%20UNESCO%20Country.xlsx UIS UNESCO] ===
=== Proxies ===
{| class="wikitable"
Proxies are used when a certain country does not have data points and estimates are needed. Proxies should be used on a case by case basis and for certain series. Proxies essentially use a similar country to the country being approximated (in terms of population, GDP, etc.) and calculates the scale of what the approximated countries value should be.
|+
!FIPS_CODE
!IFs Country
!COUNTRY_ID UIS
|-
|ZAR
|Congo, Dem. Republic of the
|COD
|-
|KSV
|Kosovo
|
|-
|MON
|Mongolia
|MNG
|-
|WBG
|Palestine
|PSE
|-
|ROM
|Romania
|ROU
|-
|YUG
|Serbia
|SRB
|-
|TWN
|Taiwan
|
|-
|TMP
|Timor-Leste
|TLS
|}
Kosovo and Taiwan are not present in this dataset. This dataset is also present in IDs and not names.


=== [https://github.com/n1shamin/Country-Concordance/blob/main/UNDP%20Country.xlsx United Nation Development Programme (UNDP)] ===
An example from IHME's series is Kosovo and Albania. Albania is similar and geographically close to Kosovo making it an ideal for a proxy. For a series that deals with death such as SocietalViolenceDeathsTotal use the following equation: 
{| class="wikitable"
|+
!IFs Countries
!UNDP Countries
|-
|Bolivia
|Bolivia (Plurinational State of)
|-
|Congo, Dem. Republic of the
|Democratic Republic of the Congo
|-
|Cote D'Ivoire
|Côte d'Ivoire
|-
|Czech Republic
|Czechia
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|China, Hong Kong SAR
|-
|Iran
|Iran (Islamic Republic of)
|-
|Korea, Dem. People's Republic
|Dem. People's Republic of Korea
|-
|Korea, Republic of
|Republic of Korea
|-
|Kosovo
|Kosovo (under UNSC res. 1244)
|-
|Lao People's Dem. Republic
|Lao People's Democratic Republic
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|Micronesia (Fed. States of)
|-
|Moldova, Republic of
|Republic of Moldova
|-
|Palestine
|State of Palestine
|-
|Sahrawi Arab Democratic Republic
|Western Sahara
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Taiwan
|China, Taiwan Province of China
|-
|Tanzania
|United Republic of Tanzania
|-
|Turkey
|Türkiye
|-
|Venezuela, Bolivarian Republic
|Venezuela (Bolivarian Republic of)
|}


=== World Bank ===
* Kosovo’s number of deaths = Albania’s number of deaths * (Kosovo’s Population / Albania’s Population)


====== [https://github.com/shangkexin/World_Development_Indicators_WDI/blob/main/WDI_Country.xlsx World Development Indicators] ======
=== Disaggregation ===
{| class="wikitable"
A lot of organizations have values for dissolved states such as USSR, Yugoslavia, Czechoslovakia, and more. This then leads to gaps in values for the newly formed such as Serbia, Slovakia, Czechia, etc. Therefore, disaggregation is a great tool to fill in these gaps. Below is the disaggregation steps for common groups:
|+
!IFs Countries
!World Bank Countries
|-
|Bahamas
|Bahamas, The
|-
|Congo
|Congo, Rep.
|-
|Congo, Dem. Republic of the
|Congo, Dem. Rep.
|-
|Czech Republic
|Czechia
|-
|Egypt
|Egypt, Arab Rep.
|-
|Gambia
|Gambia, The
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|Hong Kong SAR, China
|-
|Iran
|Iran, Islamic Rep.
|-
|Korea, Dem. People's Republic
|Korea, Dem. People's Rep.
|-
|Korea, Republic of
|Korea, Rep.
|-
|Kyrgyzstan
|Kyrgyz Republic
|-
|Lao People's Dem. Republic
|Lao PDR
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|Micronesia, Fed. Sts.
|-
|Moldova, Republic of
|Moldova
|-
|Palestine
|West Bank and Gaza
|-
|Sahrawi Arab Democratic Republic
|
|-
|Slovakia
|Slovak Republic
|-
|Sudan South
|South Sudan
|-
|Taiwan
|
|-
|Turkey
|Turkiye
|-
|United States of America
|United States
|-
|Venezuela, Bolivarian Republic
|Venezuela, RB
|-
|Yemen
|Yemen, Rep.
|}
Sahrawi Arab Democratic Republic is not present in this dataset.


For some World Bank indicators, Taiwan, China is present but in WDI is not.
# Czechoslovakia = Slovakia and Czech Republic
# Yugoslav SFR = Slovenia, Croatia, Bosnia and Herzegovina, North Macedonia, Serbia, and Montenegro
# Serbia and Montenegro = Serbia and Montenegro
# Sudan (former) = Sudan and South Sudan
# USSR = Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan


=== World Health Organization (WHO) ===
For disaggregation you are adding up the values for the year after the state disbanded, using the data when the state existed, have the values for the new states after the entity disbanded, and then extrapolate (find the percentage of each state in the data from the OG state and then multiply it to get the right amount) data for the new states in the previous years.


====== [https://github.com/n1shamin/Country-Concordance/blob/main/Global%20Health%20Observatory%20(GHO)%2C%20WHO.csv Global Health Observatory (GHO)] ======
# USSR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 15 states for 1992; extrapolate data for 15 states 1961-1991
{| class="wikitable"
# Czechoslovakia add up the values for for 1993 (year it disbanded); have totals for 1961-1992; have the values for 2 states for 1993; extrapolate data for 2 states 1961-1992
|+
# Sudan (former) add up the values for 2012 (year it disbanded); have totals for 1961-2011; have the values for 2 states for 2012; extrapolate data for 2 states 1961-2011
!IFs Countries
# Yugoslav SFR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 6 states for 1992; extrapolate for 6 states 1961-1991
!GHO Countries
# Serbia and Montenegro add up the values for 2006 (year it disbanded); have totals for 1992-2005; have the values for 2 states for 2006; extrapolate data for 2 states 1992-2005
|-
An example of this disaggregation can be found in [[FAOSTAT Land Use]].
|Bolivia
|Bolivia  (Plurinational State of)
|-
|Congo, Dem. Republic of the
|Democratic Republic of the Congo
|-
|Czech Republic
|Czechia
|-
|Guinea Bissau
|Guinea-Bissau
|-
|Hong Kong
|
|-
|Iran
|Iran (Islamic Republic of)
|-
|Korea, Dem. People's Republic
|Democratic People's Republic of Korea
|-
|Korea, Republic of
|Republic of Korea
|-
|Kosovo
|
|-
|Lao People's Dem. Republic
|Lao People's Democratic Republic
|-
|Macedonia, North
|North Macedonia
|-
|Micronesia
|Micronesia (Federated States of)
|-
|Moldova, Republic of
|Republic of Moldova
|-
|Netherlands
|Netherlands (Kingdom of the)
|-
|Palestine
|
|-
|Sahrawi Arab Democratic Republic
|
|-
|St. Lucia
|Saint Lucia
|-
|St. Vincent and the Grenadines
|Saint Vincent and the Grenadines
|-
|Sudan South
|South Sudan
|-
|Taiwan
|
|-
|Tanzania
|United Republic of Tanzania
|-
|Turkey
|Türkiye
|-
|United Kingdom
|United Kingdom of Great Britain and  Northern Ireland
|-
|Venezuela, Bolivarian Republic
|Venezuela (Bolivarian Republic of)
|}
Hong Kong, Kosovo, Palestine, Sahrawi Arab Democratic Republic, and Taiwan are not present in database.

Latest revision as of 18:01, 3 October 2025

Summary

Country concordance is an important aspect of a data technician's job to maintain the International Futures database. Country concordance refers to the differences between the IFs Country list and another organization's country list and merging them to ensure the IFs country list in IFs system. For example, in some organization "Türkiye" is the name displayed, but for IFs we use the name "Turkey". Therefore IFs data technicians need to change the name to "Turkey" for the system to process it. To do this there are multiple steps that can be taken and they are shown below:

GitHub Repositories

There are two repositories essential to this process; a mini app that can handle concordance for files less than 15mb and a centralized GitHub repository for all country concordance tables.

The app allows users to upload datasets from other organizations and utilizes fuzzy matching against standardized International Futures territory names. This is to make it easier for data technicians to ensure all country data points are being pulled correctly. You can clone the repository but a link to the application can be found here.

The centralized GitHub is a growing project thank contains excel sheets for mapping tables between an organization's country list and International Futures country names.

The format for these excels are formatted as:

IFs Country ex: ITU Country
Afghanistan Afghanistan
Albania Albania
and so on...

Process to Create Country Concordance

Use the app to import your dataset. Once you choose a file, and click upload and select the worksheet if needed:

Step 1- Upload Excel-CSV File - sheet.png

The page will go to Data Preview. Select the row or column that contains the country names. For this example it is the column "country_name"

Step 2- Data Preview.png

Select Fuzzy Match and at least 80%. You can go higher but high percentages mean that names can be missed if they fall short.

Step 3- Name-matching.png

Once you click match names you will be led to this page:

Review Name Matches (Page 1 of 1).png

In the bottom of the page there are two important options; "Save Mapping" and "Submit & Download File."

"Submit & Download File" changes the name in the original file to the IFs country names so the data technician can use the IFs interface to import into a db file. Before doing this make sure that all the country names have been changed correctly. Sometimes Turkey can be missed due to a less percentage in FuzzyMatch.

"Save Mapping" will download a file that shows the organization's country list compared to IFs country list.

Mapping Table.png

Before inputting the excel file into this GitHub change "original_name" to Country or IFs Country and "matched_name" to the organization's name. For example this is from World Bank's Poverty and Inequality Platform so the name would PIP Country.

If you are using a code to import directly into a db file, here is an example using Python and panadas:

country = pd.read_excel('ITU Country.xlsx') #get this excel from the repo listed above

dt = country.merge(df2, on='entityName', how='left') #entityName is what the list is called in ITU

dt = dt.drop(columns='entityName') #after merging you must drop this column before importing

dt = dt.rename(columns={'IFs Country':'Country'}) #this must be changed before importing as the column name in IFs is Country

Special Cases

Each organization might have special cases due to the nature of their country list. Therefore in the "Notes"/"Data Notes" section of each data source page (example: FAOSTAT Land Use#Notes) contains information necessary to complete the data pulling tasks (example: FAOSTAT Land Use needs disaggregation, notes for Ethiopia, and notes for Kosovo and Serbia).

Proxies

Proxies are used when a certain country does not have data points and estimates are needed. Proxies should be used on a case by case basis and for certain series. Proxies essentially use a similar country to the country being approximated (in terms of population, GDP, etc.) and calculates the scale of what the approximated countries value should be.

An example from IHME's series is Kosovo and Albania. Albania is similar and geographically close to Kosovo making it an ideal for a proxy. For a series that deals with death such as SocietalViolenceDeathsTotal use the following equation:

  • Kosovo’s number of deaths = Albania’s number of deaths * (Kosovo’s Population / Albania’s Population)

Disaggregation

A lot of organizations have values for dissolved states such as USSR, Yugoslavia, Czechoslovakia, and more. This then leads to gaps in values for the newly formed such as Serbia, Slovakia, Czechia, etc. Therefore, disaggregation is a great tool to fill in these gaps. Below is the disaggregation steps for common groups:

  1. Czechoslovakia = Slovakia and Czech Republic
  2. Yugoslav SFR = Slovenia, Croatia, Bosnia and Herzegovina, North Macedonia, Serbia, and Montenegro
  3. Serbia and Montenegro = Serbia and Montenegro
  4. Sudan (former) = Sudan and South Sudan
  5. USSR = Armenia, Azerbaijan, Belarus, Estonia, Georgia, Kazakhstan, Kyrgyzstan, Latvia, Lithuania, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine, and Uzbekistan

For disaggregation you are adding up the values for the year after the state disbanded, using the data when the state existed, have the values for the new states after the entity disbanded, and then extrapolate (find the percentage of each state in the data from the OG state and then multiply it to get the right amount) data for the new states in the previous years.

  1. USSR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 15 states for 1992; extrapolate data for 15 states 1961-1991
  2. Czechoslovakia add up the values for for 1993 (year it disbanded); have totals for 1961-1992; have the values for 2 states for 1993; extrapolate data for 2 states 1961-1992
  3. Sudan (former) add up the values for 2012 (year it disbanded); have totals for 1961-2011; have the values for 2 states for 2012; extrapolate data for 2 states 1961-2011
  4. Yugoslav SFR add up the values for 1992 (year it disbanded); have totals for 1961-1991; have the values for 6 states for 1992; extrapolate for 6 states 1961-1991
  5. Serbia and Montenegro add up the values for 2006 (year it disbanded); have totals for 1992-2005; have the values for 2 states for 2006; extrapolate data for 2 states 1992-2005

An example of this disaggregation can be found in FAOSTAT Land Use.