SubRegionalization Handbook: Difference between revisions

From Pardee Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 130: Line 130:
| Land Area
| Land Area
|}
|}
<br/>
 




Line 140: Line 140:


<math>\frac{subregion\ value\ [base]}{\sum{subregion\ values\ [base]}}*country\ value\ [base]</math>
<math>\frac{subregion\ value\ [base]}{\sum{subregion\ values\ [base]}}*country\ value\ [base]</math>
Since GDP2011PCPPP is defined as a rate, is filled out a little differently. Here, the sub regional values are multiplied by their respective weights for Population, then summed for a weighted average. Then a multiplier is computed by dividing the country-level GDP2011PCPPP value by this weighted average. The final normalized data are the product of the multiplier and the sub regional values.

Revision as of 20:08, 28 April 2017

Foundational Elements

A Note on Terminology

Sub-regional units go by many names internationally. When we refer to a sub-region in this document, we may use a variety of names to refer to the same entity. In IFs, there is a bias toward the use of “province” to refer to a sub-regional unit, but it is functionally equivalent to states, departments, territories, or any other name for a sub-regional unit.

Much of the following discussion will involve variables included in the model, such as GDP or POPULATION. These may be referred to as variable or series. The two terms are used interchangeably here. 

Database Modification

IFsHistSeriesXXX.mdb

Before a country in IFs can be divided into states/provinces/departments, a new data file called IFsHistSeriesXXX.mdb — for example IFsHistSeriesChina.mdb — must be created. That file is a variation of the file called IFsHistSeries.mdb, which contains data tables for all countries across time, for all variables.

For each data table in IFsHistSeriesXXX.mdb, rows correspond to all states/provinces/departments in the specified country. The file includes a table called DataDict (analogous to the file DataDict.mdb described below) that provides information on names, sources, and specialized procedures followed to prepare the state/province/department data.  

To subregionalize countries that do not already have a IFsHistSeriesXXX.mdb file, users will need to create one. While this can be done manually, it is faster and simpler to just add the file with a feature in IFs. Before creating the table, however, make sure that a file called _ifsHistSeriesxxx.mdb exists in C:\Users\Public\IFs\DATA. After that is done, follow these steps:

  • Click the path: Extended Features > Change Country Subregionalization > Add New Historical Series File
  • Select the country that is intended to be subregionalized and click OK

This does a couple things. First, it creates a new IFsHistSeries.mdb for the country you selected in C:\Users\Public\IFs\DATA. The file will include tables for the 4 essential series and a datadict. It also adds the country to the Provinces tables in Provinces.mdb. 

The entire process of sub-regionalization affects five files: IFs.mdb, IFsHistSeries.mdb, IFsCoVatra.mdb, IFsCoVatraSeries.mdb, andIFsWVSCohort.mdb. For each of the files, the process looks in the IFs\Data folder for an equivalent file with the name of the country at the end — again, IFsHistSeriesChina.mdb is an example — then creates a copy in the IFs\Runfiles directory. Modification is done in the copies on the \Runfiles directory as well to the original files in the \Data directory. Data for individual sub-regional units and the country as a whole are preserved, and can be copied back and forth between \Data and \Runfiles directories. This allows the user to switch between model runs with different sub-regionalizations, and reset everything to the starting point without any states/provinces/departments sub-regionalized.

DataDict.mdb

Path: C:/ > Users > Public > IFs > DATA > DataDict.mdb

DataDict.mdb documents the variable names, sources, units of measurement, years of data coverage, and aggregation rules for all historical data series in the model.

The IFsHistSeriesXXX.mdb file has a copy of the DataDict table, and all series in IFsHistSeriesXXX.mdb must have a corresponding DataDict row in both places; otherwise, the model will not recognize them.  If the fields Years andAggregation are not correctly filled in the series may not read correctly.

Provinces.mdb

Path: C:/ > Users > Public > IFs > DATA > Provinces.mdb

In addition to the creation of IFsHistSeriesXXX.mdb, initiating the sub-regionalization process requires modifications to tables that exist in the file Provinces.mdb. The alterations to specific tables in Provinces.mdb are outlined below:

All Provinces. The name and FIPS code of the target country should be added to a line in this table (those names should be identical to those used for the country in IFs.mdb and IFsHistSeries.mdb files; and province break-out should only be done when the Full version of IFs is being used with each country represented as a separate region).  The AllProvinces field should normally be checked.  The only circumstances under which it is not checked is when the user is selecting one, or a few, provinces for use in the model in addition to the entire country — see the discussion of the Regionalization table below.

Provinces. In this table all of the provinces or states should be identified. The first column is the target country (the same for all provinces). The second is a unique FIPS Code created by the user (since country-specific codes are three letters and cannot be duplicated, two-letter codes normally work well for these). Finally, the name of the sub-regional unit should be specified.

The addition of province or state names to this table can be done manually, but it is also automated with the "Add New Historical Series File” option, which creates IFsHistSeriesXXX.mdb with only the basic tables needed. Users can access this automated option from the main menu of IFs by following the path: 

Extended Features > Change Country Sub-Regionalization > Add New Historical Series File

The province/state names need to coincide exactly with the ones that are used in the tables of the IFsHistSeriesXXX.mdb file for variables holding data. The province/state names also need to coincide with the province/state names that are used on the global map file to display the breakdown of countries into provinces/states.  The exact names used in the map file can be found in IFs_Province.mdb using a query of the form: N.B: select Province from provinces where Country='Republic of India.'

Regionalization.  This table repeats the target Country and Full Province Name as in the Provinces table. The new feature is the addition of a Region name. Each province could be a separate Region, in which case the names should be less than 10-12 letters so as to allow easy display in IFs tables and graphs. Most often the provinces will be grouped into sub-regions of the target Country, in which case it is recommended that the Region name be the target country name or some abbreviated part of it followed by the sub-region, again a total of no more than 10-12 letters. Leading with the Country name facilitates the use of alphabetic listings of IFs.

If the user is only adding a single province (or a very small number) to the model and not splitting a country into regions based on all states/provinces, the lines added to the Regionalization table will be only for the added states/provinces (e.g. Hawaii could be added in addition to the US by adding a line to this table for it only, not other US states, and then not checking AllProvinces in the AllProvinces table). Even if all provinces/states are in the data set, as with India, this process allows a single state of India to be pulled out and added to IFs.  Warning:  if a user only adds a state/province/department rather than dividing a country into sub-regions based on all of them, there will be double counting of the country and the added unit(s) in the model; only selected variables in IFs, such as WPOP and variables of the agricultural model have been set up for correction of such double counting in the run of the model.

The process of creation of regions can be done within IFs, under: 

Extended Features > Change Country Sub-Regionalization > Change Sub-Regionalization

IFsxxx.mdb

This file holds many tables, but only a few are of interest for sub-regionalization:

PopAgeCohortCountryFemale andPopAgeCohortCountryMale. These files store data on the age-sex cohort breakdown of a population. If possible, use 2012 data because we populate the country-level tables with 2012 UNDP data. The first column, Cohort0, is reserved for infants (age <1). Cohort1 is for ages 1-4. Cohort2 and onward are 5-year age groups (e.g. Cohort2 is ages 5-9, etc.).Data accuracy is important because if there are large jumps between cohorts, errors can occur when running the model through 2100. If these data are not available, these tables read from the country-level data. 

HealthDetailedDeathsCtry. These files store data on mortality rates by sex of a population. Raw mortality data must be ascribed to the 15 mortality sub-types in IFs to match this file structure.  These categories are as follows:

  • Other communicable diseases
  • Malignant neoplasms
  • Cardiac
  • Digestive
  • Respiratory
  • Other noncommunicable diseases
  • Road injuries
  • Other unintentional injuries
  • Intentional injuries
  • Diabetes
  • HIV
  • Diarrhea
  • Malaria
  • Respiratory infections
  • Mental health

If these data are not available, this file will read from the country-level data that already exist in IFs.

PopMortalityCohortCountryFemale andPopMortalityCohortCountryMale. This table stores the survivor tables of a population by sex. These data series are calculated based on the mortality data from the table HealthDetailedDeathsCtry. This is calculated by the likelihood that a person at any given age will survive to the next cohort, given the prevailing trends in mortality. If these data are not available, this file will read from the country-level data that already exist in IFs.

Users may need to modify table EconBaseSector. Unlike other base files in IFs.mdb, rows are not automatically added to this table when additional countries (in this case sub-regions) are added. If too many sub-regions are added, this can cause problems when rebuilding the base. EconBaseSector has six rows for each country, so for example, if the user wants to break India into 36 sub-regions, a total of 1326 rows will be needed (221 countries*6). As of July 2014, the default number of rows for this table is 1304. Additional rows must be added to the table before adding sub-regions to ensure they are copied to \RUNFILES.

 IFsXXX.DAT

C: > User > Public > IFs > Data > IFsXXX.DAT 

Generating sub-national forecasts within IFs will inevitably require exogenous adjustments in some cases to produce reasonable output. Forecast tuning is only done on rare occasions, when there is not an apparent structural fix to the problem. However, there are some cases when it is the appropriate course of action.

When it is determined that tuning is necessary, the user needs to build the suitable scenario and save the scenario file (.sce extension). Then, open up the .sce file (C:\Users\Public\IFs\Scenario) and copy and paste the scenario code into a IFsXXX.DAT file. If a .DAT file does not exist for your country, simply copy and paste one available for another country (South Africa, United States), change the country name, and delete the contents. Three very important notes:

  • .DAT files most likely will not be able to be accessed without changing the extension. Simply add “.txt” at the end of the file name and click OK when Windows warns you about modifying the file extension. Once this is done, you will be able to open it with wordpad or notepad.
  • When copying the code from the .sce to the .DAT file, be sure NOT to include, “CUSTOM,” “COMMENT,” and “START.” See the following example. The .DAT file should only include the highlighted script:
  • Once the IFsXXX.DAT file is done, users will have to delete the broken-out model and re-subregionalize. The process will incorporate the newly created .DAT file into the base case. 

Provincial Data Processing

The model executes 4 processes when provinces are added (see section 3. ‘Add or Delete Sub-Regions’ for procedural steps):

  • Checks for essential series in IFsHistSeriesXXX.mdb
  • Normalizes and fills holes in essential series
  • Normalizes base-year values in preprocessor variables present in IFsHistSeriesXXX.mdb when provinces are added
  • Estimating values for preprocessor series NOT present in IFsHistSeriesXXX.mdb when provinces are added

 The following sections will cover these processes in detail. 

Checking the Essential Series

As described in section 1, data for sub-regions in IFs are stored in IFsHistSeriesXXX.mdb. In order for sub-regions to be added, data for the essential series is required. The 4 variables are described below.

Variable Definition

GDP2011

Gross Domestic Product in 2011 Constant Dollars
GDP2011PCPPP Gross Domestic Product per Capita in 2011 Constant Purchasing Power Parity International Dollars
Population Population in Millions
LandArea Land Area


When sub-regions are added, the model checks that each sub-region has at least one year of data for each of these required variables. If this is not true, then (1) the specific state/province/department cannot be used, (2) it is excluded from the process, and (3) the system will give a message indicating the problem. With these four series, the shell model is established. The user can add incrementally build upon this by adding more data. Also note, sub-regional data must be for years that are available in the country-level data. For example, if only 2011 population data is available for sub-regions, national-level population data must include 2011 as well. 

Normalization and Hole-Filling in Essential Series

Due to the special importance of these series, the model normalizes and fills in data across all years for each sub-region. Normalization means it modifies sub-regional values with respect to country-level data. To do this, the model checks the base year (2010) for all sub-regions. If base-year data is not available, it uses the closest available year. The aggregation rules for GDP2011, Population, and Land Area are all SUM, so the model simply multiplies the country base-year value by each sub regional contribution:

Since GDP2011PCPPP is defined as a rate, is filled out a little differently. Here, the sub regional values are multiplied by their respective weights for Population, then summed for a weighted average. Then a multiplier is computed by dividing the country-level GDP2011PCPPP value by this weighted average. The final normalized data are the product of the multiplier and the sub regional values.