SubRegionalization Handbook: Difference between revisions
AltheaDitter (talk | contribs) No edit summary |
AltheaDitter (talk | contribs) No edit summary |
||
Line 169: | Line 169: | ||
For preprocessor (non-essential) variables included in IFsHistSeriesXXX.mdb before adding provinces, base year values are normalized. Variables used in the preprocessor are marked in the field ‘UsedInPreProcessor’ (formerly ‘UseforProvince’) in the datadict.mdb. Non-preprocessor variables will not be normalized. In general, normalization is important because the preprocessor draws upon base year (2010) data in order to initialize values of the model run. Simply put, provincial values must be consistent with the values for the total country. The process is similar to what we saw for the essential series. The main difference is that the model only normalizes the base year value. Values for other years will not be changed and no hole-filling occurs. First, the model computes a multiplier for provincial data by dividing the country value in the base year by the weighted average, simple average, or sum (designated by aggregation field) of the provinces. If base year data are not available for the country or provinces, the model looks to the Most Recent column for provinces and country data. Note that this could mean the multiplier is computed with data from different years. Multipliers are calculated as: | For preprocessor (non-essential) variables included in IFsHistSeriesXXX.mdb before adding provinces, base year values are normalized. Variables used in the preprocessor are marked in the field ‘UsedInPreProcessor’ (formerly ‘UseforProvince’) in the datadict.mdb. Non-preprocessor variables will not be normalized. In general, normalization is important because the preprocessor draws upon base year (2010) data in order to initialize values of the model run. Simply put, provincial values must be consistent with the values for the total country. The process is similar to what we saw for the essential series. The main difference is that the model only normalizes the base year value. Values for other years will not be changed and no hole-filling occurs. First, the model computes a multiplier for provincial data by dividing the country value in the base year by the weighted average, simple average, or sum (designated by aggregation field) of the provinces. If base year data are not available for the country or provinces, the model looks to the Most Recent column for provinces and country data. Note that this could mean the multiplier is computed with data from different years. Multipliers are calculated as: | ||
<math>\frac{country\ [base]}{\sum{subregion\ [base]\ *\ weight\ [base]}} | <math>\frac{country\ [base]}{\sum{subregion\ [base]\ *\ weight\ [base]}}</math> |
Revision as of 22:59, 28 April 2017
Foundational Elements
A Note on Terminology
Sub-regional units go by many names internationally. When we refer to a sub-region in this document, we may use a variety of names to refer to the same entity. In IFs, there is a bias toward the use of “province” to refer to a sub-regional unit, but it is functionally equivalent to states, departments, territories, or any other name for a sub-regional unit.
Much of the following discussion will involve variables included in the model, such as GDP or POPULATION. These may be referred to as variable or series. The two terms are used interchangeably here.
Database Modification
IFsHistSeriesXXX.mdb
Before a country in IFs can be divided into states/provinces/departments, a new data file called IFsHistSeriesXXX.mdb — for example IFsHistSeriesChina.mdb — must be created. That file is a variation of the file called IFsHistSeries.mdb, which contains data tables for all countries across time, for all variables.
For each data table in IFsHistSeriesXXX.mdb, rows correspond to all states/provinces/departments in the specified country. The file includes a table called DataDict (analogous to the file DataDict.mdb described below) that provides information on names, sources, and specialized procedures followed to prepare the state/province/department data.
To subregionalize countries that do not already have a IFsHistSeriesXXX.mdb file, users will need to create one. While this can be done manually, it is faster and simpler to just add the file with a feature in IFs. Before creating the table, however, make sure that a file called _ifsHistSeriesxxx.mdb exists in C:\Users\Public\IFs\DATA. After that is done, follow these steps:
- Click the path: Extended Features > Change Country Subregionalization > Add New Historical Series File
- Select the country that is intended to be subregionalized and click OK
This does a couple things. First, it creates a new IFsHistSeries.mdb for the country you selected in C:\Users\Public\IFs\DATA. The file will include tables for the 4 essential series and a datadict. It also adds the country to the Provinces tables in Provinces.mdb.
The entire process of sub-regionalization affects five files: IFs.mdb, IFsHistSeries.mdb, IFsCoVatra.mdb, IFsCoVatraSeries.mdb, andIFsWVSCohort.mdb. For each of the files, the process looks in the IFs\Data folder for an equivalent file with the name of the country at the end — again, IFsHistSeriesChina.mdb is an example — then creates a copy in the IFs\Runfiles directory. Modification is done in the copies on the \Runfiles directory as well to the original files in the \Data directory. Data for individual sub-regional units and the country as a whole are preserved, and can be copied back and forth between \Data and \Runfiles directories. This allows the user to switch between model runs with different sub-regionalizations, and reset everything to the starting point without any states/provinces/departments sub-regionalized.
DataDict.mdb
Path: C:/ > Users > Public > IFs > DATA > DataDict.mdb
DataDict.mdb documents the variable names, sources, units of measurement, years of data coverage, and aggregation rules for all historical data series in the model.
The IFsHistSeriesXXX.mdb file has a copy of the DataDict table, and all series in IFsHistSeriesXXX.mdb must have a corresponding DataDict row in both places; otherwise, the model will not recognize them. If the fields Years andAggregation are not correctly filled in the series may not read correctly.
Provinces.mdb
Path: C:/ > Users > Public > IFs > DATA > Provinces.mdb
In addition to the creation of IFsHistSeriesXXX.mdb, initiating the sub-regionalization process requires modifications to tables that exist in the file Provinces.mdb. The alterations to specific tables in Provinces.mdb are outlined below:
All Provinces. The name and FIPS code of the target country should be added to a line in this table (those names should be identical to those used for the country in IFs.mdb and IFsHistSeries.mdb files; and province break-out should only be done when the Full version of IFs is being used with each country represented as a separate region). The AllProvinces field should normally be checked. The only circumstances under which it is not checked is when the user is selecting one, or a few, provinces for use in the model in addition to the entire country — see the discussion of the Regionalization table below.
Provinces. In this table all of the provinces or states should be identified. The first column is the target country (the same for all provinces). The second is a unique FIPS Code created by the user (since country-specific codes are three letters and cannot be duplicated, two-letter codes normally work well for these). Finally, the name of the sub-regional unit should be specified.
The addition of province or state names to this table can be done manually, but it is also automated with the "Add New Historical Series File” option, which creates IFsHistSeriesXXX.mdb with only the basic tables needed. Users can access this automated option from the main menu of IFs by following the path:
Extended Features > Change Country Sub-Regionalization > Add New Historical Series File
The province/state names need to coincide exactly with the ones that are used in the tables of the IFsHistSeriesXXX.mdb file for variables holding data. The province/state names also need to coincide with the province/state names that are used on the global map file to display the breakdown of countries into provinces/states. The exact names used in the map file can be found in IFs_Province.mdb using a query of the form: N.B: select Province from provinces where Country='Republic of India.'
Regionalization. This table repeats the target Country and Full Province Name as in the Provinces table. The new feature is the addition of a Region name. Each province could be a separate Region, in which case the names should be less than 10-12 letters so as to allow easy display in IFs tables and graphs. Most often the provinces will be grouped into sub-regions of the target Country, in which case it is recommended that the Region name be the target country name or some abbreviated part of it followed by the sub-region, again a total of no more than 10-12 letters. Leading with the Country name facilitates the use of alphabetic listings of IFs.
If the user is only adding a single province (or a very small number) to the model and not splitting a country into regions based on all states/provinces, the lines added to the Regionalization table will be only for the added states/provinces (e.g. Hawaii could be added in addition to the US by adding a line to this table for it only, not other US states, and then not checking AllProvinces in the AllProvinces table). Even if all provinces/states are in the data set, as with India, this process allows a single state of India to be pulled out and added to IFs. Warning: if a user only adds a state/province/department rather than dividing a country into sub-regions based on all of them, there will be double counting of the country and the added unit(s) in the model; only selected variables in IFs, such as WPOP and variables of the agricultural model have been set up for correction of such double counting in the run of the model.
The process of creation of regions can be done within IFs, under:
Extended Features > Change Country Sub-Regionalization > Change Sub-Regionalization
IFsxxx.mdb
This file holds many tables, but only a few are of interest for sub-regionalization:
PopAgeCohortCountryFemale andPopAgeCohortCountryMale. These files store data on the age-sex cohort breakdown of a population. If possible, use 2012 data because we populate the country-level tables with 2012 UNDP data. The first column, Cohort0, is reserved for infants (age <1). Cohort1 is for ages 1-4. Cohort2 and onward are 5-year age groups (e.g. Cohort2 is ages 5-9, etc.).Data accuracy is important because if there are large jumps between cohorts, errors can occur when running the model through 2100. If these data are not available, these tables read from the country-level data.
HealthDetailedDeathsCtry. These files store data on mortality rates by sex of a population. Raw mortality data must be ascribed to the 15 mortality sub-types in IFs to match this file structure. These categories are as follows:
- Other communicable diseases
- Malignant neoplasms
- Cardiac
- Digestive
- Respiratory
- Other noncommunicable diseases
- Road injuries
- Other unintentional injuries
- Intentional injuries
- Diabetes
- HIV
- Diarrhea
- Malaria
- Respiratory infections
- Mental health
If these data are not available, this file will read from the country-level data that already exist in IFs.
PopMortalityCohortCountryFemale andPopMortalityCohortCountryMale. This table stores the survivor tables of a population by sex. These data series are calculated based on the mortality data from the table HealthDetailedDeathsCtry. This is calculated by the likelihood that a person at any given age will survive to the next cohort, given the prevailing trends in mortality. If these data are not available, this file will read from the country-level data that already exist in IFs.
Users may need to modify table EconBaseSector. Unlike other base files in IFs.mdb, rows are not automatically added to this table when additional countries (in this case sub-regions) are added. If too many sub-regions are added, this can cause problems when rebuilding the base. EconBaseSector has six rows for each country, so for example, if the user wants to break India into 36 sub-regions, a total of 1326 rows will be needed (221 countries*6). As of July 2014, the default number of rows for this table is 1304. Additional rows must be added to the table before adding sub-regions to ensure they are copied to \RUNFILES.
IFsXXX.DAT
C: > User > Public > IFs > Data > IFsXXX.DAT
When it is determined that tuning is necessary, the user needs to build the suitable scenario and save the scenario file (.sce extension). Then, open up the .sce file (C:\Users\Public\IFs\Scenario) and copy and paste the scenario code into a IFsXXX.DAT file. If a .DAT file does not exist for your country, simply copy and paste one available for another country (South Africa, United States), change the country name, and delete the contents. Three very important notes:
- .DAT files most likely will not be able to be accessed without changing the extension. Simply add “.txt” at the end of the file name and click OK when Windows warns you about modifying the file extension. Once this is done, you will be able to open it with wordpad or notepad.
- When copying the code from the .sce to the .DAT file, be sure NOT to include, “CUSTOM,” “COMMENT,” and “START.” See the following example. The .DAT file should only include the highlighted script:
- Once the IFsXXX.DAT file is done, users will have to delete the broken-out model and re-subregionalize. The process will incorporate the newly created .DAT file into the base case.
Provincial Data Processing
The model executes 4 processes when provinces are added (see section 3. ‘Add or Delete Sub-Regions’ for procedural steps):
- Checks for essential series in IFsHistSeriesXXX.mdb
- Normalizes and fills holes in essential series
- Normalizes base-year values in preprocessor variables present in IFsHistSeriesXXX.mdb when provinces are added
- Estimating values for preprocessor series NOT present in IFsHistSeriesXXX.mdb when provinces are added
The following sections will cover these processes in detail.
Checking the Essential Series
As described in section 1, data for sub-regions in IFs are stored in IFsHistSeriesXXX.mdb. In order for sub-regions to be added, data for the essential series is required. The 4 variables are described below.
Variable | Definition |
GDP2011 |
Gross Domestic Product in 2011 Constant Dollars |
GDP2011PCPPP | Gross Domestic Product per Capita in 2011 Constant Purchasing Power Parity International Dollars |
Population | Population in Millions |
LandArea | Land Area |
When sub-regions are added, the model checks that each sub-region has at least one year of data for each of these required variables. If this is not true, then (1) the specific state/province/department cannot be used, (2) it is excluded from the process, and (3) the system will give a message indicating the problem. With these four series, the shell model is established. The user can add incrementally build upon this by adding more data. Also note, sub-regional data must be for years that are available in the country-level data. For example, if only 2011 population data is available for sub-regions, national-level population data must include 2011 as well.
Normalization and Hole-Filling in Essential Series
Due to the special importance of these series, the model normalizes and fills in data across all years for each sub-region. Normalization means it modifies sub-regional values with respect to country-level data. To do this, the model checks the base year (2010) for all sub-regions. If base-year data is not available, it uses the closest available year. The aggregation rules for GDP2011, Population, and Land Area are all SUM, so the model simply multiplies the country base-year value by each sub regional contribution:
Since GDP2011PCPPP is defined as a rate, is filled out a little differently. Here, the sub regional values are multiplied by their respective weights for Population, then summed for a weighted average. Then a multiplier is computed by dividing the country-level GDP2011PCPPP value by this weighted average. The final normalized data are the product of the multiplier and the sub regional values.
Where POP weight is:
These normalization procedures are important for a couple reasons. First and foremost, they allow the model to produce reasonable forecasts. They also allow the user to use any type of unit for the essential series. This may be quite helpful because sub regional data is often harder to find than national figures, especially PPP data.
Normalization of Preprocessor Variables Present in IFsHistSeriesXXX.mdb When Provinces are Added
For preprocessor (non-essential) variables included in IFsHistSeriesXXX.mdb before adding provinces, base year values are normalized. Variables used in the preprocessor are marked in the field ‘UsedInPreProcessor’ (formerly ‘UseforProvince’) in the datadict.mdb. Non-preprocessor variables will not be normalized. In general, normalization is important because the preprocessor draws upon base year (2010) data in order to initialize values of the model run. Simply put, provincial values must be consistent with the values for the total country. The process is similar to what we saw for the essential series. The main difference is that the model only normalizes the base year value. Values for other years will not be changed and no hole-filling occurs. First, the model computes a multiplier for provincial data by dividing the country value in the base year by the weighted average, simple average, or sum (designated by aggregation field) of the provinces. If base year data are not available for the country or provinces, the model looks to the Most Recent column for provinces and country data. Note that this could mean the multiplier is computed with data from different years. Multipliers are calculated as:
Where weight is based on the aggregation rule of the variable (generally GDP or POP):
The computed multiplier is stored in the multiplier field for the series row in the datadict table of IFsHistSeriesXXX.mdb in \RUNFILES.
If provincial data differs significantly from country data, likely due to different units, users can check the ApplyMultAll field of the datadict table in IFsHistSeriesXXX. This signals the model to apply the multiplier to all previous years of provincial data. This should be done before sub-regions are added.
Estimating Values for Preprocessor Series Not Present in IFsHistSeriesXXX.mdb When Provinces Are Added
For preprocessor (non-essential) variables included in IFsHistSeriesXXX.mdb before adding provinces, base year values are normalized. Variables used in the preprocessor are marked in the field ‘UsedInPreProcessor’ (formerly ‘UseforProvince’) in the datadict.mdb. Non-preprocessor variables will not be normalized. In general, normalization is important because the preprocessor draws upon base year (2010) data in order to initialize values of the model run. Simply put, provincial values must be consistent with the values for the total country. The process is similar to what we saw for the essential series. The main difference is that the model only normalizes the base year value. Values for other years will not be changed and no hole-filling occurs. First, the model computes a multiplier for provincial data by dividing the country value in the base year by the weighted average, simple average, or sum (designated by aggregation field) of the provinces. If base year data are not available for the country or provinces, the model looks to the Most Recent column for provinces and country data. Note that this could mean the multiplier is computed with data from different years. Multipliers are calculated as: