Population Data: Difference between revisions
AltheaDitter (talk | contribs) No edit summary |
AltheaDitter (talk | contribs) No edit summary |
||
Line 18: | Line 18: | ||
This series should be treated as a placeholder, not as a legitimate series. Further research is required to reproduce the data from the paper using the paper's methodology. The paper's appendices need to be used to locate the original data sources for all the data and for understanding the methods used for estimation. This series should be replicated and hypothetically there may be an update available, which would give the model more recent data. | This series should be treated as a placeholder, not as a legitimate series. Further research is required to reproduce the data from the paper using the paper's methodology. The paper's appendices need to be used to locate the original data sources for all the data and for understanding the methods used for estimation. This series should be replicated and hypothetically there may be an update available, which would give the model more recent data. | ||
[[File:NetMigrationRate v728.jpg]] | |||
It is apparent that the way in which the model estimates migration by using exogenous historical and forecast data from the UNPD, that having historical data is insufficient for the model to produce good forecasts. Rather the model will force all provinces to a rate of zero. Despite this model behavior, the sum of the province's population forecasts is quite close to the full 186-model forecasts. | |||
Interprovincial migration is a major issue in China due to the mass rural-urban migration that has occurred over the last few decades. Measuring interprovincial migration proves to be difficult in all subnational models, simply because the data is frequently unavailable. In China, there is a system to control migration that requires households to register as a means to gain access to services. The system also restricts migration by refusing certain rural households the ability to migrate to different provinces legally. This system creates a large amount of data tracking migration, but this data has not been found publicly, only alluded to. Also, an unintended consequence of this system is that there is a great deal of illegal migration among the restricted households. Municipalities, such as Beijing, have a substantial migrant worker population which throws off the model's ability to realistically forecast population and age-sex cohorts. Thus, PopMigration is an important series for the China Provincial Model. The national migration registry data is, thus far, inaccessible and it is undoubtedly inaccurate because it does not account for the substantial illegal interprovincial migration. | |||
= '''TFRMedUNPD''' = | |||
The data that is used for this series is from a report that was published by the National Bureau of Statistics in China in 2007 that was aptly named [http://www.eastwestcenter.org/fileadmin/stored/pdfs/popfertilityestimateschina.pdf Fertility Estimates for Provinces of China 1975-2000]. The series is dated because the most recent data point is in 2000, but it is the only data that has been found. Total fertility rates for China's provinces has remained a difficult to find series. It is not published by provinces on any of the National Data websites, nor has it been found in any Human Development Reports on China. | |||
[[File:TotalFertilityRate v728.jpg]] | |||
= '''PopulationUrban''' = | |||
Urban population data for the China provincial model came from the [[China_Statistical_Yearbooks|China Statistical Yearbooks]] from 2006-2016. The data is published in tens of thousands rather than the millions that are used in the model. Thus [[ApplyMultAll|ApplyMultAll]] is used on this series to normalize the data into the proper unit of millions. | |||
[[File:PopUrban 728.jpg]] | |||
= '''Households''' = | |||
China's provincial household data came from the [[China_Statistical_Yearbooks|China Statistical Yearbooks]] 2012-2015, which produced a series that runs from 2011-2014. There is household data published in the 2016 yearbook, but this data was not included because it is significantly different from the preceding years by twice the households in some provinces. There was not a reason for this inconsistency that was found in the China Statistical Yearbooks' metadata. | |||
[[File:Householdsize v728.jpg]] | |||
There is also data available that was created by the Chinese census that was received as a part of the China Data Center database. This data was chosen to not be included in the model because the most recent data was in 2010, which is more dated than the China Statistical Yearbooks' data. This data could not be blended with the China Statistical Yearbook data because there is a significant (approximately 3 million households) jump between 2010 (the end of the census data) and 2011 (the beginning of the China Statistical Yearbook data). | |||
= '''PopulationYouthDepend%''' = | |||
PopulationYouthDepend% is the percentage of the population that is under the age of 15. This is series was found in the [[China_Statistical_Yearbooks|China Statistical Yearbooks]] 2012-2016, which means that the series runs from 2011-2015. | |||
|
Revision as of 15:13, 17 April 2017
Population
The provincial population series for China runs from 1995 through 2015. The original data lacked observations for Chongqing in 1995 and 1996 because Chongqing did not gain its status as a municipality until 1997. Chongqing was part of Sichuan province prior to 1997 and thus, Sichuan province's data included Chongqing in 1995 and 1996 and there was a substantial drop in Sichuan province in 1997. To remedy this issue, the ratio of population in Chongqing relative to the sum of Sichuan province and Chongqing in 1997 was used to estimate the population in Chongqing in 1995 and 1996. This population estimate for Chongqing was subtracted from Sichuan's population in 1995 and 1996. This provides a smoother and more accurate historical series. and data for Chongqing and Sichuan province were adjusted/estimated for 1995 and 1996.
Despite this estimation, there are still some unexplained spikes and drops in the population data around 2000. Guangdong has the most noticeable shift, where in 1999 population is 73.9 million and in 2000 population jumps up to 86.9. These shifts in the data are indeed in the data and not human errors (at least not on our end). The data source was contact regarding this and other potential data issues that were found while pulling data, they have yet to respond.
The provincial population series for China's subregional model came from the China_Statistical_Yearbooks from 1996-2016. These yearbooks are available online on the National Bureau of Statistics of China website. There are other sources available for provincial population data in China, which can be read about at Alternative Population Data_Sources. The population data is published in tens of thousands, rather than the millions that are used in the full 186 version of IFs. Rather than converting the population data into millions, the population series was imported as it was published in tens of thousands and ApplyMultAll is selected in the data dictionary. This normalizes the historical data to the population of China in the full 186 version of IFs. This choice to normalize rather than change units is meant to decrease the likelihood of human error, and simplify future data updates.
Age-Sex Cohorts
China's Age-Sex Cohorts was data that was received in the University of Michigan's China Data Center database, but the data was originally collected and published in the 2010 Census. The cohorts for all Provinces were in five year cohorts up to 100+ or the 21st cohort, except cohort 1. The first cohort or population 0-4 years of age, was calculated by summing the infant population and the population 1 to 4 years of age.
ForecastNetMigrationRateUNPD
The current data series in use comes from a paper out of the University of Washington. The authors used a variety of data sources to estimate interprovincial migration. This data was compiled and published as five-year averages of net migration rates. Because this series is an annualized series, the annual net migration rates were estimated through a simple multi-step process. First the five-year averages were assigned as the middle year value for the annualized data. For instance, if the observation for Anhui province was -0.5 for 1995-2000 then -0.5 was assigned to 1998. The data was then interpolated between the observations to produce annualized data for a fifteen-year span.
This series should be treated as a placeholder, not as a legitimate series. Further research is required to reproduce the data from the paper using the paper's methodology. The paper's appendices need to be used to locate the original data sources for all the data and for understanding the methods used for estimation. This series should be replicated and hypothetically there may be an update available, which would give the model more recent data.
It is apparent that the way in which the model estimates migration by using exogenous historical and forecast data from the UNPD, that having historical data is insufficient for the model to produce good forecasts. Rather the model will force all provinces to a rate of zero. Despite this model behavior, the sum of the province's population forecasts is quite close to the full 186-model forecasts.
Interprovincial migration is a major issue in China due to the mass rural-urban migration that has occurred over the last few decades. Measuring interprovincial migration proves to be difficult in all subnational models, simply because the data is frequently unavailable. In China, there is a system to control migration that requires households to register as a means to gain access to services. The system also restricts migration by refusing certain rural households the ability to migrate to different provinces legally. This system creates a large amount of data tracking migration, but this data has not been found publicly, only alluded to. Also, an unintended consequence of this system is that there is a great deal of illegal migration among the restricted households. Municipalities, such as Beijing, have a substantial migrant worker population which throws off the model's ability to realistically forecast population and age-sex cohorts. Thus, PopMigration is an important series for the China Provincial Model. The national migration registry data is, thus far, inaccessible and it is undoubtedly inaccurate because it does not account for the substantial illegal interprovincial migration.
TFRMedUNPD
The data that is used for this series is from a report that was published by the National Bureau of Statistics in China in 2007 that was aptly named Fertility Estimates for Provinces of China 1975-2000. The series is dated because the most recent data point is in 2000, but it is the only data that has been found. Total fertility rates for China's provinces has remained a difficult to find series. It is not published by provinces on any of the National Data websites, nor has it been found in any Human Development Reports on China.
PopulationUrban
Urban population data for the China provincial model came from the China Statistical Yearbooks from 2006-2016. The data is published in tens of thousands rather than the millions that are used in the model. Thus ApplyMultAll is used on this series to normalize the data into the proper unit of millions.
Households
China's provincial household data came from the China Statistical Yearbooks 2012-2015, which produced a series that runs from 2011-2014. There is household data published in the 2016 yearbook, but this data was not included because it is significantly different from the preceding years by twice the households in some provinces. There was not a reason for this inconsistency that was found in the China Statistical Yearbooks' metadata.
There is also data available that was created by the Chinese census that was received as a part of the China Data Center database. This data was chosen to not be included in the model because the most recent data was in 2010, which is more dated than the China Statistical Yearbooks' data. This data could not be blended with the China Statistical Yearbook data because there is a significant (approximately 3 million households) jump between 2010 (the end of the census data) and 2011 (the beginning of the China Statistical Yearbook data).
PopulationYouthDepend%
PopulationYouthDepend% is the percentage of the population that is under the age of 15. This is series was found in the China Statistical Yearbooks 2012-2016, which means that the series runs from 2011-2015.