TY - JOUR
T1 - The possibility of using anonymized data from Japanese official statistics for implementing geospatial analysis at a small area level
T2 - Creating a geographically disaggregated synthetic microdata set by using a spatial microsimulation
AU - Hanaoka, Kazumasa
PY - 2012
Y1 - 2012
N2 - In recent times, with growing attention toward data analysis using a microdata set, anonymized data of official statistics were published in Japan; however, these data have limited geographical details so they cannot be used for geospatial analysis at a small area level. Therefore, in order to overcome such problems, we attempted to apply a spatial microsimulation technique to create geographically disaggregated synthetic microdata from anonymized data from official statistics of the 2004 National Survey of Family Income and Expenditure. Our study area was Kusatsu City, Japan. As a method for creating a synthetic microdata set, we used a simulated annealing, which is a combinatorial optimization technique. For each cho-cho-moku (administrative region for the census), by using a simulated annealing, the samples were repeatedly swapped between anonymized data and synthetic microdata in a manner that they agreed with the five constraint tables created from the small area statistics of the 2005 Japanese Population Census. The output of the synthetic microdata was evaluated based on the Overall Total Absolute Error and the Overall Relative Sum of Squared Z-scores. The results showed that the synthetic microdata almost perfectly matched the constraint census tables, except for a few districts in the southern part of the study area, where skewed population distributions are found. We assumed that the reason for a high rate of accuracy was that the anonymized data contained 1.8 thousand samples of households and the sampling biases were relatively small. Another merit of using anonymized data from public surveys is that it has a detailed and wide range of variables on demographic and socioeconomic attributes as well as on spending amounts. Therefore, the synthetic microdata were retabulated as per cho-cho-moku to visualize the spatial patterns of monthly spending per person by items. In addition, we presented a map of spending patterns for 28 items revealing geographical diversity in consumption. These detailed maps will be used for supporting further analysis relating to food deserts, health, and income inequalities. In conclusion, the use of anonymized data and its geographical disaggregation using a spatial microsimulation allow us to implement geospatial analysis at a small area level using a microdata set.
AB - In recent times, with growing attention toward data analysis using a microdata set, anonymized data of official statistics were published in Japan; however, these data have limited geographical details so they cannot be used for geospatial analysis at a small area level. Therefore, in order to overcome such problems, we attempted to apply a spatial microsimulation technique to create geographically disaggregated synthetic microdata from anonymized data from official statistics of the 2004 National Survey of Family Income and Expenditure. Our study area was Kusatsu City, Japan. As a method for creating a synthetic microdata set, we used a simulated annealing, which is a combinatorial optimization technique. For each cho-cho-moku (administrative region for the census), by using a simulated annealing, the samples were repeatedly swapped between anonymized data and synthetic microdata in a manner that they agreed with the five constraint tables created from the small area statistics of the 2005 Japanese Population Census. The output of the synthetic microdata was evaluated based on the Overall Total Absolute Error and the Overall Relative Sum of Squared Z-scores. The results showed that the synthetic microdata almost perfectly matched the constraint census tables, except for a few districts in the southern part of the study area, where skewed population distributions are found. We assumed that the reason for a high rate of accuracy was that the anonymized data contained 1.8 thousand samples of households and the sampling biases were relatively small. Another merit of using anonymized data from public surveys is that it has a detailed and wide range of variables on demographic and socioeconomic attributes as well as on spending amounts. Therefore, the synthetic microdata were retabulated as per cho-cho-moku to visualize the spatial patterns of monthly spending per person by items. In addition, we presented a map of spending patterns for 28 items revealing geographical diversity in consumption. These detailed maps will be used for supporting further analysis relating to food deserts, health, and income inequalities. In conclusion, the use of anonymized data and its geographical disaggregation using a spatial microsimulation allow us to implement geospatial analysis at a small area level using a microdata set.
KW - Anonymized data
KW - Kusatsu City
KW - National Survey of Family Income and Expenditure
KW - Simulated annealing
KW - Spatial microsimulation
KW - Synthetic microdata
UR - http://www.scopus.com/inward/record.url?scp=84865373885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865373885&partnerID=8YFLogxK
U2 - 10.4200/jjhg.64.3_195
DO - 10.4200/jjhg.64.3_195
M3 - Article
AN - SCOPUS:84865373885
VL - 64
SP - 195
EP - 211
JO - Jimburn Chiri/Human Geography, Kyoto
JF - Jimburn Chiri/Human Geography, Kyoto
SN - 0018-7216
IS - 3
ER -