IPUMS Global Health Interoperable Variables

Date: December 2024

Introduction

At IPUMS Global Health, many of our users have expressed interest in combining harmonized data from IPUMS DHS, IPUMS MICS, and IPUMS PMA. To make this process easier, we are creating “Global Health” (GH) harmonized variables. These variables, identified with “_GH” at the end of the variable name, share names and codes across the three survey projects. This user note describes:

Why use IPUMS GH interoperable variables?

Our goal in the original data collections is to retain all the detail of variable codes while maximizing comparability across samples. These new interoperable Global Health variables instead identify major categories and impose consistent codes and variable names across IPUMS DHS, MICS, and PMA. This consistency allows users to more easily study broad trends or calculate sustainable development goal indicators across time and place.

You will still need to create a separate extract for each IPUMS Global Health data collection. Instructions about gaining access to each harmonized survey database are available through the hyperlinks below.

What GH variables have we created

To date, we have created GH variables for two units of analysis: Women and Households.

Our initial batch represents a pilot of different types of variables that can be created across IPUMS Global Health projects. Variables in this pilot phase were selected based on:

Based on feedback from users, we will continue to add new variables and units of analysis over time.

Technical variables (available for both Women and Households)

These variables uniquely identify samples within each project, while providing consistent labels for variables shared across projects (such as country and year).

Variable Label GH Variable Name
Country COUNTRY_GH
Sample SAMPLE_GH
Year of sample YEAR_GH
IPUMS Global Health project PROJECT_GH
Urban status URBAN_GH

Women of Childbearing Age

For most surveys, these variables relate to individual women age 15-49, but check the variable universes to confirm the age range and whether limited to ever-married women.

Variable Label GH Variable Name
Demographics
Age of woman AGE_GH
Marital status MARST_GH
Age of partner/husband PARTNERAGE_GH
Fertility & Family Planning
Currently pregnant PREGNANT_GH
Number of children ever born CHEB_GH
Currently using family planning FPCURRUSE_GH
Domestic Violence Attitudes
(Justifiable to beat a wife because she…)
Goes out without telling husband DVAGOOUT_GH
Neglects the children DVNEGLECTS_GH
Argues with husband DVAARGUES_GH
Refuses sex with husband DVAREFUSESEX_GH
Burns the food DVABURNFOOD_GH
Media & Information Technology
Woman owns a mobile phone MOBILEWM_GH
Frequency of reading newspape NEWSFREQ_GH
Frequency of watching television TVFREQ_GH

Households

These variables are available for each sampled household.

Variable Label GH Variable Name
Household Characteristics
Material of walls WALLS_GH
Material of roof ROOF_GH
Type of cooking fuel COOKFUEL_GH
Type of toilet TOILET_GH
Material of floor FLOOR_GH
Household Assets
HH has mobile phone MOBPHONE_GH
HH has internet INTERNET_GH
HH has electricity ELECTRC_GH
HH has car CAR_GH
HH has radio RADIO_GH
HH has television TV_GH
HH has personal computer PC_GH
HH has bicycle BIKE_GH
HH has motorcycle MOTOCYCLE_GH
HH has refrigerator FRIDGE_GH
HH or someone in the household has a bank account BANKACC_GH

Where to find the GH variables in each of the IPUMS Global Health projects

Data within each IPUMS Global Health project are structured in different ways.

For each project, first select the unit of analysis.

Unit of analysis IPUMS DHS IPUMS MICS IPUMS PMA
Households Constructed from "Household members" unit Household characteristics Constructed from "Person - Family Planning" unit
Women Women Women Constructed from "Person - Family Planning" unit

Location of the GH variables within each project

After selecting the unit of analysis mentioned above, the GH variable can be found using the drop down navigation menus on the variable browsing page. The GH variables can be found using the drop-down navigation menus on the variable browsing page. Look for the following headings from the “Topics” menu.

GH variable IPUMS DHS IPUMS MICS IPUMS PMA
Households IPUMS Global Health IPUMS Global Health Other > Global Health
Women IPUMS Global Health IPUMS Global Health Other > Global Health

Other information about how to select samples and browse data

In your extract from each data collection, you may want to add additional information needed for your analysis. For guidance on creating a customized dataset for each project, consult the user guides linked below.

How to create a comparable dataset from the downloaded datafiles

After you create an extract from each project and before you append the datafiles, you will need to perform some additional data manipulation to make the units of analysis comparable.

Additional data manipulation is necessary because the data are derived from different units of analysis. You must recode the data files using the guidelines below before the files can be merged together. The following are the recommended recodes based on the most comparable denominator (least amount of manipulation and recoding). Other ways of merging and combining are possible, given the great flexibility of all the microdata.

Households

The interoperable variables are supported for one observation per household.

IPUMS MICS: The household unit of analysis already is representative of each household being an observation. No further recodes are needed for IPUMS MICS households.

IPUMS DHS: The household member file represents all members of the household. To achieve comparability for IPUMS DHS data, limit observations to only one person per household. After you have downloaded your data extract, keep only cases for which the variable LINENO equals 1 (usually the household head). The following command is written in Stata code.

keep if lineno==1

IPUMS PMA: The person - family planning unit of analysis contains an entire household roster, even for households without women of childbearing age. Select this unit of analysis, and when selecting samples, choose to keep “All Cases” under the “Sample Members” section, and include the variable LINENO. After you have downloaded your data extract, keep only cases for which the variable LINENO equals 1, which retains only one person per household. The following command is written in Stata code.

keep if lineno==1

Women

The interoperable variable apply to each woman of childbearing age (age 15-49 in most samples).

IPUMS MICS and IPUMS DHS data have the same structure. When “Women” is the chosen unit of analysis, each row of data represents one woman of childbearing age. Check the universe statements to determine exact age ranges and any marital status limitations for each sample.

IPUMS PMA: Choose the "Person - Family Planning" unit of analysis. On the samples selection page, choose the option for “Female Respondents” under the Sample Members heading. This selection includes only women of childbearing age who completed the female questionnaire in the data extract, and it is thus comparable to MICS and DHS samples.

Appending data files from different GH projects together

Once you have created and downloaded an extract from each project, you will want to carry out any additional recodes and data cleaning in the individual files before appending them together.

In Stata, use the command “append". You can append multiple files at a time. For example, if you had a file open called “dhs” and you wanted to append the “mics” and “pma” datasets to it, you could use the command:

append using mics pma

In R, download the files you would like to work with into the same directory. Insert the name of that directory below into the first line of the code. This will work even if the files do not have exactly the same columns (i.e., variables).


		#read in the filenames of all files in this folder
			file_list <- list.files("[insert filepath here]")
		#read the files from file_list into R as csvs and create a list called myfiles
			myfiles = lapply(file_list, read.csv)
		#append all files in the myfiles object into one csv
			appended <- do.call('rbind.fill', myfiles)
		

Either of these approaches will result in a single datafile with information from each of the different IPUMS Global Health data collections.

What’s coming next?

We understand that this current offering of GH variables may not be comprehensive enough. We are planning to expand our Global Health variables not only for the woman and household units of analysis, but also to add variables at the child level.

Do you have a suggestion for us? Please reach out at ipums@umn.edu!