The NHANES 2003-2004 Sample Person Demographics File provides the interview and MEC examination status variable, sample weights, and selected demographic variables such as gender, age, race/ethnicity, education, marital status, country of birth, pregnancy status, total family and household income, and ratio of income to poverty.
This updated Demographics file includes several new variables that provide information on citizenship status, years of U.S. residence, educational attainment, school attendance, household size, characteristics of the household reference person, and an indicator for the month of exam. The household reference person is the first household member, 18 years of age or older who is listed on the Screener household member roster who owns or rents the residence where members of the household reside. Brief descriptions of the variables that have been added to the 2003-2004 Demographics file appear in the Data Processing and Editing section of the documentation.
Several questionnaire items that were asked in the Family and Sample Person Demographics questionnaires are not included in the NHANES 2003-2004 data release file. Concerns about data disclosure and confidentiality protection prevented some of the interview information from being released publicly.
All survey participants who have a household interview record have a Demographics file record. The Demographics questionnaire items include family-level and individual-level information. The target age groups for the Demographics questions vary. Please review the NHANES 2003-2004 Family and Sample Person Demographics section questionnaires and codebooks.
Interview Setting and Mode of Administration
Demographics information was collected in the home prior to the health examination. Computer-assisted personal interviewing (CAPI) methodology was used. Persons 16 years of age and older and emancipated minors were interviewed directly. A proxy respondent provided information for survey participants who were less than 16 years of age and for persons who could not answer the questions themselves.
Quality Assurance & Quality Control
The NHANES computer-assisted personal interview (CAPI) software program that was used to collect the interview data had pre-programmed data edit and consistency checks. The data edit checks alerted the interviewer when unusual or potentially erroneous data values were recorded. The consistency checks were used to alert the interviewer when information was recorded that was inconsistent with previous data entries or respondent characteristics such as the respondent’s age. Questionnaire “skip” patterns were pre-programmed in the questionnaires to reduce respondent burden. Online information screens provided the interviewers with standardized descriptions of the terminology and concepts that were used in the questionnaires.
After data collection, interview records were reviewed by the NHANES field office staff for accuracy and completeness. A subset of the household interviews was verified by re-contacting the survey participants. The interviewers were required to record several interviews and the audio-taped interviews were reviewed by NCHS and contractor staff. The NHANES quality assurance and quality control procedures are described in the field procedures manuals that are posted on the NHANES website.
Data Processing and Editing
SDDSRVYR represents the data release number. A code of 3 denotes NHANES 2003-2004.
RIDSTATR is a recoded variable representing interview/examination status.
RIDEXMON is a variable indicating the six month time period when the examination was performed. A value of “1” indicates November 1st through April 30th; a value of “2” indicates May 1st through October 31st.
RIDAGEYR: This is the age of the sample person at the time of the screening interview. Age in years is reported by single year of age for persons from 1 through 84 years of age. For older adults, age in years was top coded at 85 years to reduce the risk of disclosure. All adults 85 years and older have a RIDAGEYR value of ’85’. In NHANES 2003-2004, the weighted mean age for participants 85 years and older is 88 years.
If exact date of birth information is provided during the interview, this information is used to calculate the exact age on the date of screening. Otherwise, an imputed date of birth is created using the following procedures: When date of birth information is missing or refused, but age in years is provided by the sample person: If month of birth is missing or not given it is imputed as 7. If day of birth is missing or not given, it is imputed as 1. If year of birth is missing or not given, it is imputed as the year of the screening interview less the age in years provided by the sample person at screening. Corrections are made to this imputed information for sample persons who are less than 1 year of age at the time of screening.
RIDAGEMN is age in months at household screening, provided only for those who were less than 85 years of age. If exact date of birth is not provided by the sample person, then the age in months is calculated based on the imputed age at screener to allow the sample person to proceed with the questionnaire and examination.
RIDAGEEX is age in months at MEC examination, provided only for persons who are less than 85 years of age at the time of the household screening interview. RIDAGEEX was not calculated for individuals with an imputed age.
RIDRETH1: This race/ethnicity variable is derived by combining responses to questions on race and Hispanic origin. Respondents who self-identified as “Mexican American” were coded as such (i.e., RIDRETH1=1) regardless of their other race-ethnicity identities. Otherwise, self-identified “Hispanic” ethnicity would result in code “2, Other Hispanic” in the RIDRETH1 variable. All other non-Hispanic participants would then be categorized based on their self-reported races: non-Hispanic white (RIDRETH1=3), non-Hispanic black (RIDRETH1=4), and other non-Hispanic race including non-Hispanic multiracial (RIDRETH1=5).
RIDRETH2 is the race/ethnicity recode that can be linked to the NHANES III race/ethnicity variable. Non-Hispanics who indicated more than one race (multiracial) and then selected a main race as black (non-Hispanic) or white (non-Hispanic) were recoded into those respective categories. In other cases, the coding was similar to RIDRETH1.
DMDBORN: Country of birth was recoded into three categories: 1) born in 50 U.S. states or Washington, D.C.; 2) born in Mexico; and 3) born in any other location or foreign country.
INDFMINC: This variable is the total family income variable. NCHS used the U.S. Bureau of the Census Current Population Survey (CPS) definition of “family” to group household members into one or more families (US Census 2003). The CPS defines a family as: “a group of two people or more (one of whom is the householder) related by birth, marriage, or adoption and residing together;” all such people (including related subfamily members) are considered to be members of one family. Over eighty percent of the NHANES households were single-family households; the remaining households were comprised of 2 or more CPS families.
After the information about sources of income was obtained in the Family Interview Income section questionnaire (INQ), the respondent was asked to report total combined family income for themselves and the other members of their family in dollars (question INQ200). If the respondent refused to answer INQ200 or did not know the total combined family income, an income screener question was asked (question INQ220) to query if the total family income was < $20,000 or ≥ $20,000. If the respondent answered INQ220, a follow-up question asked the respondent to select an income range (question INQ230) from a list of income ranges listed on a printed hand card. The midpoint of the income range was then used as the total family income value. Family income values were used to calculate the ratio of income to poverty (INDFMPIR) and estimated total household income (INDHHINC) as described below. Total family income is reported as a range in the NHANES data file.
INDHHINC: This variable is the estimated total household income. The estimated household income was derived from family income data. If a household was comprised of a single CPS family, the family income value was used as previously described. When more than one CPS family resided in the household, two methods were used to compute estimated total household income. One method was to use income data reported by each CPS family that was interviewed (INQ200). The second method for multi-family households used total household income information (INQ200) provided by a household reference person. The income information provided by respondents for each CPS family (method 1) was used whenever possible because this information was considered to be more reliable than information provided by a household reference person who may or may not have had firsthand knowledge of the total household income.
When income information was obtained from each of the CPS families in a household, the reported CPS family income values were summed to compute total household income (INDHHINC). When information was missing for any of the CPS families in the household, the estimated household income value provided by a family reference person was used to compute estimated total household income. Total household income could not be calculated for multi-family households when income range data were reported by any of the families in the household and thus the household income data are coded as missing.
INDFMPIR: This variable is an index for the ratio of family income to poverty. The Department of Health and Human Services’ (HHS) poverty guidelines were used as the poverty measure to calculate this index. These guidelines are issued each year, in the Federal Register, for determining financial eligibility for certain federal programs such as Head Start, Supplemental Nutrition Assistance Program (SNAP) (formerly Food Stamp Program), Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), and the National School Lunch Program.
The variable INDFMPIR was calculated by dividing family income by the poverty guidelines, specific to family size, as well as the appropriate year and state. The values were not computed if the income screener information (INQ 220: < $20,000 or ≥ $20,000) was the only family income information reported. If family income was reported as a range value, the midpoint of the range was used to compute the variable. Values at or above 5.00 were coded as 5.00 or more because of disclosure concerns. The values were not computed if the family income data was missing.
DMDMARTL is the derived marital status variable. Marital status data were collected for sample persons 14 years of age and older. Individuals belonging to single person households were not asked about their marital status during part of the 1999-2000 data collection cycle. For a number of these persons marital status was imputed from other questionnaire items that made reference to their marital status. Marital status remains missing 566 sample persons 14 years of age and older due to lack of sufficient data for imputation.
PREGNANCY STATUS: Pregnancy status at the time of examination (RIDEXPRG) is reported for females 8-59 years of age. Females 8-59 years of age received a urine pregnancy test prior to the dual energy x-ray absorptiometry (DXA) exam. Persons who reported they were pregnant at the time of exam were assumed to be pregnant; if the urine test was negative, but the subject reported they were pregnant, the status was still coded as pregnant at exam (RIDEXPRG=1). If the urine pregnancy results were negative and the respondent said they were not pregnant, the respondent was coded not pregnant at examination (RIDEXPRG=2). Persons who were only interviewed have an RIDEXPRG value = 3 (could not be determined).
The following new variables were added to the NHANES 2003-2004 Demographic file:
DMDCITZN: Citizenship status is reported as follows: A code of ‘1’ denotes U.S. citizen by birth or naturalization. A code of ‘2’ was assigned to persons who were not citizens of the U.S. Persons who were born in the U.S. or US territories who acquire citizenship at birth were coded as US citizens.
DMDYRSUS: This variable is the number of years the respondent has lived in the United States. Respondents who were born outside the U.S. were asked the month and year when they came to the U.S. to stay (DMQ.160). The responses to the question were recoded into 9 categories ranging from less than one year to 50 years or more.
DMDHHSIZ: This variable is the number of people in the respondent’s household. The values for this variable range from 1 to 7 with 7 being the code used for households comprised of 7 or more members.
DMDEDUC3: This variable provides information on the highest grade or level of education completed by respondents 6-19 years of age. The responses were recoded by NCHS as follows: single years of education (grades 1-12), high school graduate/GED, and post-high school.
DMDEDUC2: This variable is the highest grade or level of education completed by adults 20 years of age and older. The response categories are: less than 9th grade education, 9-11th grade education (includes 12th grade and no diploma), High school graduate/GED, some college or associates (AA) degree, and college graduate or higher. DMDEDUC2 provides more detailed information on educational attainment levels of adults compared to the categories that were released previously in the NHANES 2003-2004 Demographic file.
DMDSCHOL: This variable is school attendance status. It is asked for respondents 6-19 years of age.
RIDRETH2 race/ethnicity recode should be used to compare NHANES 2003–2004 estimates of health measures with those of NHANES III.
Income variables: Income information was not obtained from all of the families in the survey sample. Some respondents refused to provide this information, and others had little or no knowledge of family income. No attempt was made to assign or impute income in these instances. Incomplete information was reported to the extent possible.
Educational attainment: Five educational attainment variables are included the Demographics File. Three of the education variables target survey participants. Two other education variables provide information about the educational attainment of the household reference person (DMDHREDU) and the household reference person’s spouse (DMDHSEDU), when applicable. A brief description of survey participant educational attainment variables follows.
DMDEDUC is a 3-category variable that groups survey participants 6 years of age and over into one of three educational attainment groups: 1) less than high school education attainment, 2) high school graduate (has a high school diploma or high school equivalency diploma such as a General Educational Development/GED), or 3) has more than a high school education.
In addition to DMDEDUC, the Demographics File contains 2 variables that provide additional information on the educational attainment level of survey participants. Educational attainment information for persons 6 through 19 years of age is included in DMDEDUC3. Detailed educational attainment information for adults who are 20 years of age and over is reported in DMDEDUC2.
The questionnaire, examination files, and laboratory files can be linked to SP demographics variables using the unique survey participant identifier SEQN. RIDSTATR provides the MEC examination status of an SP.
The 2-year sample weights (WTINT2YR, WTMEC2YR) should be used for all NHANES 2003–2004 analyses. There are no 4-year weights in this file. The 4-year weights were provided with the NHANES 2001–2002 release file because there were some transition issues related to the use of 1990 Census and 2000 Census information. Detailed instructions for linking earlier datasets (1999–2000 and 2001–2002) are provided in the NHANES Analytic Guidelines. Analysts are encouraged to review the NHANES Analytic Guidelines provided with the data release files to determine the appropriate analytic methodology.
Please refer to the NHANES Analytic Guidelines and the on-line NHANES Tutorial for further details on the use of sample weights and other analytic issues. Both of these are available on the NHANES website.