********************************************************************** README The Third National Health and Nutrition Examination Survey, (NHANES III, 1988-1994): Multiply Imputed Data Set (Series 11, No. 7A) Description Multiple imputation is a statistical technique in which missing data are replaced by several sets of plausible, alternative simulated values. The multiple imputations distributed on this release provide an improved method for handling missing values in many analyses of NHANES III data. These files are intended as a companion to--not a replacement for--other NHANES III public-use data sets. National estimates from the NHANES III Multiply Imputed Data Set may differ from those obtained from Series 11, No. 1A (DHHS, 1997) and No. 2A (DHHS, 1998) files. Users of the NHANES III Multiply Imputed Data Set are advised to consult the detailed analytic guidelines in the documents provided in this release. The National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC) collects, analyzes, and disseminates data on the health status of U.S. residents. The results of surveys, analyses, and studies are made known through a number of data release mechanisms including publications, mainframe computer data files, CD-ROMs, and the Internet. The National Health and Nutrition Examination Survey (NHANES) is conducted periodically by the NCHS. NHANES III, conducted from 1988 through 1994, was the seventh such survey based on a complex, multi-stage sample design. It was designed to provide national estimates of the health and nutritional status of the United States civilian, noninstitutionalized population aged two months and older. NHANES III data were collected through a combination of personal home interviews and physical examinations at Mobile Examination Centers. As with most large scale data-collection efforts, NHANES III experienced moderate amounts of missing data due to unit and item nonresponse. Historically, data missing due to unit nonresponse in NHANES (e.g. failure to conduct an examination because subject did not show up) have been compensated for by weighting methods. Previously released public-use data sets from NHANES III on CD-ROM (DHHS Series 11: 1A, 1997; 2A, 1998) provide sample weights that reflect adjustments for unit nonresponse at various stages of the survey. For details on these weighting adjustment methods, refer to the report "Weighting and Estimation Methodology" on the NHANES III Reference Manuals and Reports CD-ROM (DHHS, 1996). Beginning in 1992, a group of expert statisticians conducted a feasibility study to apply multiple imputation (MI), a state-of-the-art methodology (Rubin, 1987), to compensate for unit and item nonresponse in NHANES III. This feasibility study culminated with the production of The NHANES III Multiply Imputed Data Set. In MI, each missing value is replaced by several plausible simulated values randomly generated under a statistical model. Each of the several imputed data files is analyzed in the same fashion as if it contained no missing values. These analyses should use statistical methods appropriate for the NHANES III complex sample design, as described in the "Analytic and Reporting Guidelines" document on the NHANES III Reference Manuals and Reports CD-ROM (DHHS, 1996). The several sets of point estimates and standard errors, which randomly vary as a reflection of missing-data uncertainty, are then combined using straightforward arithmetic operations to yield a final set of estimates and standard errors. Techniques for combining the results are described and illustrated in the document "Analyzing the NHANES III Multiply Imputed Data Set: Methods and Examples" provided in this release. This is data release Series 11, No. 7A. It contains one core data file and five imputed data files for interviewed persons (n=33,994) aged 2 months and older who participated in NHANES III and completed a computer-assisted personal interview (CAPI) at their home. The CORE data file contains demographic characteristics, sample design information, weights, imputation flags, and other non-imputed variables. The five imputed data files IMP1, ..., IMP5 contain observed values and imputed values for select variables from the NHANES III interview and examination. The imputation flags in the CORE data file allow the user to identify for each variable in IMP1, ..., IMP5 which data values were observed and which were imputed. The non-imputed values will be identical across the five files IMP1, ..., IMP5, whereas the imputed values will vary to reflect uncertainty due to missing data. Only a subset of NHANES III variables appears in the CORE and IMP1, ..., IMP5 files. These files provide an improved method for handling missing values in many analyses of NHANES III data. Users are advised to consult the documents provided in this release for detailed descriptions of the imputation method and for analytic guidelines. The documentation file called NH3MI.DOC reviews in greater detail the history and goals of the NHANES III multiple imputation research project. This file also describes variables and naming conventions and provides analytic guidelines for The NHANES III Multiply Imputed Data Set. Six data directories are found in this release: CORE, IMP1, IMP2, IMP3, IMP4, and IMP5. Each of these directories contains three files having the same name as the corresponding directory, but with different extensions (DAT, DOC and SAS). Each file with the extension .DAT contains data in ASCII format. Each file with the extension .DOC contains documentation for the corresponding data file in ASCII format. The documentation files are large. To view them, please import them into a word processor using a fixed-width font (e.g. Courier) and setting the margins to zero. Each file with the extension .SAS contains SAS input statements to create a SAS data set from the corresponding ASCII file. For example, the CORE directory contains common core variables in CORE.DAT; documentation for these variables is in CORE.DOC; and SAS input statements to create the SAS data set are in CORE.SAS. The main directory also contains a SAS program called NH3IMP.SAS. This program will merge the CORE data file with each of the imputed data files (IMP1, ..., IMP5) and perform minor recoding (converting -9 to the SAS missing-value code ".") to produce five merged data sets that can be analyzed by standard complete-data methods. This Internet release also contains technical documents related to the NHANES III Multiply Imputed Data Set, the statistical methods used to create imputations, and example analyses. A guide to the documents is provided as MAIN.PDF. These documents are in Adobe Portable Document Format (PDF) and should be read with the Adobe Acrobat Reader Version 4 or later. Guidelines for NHANES III Multiply Imputed Data Set Users o Variables in the NHANES III Multiply-Imputed Data Set correspond to variables in other NHANES III public-use data sets but differ with respect to the manner in which missing values have been processed. Therefore, national estimates produced from the NHANES III Multiply Imputed Data Set may differ from those obtained from previously released data sets (DHHS Series 11: 1A, 1997; 2A, 1998). o Background information on the NHANES III survey including procedures, survey components, questionnaires, examination and laboratory methods, and statistical analysis guidelines is available on The NHANES III Reference Manuals and Reports (DHHS, 1996, CD-ROM). Data users are strongly encouraged to review these reference materials and reports before analyzing data from NHANES III. o All NHANES III data files are linked with the common survey participant identification number (SEQN). Merging information from NHANES III data files using this variable ensures that the appropriate information for each survey participant is linked correctly. However, users should be cautious about merging information from this release with other NHANES III data sets (see below). o The NHANES III Multiply Imputed Data Set was designed to produce high-quality estimates and statistical inferences pertaining to the variables within this data set. Merging information from this data set with additional variables from other NHANES III public-use data files could produce misleading results with respect to inter-variable relationships, typically leading to their underestimation. For detailed information regarding the proper uses and limitations of the NHANES III Multiply Imputed Data Set, users are strongly encouraged to consult the detailed analytic guidelines in the documents provided in this release. o For each data file, SAS program code with standard variable names and labels is provided as separate text files in the directory that contains the data files. This SAS program code can be used to create a SAS data set from the data file. o Extremely high and low values have been verified whenever possible, and numerous consistency checks have been performed. Nonetheless, users should examine the range and frequency of values before analyzing data. o Confidential and administrative data are not being released to the public. Additionally, some variables have been recoded to help protect the confidentiality of the survey participants. For example, all age-related variables were recoded to 90+ years for persons who were 90 years of age and older. o Although the data files have been edited carefully, errors may be detected. Please notify NCHS staff (301-458-4636) of any errors in NHANES III data or documentation and refer to the NCHS website at www.cdc.gov/nchs for updates to data files. For issues concerning the NHANES III Multiply Imputed Data Set, please contact Meena Khare at 301-458-4312. o Some variable names in this data set differ from those used in other NHANES III public use data sets, particularly for the variables that have been multiply imputed. For details on variable names, review the documentation file NH3MI.DOC. Analytic Considerations o NHANES III (1988-1994) was designed so that the survey's first three years, 1988-91, its last three years, 1991-1994, and the entire six years were national probability samples. Analysts are encouraged to use all six years of survey results. o Sample weights are provided in the NHANES III Multiply Imputed Data Set and should be used for analyses. The final interview weight (WTPFQX6) should be used for all estimation procedures. Estimates from variables that have been multiply imputed should be computed and saved five times, once for each of the five imputed data files included in this release. o Important aspects of the NHANES III sample design (e.g. strata and PSU pairs) should be taken into account to obtain correct standard errors. For computing standard errors, special computer programs for data from complex samples, such as SUDAAN (Research Triangle Institute, 1998), WesVarPC (Westat, 1997), or SAS Version 8 (SAS Institute, Inc., 1999) is recommended. Variance estimation procedures based upon Taylor linearization should use the design information contained in the pseudo-PSU (SDPPSU6) and pseudo-stratum (SDPSTRA6). Replication-type variance estimation procedures should use the fifty-two replicate versions of the final interview weight (WTPQRP1, WTPQRP2, ..., WTPQRP52) and Fay's method with k=0.3 (Judkins, 1990). Standard errors from variables that have been multiply imputed should be computed and saved five times, once for each of the five imputed data files included in this release. o After estimates and standard errors have been computed five times, once for each of the five imputed data files, the five sets of results should be combined using the methodology described by Rubin (1987) to obtain overall point estimates, standard errors and confidence intervals. Examples and sample SAS programs for combining the five sets of results are provided in this release. Referencing or Citing NHANES III Data o In publications, please acknowledge NCHS as the original data source. For instance, the reference for the NHANES III Multiply Imputed Data Set is: U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. Third National Health and Nutrition Examination Survey (NHANES III, 1988-1994): Multiply Imputed Data Set. CD-ROM, Series 11, No. 7A. Hyattsville, MD: Centers for Disease Control and Prevention, 2001. Includes access software: Adobe Systems, Inc. Acrobat Reader version 4. References Judkins, D.R. (1990) Fay's Method for Variance Estimation. Journal of Official Statistics, 6, 3, 223-239. Research Triangle Institute (1998) SUDAAN: Software for the Statistical Analysis of Correlated Data, Version 7. Research Triangle Park, NC: Research Triangle Institute. Rubin, D.B. (1987) Multiple Imputation for Nonresponse in Surveys. New York: J. Wiley & Sons. SAS Institute, Inc. (1999) SAS/STAT User's Guide. Cary, NC: SAS Institute, Inc. U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics, CD-ROM. NHANES III Reference Manuals and Reports. Hyattsville, MD: Centers for Disease Control and Prevention, 1996. U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. National Health and Nutrition Examination Survey, III, 1988-1994. CD-ROM, Series 11, No. 1A, ASCII Version. Hyattsville, MD: Centers for Disease Control and Prevention, 1997. U.S. Department of Health and Human Services (DHHS). National Center for Health Statistics. National Health and Nutrition Examination Survey (NHANES III, 1988-1994). CD-ROM, Series 11, No. 2A, ASCII Version. Hyattsville, MD: Centers for Disease Control and Prevention, 1998. Westat, Inc. (1997) A User's Guide to WesVarPC. Rockville, MD: Westat, Inc.