National Health and Nutrition Examination Survey

Frequently Asked Questions (FAQs)

1. Where can I find the data files and list of data items that are available from particular survey cycles in NHANES?

2. Why are there so many data files?

3. Next to the name of each data file on the data page, there is a "Doc File" link. Where does that link take you?

4. On the data page I see links for data. How do I access the data from these links?

5. What format are the data files in? Can they be used with SAS, SPSS, STATA, or R?

6. Do I need to use SAS software to view NHANES data?

7. Have the NHANES data been released in different data formats?

8. When will other data be available from NHANES?

9. Where can I find the analytic guidelines (weighting, variance estimation, sample design)?

10. Will data and weights be available on public use files for single years, such as 1999, 2000, 2001, or 2002?

11. Will data and weights be available on public use files in combined datasets for four-year, six-year, or eight-year periods such as 2015-2018, 2013-2018, or 2011-2018?

12. What is the sample size for a particular data item, questionnaire section, examination component, or laboratory analyte?

13. Where can I find a description of the codebook contents?

14. How do I determine the skip patterns for a questionnaire section?

15. How are missing values, "blank but applicable", "don't know" and other values coded?

16. I have questions about using the data, protocols, etc. - where can I get help?

17. Why isn't the data on adolescent alcohol use, smoking, sexual behavior, reproductive health and drug use available as a public release file?

18. Are there variables which can identify whether survey participants are family members and/or live in the same household?

19. Can I identify what region of the country or what state or county a survey participant resides within?

20. I am interested in one or more questions which appear in the survey questionnaire but I cannot find the question in a codebook or data file available on your Internet site. What happened to it?

1. Where can I find the data files and list of data items that are available from particular survey cycles in NHANES?

Most of the NHANES data are publicly available and listed on the Questionnaires, Datasets, and Related Documentation page by survey cycle. For example, the NHANES 2017-2018 data can be located and downloaded for free by clicking NHANES 2017-2018.

Within each cycle's data page, available data items are listed in different categories. For example, below are the links for lists of data items that are currently available from NHANES 2017-2018:

Data, Documentation, Codebooks, SAS Code

There is also a "Search Variables" feature that allows users to locate variables by searching terms contained in the variable names, variable descriptions, SAS labels, or data file names.

2. Why are there so many data files?

Survey components are released in separate data files to reduce the amount of time the user would need to download data and documentation of interest. This release format also allows for producing, editing, and validating data files in a more timely manner, thus facilitates a more prompt public release. This requires merging files together for analysis. Please refer to the following SAS code example to learn how to merge files together:

Additional information about merging data files can be found on the Continuous NHANES Web Tutorials.

3. Next to the name of each data file on the data page, there is a "Doc File" link. Where does that link take you?

Data File Name Doc File Data File Date Published
Demographic Variables and Sample Weights DEMO_J.Doc Data [XPT - 3.25 MB] February 2020

The link allows you to view the documentation which contains important information for the component. We strongly encourage data users read this document prior to the use of the data. The documentation also includes the codebook with the response frequency for each item in the data file. This can be used to verify the sample size for particular data item. The documentation is formatted as a webpage so you should be able to view these directly in your browser.

4. On the data page I see links for data. How do I access the data from these links?

Data File Name Doc File Data File Date Published
Demographic Variables and Sample Weights DEMO_J.Doc Data [XPT - 3.25 MB] February 2020

Clicking on the Data link will open a dialog box from which you can specify a location to store the file (using the "Save" button) or open it directly with SAS (using the "Open" button).

The data files are released as SAS Transport files in .xpt format. They can be opened directly as a temporary work file or permanent libraries can be created using SAS. Please see the Continuous NHANES Web Tutorials for instructions. Users desiring alternate data formats can use the free SAS Universal Viewer to convert the transport file into a comma-delimited text file (.csv) for use in additional software programs. Please note that NHANES has a complex probability sample and proper analysis of the data requires statistical software that specifically incorporates sample design elements, such as weighting and clustering.

5. What format are the data files in? Can they be used with SAS, SPSS, STATA, or R?

The data files are in SAS transport file format. In additional to SAS, they can be used with any package that supports this file format, such as SUDAAN, SPSS, STATA, or R. For statistical/analytical packages that do not support SAS transport file format, you can convert the file to a different format using the free SAS Universal Viewer or other appropriate software package. Please note that NHANES has a complex probability sample and proper analysis of the data requires statistical software that specifically incorporates sample design elements, such as weighting and clustering.

6. Do I need to use SAS software to view NHANES data?

No. You can view NHANES data with the SAS Universal Viewer - a free application downloadable from SAS Institute website. Currently, most NHANES data is available in the SAS transport format (.xpt), which can be used in several statistical software programs, including SUDAAN, STATA, SPSS, and R. Users desiring alternate data formats can use the SAS Viewer to convert the transport file into a comma-delimited text file (.csv) for use in additional software programs, such as Microsoft Excel.

Learn more about SAS Universal Viewer.

7. Have the NHANES data been released in different data formats?

Since 1999, data files are released as SAS Transport files in .xpt format. This includes data files from 1999 onward (continuous survey files), as well as newly released or updated data files from NHANES III, II, and I. The SAS Transport files can be opened directly as a temporary work file, or permanent libraries can be created using SAS. Please see the Continuous NHANES Web Tutorials for instructions. Also, the files can be opened with the free SAS Universal Viewer and converted to other formats for use with other software packages.

NHANES III data that were released or updated after 1999 are available as SAS Transport (.xpt) files and can be used like the continuous survey files. NHANES III data released before 1999 were released as .dat files, which are formatted ASCII data files (text files). Running the associated SAS code creates a SAS dataset. Additionally, the text files can be used with other software packages. Please see your software package's instructions for working with text files (.dat or .txt).

NHANES II and I data files released or updated after 1999 are also available as SAS Transport (.xpt files) and can be used like the continuous NHANES data files. NHANES II and I data files released before 1999 are formatted text files (.txt) wrapped in a self-extracting executable file (.exe). Running the associated SAS code provided in the perspective data page creates a SAS dataset from the text file. Additionally, the text files can be used with other software packages. Please see your software package's instructions for working with text files (.txt).

8. When will other data be available from NHANES?

As additional data files are processed, they will be publicly released on the NHANES website. Please check the website regularly and refer to the NHANES What's New page for details on the newly released or updated datasets. To receive e-mail announcements regarding NHANES activities, products, and release dates, please subscribe to the NHANES listserv.

Note that certain data will only be available at the NCHS Research Data Center (RDC). Documentation for some of these limited access datasets are available on the Limited Access Data page.

9. Where can I find the analytic guidelines (weighting, variance estimation, sample design)?

The NHANES Analytic Guidelines are available on the website. The guidelines provide information on the sample design and on recommended methodologies for analyzing the data. In particular, the guidelines provide information on how the sample persons were selected, how the various survey weights were calculated, which particular survey weight should be used to provide survey estimates, how to compute sampling variances for those estimates, and recommended sample sizes for analysis.

10. Will data and weights be available on public use files for single years, such as 1999, 2000, 2001, or 2002?

No. Due to concerns regarding potential disclosure risks and analytic limitations of the annual sample, no single year of data in the continuous NHANES 1999+ will be publicly released. Although the data are available through the NCHS Research Data Center, analysts should be aware that estimates for single year data are relatively unreliable (have large variance estimates) because NHANES can only go to a small number of primary sampling units (PSUs) each year. Please refer to the NHANES Analytic Guidelines for more details.

11. Will data and weights be available on public use files in combined datasets for four-year, six-year, or eight-year periods such as 2015-2018, 2013-2018, or 2011-2018?

No. Any two-year survey cycle may be combined with adjacent two-year cycles to increase sample size or analyze trends. NCHS does not construct or include all possible weights for the combinations of multiple 2-year cycles in the public release files. Instead, NCHS supplies analysts with information on how to combine these cycles and construct the appropriate weights in the NHANES Analytic Guidelines available on the NHANES web site. When combining two or more 2-year cycles from 2001-2002 and onward, sample weights must be computed before beginning any analyses. For all data that includes 1999-2002, the 4-year weights provided by NCHS must be used, and the additional weights for each subsequent 2-year cycle added. The rules for combining surveys also apply to subsamples. Please refer to the NHANES Analytic Guidelines to determine the appropriate methodology for analyses of combined years of data. The Continuous NHANES Web Tutorials module on Weighting also provides guidance.

12. What is the sample size for a particular data item, questionnaire section, examination component, or laboratory analyte?

For any particular questionnaire section, examination component, or laboratory data file, you will only find records for survey participants that were eligible for inclusion in that component. For example, assuming 6,000 people were eligible for an examination in the MEC and only 5,000 were eligible for the oral health component due to age restrictions. Of the 5,000 persons, only 4,500 participated in the oral health exam; the other 500 either refused or did not have enough time to participate in the exam. The oral health data file would now have 5,000 records with 500 records having missing data. For further details refer to the "frequency" counts document for each of the data files.

13. Where can I find a description of the codebook contents?

Description of the codebook contents is found in the Documentation Contents.

14. How do I determine the skip patterns for a questionnaire section?

The first step is to review all of the documentation for the questionnaires. To review skip patterns look at the complete questionnaire instrument. Please note that due to small sample sizes and confidentiality/sensitivity issues, some response categories in the original questionnaires may have been combined on the data file, or new variables were derived from questionnaire items. However, skip pattern integrity was maintained and validated during data processing and coding.

15. How are missing values, "blank but applicable", "don't know" and other values coded?

There are codes for refused (7-fill: that is 7, or 77, or 777, ..., depending on the number of digits required for a particular data value), don't know (9-fill), and missing values (a blank field) which means the person was not asked the question or given the test. There is no longer a specific code for those cases where the variable response is "blank but applicable"; for such cases the values are designated as missing values. For laboratory data there are special considerations. When a laboratory value was less than the lower limit of detection (LOD), an imputed fill value based on the LOD was used instead of the actual analyte result. An indicator variable is included to identify whether the laboratory analyte result was below the limit of detection. Please review the documentation for each of the data files for item-specific information.

16. I have questions about using the data, protocols, etc. - where can I get help?

Please refer to the questionnaire, examination component, or laboratory descriptions. If you need help beyond the information contained in these resources, you can send an email to CDC-info and submit your question.

17. Why isn't the data on adolescent alcohol use, smoking, sexual behavior, reproductive health and drug use available as a public release file?

These files have not been released on the NHANES website due to confidentiality concerns. Data files containing protected information for adolescents will be made available at the NCHS Research Data Center.

18. Are there variables which can identify whether survey participants are family members and/or live in the same household?

In continuous NHANES 1999+, there is no way to identify whether one or more survey participants are related or live in the same household based on publicly released data. Only limited information on family and household members' relationship was collected in the survey, but this information as well as the Information identifying whether the participants live in the same household is only available through the NCHS Research Data Center. However, NHANES was not designed to produce estimates at the household or family level, only at the person-level. As such, it is not recommended that analysts use the NHANES public or limited access data to produce household or family level estimates.

19. Can I identify what region of the country or what state or county a survey participant resides within?

Geographic identifiers are available but only through the NCHS Research Data Center, in order to protect the confidentiality of our participants. A list of NHANES Geocode variables is available through the Limited Access Data page.

20. I am interested in one or more questions which appear in the survey questionnaire but I cannot find the question in a codebook or data file available on your Internet site. What happened to it?

It sometimes means that the data are not yet ready to be publicly released. Other times, the staff have determined that a question poses a risk of disclosure to our survey participants. Under these circumstances the data are made available only through the NCHS Research Data Center. Documentation for some of these datasets is available on the Limited Access Data page. Please send an email to cdcinfo@cdc.gov to inquire about the status of a particular question.