Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content

Module 3: Weighting

The NHANES Tutorials are currently being reviewed and revised, and are subject to change. Specialized tutorials (e.g. Dietary, etc.) will be included in the future.

This module addresses why weights are created and how they are calculated, the importance of weights in making estimates that are representative of the U.S. civilian non-institutionalized population, how to select the appropriate weight to use in your analysis, when and how to construct weights when combining survey cycles, and how to correctly create subsets within your analysis population.

Weights are created in NHANES to account for the complex survey design (including oversampling), survey non-response, and post-stratification. When a sample is weighted in NHANES it is representative of the U.S. Census civilian non-institutionalized population. A sample weight is assigned to each sample person. It is a measure of the number of people in the population represented by that sample person.

Weighting in NHANES

Weights are created in NHANES to account for the complex survey design (including oversampling), survey non-response, and post-stratification. When a sample is weighted in NHANES it is representative of the U.S. civilian noninstitutionalized Census population.

How weights are created in the Continuous NHANES

Each Sample person in the NHANES dataset is assigned a sample weight. This sample weight is created in three steps:

  1. the base weight is calculated;
  2. adjustments for non-response are made; and
  3. post-stratification adjustments are made to match 2000 U.S. Census population totals.

1. Calculating the base weight

In general a sample person is assigned a weight that is equivalent to the reciprocal of his/her probability of selection. In other words:

Equation sample person's weight

However, calculating the base weight for a sample person in NHANES is much more complicated due to the survey's complex, multistage design. In NHANES, the following equation, which takes into account the survey design, is used to determine the base weight for a sample person:

Equation for base weight

where

Equation for final probability

2. Adjusting for nonresponse

To the interview or exams

The base weights were adjusted for nonresponse to the in-home interview when creating interview weights and further adjusted for non-response to the MEC exam when creating exam weights.

In NHANES, an individual can be classified as a non-respondent to the interview portion of the survey and/or the exam portion. An individual is considered a non-respondent to the interview if he/she was selected to be in the sample, but did not participate in the in-home interview. Similarly, an individual who agreed to complete the interview but did not agree to, or come in for, the MEC portion of the survey is considered a non-respondent to the exam. Adjustments made for survey non-response account only for sample person interview or exam non-response, but not for component/item non-response (i.e., a sample person declined to have their blood pressure measured in the examination component but completed all other examination components).

Table showing Non-response analysis for those MEC examined NHANES 1999-2002 - All Ages

Table of Nonresponse Rates NHANES 1999-2002 - All Ages

For more information on component/item nonresponse adjustment and re-weighting the data for analyses, see

  1. Lohr, Sharon L. Sampling: Design and Analysis, pp.265-272. Duxbury Press, 1999; and

Examples of papers with re-weighted NHANES data

  1. Gregg E, Sorlie P, Paulose-Ram R, Gu Q, Wolz M, Eberhardt MS, Burt VL, Engelgau MM, and Geiss LS. Prevalence of lower extremity disease among persons 40 years and older in the US with and without diabetes. Diabetes Care. 2004 Jul;27(7):1591-7.
  2. Ostchega Y, Dillon CF, Lindle R, Carroll M, Hurley BF. Isokinetic leg muscle strength in older americans and its relationship to a standardized walk test: data from the national health and nutrition examination survey 1999-2000. J Am Geriatr Soc. 2004 Jun;52(6):977-82.

To NHANES subsample components

NHANES respondents are asked to participate in a variety of survey components that are statistically defined (or random) subsamples of the NHANES MEC-examined sample. These include a variety of lab, nutrition/dietary, environmental, or mental health components. (Please see the respective survey protocol/documentation for more specific information.) For example, some, but not all, participants are selected to give a fasting blood sample on the morning of their MEC exam. The subsamples selected for these components are chosen at random with a specified sampling fraction (for example, 1/2 or 1/3 of the total examined group) according to the protocol for that component. Each component subsample has its own designated weight, which accounts for the additional probability of selection into the subsample component, as well as the additional nonresponse.

An example of a component with subsamples is described below. Subsamples of NHANES environmental chemicals are most often mutually exclusive therefore it is not possible to conduct an analysis where more than one analyte from different subsamples is examined together. For example in 2005-2006, phthalates were measured in subsample "B", but polyfluorinated compounds were measured in subsample "A". Sometimes analytes are obtained in the same subsample and these can be analyzed together with their subsample weights. Most often these are available for analysis beginning in 2003. For example, in 2007-2008 urinary mercury and urinary arsenic were both measured in the 1/3 subsample "A"). As with all of the data files, users are encouraged to combine like subsample components across survey cycles; for example 2005-2006 heavy metals in subsample "A" and 2007-2008 heavy metals in subsample "A". This will improve the statistical reliability of the estimate. In rare cases, there are subsamples that overlap with one another but not completely; for example the persons who are part of the 2003-2004 1/3 subsample for urinary arsenic would also be found in the ½ subsample for volatile organic compounds in blood. In this situation, the data from the subsamples cannot be combined and the sample weights cannot be used. If a user attempts to combine partially overlapping subsamples the existing 1/3 and ½ sample weights would not be appropriate for analysis.

In summary, users are encouraged to combine like subsample components across survey cycles; for example 2005-2006 heavy metals in subsample "A" and 2007-2008 heavy metals in subsample "A". Subsample weights from the same survey cycle (e.g. 2003-2004) are not designed to be combined because many subsamples from the same survey cycle are mutually exclusive or partially overlapping. If it is necessary to combine two or more subsamples for your analyses that are mutually exclusive or partially overlapping, then appropriate weights would need to be recalculated. However, details on how to recalculate weights when combining subsamples go well beyond the scope of this tutorial. Therefore, it is strongly advised that you do not attempt to combine different subsamples from a single survey cycle in any analysis.

Diagram of Sample Non-response

Diagram of Nonresponse Rates The diagram above demonstrates the varying levels of sampling nonresponse. In the example above, the selected sample of persons age 20 and over included a total of 13,312 sample persons for the years 1999-2002. Only 10,291 of those sample persons actually completed the in-home interview. Therefore 22% of the individuals sampled did not complete the in-home interview. This is interview nonresponse. Among the 10,291 sample persons who were interviewed, only 9,471 completed the MEC exam. Therefore an additional 8% of the interviewed sample persons did not respond to the MEC exam. This is the MEC exam nonresponse. This example also shows the additional subsampling for the AM fasting blood sample.

Approximately 50% of MEC participants (4,696 persons) were partitioned to fast for 9 hours and come to the morning MEC exam. Of the 4696 persons partitioned to the morning subsample, only 4,157 actually fasted so the AM fasting sample was adjusted for the additional 11.5% nonresponse to the AM fast.

3. Post-stratification adjustment to match 2000 U.S. Census population control totals

In addition to accounting for sample person non-response, weights are also post-stratified to match the population control totals for each sampling subdomain. This additional adjustment makes the weighted counts the same as an independent count of the Current Population Survey (CPS) of the U.S. Census.

Please see CPS website for more information: http://www.bls.gov/cps/home.htm

Summary

In summary, it is important to utilize the weights in analyses to account for the complex survey design (including oversampling), survey nonresponse, and post-stratification in order to ensure that calculated estimates are truly representative of the U.S. civilian noninstitutionalized population.

Examples Demonstrating Importance of Using Weights in Your Analyses

Adjusting for oversampling

Bar chart of race-ethnicity distributions in U.S. population and unweighted interview sample

Graph Comparing NHANES Unweighted Interview Sample and U.S. Population

If you look at the graphs above, you will see that the unweighted interview sample from NHANES 1999-2002 is composed of 47% non-Hispanic white and Other participants, 25% non-Hispanic Black participants, and 28% Mexican American participants.The US civilian noninstitutionalizedpopulation in 2000, in contrast, was 78% non-Hispanic white and Other, 13% non-Hispanic black, and 9% Mexican American.Therefore, unweighted estimates for any survey item associated with race/ethnicity would be biased if weights were not used, because estimates would not be representative of the actual U.S. civilian noninstitutionalized population.

Why weight?

Below are three examples of estimates calculated using NHANES 1999-2002 data. In all examples, unweighted and correctly weighted estimates are shown to demonstrate the effects of not including the proper sample weights in the analysis.

In the first example, the data from NHANES 1999-2002 in the table below shows that weighted estimates reflect the U.S. civilian noninstitutionalized population very closely, but unweighted estimates are much higher in oversampled subgroups, such as non-Hispanic blacks, Mexican Americans, and persons age 12-19 years.

table of weighted and unweighted estimates showing weighted estimates closely mirror U.S. population

Why Weight?

Thirteen percent of the U.S. 2000 Census civilian noninstitutionalized population was non-Hispanic black, the unweighted sample for NHANES 1999-2002 was 25% non-Hispanic black because non-Hispanic blacks were oversampled in NHANES. Once the appropriate weights are applied, the weighted sample was only 12% non-Hispanic black. This estimate is much closer to that seen in the 2000 U.S. Census civilian noninstitutionalized population (numbers differ slightly due to rounding).

Similarly, if one looks at Mexican Americans and persons age 12-19 years—two subpopulations also oversampled in NHANES—you can see that the U.S. census civilian noninstitutionalized population and the weighted sample consist of both 9% Mexican Americans and 12% persons age 12-19 years but the percents in the unweighted sample (28% and 24% respectively) were much greater for these two subpopulations.

In the next two examples, high LDL estimates were calculated for non-Hispanic blacks age 20 years and over and the herpes positive estimate was calculated for males age 40-49 years.

table of weighted and unweighted estimates of HDl and herpes showing importance of using weights

Weighted and Unweighted Estimates Comparison

The estimates differ greatly when they are calculated with the correct weight compared to when they are calculated without being weighted at all. This is especially true in the herpes positive example, because being herpes positive is closely related to non-Hispanic black race/ethnicity, which is one of the subgroups oversampled in the survey. Therefore, the effect of not accounting for sample weights is even more pronounced when oversampled subgroups are included in the analysis.

Selecting the Correct Weight in NHANES

To produce estimates appropriately adjusted for survey non-response it is important to check all of the variables in your analysis and select the weight of the smallest analysis subpopulation.

All interview and MEC exam weights can be found on the demographic file for the respective survey. Weights for a given component conducted on only a subsample of the original NHANES sample are available on the data file for that particular component.

Task 1: How to Select the Correct Weights for NHANES Analysis

You must use the weight of the smallest subpopulation that includes all the variables you want to include in your analysis. To select the correct weight for your analysis, you need to find out in which component of the survey your variables of interest were included.

Examples

Example 1: All of the variables were collected in the in-home interview

You are performing an NHANES 1999-2002 analysis to look at the association of race/ethnicity, and poverty on previous diagnosis of diabetes. All of these variables were collected in the in-home interview (N=10,291).

ANSWER: You would use the interview weights for your analysis (wtint4yr).

Example 2: Some of the variables were collected in the MEC

You are performing an NHANES 1999-2002 analysis looking at the association of race/ethnicity, age, poverty and the prevalence of high blood pressure. All three demographic variables were collected during the in-home interview (N=10,291). But blood pressure was collected during the MEC exam and MEC questionnaire portion of the survey (N=9,471). MEC-examined sample persons are a subset of those interviewed in the survey.

ANSWER: You would use the MEC exam weight for your analysis (wtmec4yr).

Example 3: Some of the variables were part of a component subsample of the survey

You are performing an NHANES 1999-2002 analysis looking at the association of race/ethnicity, age, blood pressure and fasting triglycerides on persons age 20 and over. Race/ethnicity and age were available from the in-home interview (N=10,291) Blood pressure came from the MEC exam. MEC-examined sample persons are a subset of those interviewed in the survey. Fasting triglycerides are collected from those sample persons who were subsampled to do the 9 hour AM fast and who actually fasted. This group is approximately half the sample of those who were MEC examined (N=4,157).

ANSWER: You would use the morning fasting subsample weights (wtsaf4yr).

It is important to check all the variables in your analysis and use the weight of the smallest sample subpopulation, otherwise you will not obtain estimates appropriately adjusted for survey non-response.

All interview and MEC exam weights can be found on the demographic file for the respective survey. Weights for a given component conducted on only a subsample of the original NHANES sample are available on the data file for that particular component.

Constructing Weights for Combined NHANES Survey Cycles

The sample design for NHANES makes it possible to combine two or more survey cycles to increase the sample size and analytic options. Each 2-year cycle and any combination of 2-year cycles is a nationally representative sample. However, sometimes the size of a particular sample is too small in an individual 2-year cycle to produce statistically reliable estimates. Fortunately, the NHANES sample design makes it possible to combine two or more cycles to increase the sample size and analytic options.

To produce estimates with greater statistical reliability for demographic subdomains (e.g., sex-age-race/ethnicity groups) and rare events, combining two or more 2-year cycles of the continuous NHANES is encouraged and strongly recommended. When combining cycles of data, it is extremely important that you:

  1. verify that data items collected in all combined years are comparable in wording and methods, and
  2. be sure to select the proper weight to use for the combined dataset

For more information about determining the compatibility of datasets, please see the Locate Variables and Structure & Contents modules.

IMPORTANT NOTE

Beginning in 2003, the survey content for each 2-year period is held as constant as possible to be consistent with the data release cycle. In the first 4 years of the continuous survey (1999-2002), this was not always the case, and special data release and access procedures had to be developed for selected survey content collected in "other than 2-year" intervals. (For more details see the NHANES Data Release and Access Policy).

Task 2: When and How to Construct Weights When Combining Survey Cycles

Step 1: When to construct weights for combined survey cycles

Sample weights for NHANES 1999-2000 were based on population estimates developed by the Bureau of the Census before the Year 2000 Decennial Census counts became available. The 2-year sample weights for NHANES 2001-2002, and all other subsequent 2-year cycles, are based on population estimates that incorporate the year 2000 Census counts. Because different population bases were used, the 2-year weights for 1999-2000 and 2001-2002 are not directly comparable. Therefore, when combining 1999-2000 with 2001-2002 survey years in analyses, you must use the 4-year sample weights provided by NCHS since these have been created to account for the two different reference populations.

For both 1999-2000 and 2001-2002 survey cycles, the demographic file contains the weight variables

  • wtint2yr and wtint4yr for all interviewed sample persons,
  • wtmec2yr and wtmec4yr for the sample persons who have MEC data items, and
  • two-year and four-year (for subsample datasets with consistent data elements across two survey cycles) subsample weights for selected sample persons.
Example 1: How to combine those MEC examined for 1999-2000 with 2001-2002

You must use the 4 year weights provided (wtmec4yr) in the SAS demographic file (see explanation above).

Because NHANES 2003-2004 and all future survey cycles use the same year 2000 Census counts that were used for NHANES 2001-2002, NCHS does not need to create special 4-year weights. NCHS does not construct and include all possible weights for the combinations of multiple 2-year cycles in the public release files because it would be impractical to do so. Instead, NCHS supplies analysts with information on how to combine these cycles and construct the appropriate weights.

Step 2: How to construct weights for combined survey cycles for NHANES 2001-2002 and beyond

When you combine two or more 2-year cycles of the continuous NHANES for NHANES 2001-2002 and beyond, you must construct sample weights before beginning any analyses. When survey cycles are combined, the estimates will be representative of the population at the midpoint of the combined survey period. When you construct weights appropriately, as described in the next step, you will rescale the weights so that the sum of the weights match the survey population at the midpoint of that period.

In order to construct 4 year, 6 year 8 year, etc weights for survey cycles of the continuous NHANES for NHANES 2001-2002 and beyond, the following formulae should be used:

Formulae for Constructing Weights for NHANES 2001-2002 and Beyond**
Number of Survey Years Used Survey Cycle Code* Code with Formula for Combining Weights across Survey Cycles
4 years
If sddsrvyr in (2,3) then
If sddsrvyr in (3,4) then
If sddsrvyr in (4,5) then
MEC4YR = 1/2 * WTMEC2YR; /*for 2001-2004*/
MEC4YR = 1/2 * WTMEC2YR; /*for 2003-2006*/
MEC4YR = 1/2 * WTMEC2YR; /*for 2005-2008*/
6 years
If sddsrvyr in (2,3,4) then
If sddsrvyr in (3,4,5) then
MEC6YR = 1/3 * WTMEC2YR; /*for 2001-2006*/
MEC6YR = 1/3 * WTMEC2YR; /*for 2003-2008*/
8 years
If sddsrvyr in (2,3,4,5) then
MEC8YR = 1/4 * WTMEC2YR; /*for 2001-2008*/

*SDDSRVYR is the survey cycle variable, i.e.
1 = 1999-2000
2 = 2001-2002
3 = 2003-2004
4 = 2005-2006
5 = 2007-2008
Etc.

** To construct weights across survey cycles that include the 1999-2000 survey cycle, see the examples below.

Examples for Constructing 4-year, 6-year, 8-year, etc... Weights for Survey Cycles of the Continuous NHANES (Including Combinations with NHANES 1999-2000):

4 years of data from 2001-2004

For 4 years of data from 2001-2004 a weight should be constructed as:

if sddsrvyr=2 or sddsrvyr=3 then
MEC4YR = 1/2 * WTMEC2YR ;

Example 2: How to combine 4 years of data from 2001-2002 with 2003-2004

ANSWER: As stated above, create a weight variable by combining the 2-year weights (WTMEC2YR) found in the SAS demographic file using the following code:

if sddsrvyr=2 or sddsrvyr=3 then
MEC4YR = 1/2 * WTMEC2YR ;

6 years of data from 1999-2004

For 6 years of data from 1999-2004 a weight should be constructed as:

if sddsrvyr=1 or sddsrvyr=2 then
MEC6YR = 2/3* WTMEC4YR ; /* for 1999-2002 */ If sddsrvyr=3 then
MEC6YR = 1/3 * WTMEC2YR ; /* for 2003-2004 */

WARNING

For all data that includes 1999-2002, you must use the 4 year weights provided by NCHS, then include the additional weights for each 2-year cycle added.

Example 3: How to combine 6 years of data from 1999-2004

ANSWER: Because you are using the 1999-2000 survey combined with 2001-2002, you must begin with the 4-year weights provided (WTMEC4YR) in conjunction with the 2-year weights for 2003-2004 (WTMEC2YR) found in the SAS demographic files. This will allow you to create a 6-year weight variable (MEC6YR) using the following code.

For 6 years of data from 1999-2004 a weight should be constructed as:

if sddsrvyr=1 or sddsrvyr=2 then
MEC6YR = 2/3 * WTMEC4YR ; /* for 1999-2002 */

If sddsrvyr=3 then
MEC6YR = 1/3 * WTMEC2YR ; /* for 2003-2004 */

6 years of data from 2001-2006

For 6 years of data from 2001-2006 a weight should be constructed as:

if sddsrvyr in (2,3,4) then
MEC6YR = 1/3 * WTMEC2YR;

Example 4: How to combine multiple 2-year weights that do not include 1999/2000 to make 6 or more years of data (i.e., 2001/2002, 2003/2004, 2005/2006).

ANSWER: Because you are NOT using any data from 1999-2000 you can combine all 2-year weights (WTMEC2YR) found in the SAS demographic files as follows to create another 6-year weight variable MEC6YR using the following code:

For 6 years of data from 2001-2006 a weight should be constructed as:

if sddsrvyr in (2,3,4) then
MEC6YR = 1/3 * WTMEC2YR;

Similarly, for 8 years of data from 2001-2008 and beyond, you would combine the 2-year weights (WTMEC2YR) using the correct proportion of each. For example:

For 8 years of data from 2001-2008 a weight should be constructed as:

if sddsrvyr in (2,3,4,5) then
MEC8YR = 1/4 * WTMEC2YR;

8 years of data from 1999-2006

For 8 years of data from 1999-2006 a weight should be constructed as:

if sddsrvyr=1 or sddsrvyr=2 then
MEC8YR = 1/2 * WTMEC4YR ; /* for 1999-2002 */

if sddsrvyr=3 or sddsrvyr=4 then
MEC8YR = 1/4 * WTMEC2YR ; /* for 2003-2006 */

WARNING

For all data that includes 1999-2002, you must use the 4 year weights provided by NCHS, then include the additional weights for each 2-year cycle added.

Example 5: How to combine 8 years of data that include 1999-2000 through 2005-2006

ANSWER: You must use the 4-year weights provided for 1999-2002 (WTMEC4YR) with the 2-year weights for both 2003-2004 and 2005-2006 (WTMEC2YR) to create a 8-year weight variable (MEC8YR).

For 8 years of data from 1999-2006 a weight should be constructed as:

if sddsrvyr=1 or sddsrvyr=2 then
MEC8YR = 1/2 * WTMEC4YR ; /* for 1999-2002 */

if sddsrvyr=3 or sddsrvyr=4 then
MEC8YR = 1/4 * WTMEC2YR ; /* for 2003-2006 */

Again, future years of data can continue to be added using the same methods as above for combining cycles by taking the correct proportion of the 4-year and 2-year weights.

10 years of data from 1999-2008

For 10 years of data from 1999-2008 a weight should be constructed as:

if sddsrvyr=1 or sddsrvyr=2 then
MEC10YR = 2/5 * WTMEC4YR ; /* for 1999-2002 */

if sddsrvyr=3 or sddsrvyr=4 or sddsrvyr=5 then
MEC10YR = 1/5 * WTMEC2YR ; /* for 2003-2008 */

Example 6: How to combine 10 years of data that include 1999-2000 through 2007-2008

ANSWER: You must use the 4-year weights provided for 1999-2002 (WTMEC4YR) with the 2-year weights for 2003-2004, 2005-2006 and 2007-2008 (WTMEC2YR) to create a 10-year weight variable (MEC10YR).

For 10 years of data from 1999-2008 a weight should be constructed as:

if sddsrvyr=1 or sddsrvyr=2 then
MEC10YR = 2/5 * WTMEC4YR ; /* for 1999-2002 */

if sddsrvyr=3 or sddsrvyr=4 or sddsrvyr=5 then
MEC10YR = 1/5 * WTMEC2YR ; /* for 2003-2008 */

Again, future years of data can continue to be added using the same methods as above for combining cycles by taking the correct proportion of the 4-year and 2-year weights.

Creating Appropriate Subsets of Data for NHANES Analyses

Sometimes you may wish to analyze only a certain demographic subgroup of interest, such as a particular age range or gender, or whether survey participants were tested for a particular lab analyte or other examination criteria.

For SUDAAN procedures it is important that you do not create a smaller subgroup based on any non weight-related groups of interest (e.g. demographic, laboratory or examination variables) in the SAS data step before executing the SUDAAN procedure. Instead, it is highly recommended that you create a subset of your sample population using the subpopn statement in the SUDAAN procedure itself and not in the SAS data step. In addition, SUDAAN procedures require that all observations in the dataset being read into a procedure have the same sample weight. Therefore, prior to the SUDAAN procedure you should create a subset of your data to include only those observations with the appropriate sample weight for your analysis.

For SAS Survey procedures, there is no subpopn statement. Instead, most SAS 9.2 Survey procedures use a domain statement for domain analysis, also known as subgroup analysis or subpopulation analysis. In SAS 9.1 Survey Procedures, proc surveymeans, proc surveyreg, proc surveyfreq, and proc surveylogistic have different methods for selecting a subpopulation.

Reference

SAS Technical Support.

Task 3a: How to Create an Appropriate Subset of Your Data for NHANES Analyses in SUDAAN

The following example demonstrates the critical code necessary to create subsets of your data appropriately for SUDAAN and SAS Survey procedure analyses. These examples only highlight the portion of code necessary to illustrate creation of appropriate subsets of data. For examples of full SUDAAN and SAS Survey procedure codes, please see the Logistic Regression module.

Example used throughout this task: You are interested in analyzing only 20-49 year old females who were tested for total cholesterol in a 2-year dataset.

Step 1: Create Dataset

First, you determine that you will include all MEC examined individuals in your data set.

The ridstatr variable on your demographic file designates interviewed participants with a value = 1, and interviewed plus examined participants with a value = 2. Therefore, in the SAS data step, you keep the ridstatr variable (ridstatr=2) to create a MEC-examined subset of data.

Step 2: Specify correct weight in program

Next, in SUDAAN you specify the correct weight to be used in the procedure by using a weight statement. Since you are using a single 2-year cycle, use the wtmec2yr variable.

Step 3: Include selected subset

Then, in the SUDAAN procedure you will include a subpopn statement that creates a subset of the data which includes those who are greater than or equal to age 20 and less than or equal to age 49 years, are female, and have a valid measure for the total cholesterol variable lbxtc.

IMPORTANT NOTE

The correct method for creating a subset of your sample population for SUDAAN analyses is to use the subpopn statement to designate sample subdomains to analyze and only use sample weight-related variables in the SAS data step.

Sample Statements to Include Weight and Select Subset of Dataset in SUDAAN Procedures
Statements Explanation
If ridstatr=2;

The ridstatr variable on your demographic file designates interviewed participants with a value = 1, and interviewed plus examined participants with a value = 2. Therefore, in the SAS data step, you use this SAS statement to create a MEC-examined subset of data.

weight wtmec2yr;

Specify the correct weight to be used in the procedure by using a weight statement.

subpopn ridageyr >=20 and 
ridageyr <= 49 and 
riagendr = 2 and 
lbxtc > -1 ;

Subpopn statement creates a subset of the data which includes those who are greater than or equal to age 20 and less than or equal to age 49 years, are female, and have a valid measure for the total cholesterol variable lbxtc.

IMPORTANT NOTE

SUDAAN does not accept some SAS terminology such as GE for >= or LE for <=. Therefore, you can not combine criteria such as 20 le ridageyr le 49, or use '.' to designate missing value. See SUDAAN manual for other limitations.

Task 3b: How to Create an Appropriate Subset of Your Data for NHANES Analyses in SAS Survey Procedures

The following text explains the critical code necessary to create subsets of your data appropriately for SAS Survey procedure analyses. For examples of full SAS Survey procedure codes, please see the Logistic Regression module.

Example used throughout this task: You are interested in analyzing only 20-49 year old females who were tested for total cholesterol in a 2-year dataset.

Step 1: Create Dataset

First, you determine that you will include all MEC examined individuals in your data set.

The ridstatr variable on your demographic file designates interviewed participants with a value = 1, and interviewed plus examined participants with a value = 2. Therefore, in the SAS data step, you use the ridstatr variable (ridstatr=2) to create a MEC-examined subset of data.

Step 2: Specify correct weight in program

Next, in SAS Survey Procedures you specify the correct weight to be used in the procedure by using a weight statement. Since you are using a single 2-year cycle, use the wtmec2yr variable.

Step 3: Include selected subset

If you wanted to complete an analysis of those who are greater than or equal to age 20 and less than or equal to age 49 years, are female, and have a valid measure for the total cholesterol variable lbxtc, then you need to create a subset of data containing only those observations. For SAS Survey procedures, there is no subpopn statement. Instead, most SAS 9.2 Survey procedures use a domain statement for domain analysis, also known as subgroup analysis or subpopulation analysis. In SAS 9.1 Survey Procedures, proc surveymeans, proc surveyreg, proc surveyfreq, and proc surveylogistic have different methods for selecting a subpopulation.

IMPORTANT NOTE

You should not use a where clause or by-group processing in order to analyze a subpopulation with SAS Survey procedures.

Methods for Subpopulation Analysis in SAS 9.1 Survey Procedures

proc surveymeans

proc surveymeans has a domain statement for domain or subpopulation analysis. Syntax details are in the SAS OnlineDoc:

http://support.sas.com/documentation/index.html

proc surveyreg

You can use the %sregsub macro available on the SAS website at:

http://support.sas.com/documentation/index.html

A domain statement is being added to proc surveyreg in SAS 9.2.

proc surveyfreq

You can perform a domain analysis by including your domain variable(s) in the tables statement. Details are at: http://support.sas.com/documentation/index.html

proc surveylogistic

To get an approximate domain analysis, you assign a near zero weight to observations that do not belong to your current domain. The reason that you cannot make the weight zero is that the procedure will exclude any observation with zero weight. For example, if you have a domain gender=male or female, and if you specify in a data step:

if gender=male then newweight=weight;
else newweight=1e-6;

you could then perform the logistic regression using the newweight variable as:

weight newweight;

SAS hopes to add a domain statement for proc surveylogistic in future releases, although no timetable has been set.

Reference

SAS Technical Support.

TOP