In this task, you will use Stata commands to calculate a t-statistic and assess whether the mean systolic blood pressures (SBP
) in males and females age 20 years and older are statistically different.
Step 1: Set Up Stata to Produce Means
Follow the steps in the summary table below to produce the mean SBP and the t-test to test whether the mean SBP between males and females obtained is statistically significant different using the Stata command svy:mean
.
IMPORTANT NOTE
There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.
Step 2: Use svyset
to define survey design variables
Remember that you need to define the SVYSET before using the SVY series of commands. The general format of this command is below:
svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)
To define the survey design variables for your SBP analysis, use the weight variable for 4 years of MEC data (wtmec4yr
), the PSU variable (sdmvpsu
), and strata variable (sdmvstra
). The vce
option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization. Here is the svyset
command for four years of MEC data:
svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)
Step 3: Use svy:mean
to generate means and standard errors in Stata
Now, that the svyset has been defined you can use the Stata command, svy: mean
, to generate means and standard errors. The general command for obtaining weighted means and standard errors of a subpopulation is below.
svy: mean varname, subpop(if condition)
Use the svy : mean
command
with the systolic blood pressure variable (bpxsar
) to estimate the mean systolic blood pressure for people age 20 years and older. Use the subpop()
option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if
statement to define the subpopulation based on the age variable's (ridageyr
) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0.
svy: mean bpxsar, subpop(if ridageyr>=20 & ridageyr<.)
Step 4: Use over
option of svy:mean
command to generate means and standard errors for different subgroups in Stata
You can also add the over()
option to the svy:mean
command to generate the means for different subgroups. When you do this, you can type a second command, estat size
, to have the output display the subgroup observation numbers. Here is the general format of these commands for this example:
svy: mean varname, subpop(if condition) over(var1 var2)
estat size
Use the svy : mean
command with the systolic blood pressure variable (bpxsar
) to estimate the mean systolic blood pressure for people age 20 years and older. Use the subpop()
option to select a subpopulation for analysis, rather than select the study population in the Stata program while preparing the data file. This example uses an if
statement to define the subpopulation based on the age variable's (ridageyr
) value. Another option is to create a dichotomous variable where the subpopulation of interest is assigned a value of 1, and everyone else is assigned a value of 0. Use the over
option to get stratified results. This example produces estimates by gender. Use the estate size
post estimation command to display the number of subpopulation observations and weighted numbers.
svy: mean bpxsar, subpop(if ridageyr>=20 & ridageyr<.) over(riagendr)
estat size, obs size
Step 5a: Test the hypothesis using the lincom
post estimation command
If you have already done some estimations, then you can use the lincom
command to test the hypothesis that the difference between the mean for the subpopulations equal 0. Use square brackets around the variable you are estimating. After the variables in square brackets, put the stratifier that you want to test (e.g. the variable in the over
option). If you used labels for the variable, you can use labels instead of the coded values. Here is the general format of these commands for this example:
lincom [varname]stratval1 - [varname]stratval2
Because you have done some prior estimation, you can use the lincom
post estimation command to test the hypothesis that the difference between mean SBP (bpxsar
) for males and females equal 0. This example uses labeled values (male, female
) instead of the coded values (1,2
) for the gender variable (riagendr
).
lincom [bpxsar]male - [bpxsar]female
Step 5b: Test the hypothesis using svy:reg
command
The svy:reg
command could also be used to calculate the t-statistic. The difference between using svy:reg
and lincom
is that svy:reg
can be used without prior estimation. The xi
prefix is used before the command to denote a categorical variable and the i
prefix before categorical variables. Here is the general format of these commands for this example:
xi: svy, subpop(if condition): reg dependentvar i.varname
Use the svy:reg
command with the xi
prefix to calculate the t-statistic and assess whether the mean SBP (bpxsar
) for males and females age 20 years and older are statistically different. The i
prefix denotes the categorical variable, which in this example is riagendr
. Use the char
function choose the reference group for the categorical variable.
char riagendr[omit]2
xi:svy, subpop(if ridageyr.=20 & ridageyr<.):reg bpxsar i.riagendr,
Step 6: Review Stata means and t-test output
Here a table summarizing the results of the previous analyses:
Summary of Results
Variable |
Subpopulation analyzed |
Number of respondents with data |
Mean |
p value |
Systolic blood pressure (bpxsar ) |
Adults age 20 and older |
9,056 |
123 |
n/a |
Men age 20 and older |
4,301 |
124 |
0.0132 (men vs women) |
Women age 20 and older |
4755 |
122 |
According to the stratified analysis, men's mean blood pressure is 2 points higher than women's. This difference is statistically significant (i.e. a difference this big or bigger would happen just by chance (in a sample of this size) only 1.3% of the time). 9,056 respondents had information on systolic blood pressure (SBP
).