**********************************************************************************************************************
** Example Stata code to replicate NCHS Data Brief No. 303, Figure 1                                                **
** Prevalence of Depression Among Adults Aged 20 and Over: United States, 2013–2016                                 **
**                                                                                                                  **
** Brody DJ, Pratt LA, Hughes JP. Prevalence of Depression Among Adults Aged 20 and Over: United Brody DJ, Pratt LA,** 
** Hughes JP. Prevalence of Depression Among Adults Aged 20 and Over: United.                                       **
** NCHS Data Brief. No 303. Hyattsville, MD: National Center for Health Statistics. 2018.                           **
** Available at: https://www.cdc.gov/nchs/products/databriefs/db303.htm                                             **
**********************************************************************************************************************

** Note to tutorial users: you must update some lines of code (e.g. file paths) 
**    to run this code yourself. Search for comments labeled "TutorialUser"


** Display Stata Version Number **
version

* This example code is written and verified using the syntax available in Stata/SE version 16.
* New syntax were introduced in Stata Version 16 for some of the commands used in this example code.
* If you are using earlier or later version of Stata, please be aware that some lines need to be modified.

* Change working directory to a directory where we can save temporary files *
* TutorialUser: Update this path to a valid location on your computer!
cd "C:\Stata_workspace\"

** Download Demographic (DEMO) Data and Keep Variables Of Interest **
import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.XPT", clear
keep seqn riagendr ridageyr sdmvstra sdmvpsu wtmec2yr
save "DEMO_H.dta", replace

import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT", clear
keep seqn riagendr ridageyr sdmvstra sdmvpsu wtmec2yr

** Append Files **
append using "DEMO_H.dta"
save "DEMO.dta", replace

** Download Mental Health - Depression Screener (DPQ) Data **
import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DPQ_H.XPT", clear
save "DPQ_H.dta", replace

import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DPQ_I.XPT", clear

** Append Files **
append using "DPQ_H.dta"

** Merge Files **
merge 1:1 seqn using "DEMO.dta"

** Set Refused/Don't Know To Missing (for all variables that start with prefix dpq) **
recode dpq* (7/9 = .)

** Create Binary Depression Indicator as 0/100 variable **
** note that the score will be missing if any of the items are missing **
gen Depression_Score = dpq010+dpq020+dpq030+dpq040+dpq050+dpq060+dpq070+dpq080+dpq090
recode Depression_Score (0/9 = 0) (10/27 = 100), generate(Depression_Indicator)

** Create a new variable with age categories: 20-39, 40-59, 60 and over ** 
recode ridageyr (0/19 = .) (20/39 = 1) (40/59 = 2) (60/80 = 3), generate(Age_Group)

** Labels for categorized variables **
label define Gender_Labels 1 "Male" 2 "Female"
label values riagendr Gender_Labels
label define Age_Labels 1 "20-39" 2 "40-59" 3 "60+"
label values Age_Group Age_Labels

** Define analysis population: adults age 20 and over with a non-missing depression score
gen inAnalysis=0
replace inAnalysis=1 if ridageyr >=20 & !missing(Depression_Indicator)

** Specify survey design variables and request Taylor linearized variance estimation **
** Note: using the MEC Exam Weights (WTMEC2YR), per the analytic notes on the 
**       Mental Health - Depression Screener (DPQ_H) documentation
**  Divide weight by 2 because we are appending 2 survey cycles for 2013-2014 and 2015-2016
gen wtmec4yr = wtmec2yr / 2
svyset [w=wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

** Sample Size (unweighted) by sex and age for analysis population **
tab riagendr Age_Group if inAnalysis

** Prevalence of depression **
svy, subpop(inAnalysis): mean Depression_Indicator

** Prevalence of depression by gender **
svy, subpop(inAnalysis): mean Depression_Indicator, over(riagendr)

** Compare prevalence of depression between men and women **
lincom c.Depression_Indicator@1.riagendr- c.Depression_Indicator@2.riagendr

** Prevalence of depression by age group **
svy, subpop(inAnalysis): mean Depression_Indicator, over(Age_Group)

** Pairwise Comparison Of Age Groups **
lincom c.Depression_Indicator@1.Age_Group - c.Depression_Indicator@2.Age_Group  // 20-39 vs. 40-59
lincom c.Depression_Indicator@1.Age_Group - c.Depression_Indicator@3.Age_Group  // 20-39 vs. 60 and over
lincom c.Depression_Indicator@2.Age_Group - c.Depression_Indicator@3.Age_Group  // 40-59 vs. 60 and over

** Prevalence By Gender And Age Group **
svy, subpop(inAnalysis): mean Depression_Indicator, over(riagendr Age_Group)

** Compare Prevalence Between Men And Women By Age Group **
lincom c.Depression_Indicator@1.riagendr#1.Age_Group - c.Depression_Indicator@2.riagendr#1.Age_Group // men vs. women: aged 20-39
lincom c.Depression_Indicator@1.riagendr#2.Age_Group - c.Depression_Indicator@2.riagendr#2.Age_Group // men vs. women: aged 40-59
lincom c.Depression_Indicator@1.riagendr#3.Age_Group - c.Depression_Indicator@2.riagendr#3.Age_Group // men vs. women: aged 60 and over

** Pairwise Comparison of Age Groups By Gender **
lincom c.Depression_Indicator@1.riagendr#1.Age_Group - c.Depression_Indicator@1.riagendr#2.Age_Group // 20-39 vs. 40-59       : men
lincom c.Depression_Indicator@1.riagendr#1.Age_Group - c.Depression_Indicator@1.riagendr#3.Age_Group // 20-39 vs. 60 and over : men
lincom c.Depression_Indicator@1.riagendr#2.Age_Group - c.Depression_Indicator@1.riagendr#3.Age_Group // 40-59 vs. 60 and over : men
lincom c.Depression_Indicator@2.riagendr#1.Age_Group - c.Depression_Indicator@2.riagendr#2.Age_Group // 20-39 vs. 40-59       : women
lincom c.Depression_Indicator@2.riagendr#1.Age_Group - c.Depression_Indicator@2.riagendr#3.Age_Group // 20-39 vs. 60 and over : women
lincom c.Depression_Indicator@2.riagendr#1.Age_Group - c.Depression_Indicator@2.riagendr#2.Age_Group // 40-59 vs. 60 and over : women

************************************************************

** Alternative method of testing: pairwise comparisons on a "cell means model" from the logit command **
** This method produces slightly different results than the above-shown "svy:mean"-based results.**
** This method produces is slightly better. **

** Prevalence By Gender And Age Group **
* specify ibn. for each factor variable and the noconstant option to include all levels of categorical variables in the model *
svy, subpop(inAnalysis): logit Depression_Indicator ibn.Age_Group#ibn.riagendr, noconstant

** Pairwise comparison of age groups, among men (riagendr=1) and women (riagendr=2) **
pwcompare Age_Group#1.riagendr, pveffects
pwcompare Age_Group#2.riagendr, pveffects

** Pairwise comparison by gender, for each age group *;
pwcompare riagendr#1.Age_Group, pveffects
pwcompare riagendr#2.Age_Group, pveffects
pwcompare riagendr#3.Age_Group, pveffects