Technical Info

Data Source Design and Case Definition

Back to Top Data source

The Current Population Survey (CPS) is conducted to measure national labor force participation and employment. It is administered by the U.S. Census Bureau for the U.S. Bureau of Labor Statistics (BLS). The homepage for CPS, providing further information including data tables and publications, is http://www.bls.gov/cps/.

The data provided in this query system are variable limited and subset by NIOSH to include only those persons whose age is greater than or equal to 15 and who are classified as "Employed - At work" or "Employed - Absent" according to the monthly labor force variable. Complete monthly CPS datasets are available for download at https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html. Variable labels included below correspond to the BLS data files.

Back to Top Sample design

CPS data are collected via personal and telephone surveys from a sample of 50,000 to 60,000 households per month. The CPS is based on a multistage stratified sample compiled from independent samples in each state and the District of Columbia to represent the civilian, noninstitutional population. Sample size is determined by reliability requirements as established by the coefficient of variation. Survey respondents are asked to provide information about the employment status of each household member who is 15 years of age or older.

Complete details on the CPS sample design can be found in Chapter 2 in Technical Paper 77:Design and Methodology.

Back to Top Case definition

To be eligible for inclusion in the CPS, persons must meet the following criteria:

  • 15 years old or greater (There is no upper age limit for inclusion.)
  • Civilian (Non-military)
  • Non-institutionalized (e.g. prisons, long-term care hospitals, and nursing homes)

The NIOSH subset of CPS data used in this query system includes only those persons classified as employed (Monthly labor force variable = Employed ' at work or Employed ' absent).

Back to Top Data files

NIOSH, Division of Safety Research, Surveillance and Field Investigations Branch staff, download public use BLS CPS microdata files annually (https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html). These files are subset for internal use to include only the employed labor force with a reduced number of variables. Employed labor force population estimates from this query system are obtained from these files. Each annual file contains 697,000 - 925,000 records, with an average of 800,000 records per year.

Beginning in January 2011, additional safeguards were implemented with the CPS public use microdata files to ensure that respondent identifying information is not disclosed. This involved altering or perturbing respondents' ages to protect the confidentiality of survey respondents and their data. Due to these measures, labor force estimates from the public use microdata files will no longer exactly match most official BLS published estimates, which are based on internal, nonpublic-use files. All differences should fall well within the sampling variability for CPS estimates.

Estimates and Errors

Back to Top Employed labor force estimates

The NIOSH subset of the CPS dataset is used to produce national estimates of employed persons by demographics and type of employment by extrapolation using the statistical weight of each case. While CPS data can be broken down to provide monthly estimates, it is recommended that annual estimates are used whenever possible. Estimates should be formatted in thousands. Any estimates below 1,000 are considered to be unstable and should be avoided.

Back to Top Weights

Each record in the CPS dataset represents a portion of the national labor force as indicated by its weight. Three weights in the CPS dataset are applicable to the selected population and variables within this query system. Each weight has restrictions in application and produces slightly different estimates. For this query system, an algorithm was designed to designate the preferred weight noted in the Advance Options section of the query screen. Note that expanding the Advance Options section allows the user to choose an alternate weight if there is another applicable weight based on the query parameters. Complete details regarding the available weights are available in Chapter 2 in Technical Paper 77: Design and Methodology. The following is provided to give the user a brief overview of the three weights:

  1. Final weight (1980-1988 FWT; 1989-1993 FNLWGT) and second stage weight (1994-present PWSSWGT). This is the basic weight for the population. It is adjusted to control the sample estimates for several geographic and demographic subgroups of the population. This ensures that the sample-based estimates match independent population controls in the various geographic and demographic groups. It is the only weight inclusive of persons 15 years old or younger. It is also the only final population weight on the publicly available microdata files prior to 1998.
  2. Composited final weight (1998-present PWCMPWGT). This is the weight used by BLS to compute the estimates found in the BLS publication, Employment and Earnings. Estimates calculated with the composited final weight are more stable as this weight is a percentage of the second stage weight from the current survey month plus a percentage of the second stage weight from the previous month. This weight is only applicable to persons 16 years old or older. It is not publicly available on the CPS microdata files until 1998.
  3. Outgoing rotation weight (1994-present PWORWGT). This weight is used for selected population estimates involving second job industry or occupation that are collected only during the outgoing rotation (month-in-sample = 4 or 8). In this query system, it is applied to all queries specific to second job industry and occupation. This weight is only applicable to persons 16 years old or older. It is not publicly available on the CPS microdata files until 1994.

Back to Top Calculation of number of worker estimates

To produce monthly number of worker estimates, the designated weights for all records in the specified month(s) are summed. To produce annual number of worker estimates, the designated weights for all records in the specified year(s) are divided by 12 and then summed.

Back to Top Calculation of full-time equivalents (FTE) (Primary job) estimates

Primary job FTE are defined as working 40 hours per week for 50 weeks per year in their primary job. The following formula is used to calculate Primary job FTE:

  • Primary job FTE = ((Primary job hours worked * 52) *(weight/12))/(40 * 50)

    (Where primary job hours = PEHRACT1 and weight = PWCMPWGT or PWSSWGT)

Hours worked with a value in excess of 168 are set to 168 for FTE calculations. Negative values are set to zero.

Monthly FTE estimates are calculated by grouping the data by month and using the FTE formula above. FTE estimates by month assume that all months have the same number of days.

Back to Top Calculation of full-time equivalents (FTE) (Secondary job) estimates

Secondary job FTE are defined as working 40 hours per week for 50 weeks per year in their secondary job. The other job hours worked variable used in this calculation represents hours for all jobs other than the primary job. To restrict results to second job only, secondary job FTE calculations are limited to workers with exactly two jobs (PEMJNUM = 2). Consequently, the results estimate the lower bound for workers with two or more jobs. The following formula is used to calculate secondary FTE:

  • Secondary job FTE = ((Secondary job hours worked * 52) *(weight/12))/(40 * 50)

    (Where other job hours = PEHRACT2 and weight = PWCMPWGT or PWSSWGT)

Hours worked with a value in excess of 168 are set to 168 for FTE calculations. Negative values are set to zero. Monthly FTE estimates are calculated by grouping the data by month and using the FTE formula above. FTE estimates by month assume that all months have the same number of days.

Back to Top Calculation of full-time equivalents (FTE) (All jobs) estimates

All jobs FTE are defined as working 40 hours per week for 50 weeks per year, in all jobs. Because of the use of all hours worked in all jobs, person-years cannot be linked to specific jobs, occupations, or industries. The following formula is used to calculate person-years:

  • All jobs FTE = ((All job hours worked * 52) * (weight/12))/(40 * 50)

    (Where all job hours worked = PEHRACTT and weight = PWCMPWGT or PWSSWGT)

Hours worked with a value in excess of 168 are set to 168 for person-years calculations. Negative values are set to zero.

Monthly person-years estimates are calculated by grouping the data by month and using the person-years formula above. Person-years estimates by month assume that all months have the same number of days.

Back to Top Standard error calculations

Approximate standard errors are calculated for CPS employment estimates by using specified parameters in generalized variance functions. For CPS data through 2014, guidance for calculating standard error estimates may be found in the "Reliability of the estimates" section of the Household Data Technical Notes (Employment and Earnings, February 2006, pages 193-200). For CPS data from 2015 onward, refer to the BLS document titled "Calculating Approximate Standard Errors and Confidence Intervals for Current Population Survey Estimates" and the associated Excel tables titled "Parameters and factors for calculating standard errors for estimates from the Current Population Survey".

The approximate standard error se(x) of x, an estimated monthly level, can be obtained using the formula below, where a and b, parameters from the BLS documents above, are associated with a particular characteristic and c is an adjustment factor when x is an average of several monthly levels (e.g., quarterly or annually).

Aproximate standard error formula

Selection of parameters a and b for a specific year are first chosen based on the appropriate employment characteristics. When multiple characteristics may apply for specific subsets of sex and/or age groups, it is recommended that parameters are chosen to provide the most conservative error estimate (i.e., the largest error estimate). Second, the adjustment factor c is chosen in a similar fashion based on the specific time period. For single monthly estimates or any estimates of non-contiguous months the adjustment factor c equals 1. For quarterly (any three contiguous months) or annual estimates the standard error is reduced by multiplying the monthly standard error by the appropriate adjustment factor (which is less than 1).

Data Elements and Reports

Back to Top Data elements

Data elements and their respective size, description, location, and variable responses can be found in the data dictionaries stored at https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html. Brief descriptions of the variables, variable names as listed in the most recent data file, and groupings unique to this query system are listed below.

  • Year of Interview (HRYEAR4)

    Identifies the year in which the survey was completed.

  • Month of Interview (HRMONTH

    Identifies the month in which the survey was completed.

  • Region(s) (GEREG)

    Identifies the region of the country in which the worker resides. There are four response choices for this variable (Northeast, Midwest, South, West). The states included in each of these regions are as follows:

    • Northeast: CT, ME, MA, NH, NJ, NY, PA, RI, VT
    • Midwest: IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI
    • South: AL, AR, DE, DC, FL, GA, KY, LA, MD, MS, NC, OK, SC, TN, TX, VA, WV
    • West: AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, VA, WY
  • State(s) (GESTFIPS)

    Identifies the U.S. state in which the worker resides. This is coded according to Federal Information Processing Standards (FIPS) codes that are listed in the data dictionaries stored at https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html.

  • Include 15 yr olds

    Includes 15 year-old participants. These participants are not typically included in reports produced by BLS. Inclusion of this age group will limit the weight selection to the final weight (PWSSWGT). In addition, analyses cannot be done on industry, occupation, or class of worker for secondary job.

  • Age group(s) (5 year and 10 year) (PRTAGE)

    Collapsed age groupings created within this query system based on the worker's age at the time of the survey. Breakdowns of adolescent age and young adult age groups (15-17; 18-19) are available in the 5 year groupings.

  • Sex (PESEX)

    Defines whether the worker is male or female.

  • Education (PEEDUCA)

    Identifies the education level of the worker. The response choices in this variable changed in 1992. Data collected prior to 1992 are not comparable with data collected in 1992 and after. A slight change in variable meaning is noted in 1994 as the values were written to reflect the highest level of school completed. Prior to 1994, the values in this variable reflect the highest grade attended.

  • Race (PTDTRACE)

    Classifies workers with similar biological, social, and cultural heritage into specified race groups. There have been three changes to the variable response choices within the race variable since 1980:

    • In 1989, the response choices of American Indian/Aleutian Eskimo and Asian/Pacific Islander were added to the already existing White, Black, and Other choices.
    • In 1994, the Other choice was eliminated.
    • In 2003, the response choices were changed again to comply with new federal data standards for race. This change allowed respondents to choose up to five races. As a result of this change, published labor force estimates for single-race groups will be smaller than those of previous years as they no longer include persons reporting multiple races. Consequently, these numbers may not be comparable to previous years.
  • Hispanic (PEHSPNON)

    Defines whether or not workers are Hispanic. This dichotomous variable is available beginning with data collected in 1994.

  • Origin (PRDTHSP)

    Classifies workers of Hispanic origin into more detailed origin groups. The origin variable is available beginning with data collected in 1989. In 2003, the response choices were changed to comply with new standards for maintaining, collecting, and presenting federal data on ethnicity. In addition to updating response choices, CPS began asking the ethnicity questions prior to asking the questions about race rather than inferring ethnicity from the country of origin. In 2014, CPS expanded the number of response options for this variable. However, several of the new options could not be mapped to those of prior years. Thus, to avoid a break in series in ELF, all Hispanic origin data responses have been collapsed into the categories of Mexican, Puerto Rican, Cuban, and Other Spanish. Use the original CPS microdata files if you wish to look at this variable in greater detail.

  • Self-nativity (PENATVTY)

    Identifies the worker's country of birth. This variable is available beginning with data collected in 1994. Not all countries were coded consistently since 1994. If a country was not coded for all data years, there are parentheses after the country name indicating the time frame during which the country code was used. When interpreting data results for this variable, you must be cautious and recognize that results are not restricted by timeframe. For instance, if you query data years 2000-2007 by nativity, it will include data for country codes that were not available until 2007. However, the aggregate 2000-2007 estimate for those select countries will only reflect one year of data (i.e., 2007). You may want to use the Time Period parameters to limit your query in order to match the timeframes of select countries.

  • Mother's nativity (PEMNTVTY)

    Identifies the worker's mother's country of birth. This variable is available beginning with data collected in 1994. Not all countries were coded consistently since 1994. If a country was not coded for all data years, there are parentheses after the country name indicating the time frame during which the country code was used. When interpreting data results for this variable, you must be cautious and recognize that results are not restricted by timeframe. For instance, if you query data years 2000-2007 by mother’s nativity, it will include data for country codes that were not available until 2007. However, the aggregate 2000-2007 estimate for those select countries will only reflect one year of data (i.e., 2007). You may want to use the Time Period parameters to limit your query in order to match the timeframes of select countries.

  • Father's nativity (PEFNTVTY)

    Identifies the worker's father's country of birth. This variable is available beginning with data collected in 1994. Not all countries were coded consistently since 1994. If a country was not coded for all data years, there are parentheses after the country name indicating the time frame during which the country code was used. When interpreting data results for this variable, you must be cautious and recognize that results are not restricted by timeframe. For instance, if you query data years 2000-2007 by father’s nativity, it will include data for country codes that were not available until 2007. However, the aggregate 2000-2007 estimate for those select countries will only reflect one year of data (i.e., 2007). You may want to use the Time Period parameters to limit your query in order to match the timeframes of select countries.

  • Citizenship status (PRCITSHP)

    Identifies whether the worker is native or foreign born. If they are foreign born, identifies whether they have become a citizen of the United States. This variable is available beginning with data collected in 1994.

  • Number of jobs (PEMJNUM)

    Identifies how many jobs the worker holds. This question is only asked if the worker is identified as holding more than one job. This variable is available beginning with data collected in 1994.

  • Hours worked in primary job (PEHRACT1)

    Records how many hours the participant worked in one week in their designated primary job. These data have been grouped to facilitate analysis. The detailed groups of hours worked reflect the groupings used in Employment and Earnings, a BLS publication of data output from the CPS. The broad groups of hours worked reflect groupings found in other BLS publications.

  • Hours worked in all other jobs (PEHRACT2)

    Records how many hours the participant worked in one week in all jobs other than their primary job. These data have been grouped to facilitate analysis. The detailed groups of hours worked reflect the groupings used in Employment and Earnings, a BLS publication of data output from the CPS. The broad groups of hours worked reflect groupings found in other BLS publications. This variable is available beginning with data collected in 1994.

  • Total Hours Worked in All Jobs (PEHRACTT)

    Sums the number of hours the participant worked in one week in their primary job and the number of hours the participant worked in one week in all jobs other than their primary job. These data have been grouped to facilitate analysis. The detailed groups of hours worked reflect the groupings used in Employment and Earnings, a BLS publication of data output from the CPS. The broad groups of hours worked reflect groupings found in other BLS publications. This variable is available beginning with data collected in 1994.

  • Labor force status (PEMLR)

    Identifies whether the worker was employed and present at work or employed but absent from work for the work week referenced during the survey. While there are additional response options for this variable in the original CPS microdata files, the data used for this query system are subset by this variable to capture only those who are employed at the time of the survey.

  • Class of worker – primary job (PEIO1COW)

    Identifies whether the worker’s primary job is in the government or private industry or if they are self-employed. These broad groupings are available beginning in 1980 and allow comparisons across all years. More detailed groupings are available beginning in 1989, distinguishing the types of government (i.e., local, state, or federal). In 1994, the variable was updated to distinguish between private for profit businesses and private nonprofit businesses.

  • Class of worker – secondary job (PEIO2COW)

    Provides detailed groups indicating whether the worker’s second job is in government or private industry or if they are self-employed. This variable is available beginning in 1994.

  • Covered by a union (PEERNCOV)

    Indicates whether the worker was covered by a union or an employee association contract. This variable is available beginning in 1994.

  • Member of a union (PEERNLAB)

    Indicates whether the participant was a member of a labor union or an employee association similar to a union. This variable is available beginning in 1994.

  • Industry code - Primary job (PEIO1ICD)

    Identifies the industry in which the worker's primary job is held. This variable is available beginning with data collected in 1983 and uses the Bureau of Census (BOC) industry codes to identify the specific industry.

    While the transition from 1980 to 1990 BOC industry codes was considered minor and it is possible to crosswalk the data, the transition to the 2002 BOC industry codes involved significant changes. Thus, industry data from 2003 onward are not comparable to those from prior years. The transitions from 2002 BOC codes to 2007 BOC codes to 2012 BOC codes were minor and can be crosswalked. Details for crosswalking the BOC industry codes can be found on the U.S. Census Bureau website. The transition from the 2012 BOC codes to the 2017 BOC codes involved significant changes that prohibit the development of a crosswalk for all industries. For details on these changes, refer to the 2017 BOC industry codes document.

    In addition to code changes, select industry codes were collapsed as determined by results of the Economic Census. These codes were collapsed beginning in May 2012. While collapsed codes can be seen in the industry code trees, codes that have been collapsed cannot be selected for query.

  • Industry code - Secondary job (PEIO2ICD)

    Identifies the industry in which the worker's secondary job is held. This variable is available beginning with data collected in 1994 and uses the Bureau of Census (BOC) industry codes to identify the specific industry.

    The transition to the 2002 BOC industry codes involved significant changes. Thus, industry data from 2003 onward are not comparable to those from prior years. The transitions from 2002 BOC codes to 2007 BOC codes to 2012 BOC codes were minor and can be crosswalked. Details for crosswalking the BOC industry codes can be found on the U.S. Census Bureau website. The transition from the 2012 BOC codes to the 2017 BOC codes involved significant changes that prohibit the development of a crosswalk for all industries. For details on these changes, refer to the 2017 BOC industry codes document.

    In addition to code changes, select industry codes were collapsed as determined by results of the Economic Census. These codes were collapsed beginning in May 2012. While collapsed codes can be seen in the industry code trees, codes that have been collapsed cannot be selected for query.

  • Occupation code - Primary job (PEIO1OCD)

    Identifies the occupation describing the worker's primary job. This variable is available beginning with data collected in 1983 and uses the Bureau of Census (BOC) occupation codes to identify the specific occupation.

    While the transition from 1980 to 1990 BOC occupation codes was considered minor and it is possible to crosswalk the data, the transition to the 2002 BOC occupation codes involved significant changes. Thus, occupation data from 2003 onward are not comparable to those from prior years. A crosswalk for the 2010 BOC occupation codes to earlier years can be found on the U.S. Census Bureau website. The transition from the 2010 BOC codes to the 2018 BOC codes also involved significant changes. For details on these changes, refer to the 2018 BOC occupation codes document.

    In addition to code changes, occupation codes were collapsed. In January of 2012, the CPS disclosure board mandated that occupations with fewer than 10,000 persons based on results from the American Community Survey should be collapsed. Occupation codes not meeting the threshold were collapsed as of May 2012. While collapsed codes can be seen in the occupation code trees, codes that have been collapsed cannot be selected for query.

  • Occupation code - Secondary job (PEIO2OCD)

    Identifies the occupation describing the worker's secondary job. This variable is available beginning with data collected in 1994 and uses the Bureau of Census (BOC) occupation codes to identify the specific occupation.

    The transition to the 2002 BOC industry codes involved significant changes. Thus, occupation data from 2003 onward are not comparable to those from prior years. A crosswalk for the 2010 BOC occupation codes to earlier years can be found on the U.S. Census Bureau website. The transition from the 2010 BOC codes to the 2018 BOC codes also involved significant changes. For details on these changes, refer to the 2018 BOC occupation codes document.

    In addition to code changes, occupation codes were collapsed. In January of 2012, the CPS disclosure board mandated that occupations with fewer than 10,000 persons based on results from the American Community Survey should be collapsed. Occupation codes not meeting the threshold were collapsed as of May 2012. While collapsed codes can be seen in the occupation code trees, codes that have been collapsed cannot be selected for query.

Back to Top Data reports

Data results can be output in the form of number of workers, primary job full-time equivalents (FTE), or secondary job full-time equivalents (FTE) or all jobs full-time equivalents (FTE). Each of these output forms is explained below.

  • Number of workers

    Selecting to output counts produces national estimates detailing the total number(s) of employees, regardless of the number of hours worked. Counts are calculated by summing the weighted number of employees.

  • Primary job full-time equivalents (FTE)

    Selecting to output primary job FTE produces national estimates detailing the number of workers working 40 hours per week for 50 weeks in their primary job. primary job FTE are calculated using the following formula:

    • primary FTE = (primary job hours worked*52) (weight/12)/(40*50)
  • Secondary job full-time equivalents (FTE)

    Selecting to output secondary FTE produces national estimates detailing the number of workers working 40 hours per week for 50 weeks in their secondary job. To restrict results to second job only, secondary job FTE calculations are limited to workers with exactly two jobs (PEMJNUM = 2). Consequently, the results estimate the lower bound for workers with two or more jobs. Secondary job FTE are calculated using the following formula:

    • Secondary FTE = (all other job hours worked*52) (weight/12)/(40*50)
  • All jobs full-time equivalents (FTE)

    Selecting to output all jobs FTE produces national estimates detailing the number of workers working 40 hours per week for 50 weeks in all jobs. Due to the use of all hours worked, all jobs FTE cannot be linked to specific jobs, occupations, or industries. All jobs FTE are calculated using the following formulas:

    • All jobs FTE = (total job hours worked*52)(weight/12)/(40*50)

Back to Top BLS reporting recommendations

The following reporting recommendations have been made by BLS:

  1. When possible, report annual estimates rather than monthly estimates.
  2. Any estimates less than 1,000 are considered unstable and should be avoided.
  3. Estimates should be formatted in thousands.

Back to Top Disclaimer

Although NIOSH extends considerable effort to insure reasonable data quality for Employed Labor Force (ELF) estimates, there are no warranties expressed or implied with these data. The underlying data for queries are subject to change without notice as errors, inconsistencies, or other data issues arise. The objective of the ELF query system is to provide NIOSH-level access to national employment data for scientific purposes, including computation of denominator data and exploration of various subsets of employed persons for inclusion in reports, manuscripts, project proposals, etc.

Back to Top Contact information

The NIOSH subset of the CPS data is compiled and maintained by Suzanne Marsh in the Division of Safety Research. Should you have questions about these data or the query system or should you want to request the annual NIOSH subset of CPS data files, please e-mail her at smm2@cdc.gov.

BLS also offers help in understanding and interpreting the CPS data. To contact their helpdesk, send questions electronically to BLS.

Page last reviewed: December 9, 2023
Content source: National Institute for Occupational Safety and Health (NIOSH) Division of Safety Research