A new, individual-level mortality data set built from Social Security Administration mortality data

The CenSoc team at UC Berkeley is excited to announce the release the Berkeley Unified Numident Mortality Dataset (BUNMD). The BUNMD is a microlevel dataset providing researchers access to over 49 million US mortality records, including nearly complete coverage of deaths to individuals 65+ from 1988-2005. The demographic covariates and fine geographic detail allow for high-resolution mortality research.

The BUNMD is a very large, stand-alone data set. It is being released along with two other datasets that can be linked with rich-covariates in the 1940 Census. (For details and downloads, go to http://censoc.berkeley.edu)

BUNMD Dataset

The National Archives’ 2019 release of the Social Security Numident Records created a new administrative data resource for researchers studying mortality. We purchased copies of this public data and are now releasing a publicly available data set. The data, originally in dozens of files of different types, has been cleaned and unified into a single file with one record per person, allowing researchers to jump right in and begin analysis. Details of how the data was processed are available in BUNMD working paper by Joshua R. Goldstein and Casey Breen.

The BUNMD includes several demographic covariates: 

  • Sex
  • Race
  • Place of Birth
  • ZIP Code of residence at time of death
  • State where Social Security Card was issued
  • Individual identifiers are also available

For a complete list of variables, please see the BUNMD codebook

Mortality Estimation 

The BUNMD has nearly complete death coverage of individuals 65+ dying between 1988-2005. Weights are provided to match the counts of death to those estimated for these ages and years by the Human Mortality Database.

BUNMD death coverage for persons 65+. “Complete cases” refer to records with non-missing sex, race, and place of birth covariates. 

Research possibilities

The BUNMD can be used to explore country-of-birth mortality differentials. Large sample sizes lend enough precision to see interesting patterns by individual county. For example, we can confirm there is an immigrant mortality advantage—with the exception of Irish men—as shown in the figure below. 

Mortality differentials by country-of-birth for birth cohorts of 1910-1919. The x-axis gives the age of death of those aged 65+ relative to native born, controlling for birth cohort. For example, Mexican-born women had about 0.7 year longevity advantage over native-born women.

Researchers can also take advantage of the ZIP Code geographic resolution of the BUNMD. The choropleth map below shows e(65)—life expectancy conditional on living to age 65—for Cleveland’s Cuyahoga county. These old-age mortality disparities are likely driven by racial segregation.

Difference in life expectancy at age 65 (e65) by ZIP Code in Cuyahoga County for the birth cohorts of 1910-1919.

The BUNMD’s individual death counts by day allow researchers to study who was hit
hardest by the flu, by ZIP Code, race, exact date of birth, and more. The figure below shows the
four US big flu seasons at the end of the 1990’s.

For more information, please see the BUNMD working paper. Replication materials are available on GitHub.

Posted in demography.