Medicare Bayesian Improved Surname Geocoding (MBISG) Data Now Available

CMS is pleased to announce the availability of a new Research Identifiable File (RIF) that utilizes the Medicare Bayesian Improved Surname Geocoding (MBISG) algorithm to predict the race and ethnicity of Medicare beneficiaries.

The MBISG algorithm was developed by CMS to augment existing race and ethnicity data from the Social Security Administration and produce more accurate indirect estimates of the race and ethnicity of the Medicare beneficiary population. The MBISG data includes a set of probabilities that the beneficiary is a member of six racial and ethnic groups: American Indian or Alaska Native (AI/AN), Asian American and Native Hawaiian or Other Pacific Islander (AA and NHPI), Black, Hispanic, Multiracial, or White. MBISG probabilities are based on U.S. Census Bureau data on race and ethnicity distributions by surname and Census block group, as well as CMS’s race and ethnicity administrative data and additional administrative elements including first name, demographics, and coverage characteristics.

The MBISG data also include a Spanish Preference Category variable which categorizes the predicted probability that each Medicare beneficiary prefers Spanish language survey material.

The MBISG data consists of a single file that contains the race and ethnicity probabilities of Medicare beneficiaries enrolled on March 1, 2023. This dataset is separate from CCW’s Master Beneficiary Summary File (MBSF), which is partitioned by calendar year. The MBISG dataset will overlap with MBSF files, but the cohort does not match exactly to any given MBSF calendar year dataset.

For more information about the MBISG data, please contact