Differences between RIF, LDS, and PUF Data Files

Purpose

CMS offers files from aggregate data to individual person level data. This article describes the differences between the aggregate, public use files, the limited data sets, and research identifiable files.

Current Version Date:
08/10/2016

Public Use File (PUFs), also called Non-Identifiable Data Files, have been edited and stripped of all information that could be used to identify individuals. In general, the PUFs contain aggregate-level information. The CMS Public Use Files for Researcher Use page on the ResDAC website provides resources and information about PUFs.

Limited Data Set (LDS) files also contain beneficiary level protected health information similar to the RIF files. In fact, many of the RIFs have an LDS equivalent. LDS files are considered identifiable because of the potential to re-identify a beneficiary. The difference, however, between RIF and LDS is that selected variables within the LDS files are blanked or ranged. LDS requests require a DUA, but do not go through a Privacy Board review. LDS files are available as a 100% or 5% random sample file. The DUA-Limited Data Sets (LDS) page on the CMS website describes the ways in which the LDS files may be used.

Research Identifiable Files (RIFs) contain beneficiary level protected health information (PHI). Requests for RIF data require a Data Use Agreement (DUA) and are reviewed by CMS’s Privacy Board to ensure that the beneficiary’s privacy is protected and only the minimum data necessary are requested and justified. The Identifiable Data Files page of the CMS website provides information about the release of these data.

Public Use File Limited Data Sets Research Identifiable
Requires Privacy Board Review? No No Yes
Requires a Data Use Agreement? No Yes Yes
Files include beneficiary-level data? No Yes Yes
Researchers may request customized cohorts (e.g. Diabetics residing in MN)? No No Yes
Data can be linked at beneficiary level to non-CMS data using a beneficiary identifier? No No Yes[1]
Claim run off period[2] NA Annual file: 6-month run off Annual file: 12-month run off
Quarterly file: 3-month run off Quarterly file: 3-month run off
Table 1. Overview of file difference by privacy level

The RIF and LDS files both contain beneficiary-level data, however, some variables included in the RIF data may be presented differently (ranged or absent) in the LDS counterpart. See Table 2 for the key differences.

Variable File Limited Data Set Research Identifiable File
Unique Beneficiary Identifier Claims & Enrollment files Encrypted identifier Encrypted identifier
MedPAR No identifier Encrypted identifier
Health Insurance Claim (HIC) or Social Security Number (SSN) Claims & Enrollment files Not included in file Included as an add-on with special permission only
Dates (MM/DD/YYYY) Claims files Included as of CY2010[3] Included
MedPAR Quarter and year only Included
Claim from date Claims files Not included Included
Claim through date Claims files Included Included
Beneficiary Zip Code[4] Claims & Enrollment files County and state Included
MedPAR State only Included
Beneficiary Date of Birth Claims, MedPAR & Enrollment files Not included. Age year or age range[5] Included
Date of Death Enrollment files Included, for validated dates of death only[6][7] Included
NPI/UPIN for person level provider Claims files As of 2013, the real NPI is included[8] Included
MedPAR Not included Not included
Facility provider number[9] Claims files & MedPAR Included Included
NPI of the facility Claims files & MedPAR Included Included
Table 2. Variable differences between RIF and LDS files
Appendix

NPI/UPINs are encrypted in the LDS files from 1999-2012. A free crosswalk is available for request. The crosswalk includes encrypted to unencrypted UPIN and NPI numbers back to 1999.

Please see the "Add LDS Files to an Existing LDS DUA" section of the Limited Data Set (LDS) Files page to order this file. 

 


[1] The inclusion of patient identifiers linkable to outside data requires CMS approval. Without this approval, the RIF patient identifiers are not linkable to outside data.

[2] More detailed information about the runoff periods and availability are found in the articles, "RIF Medicare Quarterly Data" and "Medicare Limited Data Set (LDS) Quarterly Claims and Enrollment Data".

[3] LDS files include dates as of 2010. For 2009 files, CMS provides the dates as a separate file. Prior to 2009, the files present dates as a quarter and year.

[4] The Medicare Current Beneficiary Survey (MCBS) LDS and Health Outcomes Survey (HOS) LDS files contain zip code, date of birth, and date of death.

[5] See footnote 2 above.

[6] See footnote 2 above.

[7] Based on a ResDAC analysis of the 2012 RIF Master Beneficiary Summary file, 4% of the death dates are not validated.

[8] NPI/UPINs are encrypted in LDS data files from 1999-2012. A free crosswalk file is available for request to identify individual providers back to 1999. See the Appendix for the record layout and ordering information.

[9] The facility provider number is also called the CMS Certification Number (CCN) or the Medicare provider number and identifies the institutional facility.