Updated research request forms and data security approval required beginning 4/24/23
The Medicare Cost Report data can be downloaded from the CMS website. The table at that link contains the Hospital, Home Health Agency (HHA), and Skilled Nursing Facility (SNF) cost reports dating back to 1996.
You will notice in Figure 1 that hospital cost reports are labelled in the Facility Type column as either “HOSPITAL” or “Hospital-2010.” Cost reports labelled as “HOSPITAL” were submitted on the 2552-96 form, which was used from 1996 until 2010, when CMS switched to the 2552-10 form. The reports submitted on the 2552-10 form will be labelled as “Hospital-2010.”
You can navigate to the cost report of interest using the navigation arrows at the bottom of the table and/or sorting by Fiscal Year or Facility Type (indicated by arrows in Figure 1).
The cost reports for hospitals, as well as other facility types, are formatted as a relational database. Let’s start by taking a look at the cost report data files to better understand what this means.
Start by navigating to the 2015 Hospital-2010 download and clicking on “2015.” A zipped folder will open with 3 CSV files:
- Alpha-numeric file (Hosp10_2015_ALPHA): This file contains data for alphanumeric variables for every cost report submitted in FY 2015.
- Numeric file (Hosp10_2015_NMRC): This file contains data for numeric variables for every cost report submitted in FY 2015.
- Report file (Hosp10_2015_RPT): This file contains 1 record for each cost report and provides a description (e.g: Report fiscal year beginning/ending dates, Medicare Provider Number of the submitting hospital, etc.).
The First 10 lines of the 2015 Hospital Numeric File are displayed in Figure 2.
Note that there are no column headings in the Numeric file, which is true for the Alpha-numeric and report files, as well. Documentation for these files can be found in the Downloads section of the Hospital Form 2552-10 website, which contains supplemental reports, SAS datasets, and documentation for understanding the cost reports (Figure 3).
The PDF document labelled “HCRIS_Data_model” contains all of the column names in the order they show up in the data files, as well as a schematic demonstrating how the files are linked together (Figure 4). The “RPT_REC_NUM : NUMBER (PK)” variable is an HCRIS-assigned Report Record Number variable used as the linking variable between each of these files. There is no facility identifier in the Alpha-numeric and Numeric files, so you will need to look up any facilities of interest in the Report File, identify the Report Record Number for the report(s) submitted by that facility, and look up that facility’s cost report data in the other files using their Report Record Number.
More information about each of the variables listed in the HCRIS_Data_Model can be found in the data dictionary (HCRIS_Data Dictionary), which is stored in the same zipped file.
Note that the Report Record Number is in the first column (Column A in Figure 2) of each data file. In Figure 2, you can see that the Report Record Number is the same for the first 10 rows. This is because each row in the Numeric and Alpha-numeric files represents 1 value from 1 cost report. The first column identifies the report, while the next 3 columns (Worksheet Code, Line Number, and Column Number) identify the specific location within the cost report. The last column contains the value of the element. The next section identifies the resources necessary to identify the Worksheet Code, Line Number, and Column Number, which will allow you to look up a specific value (e.g: number of beds) in the cost reports.
The Alpha-numeric and Numeric data files each have five variables:
- Report Record Number
- Worksheet Indicator
- Line Number
- Column Number
- Value of the Variable
We have already discussed the Report Record Number, which is used to link between the Alpha-numeric, Numeric, and Report Files. The Worksheet, Line, and Column Numbers refer to the position of the data element of interest within the Cost Reports submitted by hospitals. Therefore, in order to identify a specific variable in the files downloaded from the CMS website, the variable must be located within the forms submitted by Hospitals.
The cost report forms submitted by hospitals, as well as other provider types, can be found in the Provider Reimbursement Manual – Part 2. Note that each provider type submits a different cost report form, and for some provider types, the form has changed over time.
Hospitals submitted the 2552-96 form from 1996 through 2010, when they began submitting the 2552-10 form. The 2552-96 form can be found in Chapter 36 of the Provider Reimbursement Manual, while the 2552-10 form can be found in Chapter 40. While the precise manner in which these items are stored can differ between chapters, each chapter contains the following items:
- Blank cost reporting forms: listed as “R25p236F” in Chapter 36; “R8P240f” in Chapter 40
- Instructions for completing cost report: listed as “pr2_36XX” in Chapter 36; “pr2_40” in Chapter 40
- Data specifications: listed as “R25p236S” in Chapter 36; found toward the bottom of “pr2_40” in Chapter 40
Please proceed to open up the blank reporting forms for Chapter 40 (R8P240f). Note that this document is 163 pages long, so identification of a specific element can require some searching if you are not certain where to find it. One strategy is to perform keyword searches with the document by typing “Ctrl + F” on your keyboard.
For example, if you are interested in the number of beds reported by hospitals, try searching for the term “beds.” The first 6 matches within the document are not helpful, but the 7th match locates a variable titled, “No. of Beds,” as observed in Figure 5.
This worksheet contains all 3 pieces of information necessary to identify the number of beds in the cost reports: Worksheet, Column, and Line. The worksheet is identified in the upper right-hand corner of the page, and in this case, is Worksheet S-3 Part 1. The line number is the numbers to the left of each line description, while the column numbers can be found below each of the column descriptions (each are indicated by arrows in Figure 5). Therefore, the total number of beds for a given hospital (if you sum all of the units) can be found in line 14, column 2, of Worksheet S-3 Part 1.
Note that if a box in a worksheet is greyed out, that Line/Column/Worksheet combination is not reported by hospitals, and thus will not be found in the Cost Reports downloaded on the CMS website.
At this point, if you are interested in learning more about the data element that you have identified, you can find this information in Instructions document, “pr2_40.” For instance, if you are uncertain of what counts as a “bed” in the cost reports, you can search the instructions document for the specific instructions provided to hospitals for filling out Worksheet S-3, Part 1, Column 2.
Now that we have identified the Worksheet, Line, and Column that “Number of Beds” is reported in, we are nearly ready to look up the value in the Cost Report Data files. However, there are 2 remaining questions to be addressed:
- Will my variable of interest show up in the Numeric or Alpha-numeric file?
- How are the Worksheet, Line and Column Numbers coded in the Cost Report Files?
While it is often obvious whether your element or interest is a Number or Alpha-numeric, data specifications exist to help you determine this definitively, and, in turn, determine which file the element will show up. As mentioned above, these data specifications can be found in Chapter 40 Provider Reimbursement Manual – Part 2.
In the zipped folder that opens for Chapter 40, open the PDF document titled, “pr2_40.” The data specifications can be found in Section 4095. To find the specifications for “Number of Beds,” navigate to Section 4095, Worksheet S-3, Part 1, Column 2 (Figure 6).
Whether a variable is numeric or alpha-numeric can be determined by looking at the “Usage” column. The number “9” denotes a numeric variable, and “X” denotes an alpha-numeric variable. If the usage column contains a “-9,” this indicates a numeric variable that can be negative. Since “Number of beds” has a “9,” this variable will be found in the Numeric file.
The specification also indicates which lines are reported for a given column and the field size. (Lines and columns not reported are represented by greyed-out boxes on the cost report forms.)
Now that we know to look for “Number of Beds” in the numeric file, we just need to find how the worksheet, column and line are coded in the data files to identify this element in the numeric file.
The worksheet indicators are found in the Downloads section on the Hospital Form 2552-10 page (i.e: in the zip file with the data model and data specifications documents). From the zipped file, click on the document titled, “HOSP2010_Worksheet Codes.” The worksheet indicator for Worksheet S-3, Part 1, is on Page 2 of the document (Figure 7). Worksheet S-3, Part 1, will be identified with the worksheet indicator “S300001.”
The Line and Column Numbers are 5 position codes in which the third position is the first position to the left of the decimal point. For instance, for Column 2, Line 14, “00200” would represent the column, and 01400 would represent the Line. Sometimes sub-columns exist in the cost reports (e.g: column 3.1). Column 3.1 would appear as “00310”.
In summary, the number of beds will be located in the numeric file. To identify the number of beds for every report in numeric file, filter the records where the second column (Worksheet Indicator) is “S300001,” the third column (Line Number) is “01400”, and the fourth column (Column Number) is “00200”. To identify the number of beds for a specific report submitted by a specific facility, filter the records by the “Record Report Number,” which is reported in Column 1. The Report Record Number for a specific facility can be found in the Report data file.