How to Pick the Best Algorithm

Purpose

ResDAC faculty and Technical Advisors (TAs) are frequently asked to recommend a best algorithm for identifying cases, treatments, outcomes, etc.

In the experience of ResDAC faculty, there is never a single best algorithm that meets all situations. There is often ambiguity with a mix of clear “yes”, clear “no” and a group in the middle (the “uncertain” ones). How exactly we define the three groups and what we do with that “uncertain” group depends on a variety of factors.

Current Version Date:

01/10/2024

Save as PDF (print)

Here are things we think about when picking our approach:

What type of error are we most worried about?

We make different choices if we are most worried about under-identification (missing a case) or over-identification (calling something a case that is not actually a case).

In some cases, we want to make sure that everyone identified as a case is absolutely a case, other times we want to cast the broadest net possible. We often make different decisions in the two cases.
Example: many algorithms count someone with two outpatient/carrier claims or one inpatient claim as a case, some people might include people with a single outpatient/carrier claim as a case if they are worried about missing someone
Sometimes it can be helpful to ask whether real cases of X would follow a specific pattern. For example, does the algorithm make it possible to differentiate between a true positive and a “re-test” in 2 weeks, 3 months, 6 months, etc.
Consider the impact of decisions related to coverage and enrollment.
- For example, are observation windows wider or narrower than needed for your inference? Will you be including or excluding people or making assumptions about what happened before you could observe them?

When evaluating an existing algorithm, how detailed is the algorithm?

Think here about whether the description of the algorithm is detailed enough to implement. Do they use standard names for Medicare files and variables? We are more confident in published work that is more detailed.

Do we agree with the choices about codes that are and are not included?

This involves actually looking at code books and is not just about whether we agree with what they include but also what they do not. In some cases, particularly around coding of procedures, we will reach out to a coder at a hospital and ask them how they code it.
- CPT codes and ICD codes are proprietary. While coding sites can help, we recommend looking at the actual code books to make sure we agree with the choices others make.

Are the elements of the algorithm consistent with CMS payment policy?

CMS has fantastic policy manuals. The manuals will specify rules for submitting claims and whether the rules vary by location of care, payment structure, ownership, etc.
- For example, there may be different rules for hospitals that are paid by prospective payment and those that are not.
- Critical access hospitals also sometimes have slightly different payment rules. The manuals will explain these rules and allow for us to adjust our algorithms as needed.

Does the algorithm rely on factors that inconsistently matter when calculating payment?

Remember, the data are originally generated for billing and payment purposes, not for research or medical record keeping. For example, while diagnosis codes for obesity exist, there are limited circumstances for which coding obesity will affect payment. Comparing beneficiaries with obesity diagnosis codes who receive a procedure that requires coded obesity to those receiving a procedure probably will not result in comparable populations.

Can the algorithm change over time?

There are many aspects of healthcare billing that change over time and can introduce unexpected time trends. For example:
- COVID-19 diagnosis codes and testing codes were created after the pandemic started. Thus, studies of COVID in February and March will need to use different codes than studies focusing on later periods.
- Changes to the forms and to payment policy (in 2011 the UB04 increased the number of diagnosis slots from 10 to 25, this might impact coding of comorbidities
- The October 2015 shift from ICD-9 to ICD-10 has changed all diagnoses as well as the procedures used by hospitals.

Summary

In summary, ResDAC faculty recognize that there is no perfect algorithm that applies in all circumstances for all study goals. Even with published algorithms, ResDAC faculty always spend time investigating options before settling on a strategy and will often use slightly different approaches depending on the goals of their study.

Save as PDF (print)

Article Information

Topic:

Analytic Guidance

Disclaimer:

The process and materials mentioned as part of this KnowledgeBase article are current, as of the publication date on the article, to the best of our knowledge. The examples provided are correct in the aggregate but may not apply to every subgroup or circumstance that a researcher may wish to study. It is up to the researcher to conduct analysis and confirm that the patterns described in this KnowledgeBase article apply to his/her particular study. If your research findings appear to contradict the advice provided, please contact ResDAC at resdac@umn.edu.