How to Pick the Best Algorithm


ResDAC faculty and Technical Advisors (TAs) are frequently asked to recommend a best algorithm for identifying cases, treatments, outcomes, etc.

In the experience of ResDAC faculty, there is never a single best algorithm that meets all situations. There is often ambiguity with a mix of clear “yes”, clear “no” and a group in the middle (the “uncertain” ones). How exactly we define the three groups and what we do with that “uncertain” group depends on a variety of factors.

Current Version Date:
Here are things we think about when picking our approach:

What type of error are we most worried about?

We make different choices if we are most worried about under-identification (missing a case) or over-identification (calling something a case that is not actually a case).

  • In some cases, we want to make sure that everyone identified as a case is absolutely a case, other times we want to cast the broadest net possible. We often make different decisions in the two cases.
  • Example: many algorithms count someone with two outpatient/carrier claims or one inpatient claim as a case, some people might include people with a single outpatient/carrier claim as a case if they are worried about missing someone
  • Sometimes it can be helpful to ask whether real cases of X would follow a specific pattern. For example, does the algorithm make it possible to differentiate between a true positive and a “re-test” in 2 weeks, 3 months, 6 months, etc.
  • Consider the impact of decisions related to coverage and enrollment.
    • For example, are observation windows wider or narrower than needed for your inference? Will you be including or excluding people or making assumptions about what happened before you could observe them?

When evaluating an existing algorithm, how detailed is the algorithm?

  • Think here about whether the description of the algorithm is detailed enough to implement. Do they use standard names for Medicare files and variables? We are more confident in published work that is more detailed.

Do we agree with the choices about codes that are and are not included?

  • This involves actually looking at code books and is not just about whether we agree with what they include but also what they do not. In some cases, particularly around coding of procedures, we will reach out to a coder at a hospital and ask them how they code it.

Are the elements of the algorithm consistent with CMS payment policy?

  • CMS has fantastic policy manuals. The manuals will specify rules for submitting claims and whether the rules vary by location of care, payment structure, ownership, etc.
    • For example, there may be different rules for hospitals that are paid by prospective payment and those that are not.
    • Critical access hospitals also sometimes have slightly different payment rules. The manuals will explain these rules and allow for us to adjust our algorithms as needed.

Does the algorithm rely on factors that inconsistently matter when calculating payment?

  • Remember, the data are originally generated for billing and payment purposes, not for research or medical record keeping. For example, while diagnosis codes for obesity exist, there are limited circumstances for which coding obesity will affect payment. Comparing beneficiaries with obesity diagnosis codes who receive a procedure that requires coded obesity to those receiving a procedure probably will not result in comparable populations.

Can the algorithm change over time?

  • There are many aspects of healthcare billing that change over time and can introduce unexpected time trends. For example:
    • COVID-19 diagnosis codes and testing codes were created after the pandemic started. Thus, studies of COVID in February and March will need to use different codes than studies focusing on later periods.
    • Changes to the forms and to payment policy (in 2011 the UB04 increased the number of diagnosis slots from 10 to 25, this might impact coding of comorbidities
    • The October 2015 shift from ICD-9 to ICD-10 has changed all diagnoses as well as the procedures used by hospitals.

In summary, ResDAC faculty recognize that there is no perfect algorithm that applies in all circumstances for all study goals. Even with published algorithms, ResDAC faculty always spend time investigating options before settling on a strategy and will often use slightly different approaches depending on the goals of their study.