Mining the Medicare Physician Dataset

Twitter icon
Facebook icon
LinkedIn icon
e-mail icon
Google icon
 - Michael Bohl
Michael Bohl

On April 9, 2014, CMS released its Medicare Provider Utilization and Payment Data, a dataset representing an aggregation of 2012 volume, charge, and payment information by physician, by CPT code, by place of service—more than 9 million records. It is important that the radiology community not flatly reject these data, but instead recognize that the database contains meaningful information about us and our environment—and learn from the data. 

Medicare gleaned the data, including volume, charge, payment, and place of service information from its CY2012 National Claims History (NCH) Standard Analytic Files (SAFs). Provider demographics, including name, credentials, gender, address, and entity type information, came from the National Plan & Provider Enumeration System (NPPES).

Leading up to the data’s release, virtually all physician groups—including the ACR and the American Medical Association—opposed the release, but primarily out of privacy concerns and the lack of data context. On the day after the release, and for several days thereafter, we were treated to headlines about which provider(s) in our local area or state received the most payments. These stories made for interesting headlines, but little else.

Radiology Listservs and other social media were filled with critical comments, mostly along the lines of: nothing more than a data dump; data are not verifiable; anecdotal stories reveal lots of errors; payments are not broken out by PC and TC; so many qualifiers needed to make the information usable; and the data do not agree with what we see in our practice.

The unabridged version

Much of the early criticism likely was due to the commenters’ use of an online tool that was limited to returning a single aggregated payment total with little to no detail. From my perspective, the aggregated data, while great for generating headlines, are largely useless and misleading. Based on only the use of the aggregation tools, I share the critics’ opinions. 

However, CMS also made available the complete dataset containing more than 9 million records with 27 separate database elements including:

  • number of times a provider submitted each CPT code;
  • average charge each physician submitted by CPT code;
  • average Medicare allowed amount for each CPT code by physician;
  • average payment each physician received for each CPT code;
  • specialty each physician is registered as with Medicare;
  • billing address of each physician;
  • indication of whether the procedure was performed in a “facility” or “non-facility” location; and
  • POS code for each procedure

The dataset is available for download at the website.  Perhaps the easiest way to locate it is to enter “Medicare Provider Utilization and Payment Data: Physician and Other Supplier” into your favorite search engine. The data are available in a single 1.6GB tab delimited text file or 12 separate Excel files.  

The most efficient method for accessing the data is using a SQL server database, but that is beyond my skillset and, I suspect, most readers of this article. I found it convenient to simply “link” the tab delimited text file as a table in Microsoft Access. I also imported the 12 Excel files directly into Access, but due to table size limitations in Access, I had to create four different tables, which makes querying more difficult. You then have to build a query to extract and parse the data you want to compare. Needless to say, this requires an understanding of how the data are organized and how to construct Access queries.  

Based on my review, while imperfect, I’ve found the complete dataset to be reasonably accurate and highly consistent with data I have from other internal systems. For example, unlike some of the early statements, it is absolutely possible to parse facility (ie, hospital-based) payments from global payments.

Another oft-expressed criticism is that the reported payment totals are significantly lower than what the commenter’s internal billing systems show, or what they expected. In my analysis, the dataset is reasonably accurate. While there are some errors and omissions, most of any perceived gap in reported payments occurs because the dataset does not include payments received from Medicare Advantage Plans or patient co-pays. This means that for most radiology practices, the Medicare dataset is likely to report only about 55% to 65% of a practice’s total Medicare-related payments. 

There is no shortage of inquiries into this dataset and interested person with moderate