Mining the Medicare Physician Dataset

Twitter icon
Facebook icon
LinkedIn icon
e-mail icon
Google icon
 - Michael Bohl
Michael Bohl

On April 9, 2014, CMS released its Medicare Provider Utilization and Payment Data, a dataset representing an aggregation of 2012 volume, charge, and payment information by physician, by CPT code, by place of service—more than 9 million records. It is important that the radiology community not flatly reject these data, but instead recognize that the database contains meaningful information about us and our environment—and learn from the data. 

Medicare gleaned the data, including volume, charge, payment, and place of service information from its CY2012 National Claims History (NCH) Standard Analytic Files (SAFs). Provider demographics, including name, credentials, gender, address, and entity type information, came from the National Plan & Provider Enumeration System (NPPES).

Leading up to the data’s release, virtually all physician groups—including the ACR and the American Medical Association—opposed the release, but primarily out of privacy concerns and the lack of data context. On the day after the release, and for several days thereafter, we were treated to headlines about which provider(s) in our local area or state received the most payments. These stories made for interesting headlines, but little else.

Radiology Listservs and other social media were filled with critical comments, mostly along the lines of: nothing more than a data dump; data are not verifiable; anecdotal stories reveal lots of errors; payments are not broken out by PC and TC; so many qualifiers needed to make the information usable; and the data do not agree with what we see in our practice.

The unabridged version

Much of the early criticism likely was due to the commenters’ use of an online tool that was limited to returning a single aggregated payment total with little to no detail. From my perspective, the aggregated data, while great for generating headlines, are largely useless and misleading. Based on only the use of the aggregation tools, I share the critics’ opinions. 

However, CMS also made available the complete dataset containing more than 9 million records with 27 separate database elements including:

  • number of times a provider submitted each CPT code;
  • average charge each physician submitted by CPT code;
  • average Medicare allowed amount for each CPT code by physician;
  • average payment each physician received for each CPT code;
  • specialty each physician is registered as with Medicare;
  • billing address of each physician;
  • indication of whether the procedure was performed in a “facility” or “non-facility” location; and
  • POS code for each procedure

The dataset is available for download at the CMS.gov website.  Perhaps the easiest way to locate it is to enter “Medicare Provider Utilization and Payment Data: Physician and Other Supplier” into your favorite search engine. The data are available in a single 1.6GB tab delimited text file or 12 separate Excel files.  

The most efficient method for accessing the data is using a SQL server database, but that is beyond my skillset and, I suspect, most readers of this article. I found it convenient to simply “link” the tab delimited text file as a table in Microsoft Access. I also imported the 12 Excel files directly into Access, but due to table size limitations in Access, I had to create four different tables, which makes querying more difficult. You then have to build a query to extract and parse the data you want to compare. Needless to say, this requires an understanding of how the data are organized and how to construct Access queries.  

Based on my review, while imperfect, I’ve found the complete dataset to be reasonably accurate and highly consistent with data I have from other internal systems. For example, unlike some of the early statements, it is absolutely possible to parse facility (ie, hospital-based) payments from global payments.

Another oft-expressed criticism is that the reported payment totals are significantly lower than what the commenter’s internal billing systems show, or what they expected. In my analysis, the dataset is reasonably accurate. While there are some errors and omissions, most of any perceived gap in reported payments occurs because the dataset does not include payments received from Medicare Advantage Plans or patient co-pays. This means that for most radiology practices, the Medicare dataset is likely to report only about 55% to 65% of a practice’s total Medicare-related payments. 

There is no shortage of inquiries into this dataset and interested person with moderate to advanced Access/Excel skills could make. Some ideas follow.

  • Reconstruct one or several groups’ (of your choosing) fee schedules at the CPT code and practice levels.
  • Determine the multiple of Medicare each group uses when establishing their fees.
  • Explore and compare generalized practice patterns between groups or providers; for example, the percent of limited abdominal ultrasound versus complete abdominal ultrasound performed; or the distribution of with, without and with and without contrast studies.
  • Determine who is providing what services in your area and quantify and organize the results in presentable tables, charts, and graphs.
Table. Average Volume-weighted Charge as a Multiple of Medicare Allowable, by State, Territory
Courtesy of Michael Bohl.

Here is an illustration of how one could use the data. The table (left) shows the average volume weighted charge as a multiple of Medicare allowable by state/territory for radiologists practicing in hospitals. I limited the data to physicians registered with either a diagnostic or interventional radiologist or nuclear medicine specialty code practicing in hospitals. As you can see, the average charge-to-allowable multiple ranges from a low of just under three (Hawaii, Wyoming and Puerto Rico) to near seven and above (Wisconsin and Virgin Islands). The percentile distributions of charge multiple of Medicare allowed are as follows: 25th percentile = 3.7; median = 3.9; 75th percentile = 4.3; and the 90th percentile = 4.7. Could this be actionable information? Absolutely. 

Do the data have limitations? Yes. Do we have to be careful about how we use it? Yes. Can the data be misinterpreted or misapplied? Without a doubt. But these are limitations of any dataset. In the end, I can say that after having spent some time querying and analyzing the data, I know more about radiology in my region today than I did before, and I will be able to use it to support decisions going forward.

Michael Bohl is executive director, Radiology Group, PC, SC, Davenport, Iowa, and past-president of the Radiology Business Management Association.