What is an NPI?
National Provider Identifiers (NPIs) are used for uniquely identifying healthcare providers across a wide of variety of healthcare applications such as billing and payment, compliance, personal health records, and healthcare eligibility and enrollment. NPIs are publicly available and can be searched for in the CMS National Plan and Provider Enumeration System (NPPES). Any health care provider who transmits any health information in electronic form is registered through CMS and will have an NPI. There are two types of provider entities that are assigned NPIs:
- Type 1: Individual. These identify people, such as physicians, nurses, dentists and technicians. For example, Jane Doeblin, an OB/GYN in Rochester NY, has the NPI 1780647248
- Type 2: Organizational. These identify anything that is not a person, such as a hospital, group, laboratory, or nursing home. For example, the Mount Sinai hospital at 1468 Madison Ave has the NPI 1629410592
Why may they not be valid?
Invalid NPIs can occur frequently due to a mix of reasons depending on the source of the data:
- Typo / human error due to manual entry
- Unique identifier that is not an NPI, e.g. license number or internal unique identifiers for a specific organization
- Leakage of other non-unique information e.g. taxonomy code or other billing related codes
- General data uncleanlieness: for example, erroneous string padding, missings, bad defaults, etc
Because NPIs are used in so many places, it may often be necessary to check that an NPI is valid to identify usable records or enforce form validation.
Method 1: Dataset comparison
The most reliable way to identify valid NPIs is using the NPPES dataset directly, which can be downloaded as a full replacement or incremental file from their website
This lends itself well to databases, for example:
select a.npi
, nppes.npi is not null as is_valid_npi
, nppes.entity_type_code
, ...
from my_dataset as a
left join nppes
on a.npi = nppes.npi
Pros | Cons |
---|---|
|
|
Method 2: Simple validation
A more programmatic approach is validating the identifier based on the definition. From CMS:
An NPI is a 10 digit numerical identifier for providers of health care services. It is national in scope and unique to the provider… The number itself is not a “smart” number, i.e., there is no intelligence built into it
In addition, all NPIs must start with 1 or 2. We can construct a simple function that can check quickly that the identifier meets these requirements:
def is_npi_valid(npi: str) -> bool:
return all([
bool(npi),
npi.isdigit(),
len(npi) == 10,
npi[0] in ("1", "2"),
])
Pros | Cons |
---|---|
|
|
Method 3: Check digit validation
While NPI is not a smart identifier and is randomly generated (eg not incremental - new NPIs assigned via a scattering algorithm), there is a built-in check digit using Luhn’s algorithm that can be used to validate the number, similar to a credit card number. More information about how CMS uses this can be found here
Using a simplified version of Luhn’s (input is 10-digit and an NPI) in python could look like:
def is_valid_npi(npi: str) -> bool:
if len(npi) != 10:
return False
ret = {0:0,1:2,2:4,3:6,4:8,5:1,6:3,7:5,8:7,9:9}
s = 24 + sum(int(d) for d in npi[-3::-2]) + sum(ret[int(d)] for d in npi[-2::-2])
return 9*s % 10 == int(npi[-1])
Pros | Cons |
---|---|
|
|
Summary
Three approaches were presented for NPI validation, along with simple python and sql implementations and the pros/cons of each. Selecting the method best for your use case will depend highly on:
- Point in time of validation: Are you doing form validation, or just cleaning up data inside your database? How fast does the validation need to be?
- Cleanliness of NPIs: How often are NPIs invalid? Do you know why, and is it a problem you created or that you can control? Are most invalid NPIs 9-digits? Do most contain alphabetical characters?
- Potential consequences of false positives: What happens if you have an invalid NPI in your final dataset? Do you care? Do downstream processes require or assume NPI to be unique?
- Desired complexity & scale: How much infrastructure and time are you willing to put into this issue? How many NPIs do you have, how often will you receive new ones, do you needs to check that the NPI is both valid and active?