Comment on page
US Insurance & Healthcare Provider Foundation
Healthcare provider and insurance-related data including NPIs from the NPPES and benefits plans of all large US employers via Form 5500 filings
This product includes data on actively registered US healthcare providers and on the benefits plans (e.g. healthcare, medical, life insurance) of all large US employers. The dataset is well suited to serve as a spine for any healthcare provider analysis because of the unique NPI (National Provider Identifier) code that is used across HIPAA covered entities.
Examples topics covered:
- Insurance providers of specific companies
- Insurance carrier market penetration
- Registered healthcare provider specialty, location, email, and phone number
- Primary practice location of medical students
- NPI issuance, deactivation and reactivation dates
The healthcare providers data is sourced from the National Plan and Provider Enumeration System (NPPES) and the benefits data is sourced from US Department of Labor Form 5500 filings.
Geographic Coverage | United States |
Entity Level | Individual Provider, Company Sponsor |
Update Frequency | |
History | Form 5500 reports date back to January 2010; NPIs date back to 2005; Metadata on healthcare providers (e.g. names, type of provider, location, specialization) is included for providers actively registered with the NPPES and those that deactivated after August 1, 2023 |
All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company). Entities may have characteristics (i.e. descriptors of the entity) in an index table and values (i.e. statistics, measure) in a timeseries table. Data is joinable across all Cybersyn products that have a
GEO_ID
. Refer to Cybersyn Concepts for more details.The healthcare providers data includes the names, licenses, addresses, specialties, business, and practice locations of all healthcare providers actively registered with NPPES and those that deactivated after August 1 2023. It also includes NPI issuance, deactivation and reactivation dates as applicable dating back to 2005. The NPPES is a governing body created by the Centers for Medicare and Medicaid Services (CMS) to issue mandatory National Provider Identifier (NPI) codes to all healthcare providers in the United States. NPI codes are unique 10-digit numbers used to identify healthcare providers. Providers include both individual practitioner (e.g. an individual doctor, nurse) and organization (hospital clinic, doctors office) that have a NPI. The administrative simplification provisions of HIPAA require all covered healthcare providers, plans and clearinghouses use NPIs in administrative and financial transactions (e.g. claim payments, referrals, diagnosis coding, etc.). Providers are classified along a 3-level taxonomy from the National Uniform Claim Committee (NUCC) that includes Provider Grouping, Classification, and Area of Specialization. If a provider is no longer practicing, they are responsible for manually deactivating their NPI with the NPPES.
The Department of Labor requires all large US employers that provide benefits to complete Form 5500. The Form 5500 filing includes basic firmographic data around the sponsor (employer) and the benefit recipients (employees) as well as the benefits plans if any. The benefits data includes information about the type of plan (i.e. medical, healthcare, annuity), the carrier (i.e. insurance company), and select contractual information such as payments made to brokers and the effective, and expiration dates.
Additional form types, schedules, and data sources around medical claims, providers and drug codes may be covered in the future. Email [email protected] for specific requests.
As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Table Names | Source | Source Schedule |
---|---|---|
US_DEPARTMENT_OF_LABOR_FORM_5500_POLICY_INDEX US_DEPARTMENT_OF_LABOR_FORM_5500_FILING_INDEX
US_DEPARTMENT_OF_LABOR_FORM_5500_BROKER_INDEX | Monthly - within the first week | |
NPPES_NUCC_TAXONOMY
NPPES_NUCC_MEDICARE_TAXONOMY_CROSSWALK
| Bi-annually - January, July | |
NPPES_ORGANIZATION_ATTRIBUTES
NPPES_PRACTITIONER_ATTRIBUTES
NPPES_PROVIDER_ADDRESSES
NPPES_PROVIDER_LICENSE_NUMBERS NPPES_NPI_INDEX
| Weekly - Monday between 4am-7am ET | |
GEOGRAPHY_HIERARCHY
GEOGRAPHY_INDEX
GEOGRAPHY_RELATIONSHIPS | Data Commons is an aggregator of government data sources. Release calendars vary by underlying source. The US Census Bureau publishes datasets about the US people and it’s economy, release schedules vary by dataset. |
Sole proprietors are included in both the
nppes_organization_attributes
and the nppes_practitioner_attributes
tables because they are counted as both an individual practitioner as well as an organization. Sole proprietors are flagged in the organization table with a boolean indicator in the is_sole_proprietor
column.- Practitioner - an individual who has a NPI (e.g. doctors, physicians, nurse practitioners)
- Organization - business with an NPI (e.g. hospital, clinic, doctor's office). Note that when a practitioner is a sole proprietor (e.g. she has her own doctor's office) then the person is considered both a practitioner and an organization.
- Provider - broad term for any practitioner or organization that has an NPI
The
nppes_npi_index
contains information on when NPIs were first issued, deactivated, or reactivated - dating back to 2005. This table also includes a boolean flag to indicate if an NPI is currently active.While all NPIs appear in the
nppes_npi_index
table, only actively registered NPIs as well as NPIs deactivated after August 1, 2023 appear in the nppes_practitioner_attributes
and nppes_organization_attributes
tables. This means the dataset does not include attribute-level data (names, type of providers, specialization) on providers with NPIs deactivated before August 2023.In the
nppes_provider_addresses
table, Cybersyn adds a geo_id
identifier to standardize geography names and map geographies across various datasets. These geo_ids
can be joined to the geography_relationships
table to move from one level of geography to another (e.g., map zip code to county). Each provider can be linked to more than one address.Cybersyn builds Streamlit demos to visualize the data available in this product and provide a jumping off point.
Practitioners by speciality or location
List the names and addresses of all dentists in NYC.
SELECT DISTINCT
practitioner.last_name,
practitioner.first_name,
address.address_first_line,
address.city,
address.zip_code
FROM cybersyn.nppes_practitioner_attributes AS practitioner
INNER JOIN cybersyn.nppes_nucc_taxonomy AS taxonomy
ON (taxonomy.npi = practitioner.npi)
INNER JOIN cybersyn.nppes_provider_addresses AS address
ON (address.npi = taxonomy.npi)
INNER JOIN cybersyn.geography_index AS geo
ON (geo.geo_id = address.geo_id_city)
WHERE
taxonomy.level_1_grouping = 'Dental Providers'
AND taxonomy.is_primary_taxonomy = TRUE
AND address.address_type = 'Primary Practice'
AND geo.geo_name ILIKE 'NEW YORK'
ORDER BY address.zip_code;
Names and primary practice location of medical students
Find the names and primary practice location of all medical students in the United States.
SELECT
practitioner.npi,
practitioner.last_name,
practitioner.first_name,
address.address_first_line,
address.city,
address.state,
address.zip_code
FROM cybersyn.nppes_practitioner_attributes AS practitioner
INNER JOIN cybersyn.nppes_nucc_taxonomy AS taxonomy
ON (taxonomy.npi = practitioner.npi)
INNER JOIN cybersyn.nppes_provider_addresses AS address
ON (address.npi = taxonomy.npi)
WHERE
address.address_type = 'Primary Practice'
AND taxonomy.is_primary_taxonomy = TRUE
AND taxonomy.level_1_grouping = 'Student, Health Care';
Hospitals with a particular specialty
Find all names, NPI, and taxonomy codes for healthcare organizations that specialize in oncology.
SELECT
org.organization_name,
org.npi,
tax.nucc_taxonomy_code,
tax.level_2_classification AS classification,
tax.level_3_specialization AS specialization
FROM cybersyn.nppes_nucc_taxonomy AS tax
INNER JOIN cybersyn.nppes_organization_attributes AS org
ON (org.npi = tax.npi)
WHERE
org.is_sole_proprietor = FALSE
AND tax.is_primary_taxonomy = TRUE
AND tax.level_3_specialization ILIKE '%oncology%';
Insurance carrier penetration by industry, market, and type of insurance
Show the top life insurance providers in Florida that serve companies in the Educational Services industry.
SELECT
insurance_carrier_name,
COUNT(DISTINCT sponsor_ein) AS count_sponsors,
SUM(employees_covered_eoy) AS count_covered_employees
FROM cybersyn.us_department_of_labor_form_5500_policy_idx AS policy
JOIN cybersyn.us_department_of_labor_form_5500_filing_idx AS filing
ON (policy.ack_id = filing.ack_id)
WHERE YEAR(policy.policy_end_date) = 2021
AND filing.sponsor_state = 'FL'
AND ARRAY_CONTAINS('Life Insurance'::VARIANT, insurance_types)
AND filing.sponsor_naics_description = 'Educational Services'
GROUP BY insurance_carrier_name
ORDER BY count_covered_employees DESC NULLS LAST;
Companies using specific insurance providers
Find Texas-based companies offering Bluecross Blueshield health insurance.
SELECT
sponsor_name,
sponsor_ein,
SUM(employees_covered_eoy) AS count_covered_employees
FROM cybersyn.us_department_of_labor_form_5500_policy_idx AS policy
JOIN cybersyn.us_department_of_labor_form_5500_filing_idx AS filing
ON (policy.ack_id = filing.ack_id)
WHERE YEAR(policy.policy_end_date) = 2021
AND filing.sponsor_state = 'TX'
AND ARRAY_CONTAINS('Health Insurance'::VARIANT, insurance_types)
AND insurance_carrier_name ILIKE 'BLUECROSS BLUESHIELD%'
GROUP BY sponsor_name, sponsor_ein
ORDER BY count_covered_employees DESC NULLS LAST
LIMIT 500;
Expanded the US Department of Labor data to include information found on Form 5500 Schedule A Part 1. The new table,
us_department_of_labor_form_5500_broker_index
, provides commission and fee amounts received by a broker for an insurance policy. Additional information about the brokers in the table includes their address, classification as an insurance broker, as well as notes pertaining to the compensation disbursed to them.
The
us_department_of_labor_form_5500_broker_index
can be joined to insurance carrier and policy information to individual Form 5500 filings, using INSURANCE_POLICY_ID
and ACK_ID
.9/15/23 - Added healthcare provider emails; combined
TELEPHONE
and TELEPHONE_EXTENSION
into one field; changed TELEPHONE
to array to accommodate numerous values- Added
EMAIL
field to thenppes_provider_addresses
table with provider emails per address. - Combined
TELEPHONE
andTELEPHONE_EXTENSION
from thenppes_provider_addresses
table into a single field,TELEPHONE
, and removed theTELEPHONE_EXTENSION
field. - Aggregated all values for
TELEPHONE
,FAX
, andEMAIL
that are associated with the same NPI and address into arrays in one row. Rows in thenppes_provider_addresses
table are now uniquely defined by NPI and full address.
Added the
NPPES_PROVIDER_TAXONOMY_AND_LICENSE_NUMBERS
table that relates taxonomy classifications to practitioners’ license numbers. This table provides users the ability to filter for license numbers based on practitioners’ primary taxonomy.- Added new fields to
us_department_of_labor_form_5500_filing_index
with Form 5500 contact information including name and phone numbers:ADMIN_SIGNED_NAME
,SPONSOR_SIGNED_NAME
,DIRECT_FILING_ENTITY_SIGNED_NAME
,ADMIN_PHONE_NUM
andSPONSOR_DIRECT_FILING_ENTITY_PHONE_NUM
.
- Added new fields to
us_department_of_labor_form_5500_policy_index
with payments to agents and brokers:COMMISSIONS_PAID_TO_BROKER
andFEES_PAID_TO_BROKER
.
- Removed ~10k rows from
us_department_of_labor_form_5500_policy_index
that had NULL values for each field as they filed no data around insurance policies from Form 5500 Schedule A.
Added a new table,
nppes_npi_index
, that contains information on when NPIs were first issued, deactivated, or reactivated - dating back to 2005. This table also includes a boolean flag to indicate if an NPI is currently active.Note that while all NPIs appear in the
nppes_npi_index
table, only actively registered NPIs as well as NPIs deactivated after August 1, 2023 appear in the nppes_practitioner_attributes
and nppes_organization_attributes
tables. This means the dataset does not include attribute-level data (names, type of providers, specialization) on providers with NPIs deactivated before August 2023.The data in this dataset is sourced here. Links to provider terms and disclaimers are provided where appropriate:
Cybersyn is not endorsed or affiliated with any of these providers. Contact [email protected] for questions.
Last modified 1mo ago