Skip to main content

USPTO

See on Snowflake

Overview

This product includes patent applications and grants from the United States Patent & Trademark Office (USPTO) with publications dating back to January 1976. The raw metadata text (including the abstract text) of each patent makes the listing a good training set for LLMs.

Example topics covered:

  • Date of granted patent application
  • Patent type
  • Invention title
  • Contributor name and location

Key Attributes

Geographic CoverageUnited States
Entity LevelContributor, Patent
Time GranularityDaily
Release FrequencyWeekly - Tuesday
HistorySince January 1976

Description

Patents and Contributors are the two core entities in US Patent Grants. The Patent entity represents individual documents for a granted patent. The Contributor entity refers to individual persons or organizations who are affiliated with the patent. Relationships between the Patent entity and the Contributor entity are tracked in the corresponding relationships table (a many to many relationship).

Each Contributor entity includes characteristics around its type (individual or organizational) and its address. Each Contributor entity is identified by a Cybersyn created identifier (contributor_id) that also serves as the unique identifier for this entity type. A Contributor entity can refer to either an applicant, inventor, or assignee (an individual or organization with ownership interest) of a Patent. The uspto_patent_contributor_relationships table describes how a Contributor relates to a specific Patent entity.

Each Patent entity includes characteristics around its application and publication filing dates as well as the category (CPC) that describes the type of invention it claims. Each entity can be uniquely identified using the patent_id. The patent_id field can also be used to locate the patent in the USPTO PPUBS search portal and the patent_extended_id can be used to locate the patent in Google Patents.

This product contains both metadata and text of each granted patent filing. The metadata includes document level metadata and corresponding contributor level metadata. At the Patent level, metadata includes ID’s that are searchable in both the USPTO patent search portal as well as Google Patents. These ID’s can be used to obtain the PDF versions of the patent documents via the USPTO portal. The metadata also includes invention titles, patent types (e.g. utility, design, reissue), dates of application and publication, and the number of claims granted to the patent. Cooperative Patent Classification (CPC) system hierarchical information, a common international standard for patent classification, is also included when available. Note that the amount of text available in each patent will vary depending on the patent type. This listing includes no images or design drawings. On the Contributor level, metadata includes contributor name, type of contributor (i.e organization or individual), and geographic information (i.e. country, state, and city) as well as the geography entity identifiers for each location (joinable to other Cybersyn datasets).

Notes & Methodology

Patent identifier

The unique identifiers of the uspto_patent_index table, patent_id and patent_extended_id, are generated using a combination of the document number, as well as a number of other metadata properties in the data such as patent issue dates. The ID and extended_ID are designed to be searchable in USPTO Patent Public Search and Google Patent search, respectively.

Due to changes in patent document numbers and classifications over time, there may be cases where there is an imperfect match between Cybersyn’s identifier and the lookup capabilities of the USPTO public search and Google Patent search.

Sample Queries

Patents assigned to specific corporations

Find all patents where Nvidia is the designated assignee.

SELECT * FROM cybersyn.uspto_contributor_index AS contributor_index
INNER JOIN
cybersyn.uspto_patent_contributor_relationships AS relationships
ON (contributor_index.contributor_id = relationships.contributor_id)
INNER JOIN
cybersyn.uspto_patent_index AS patent_index
ON relationships.patent_id = patent_index.patent_id
WHERE contributor_index.contributor_name ILIKE 'NVIDIA CORPORATION'
AND relationships.contribution_type = 'Assignee - United States Company Or Corporation';

Patents by contributor

Search for patents by Steven P. Jobs.

SELECT * FROM cybersyn.uspto_contributor_index AS contributor_index
INNER JOIN
cybersyn.uspto_patent_contributor_relationships AS relationships
ON (contributor_index.contributor_id = relationships.contributor_id)
INNER JOIN
cybersyn.uspto_patent_index AS patent_index
ON relationships.patent_id = patent_index.patent_id
WHERE contributor_index.contributor_name ILIKE 'Steven P. Jobs';

Patents by associated keywords

Find all patents that include OLED in the title.

SELECT * FROM cybersyn.uspto_patent_index WHERE invention_title ILIKE ANY ('%OLED%');

Disclaimers

The data in this product is sourced from the United States Patent and Trademark Office (USPTO).

Cybersyn is not endorsed by or affiliated with any of these providers. Contact snowflake-public-data@snowflake.com for questions.