USPTO

Text and metadata for patents granted by the United States Patent and Trademark Office

Overview

This product includes patent grants from the United States Patent & Trademark Office (USPTO) with publications dating back to January 1976. The raw metadata text (including the abstract text) of each patent makes the listing a good training set for LLMs.

Example topics covered:

  • Date of granted patent application

  • Patent type

  • Invention title

  • Contributor name and location

Key Attributes

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.

Description

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. Index tables contain all entities of a certain type. Timeseries tables contain all timeseries' dates and values that refer to an entity type. Additional tables, such as the relationships table and attributes table, are used to describe the entities and timeseries. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details

Patents and Contributors are the two core entities in US Patent Grants. The Patent entity represents individual documents for a granted patent. The Contributor entity refers to individual persons or organizations who are affiliated with the patent. Relationships between the Patent entity and the Contributor entity are tracked in the corresponding relationships table (a many to many relationship).

Each Contributor entity includes characteristics around its type (individual or organizational) and its address. Each Contributor entity is identified by a Cybersyn created identifier (contributor_id) that also serves as the unique identifier for this entity type. A Contributor entity can refer to either an applicant, inventor, or assignee (an individual or organization with ownership interest) of a Patent. The uspto_patent_contributor_relationships table describes how a Contributor relates to a specific Patent entity.

Each Patent entity includes characteristics around its application and publication filing dates as well as the category (CPC) that describes the type of invention it claims. Each entity can be uniquely identified using the patent_id. The patent_id field can also be used to locate the patent in the USPTO PPUBS search portal and the patent_extended_id can be used to locate the patent in Google Patents.

This product contains both metadata and text of each granted patent filing. The metadata includes document level metadata and corresponding contributor level metadata. At the Patent level, metadata includes ID’s that are searchable in both the USPTO patent search portal as well as Google Patents. These ID’s can be used to obtain the PDF versions of the patent documents via the USPTO portal. The metadata also includes invention titles, patent types (e.g. utility, design, reissue), dates of application and publication, and the number of claims granted to the patent. Cooperative Patent Classification (CPC) system hierarchical information, a common international standard for patent classification, is also included when available. Note that the amount of text available in each patent will vary depending on the patent type. This listing includes no images or design drawings. On the Contributor level, metadata includes contributor name, type of contributor (i.e organization or individual), and geographic information (i.e. country, state, and city) as well as the geography entity identifiers for each location (joinable to other Cybersyn datasets).

Data Dictionary

Notes & Methodology

Patent identifier

The unique identifiers of the uspto_patent_index table, patent_id and patent_extended_id, are generated using a combination of the document number, as well as a number of other metadata properties in the data such as patent issue dates. The ID and extended_ID are designed to be searchable in USPTO Patent Public Search and Google Patent search, respectively.

Due to changes in patent document numbers and classifications over time, there may be cases where there is an imperfect match between Cybersyn’s identifier and the lookup capabilities of the USPTO public search and Google Patent search.

Examples & Sample Queries

Patents assigned to specific corporations

Find all patents where Nvidia is the designated assignee.

SELECT * FROM cybersyn.uspto_contributor_index AS contributor_index
INNER JOIN
    cybersyn.uspto_patent_contributor_relationships AS relationships
    ON (contributor_index.contributor_id = relationships.contributor_id)
INNER JOIN
    cybersyn.uspto_patent_index AS patent_index
    ON relationships.patent_id = patent_index.patent_id
WHERE contributor_index.contributor_name ILIKE 'NVIDIA CORPORATION'
AND relationships.contribution_type = 'Assignee - United States Company Or Corporation';

Patents by contributor

Search for patents by Steven P. Jobs.

SELECT * FROM cybersyn.uspto_contributor_index AS contributor_index
INNER JOIN
    cybersyn.uspto_patent_contributor_relationships AS relationships
    ON (contributor_index.contributor_id = relationships.contributor_id)
INNER JOIN
    cybersyn.uspto_patent_index AS patent_index
    ON relationships.patent_id = patent_index.patent_id
WHERE contributor_index.contributor_name ILIKE 'Steven P. Jobs';

Patents by associated keywords

Find all patents that include OLED in the title.

SELECT * FROM cybersyn.uspto_patent_index WHERE invention_title ILIKE ANY ('%OLED%');

Releases & Changelog

There are no updates at this time.

Disclaimers

The data in this product is sourced from the United States Patent and Trademark Office (USPTO).

Cybersyn is not endorsed by or affiliated with any of these providers. Contact support@cybersyn.com for questions.

Last updated

Copyright Β© 2024 Cybersyn