Tech & Innovation Essentials

Technology-focused reference and activity data including web domains, US patents, GitHub repos, and IMEI type allocation codes

Overview

This product includes technology-focused reference and activity data centered around new innovations in technology.

Example topics covered include:

  • A repository of over 300M web domains cleaned and aggregated in a standardized format, including redirects and HTTP response status

  • GitHub stars, pull requests, and issues across users and repos

  • US patent grants and inventor information

  • IMEI Type Allocation Codes (TAC)

  • OpenAlex index of scholarly entities (e.g. works, sources, authors, funders, publishers) and how they are connected to one another

Data Sources, Attributes, Sample Queries

A detailed description of the data is available by source. Source pages include key attributes (e.g. geographic coverage, time granularity, history, entity level), release frequency, notes & methodologies, and sample queries.

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. All timeseries' dates and values that refer to the entity are included in a timeseries table. Descriptors of the timeseries are included in an attributes table. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details.

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.

Data Dictionary

📖pageData Dictionary

Releases & Changelog

1/26/24: Added open catalog of scholarly entities and how they are connected from OpenAlex

Added OpenAlex's catalog on scholarly entities and how they are connected to each other. Entities are defined as scholarly works (e.g. journal articles, books, theses), authors, sources, affiliated organizations, topics covered, publishers, and funders. This data is derived from a wide range of sources, offering an extensive overview of academic research and its contributors.

The data is available in the following tables:

  • OPENALEX_AUTHORS_INDEX

  • OPENALEX_CONCEPTS_INDEX

  • OPENALEX_FUNDERS_INDEX

  • OPENALEX_INSTITUTIONS_INDEX

  • OPENALEX_PUBLISHERS_INDEX

  • OPENALEX_SOURCES_INDEX

  • OPENALEX_WORKS_INDEX

12/14/23 - Added web domain redirect relationships and domain characteristics

Added two new tables DOMAIN_REDIRECT_RELATIONSHIPS and DOMAIN_CHARACTERISTICS:

  • DOMAIN_REDIRECT_RELATIONSHIPS includes the original web domain and the domain that it redirects to as well as the start and end dates that the redirect relationship was observed. At the time of this release, the table covers ~50K domains of the 300M+ domains Cybersyn tracks

  • DOMAIN_CHARACTERISTICS details which web domains are active/inactive based on the HTTP response status of the domain and whether the domain is the primary landing page or redirects to another domain. At the time of this release, the table covers 800K+ domains that Cybersyn tracks.

10/06/23 - Added in repository of web domains plus included GitHub Archive & US Patents tables

Cleaned and aggregated over 300M domains in a single source to track the list of websites globally into new domain_index table.

Added GitHub Archive and US Patents Grants tables to the product, rebranded product from "IMEI Type Allocation Codes" to "Tech & Innovation Essentials."

Disclaimer

The data in this dataset is sourced on the individual source pages. Links to provider licenses, terms and disclaimers are provided where appropriate.

Cybersyn is not endorsed by or affiliated with any of these providers. Contact support@cybersyn.com for questions.

Last updated

Copyright © 2024 Cybersyn