Links
Comment on page

Tech & Innovation Essentials

Technology-focused reference and activity data, such as web domains, patents, and GitHub repos.

Overview

This product includes technology-focused reference and activity data centered around various new innovations in the tech space.
Example topics covered include:
  • A repository of over 300M web domains
  • Github repository events and stars
  • Patent applications and inventor information

Key Attributes

Geographic Coverage
Global
Entity Level
IMEI, Domain, GitHub repository, Contributor, Patent
Update Frequency
Depending on source; see table below

Description

Cybersyn has cleaned and aggregated over 300M domains in a single source to track the list of websites globally. The domains are cleaned into a standardized format stripping away any protocols and subdomains (e.g., cybersyn.com) and include helpful reference columns such as the “core” domain (cybersyn) and the public suffix domain (com).
The GitHub Archive dataset offers access to public GitHub activity, presenting a look into open-source developers' contributions to repositories. See more details here.
The USPTO patent data includes patent grants in the US with publications dating back to January 1976. See more details here.
IMEI Type Allocation Codes (TAC) Data is sourced from the Open Source Mobile Communications project which maintains the database mapping the TAC to brand and model names. The data covers approximately 6,000 unique models. It includes links to GSMArena for both brand and model when available.

Data Dictionary

Data Sources & Release Frequency

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Tables Names
Source
Source Schedule
domain_index
Daily at 7am ET
github_events github_repos github_stars
Daily at 11pm ET
uspto_patent_index uspto_contributor_index uspto_patent_contributor_relationships
Weekly - Tuesday
IMEI_tac_device
OSMOCOM
OSMOCOM updates the data when they receive updates from their community.

Streamlit Demos

Cybersyn builds Streamlit demos to visualize the data available in this product and provide a jumping off point.

Example Use Cases & Queries

Pull lists of websites Screen for websites that are registered using the “.ai” suffix domain
SELECT domain_id, core_domain, public_suffix_domain
FROM cybersyn.domain_index
WHERE public_suffix_domain = 'ai'
LIMIT 500;
Use Case: TAC of the Apple iPhone 13 Query the database to find the iPhone 13 TAC to see how many are connected to your network
SELECT TAC, BRAND_NAME, MODEL_NAME FROM
CYBERSYN.TAC_DEVICE
WHERE BRAND_NAME = 'Apple' AND MODEL_NAME = 'iPhone 13'
See here for sample queries for GitHub Archive and US Patent Grants.

Releases & Changelog

10/06/23 - Added in repository of web domains plus included GitHub Archive & US Patents tables
Cleaned and aggregated over 300M domains in a single source to track the list of websites globally into new domain_index table.
Added GitHub Archive and US Patents Grants tables to the product, rebranded product from "IMEI Type Allocation Codes" to "Tech & Innovation Essentials."

Disclaimer

The data in this dataset is sourced here. Links to provider licenses, terms and disclaimers are provided where appropriate:
Cybersyn is not endorsed by or affiliated with any of these providers. Contact [email protected] for questions.
Last modified 1mo ago