Domain Project, ICANN, Majestic
Overview
Domain Project is the world's largest public internet domains dataset.
ICANN's Centralized Zone Data Service provides zone files for top level domains (TLDs) via an online portal.
Majestic Million provides rankings for the top million domains with the most referring subnets.
Example topics covered:
300M+ cleaned web domains in a standardized format
Redirect domains and the start/end date for which the redirect was observed
Active/inactive HTTP responses status
Key Attributes
Geographic Coverage | Global |
Entity Level | Domain |
As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Notes
Cybersyn has cleaned and aggregated over 300M domains in a single source to track the list of websites globally. The domains are cleaned into a standardized format stripping away any protocols and subdomains (e.g., cybersyn.com) and include helpful reference columns such as the “core” domain (cybersyn) and the public suffix domain (com). For a subset of these domains, Cybersyn provides information on redirects including the redirect domain and the start/end dates for which the redirect relationship was observed. Details on whether a domain is active/inactive based on the HTTP response status and whether a domain is the primary landing page or redirects are also included.
Cybersyn periodically does GET requests for domains to determine the status response code received and any redirect destinations. Cybersyn periodically does GET requests for domains to determine the redirect destinations.
EAV Model: All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. All timeseries' dates and values that refer to the entity are included in a timeseries table. Descriptors of the timeseries are included in an attributes table. Data is joinable across all Cybersyn products that have a GEO_ID
. Refer to Cybersyn Concepts for more details.
Tables & Sources
Table | Source(s) |
---|---|
Cybersyn Products
Tables above are available in the following Cybersyn data products:
Examples & Sample Queries
Pull a list of websites with a specific domain
Screen for websites that are registered using the “.ai” suffix domain
Pull a list of active websites with a specific domain
Select only domains that use the ‘.ai’ top level domain and for which the most recent HTTP response check by Cybersyn was successful
Disclaimers
The data in this product is sourced from the following:
Domain Project: License; Copyright (c) 2020-2021, Bohdan Turkynewych All rights reserved.
ICANN
Majestic Million: License
Cybersyn is not endorsed by or affiliated with any of these providers. Contact support@cybersyn.com for questions.
Last updated