Domain Project, ICANN, Majestic

Overview

Domain Project is the world's largest public internet domains dataset.

ICANN's Centralized Zone Data Service provides zone files for top level domains (TLDs) via an online portal.

Majestic Million provides rankings for the top million domains with the most referring subnets.

Example topics covered:

  • 300M+ cleaned web domains in a standardized format

  • Redirect domains and the start/end date for which the redirect was observed

  • Active/inactive HTTP responses status

Key Attributes

Geographic Coverage

Global

Entity Level

Domain

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.

Notes

Cybersyn has cleaned and aggregated over 300M domains in a single source to track the list of websites globally. The domains are cleaned into a standardized format stripping away any protocols and subdomains (e.g., cybersyn.com) and include helpful reference columns such as the “core” domain (cybersyn) and the public suffix domain (com). For a subset of these domains, Cybersyn provides information on redirects including the redirect domain and the start/end dates for which the redirect relationship was observed. Details on whether a domain is active/inactive based on the HTTP response status and whether a domain is the primary landing page or redirects are also included.

Cybersyn periodically does GET requests for domains to determine the status response code received and any redirect destinations. Cybersyn periodically does GET requests for domains to determine the redirect destinations.

EAV Model: All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. All timeseries' dates and values that refer to the entity are included in a timeseries table. Descriptors of the timeseries are included in an attributes table. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details.

Tables & Sources

TableSource(s)

DOMAIN_INDEX DOMAIN_CHARACTERISTICS DOMAIN_REDIRECT_RELATIONSHIPS

Cybersyn Products

Tables above are available in the following Cybersyn data products:

Examples & Sample Queries

Pull a list of websites with a specific domain

Screen for websites that are registered using the “.ai” suffix domain

SELECT domain_id, core_domain, public_suffix_domain
FROM cybersyn.domain_index
WHERE public_suffix_domain = 'ai'
LIMIT 500;

Pull a list of active websites with a specific domain

Select only domains that use the ‘.ai’ top level domain and for which the most recent HTTP response check by Cybersyn was successful

SELECT domain_id
FROM cybersyn.domain_characteristics
WHERE domain_id ILIKE '%.ai'
  AND relationship_type = 'successful_http_response_status'
  AND value = 'true'
  AND relationship_end_date IS NULL;

Disclaimers

The data in this product is sourced from the following:

  • Domain Project: License; Copyright (c) 2020-2021, Bohdan Turkynewych All rights reserved.

  • ICANN

  • Majestic Million: License

Cybersyn is not endorsed by or affiliated with any of these providers. Contact support@cybersyn.com for questions.

Last updated

Copyright © 2024 Cybersyn