Web Traffic Foundation
Projected web traffic metrics
Web Traffic Foundation is an experimental product currently in alpha testing. Metrics from the past 7 weeks may change slightly as more sample data is added.
Overview
Web Traffic Foundation provides projected web traffic metrics for the top ~65,000 domains globally.
Metrics include weekly:
Sessions
Pageviews
Users
Key Attributes
Geographic Coverage | Global |
Entity Level | Domain (e.g. google.com) |
Time Granularity | Weekly - see visual for more details |
Update Frequency | Weekly on Thursdays |
History | January 1, 2021 - Present |
Lag | 4 day lag - see visual for more details |
Description
All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. URL domain, company, geography) that Cybersyn provides data on. Entities may have characteristics (i.e. descriptors of the entity) in an index table and values (i.e. statistics) that refer to an entity and a date in a timeseries table. Descriptors of a timeseries are included in an attributes table. Refer to Cybersyn Concepts for more details.
The WEBTRAFFIC_SYNDICATE_TIMESERIES
table provides projected values for weekly sessions, pageviews, and users by domain over time.
The WEBTRAFFIC_SYNDICATE_ATTRIBUTES
table provides a wide format of each variable. Fields include MEASURE
, UNIT
, FREQUENCY
, DEVICE
, and MODEL_VERSION
.
The DOMAIN_INDEX
table includes a repository of over 300M domains cleaned and aggregated in a standardized format. The DOMAIN_ID
strips away any subdomain (e.g. www) and protocol (e.g. https) information.
The COMPANY_DOMAIN_RELATIONSHIPS
table serves as a mapping between companies and the domains that they own. The table maps DOMAIN_ID
to COMPANY_ID
and COMPANY_NAME
which can be tied back to CIK, LEI, EIN, and company-level PermID information. All Cybersyn datasets that include Company entities use the COMPANY_ID
field as the unique ID for the Company, allowing users to join as needed.
The COMPANY_INDEX
table serves as the spine for Cybersyn data that involves company-level identifiers. This table is a list of ~100K public and private companies aggregated from the Securities and Exchange Commission (SEC), Refinitiv, the Global Legal Entity Identifier Foundation (GLEIF), and the IRS. Each of these sources have their own unique identifier for companies (EIN, CIK, LEI, PermID) and Cybersyn maps these IDs together to allow users to join datasets using common unique identifiers. All Cybersyn datasets that include Company entities use the COMPANY_ID
field as the unique ID for the Company.
The COMPANY_CHARACTERISTICS
table serves as a compliment to the COMPANY_INDEX
table. This table includes a unique ID for each company (COMPANY_ID
) and associated categorical characteristics: address, legal structure, previous names, SEC industry group, EIN, CIK, LEI, PermID, Refinitiv business sector and industry code/description, and SIC code/description. A characteristic may be temporal with start and end dates indicating the range for which the data is valid.
Data Dictionary
📖pageData DictionaryEntity Relationship Diagram
Notes & Methodology
Updates to Data - Experimental Product
Web Traffic Foundation is an experimental product currently in alpha testing. Metrics from the past 7 weeks may change slightly as more sample data is added. Beyond the most recent 7 weeks, the output of each model should not change, though we reserve the right to change data with each release while the product is in alpha testing.
Domains Included
Today, web traffic estimates for ~65,000 domain URLS (e.g. google.com) are included. This number will continue to grow as we improve our data and methodologies.
Subdomain estimate (e.g. maps.google.com) are not currently available but will be added in future releases.
Estimate Accuracy
We use an out-of-sample domain (“ground truth”) set to calculate accuracy metrics:
Average Correlation: For larger domains (> 250k users per week), our estimates have an average correlation of ~60% with ground truth. For smaller domains (< 250k users per week), our estimates have an average correlation of ~50% with ground truth.
Mean Absolute Percent Error for Nominal User Counts (MAPE): For larger domains (> 250k users per week), our estimates have an average percent error of ~40% compared with ground truth. For smaller domains (< 250k users per week), our estimates have an average percent error of ~70% compared with ground truth.
Weekly Aggregations
Data is provided at the weekly level. Each date represents the week ending on that date (always a Sunday). For example, 1/14/24 represents data from 1/8/24 - 1/14/24. Click here for a visual of the release timeline. Monthly and daily aggregations, including monthly active users (MAUs) and daily active users (DAUs), will be added in future releases.
Makeup of User Panel
The panel of web traffic users is primarily sourced from desktop users. Domains that skew heavily mobile may be underrepresented in the panel. The user panel does not include users from China. As a result, the largest domains from China are largely excluded.
User-level Metrics
Measures for “Users” are meant to represent the number of unique active users for the given time period set in the FREQUENCY
field.
Model Versions
A “model version” is included to create transparency in estimates and changes in methodologies. As models improve, this will give users the ability to evaluate how predictions have changed over time. Additionally, customers can choose to use a previous model version to limit any impacts from data changes outputs that rely on the data.
Numerous model versions will be published in parallel as new methodologies are developed. More recently timestamped model versions will feature incremental improvements over older model versions.
Release Timeline
Example Use Cases & Queries
Benchmark company web traffic metrics against industry peers
Compare weekly users of airbnb.com relative to vrbo.com.
Releases & Changelog
Errata & Future Improvements
We note known issues and planned future improvements. If you would like to submit a bug report or feature request, email us at support@cybersyn.com
Terms
Customers are subject to the Cybersyn terms of service.
Last updated