Links

URL Parsing Functions

Toolbox of utilities for parsing parts of domains and URLs based on user-defined functions (UDFs)

Overview

Cybersyn URL Parsing Functions is a Snowflake Native App that includes a library of functions to help clean and process URLs directly in Snowflake SQL. The user-defined functions (UDFs) allow users to easily extract website parts from URLs within Snowflake queries, which is particularly useful for data cleaning and analysis tasks related to clickstream and web traffic data.
Each UDF takes a single parameter, URL, which should be a string.

Functions

Function
Text
GETDOMAIN(url)
Returns the domain with the suffix or top level domain without any URL parameters or subdomains.
GETDOMAINONLY(url)
Returns only the core domain text without any subdomain, parameters, suffix or top level domain.
GETPUBLICSUFFIXDOMAINONLY(url)
Returns only the public suffix domain. Public suffixes include suffixes under which users or companies can register domains.
GETSUBDOMAIN(url)
Returns the domain in addition to the subdomain. The subdomain “www” is excluded from results.
GETSUBDOMAINONLY(url)
Returns only the subdomain portion of the domain.
Internally, Cybersyn uses these functions to extract portions of URLs to more accurately and consistently parse and match domains across datasets. Use these functions to clean your own data or to easily join your internal data to Cybersyn’s web data.

Sample Queries

Parse the domain details for the following url: https://chat.openai.com/?model=gpt-4-browsing
-- GETDOMAIN(https://chat.openai.com/?model=gpt-4-browsing) - openai.com
-- GETDOMAINONLY(https://chat.openai.com/?model=gpt-4-browsing) - openai
-- GETPUBLICSUFFIXDOMAINONLY(https://chat.openai.com/?model=gpt-4-browsing) - com
-- GETSUBDOMAIN(https://chat.openai.com/?model=gpt-4-browsing) - chat.openai.com
-- GETSUBDOMAINONLY(https://chat.openai.com/?model=gpt-4-browsing) - chat

Releases & Changelog

No updates at this time.

Disclaimer

This function is built off of a library licensed from here.
Cybersyn is not endorsed by or affiliated with any of these developers. Contact [email protected] for questions.
Last modified 2mo ago