top of page
Search

Sequences in Snowflake - Unsung Data Integrity Heroes

Writer: Tim BurnsTim Burns

Updated: Feb 21, 2023

Sequence Basics

Sequences are the unsung heroes of a data application. I like to think about sequences in two forms:

  • Surrogate keys - the auto number sequences assigned to a table to track rows

  • Natural keys (IDs) - sequences that form groupings of business logic

Surrogate keys

Surrogate keys are the backbone of the data warehouse. This article by Robert Kimball forms the philosophical basis of surrogate keys.


The key takeaways on surrogate keys are as follows:

  • They are the foundation of the data warehouse

  • Create them as anonymous integers using an automatic number sequence

  • They protect you from changes in the incoming data as it changes over time

  • They are responsible for the data integrity of the warehouse

In snowflake, the best habit is to assign a surrogate key to every table as an automated row increment and give it the KEY suffix along with the table name to emphasize its attachment to the data in the table.


So if I created a dimensional table on my New Music for Data Analytics sandbox to contain the hosts for each show in the radio station, the table name would be WAREHOUSE.DIM_HOST and the primary Surrogate key is DIM_HOST_KEY.

CREATE TABLE IF NOT EXISTS WAREHOUSE.DIM_HOST
(
    DIM_HOST_KEY   INT PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1,
    HOST_ID        INT     NOT NULL,
    NAME           VARCHAR NULL,
    URI            VARCHAR NULL,
    IMAGE_URI      VARCHAR NULL,

    THUMBNAIL_URI  VARCHAR NULL,
    IS_ACTIVE      BOOLEAN NULL,
    DW_ACTIVE      BOOLEAN          DEFAULT TRUE NOT NULL,
    DW_FROM_DATE   TIMESTAMPLTZ,
    DW_TO_DATE     TIMESTAMPLTZ     DEFAULT CAST('2099-12-31 00:00:00' AS TIMESTAMP_LTZ(9)) NOT NULL,
    DW_CREATE_DATE TIMESTAMPTZ      DEFAULT CURRENT_TIMESTAMP(),
    DW_CREATE_USER VARCHAR NOT NULL DEFAULT CURRENT_USER(),
    DW_UPDATE_DATE TIMESTAMPTZ      DEFAULT CURRENT_TIMESTAMP(),
    DW_UPDATE_USER VARCHAR NOT NULL DEFAULT CURRENT_USER()
);

Natural Keys (IDs)

Natural keys create a meaningful way to identify records. Again, the best practice is to use these values as integers. It should have the NOT NULL restriction on the table to prevent a row without meaningful identification from entering the warehouse. For the KEXP app, these keys are created by the incoming data feed.


Sequence Syntax

Sometimes we need to create sequences that don't necessarily fall into a simple auto-increment pattern.


For this example, I have an analytics table where the auto-increment took the order when a host was added as default. Still, since my favorites change over time, this value has no meaning outside of the last time I recorded my preference.

insert into ANALYTICS.FAVORITE_HOST(DIM_HOST_KEY)
with fav_hosts as (select DIM_HOST_KEY, HOST_ID, NAME, URI, IMAGE_URI, THUMBNAIL_URI, IS_ACTIVE
                   from WAREHOUSE.DIM_HOST
                   where NAME in ('Cheryl Waters', 'Evie', 'Gabriel Teodros'))
select DIM_HOST_KEY
from fav_hosts;

This table uses the following sequence to register favorites in the order they were added automatically.

CREATE OR REPLACE SEQUENCE ANALYTICS.FAVORITE_SEQUENCE
    START = 1
    INCREMENT = 1
;


No CURRVAL for Current Value

One drawback Snowflake has is the lack of a CURRVAL method to get the current value. The rationale is that Snowflake doesn't want the user to use small, single-row queries.

 
 
 

Comments


  • Facebook
  • Twitter
  • LinkedIn

©2019 by Owl Mountain Software, LLC. Proudly created with Wix.com

bottom of page