Data ingestion model for the Snowflake Connector for Google Analytics Raw Data¶
This topic provides information on the data ingestion models supported by the Snowflake Connector for Google Analytics Raw Data.
Google Analytics to BigQuery export¶
- Google Analytics supports two types of BigQuery exports:
Daily - Google Analytics exports data to tables named
events_XXXXXX. Tables are created once daily, after the end of the day, once all the events for the given day are collected.
Streaming - Google Analytics continuously exports data throughout the day, and stores it into a table named
The connector supports both types of exports and automatically downloads all the tables it finds in BigQuery, regardless as to whether they are daily or intraday. No additional configuration is needed.
For each property, the connector saves the events into property-specific tables, which are created in a database and a schema provided on the connector configuration. For each of the properties, two tables are created: one for daily export and another one for intraday export, named
ANALYTICS_INTRADAY_XXXXXXXXX respectively. When both types of export are configured in Google Analytics, the connector will ingest both tables - the intraday table first,
and the daily table afterward.
Daily table ingestion¶
The connector downloads the entire table in a single run when it recognizes the table is present in BigQuery. Once the table is downloaded it will never be reconsidered for future processing. Google cautions the daily tables can be updated up to 72 hours after the table was created. The current version of the connector does not support reflecting this kind of update in sink tables.
The connector supports downloading historical intraday tables (if they are present in BigQuery) and ongoing ingestion of intraday tables still receiving updates.
For past days, the connector downloads intraday tables the same way it foes daily ones – each table is downloaded in whole, one table at a time, until the process reaches the present day’s data.
When the connector recognizes that an intraday table is the last one in BigQuery, it starts processing the table incrementally. This means it downloads incoming batches of data from the table throughout the day, at a constant interval, which is 8 hours by default.
When any of the following conditions are met:
A next-day table appeared in the BigQuery dataset
24 hours passed since the first load for the given table
the connector does a final ingestion for the given intraday table and switches to the next one.
A small number of events may not be ingested, it relates to the events that are delayed by more than 10 minutes. The coming feature will solve this issue.
Use CONFIGURE_INGESTION_INTERVAL to change the default interval value if you need more frequent updates.