Data ingestion model for the Snowflake Connector for Google Analytics Raw Data¶
The Snowflake connector for Google Analytics Raw Data is subject to the Connector Terms.
This topic provides information on the data ingestion models supported by the Snowflake Connector for Google Analytics Raw Data.
Google Analytics to BigQuery export¶
- Google Analytics supports two types of BigQuery exports:
Daily - Google Analytics exports data to tables named
events_XXXXXX
. Tables are created once daily, after the end of the day, once all the events for the given day are collected.Streaming - Google Analytics continuously exports data throughout the day, and stores it into a table named
events_intraday_XXXXXX
.Users - Google Analytics export containing user data related to the collected events. Tables are stored in BigQuery under the names
users_XXXXXX
andpseudonymous_users_XXXXXX
.
The connector supports both types of exports and automatically downloads all the tables it finds in BigQuery, regardless as to whether they are daily or intraday. No additional configuration is needed.
Sink tables¶
For each property, the connector saves the events into property-specific tables, which are created in a database and a schema provided during connector configuration.
For each of the properties, there might be up to four sink tables created, depending on which export types were enabled. The tables are named as follows:
ANALYTICS_<propertyId>
ANALYTICS_INTRADAY_<propertyId>
USERS_<propertyId>
PSEUDONYMOUS_USERS_<propertyId>
Daily table ingestion¶
The connector downloads the entire table in a single run when it recognizes the table is present in BigQuery. Google cautions the daily tables can be updated up to 72 hours after the table was created. To ensure data consistency, the connector reloads tables after 72 hours, (Note that the exact reload time is dependent on the connector ingestion schedule). Updates in BigQuery made after 72 hours since table creation, won’t be reflected in Snowflake. Such tables can be reloaded manually, using one of the RELOAD_PROPERTY procedures.
Intraday ingestion¶
The connector supports downloading historical intraday tables (if they are present in BigQuery) and ongoing ingestion of intraday tables still receiving updates.
For past days, the connector downloads intraday tables the same way it foes daily ones – each table is downloaded in whole, one table at a time, until the process reaches the present day’s data.
When the connector recognizes that an intraday table is the last one in BigQuery, it starts processing the table incrementally. This means it downloads incoming batches of data from the table throughout the day, at a constant interval, which is 8 hours by default.
When any of the following conditions are met:
A next-day table appeared in the BigQuery dataset
24 hours passed since the first load for the given table
the connector does a final ingestion for the given intraday table and switches to the next one.
Note
A small number of events may not be ingested if events are delayed more than 10 minutes. Immediately after the incremental load of a intraday table is finished, the connector verifies whether there are any lost events, and if so schedules a table reload to ensure data consistency between Snowflake and BigQuery.
User data tables ingestion¶
User data table ingestion is based on the same mechanism as daily tables ingestion.
Scheduling¶
Connector checks whether new tables exist in BigQuery and then schedules ingestions of them (or its parts in case of incremental intraday ingestions) into Snowflake when:
- Task is triggered according to configured schedule
By default it is every 8 hours
Using CONFIGURE_INGESTION_INTERVAL you can change the default interval value if you need more/less frequent updates.
- Connector finished ingestion of last scheduled table
In consequence, this means that schedules are more frequent than it stems from the configuration, since there should be at least one ingestion per day, which means at least one extra check.
In particular, when there is initial load ongoing, and there are a lot of tables to ingest, after ingesting each of the tables, the scheduling mechanism is triggered.