Setting Up data ingestion for your ServiceNow® data¶
The Snowflake Connector for ServiceNow® is subject to the Snowflake Connector Terms.
This topic describes how to set up data ingestion for the Snowflake Connector for ServiceNow®.
Note
The Snowflake Connector for ServiceNow® ingests data from ServiceNow® tables into Snowflake. Data ingestion
depends on v2
of the ServiceNow® table API.
Strategies for Ingesting ServiceNow® Tables¶
Note
The connector can only ingest tables with
sys_id
columns present.ServiceNow views are not supported. Instead of ingesting these views, you should synchronize all tables for the underlying view and join the synchronized tables in Snowflake.
The connector uses different ingestion strategies, depending on the table schema. The connector uses three ingestion modes:
The initial load of data occurs for each table when the table is enabled for synchronization.
In this mode, the table is ingested by iterating through the records identified by the IDs in the
sys_id
column. Once all records are ingested, the initial load phase is complete. For certain tables, you can also set the data range start time value which can restrict which records are ingested.Incremental updates occur only for tables with
sys_updated_on
orsys_created_on
columns.Incremental updates begin after the initial load is done and occur on a regular schedule that you can configure. In this mode, the connector ingests only the records that were added, updated or deleted since last synchronization.
Note
Information about deletions comes from the journal table provided during connector configuration.
For tables that do not have
sys_updated_on
orsys_created_on
columns, the connector uses truncate and load mode.In this mode, the table is always ingested using the initial load approach, and newly ingested data replaces the old data. The connector replaces the data by running the
INSERT OVERWRITE
command.
Note
In the “incremental updates” mode, the connector uses the
sys_updated_on
column, if that column is present.If the column is not present, the connector uses the
sys_created_on
column instead.For rotated tables, the connector always uses the
sys_created_on
column. If the table is rotated using a different column thansys_created_on
, the ingestion of that table might cause performance issues.
Note
If the
sys_updated_on
orsys_created_on
fields are not updated when the record is modified in ServiceNow, those modifications won’t be propagated to Snowflake, which results in data inconsistency. Snowflake recommends that you avoid disabling the update of system fields.If a record deletion is not audited, information about deleted records won’t be propagated to Snowflake, resulting in a data inconsistency.
Note
Due to restrictions on the Snowflake and ServiceNow® REST APIs, the connector cannot ingest a table if a single row exceeds 16 MB of data. In that case, the connector tries to ingest data with the frequency defined in the table schedule. If a row exceeds the limit, the connector generates an error message and continues ingesting other table. To overcome this limitation, you can configure column filtering to exclude large columns from ingestion.
Archived records¶
The connector does not actively reflect the records archived in ServiceNow on Snowflake side for the ingested tables. Assuming that you archive inactive records older than a certain date, the following apply:
Any record archived before the connector ingested it (for example, before the initial load of the table) will not be present in the table on Snowflake side at all.
Any record archived after it was already ingested by the connector remains on the Snowflake side with no indication of archive action occurring.
Any archived record restored for a table that is already operating in incremental updates mode will not be ingested on the Snowflake side unless that record is also modified afterwards (with its
sys_updated_on
value being updated to current time).An archived record restored during the initial load of the table may be ingested on Snowflake side depending on its ID in the
sys_id
column.
If you want to bring the table with an active archive rule up to date, you can reload the entire table but any record archived or restored after the reload is finished will follow the same principles listed above.
ServiceNow archive tables ar_[table_name]
can be enabled for synchronization. However, the first incremental update
that follows the initial load of such table are searched for records created/updated past the date
the initial load of the archive table has started. Because neither sys_updated_on
nor sys_created_on
are modified when the record is archived, records archived after the initial load of the archive
table up to a certain point in time are missing in it on the Snowflake side. For example, if you archive records older
than a year, then any record archived for a year after the initial load of the archive table is not ingested
to the archive table on the Snowflake side. The archived records that were restored or deleted by a destroy rule
following initial load of an archive table is never removed from it on the Snowflake side.
Parallel Ingestion of ServiceNow® Tables¶
The connector ingests a few tables in parallel, but the ingestion of each individual table is a synchronous process. This means that ingesting large tables might block the connector from updating other tables. This issue is more likely to occur during the initial load phase than in other phases. By default the connector uses 10 worker threads, which is considered an optimal value to not overload the ServiceNow® instance. If you are sure that your instance can support additional concurrency, you can increase this value to a maximum of 30 by calling CONFIGURE_CONCURRENCY procedure.
Setting Up Data Ingestion Using Snowsight¶
To set up data ingestion using Snowsight, do the following:
Sign in to Snowsight as a user with the ACCOUNTADMIN role.
In the navigation menu, select Data Products » Apps.
Search for the Snowflake Connector for ServiceNow® app, then select the tile for the connector.
In the page for the Snowflake Connector for ServiceNow®, select Data Sync tab.
This displays a list of all the ServiceNow® tables.
Note
The connector can only ingest tables with
sys_id
columns present.Select the tables you want to ingest:
Search for the table you want to ingest.
Select the checkbox in the Status column from the left to the table you want to select.
Under Sync Schedule, select how frequently you want to synchronize the table between Snowflake and ServiceNow®.
Repeat these steps for each table you want to ingest into Snowflake.
Select the heading of the Status column to see the tables you have currently selected.
Select Start sync to begin ingesting data into your Snowflake account.
The connector status changes to Syncing data. When at least one of the tables is ingested successfully, the connector status changes to Last Sync: just now.
Refer to Monitoring the Snowflake Connector for ServiceNow® for information on how to view the contents of the tables in Snowflake.
Modifying Data Ingestion Using Snowsight¶
To modify the ServiceNow® tables to be ingested or the synchronization schedule for the tables, do the following:
Sign in to Snowsight as a user with the ACCOUNTADMIN role.
In the navigation menu, select Data Products » Apps.
Search for the Snowflake Connector for ServiceNow® app, then select the tile for the connector.
In the page for the Snowflake Connector for ServiceNow®, select Data Sync.
Select Edit tables button to enter into the editing mode.
Modify the tables you want to ingest:
Search for the table you want to ingest.
Select the checkbox in the Status column from the left to the table you want to select or deselect.
Under Sync Schedule, select how frequently you want to synchronize the table between Snowflake and ServiceNow®.
Select Update data sync.
Setting Up Data Ingestion Using SQL Statements¶
To set up data ingestion using SQL Statements, do the following:
Note
To configure these settings, you use stored procedures that are defined in the PUBLIC schema of the database that serves as an instance of the connector.
Before calling these stored procedures, select that database as the database to use for the session.
For example, if that database is named my_connector_servicenow
, run the following command:
USE DATABASE my_connector_servicenow;
Enabling or disabling the table synchronization¶
This section describes how to enable or disable the synchronization of a table in ServiceNow®. Synchronization enablement can be done with both default and custom configuration.
Enabling multiple tables using the default configuration¶
To enable the synchronization of data for at least one table in ServiceNow®, call the ENABLE_TABLES
stored procedure with the following arguments:
CALL ENABLE_TABLES(<tables_to_enable>);
Where:
tables_to_enable
Specifies an array of ServiceNow® table names.
Use the table name, not the label displayed in the ServiceNow® UI. You can find the table name in the data dictionary tables in ServiceNow. In the ServiceNow® UI, go to System Definition » Tables. The Name column displays the name of the table.
For example, to enable the synchronization of the tables named table1
, table2
, and table3
, run
the following command:
CALL ENABLE_TABLES(['table1', 'table2', 'table3']);
Disabling multiple tables¶
To disable table data synchronization for a specific table in ServiceNow®, call the DISABLE_TABLES
stored procedure with the following arguments:
CALL DISABLE_TABLES(<tables_to_disable>);
Where:
tables_to_disable
Specifies an array of ServiceNow® table names.
Use the table name, not the label displayed in the ServiceNow® UI. You can find the table name in the data dictionary tables in ServiceNow. In the ServiceNow® UI, go to System Definition » Tables. The Name column displays the name of the table.
For example, to disable the synchronization of the tables named table1
and table2
, run the following command:
CALL DISABLE_TABLES(['table1', 'table2']);
Disabling the table stops synchronization gracefully, as soon as it’s possible. When the table synchronization is re-enabled, ingestion resumes from where it was paused.
Note
Disabling all tables from synchronization does not mean that the Snowflake Connector for ServiceNow® stops incuring cost. Background tasks, such as those related to notifications, can continue to execute.
The ENABLE_TABLES
and DISABLE_TABLES
procedures add the specified table names to the CONFIGURED_TABLES
view.
Note
The connector does not support roll backs or delete recoveries in ServiceNow®.
Using the roll back and delete recovery features may result in data inconsistency. Records that are recovered in ServiceNow® may still be marked as deleted in Snowflake. To resolve it you can reload the table.
Enabling a single table using custom configuration¶
To enable the synchronization of data with custom configuration for a specific table in ServiceNow®, call the ENABLE_TABLE
stored procedure with the following arguments:
CALL ENABLE_TABLE('<table_to_enable>', <table_config>);
Where:
table_to_enable
Specifies a ServiceNow® table name.
table_config
Optional, specifies an object with table ingestion configuration. If not specified table ingestion uses the default configuration.
Currently supported configurations are:
column filtering, where you can provide
include_columns
orexclude_columns
properties with a list of column names,row filtering, where you can provide
filter
property with a filter expression,synchronization schedule, where you can provide the
schedule
property with custom ingestion schedule,deletions synchronization enablement, where you can provide the
sync_deletions
boolean property.
Note
All of the custom configurations can be combined in a single object and used simultaneously for a single table ingestion, for example to enable ingestion of table
sys_audit
with the below configuration:the table should be synchronized every Saturday at 10:00 AM UTC,
only the columns
newvalue
andreason
should be ingested,only the rows that have the
newvalue
column starting with the stringprivacy
should be ingested,the deletions shouldn’t be synchronized if a journal table is configured.
run the following command:
CALL ENABLE_TABLE('sys_audit', { 'schedule': { 'type': 'custom', 'value': { 'hour': 10, 'day_of_week': '6' } }, 'include_columns': ['newvalue', 'reason'], 'row_filter': 'newvalue STARTSWITH "privacy"', 'sync_deletions': false });
Enabling a single table using column filtering¶
If you don’t need all columns from a ServiceNow® table in Snowflake, the connector can ignore them. For example, skip to columns if a single row exceeds the maximum row size of 16 MB.
To enable table ingestion with specified columns run the following command:
CALL ENABLE_TABLE('<table_to_enable>', <table_config>);
Where:
table_to_enable
Specifies a ServiceNow® table name.
table_config
Object including
include_columns
orexclude_columns
properties with a list of column names. Ifsys_id
,sys_created_on
, andsys_updated_on
exist, they are always included. You don’t have to add them toincluded_columns
array and cannot exclude them usingexcluded_columns
as the connector uses them in the ingestion process.
Note
Since columns in ServiceNow® are written in lowercase and the API that the connector uses is case-sensitive, the values provided for specified columns must also be in lowercase.
Note
You shouldn’t provide both include_columns
and exclude_columns
. If you want to list include_columns
, you should skip the exclude_columns
property, and vice versa.
If both arrays are not empty and there aren’t any conflicting columns, include_columns
takes precedence over exclude_columns
.
If both include_columns
and exclude_columns
are empty arrays, all the available columns will be ingested.
For example with a ServiceNow® table named u_table
with columns sys_id
, sys_updated_on
, col_1
and col_2
and executing:
CALL ENABLE_TABLE('u_table', { 'include_columns': ['sys_id', 'sys_updated_on'] });
will ingest only sys_id
and sys_updated_on
columns for the given table, but calling:
CALL ENABLE_TABLE('u_table', { 'exclude_columns': ['col_1'] });
will ingest sys_id
, sys_updated_on
and also col_2
.
The connector validates the provided columns and rejects the enablement request if any of the columns are not available in ServiceNow®.
ServiceNow® API supports only include mode. As a result the connector transforms provided column arrays into included columns list and sends them with each request to ServiceNow®.
The URL with included columns could possible be too long to be handled by ServiceNow®. The connector validates this limitation when the ENABLE_TABLE
is invoked.
Columns configuration for each table can be found in the INCLUDED_COLUMNS
column of the CONFIGURED_TABLES
view.
To modify the list of ingested columns, you need to disable the specific table first.
If column filtering is configured for a table, you can enable the table only using the ENABLE_TABLE
procedure. You cannot use the ENABLE_TABLES
, which accepts a list of tables as an argument.
Flattened views only include the columns specified when the table was enabled. They are updated every time the list of included columns changes. If column filtering is not configured, views contain all the available columns.
Note
Configuration change does not affect the previously ingested data. Column filtering applies only to the newly ingested records. To apply the filter to the previously ingested data, the table needs to be reloaded.
Enabling a single table using row filtering¶
You can exclude data ingestion for select rows from a ServiceNow® table by specifying a filter condition. For example, to exclude the rows which include sensitive data that you don’t want in Snowflake, or exclude the rows which include unnecessary data in order to reduce cost.
To enable table ingestion with specified row filter run the following command:
CALL ENABLE_TABLE('<table_to_enable>', <table_config>);
Where:
table_to_enable
Specifies a ServiceNow® table name.
table_config
Object including
row_filter
property with a filter expression, which is a valid string.Currently supported filter operators are:
Operator
Description
Example
AND
Logical operator to join conditions, where both must be fulfilled.
active = "true" AND impact = "2"
OR
Logical operator to join conditions, where at least one of them must be fulfilled.
Important
Takes precedence over
AND
operator. See the examples below.tablename = "incident" OR tablename = "problem"
=
Checks if values are equal.
priority = "1"
!=
Checks if values are not equal.
state != "7"
LIKE
Checks if the value contains specified character sequence.*
newvalue LIKE "privacy"
STARTSWITH
Checks if the value starts with specified character sequence.*
description STARTSWITH "important"
ENDSWITH
Checks if the value ends with specified character sequence.*
description ENDSWITH "important"
IN
Checks if the value is equal to any from the list of values.**
tablename IN ("incident", "task", "cmdb_ci")
(*) - fields must be of
string
data type. (**) - choice fields must contain strings.Filter expression rules and limitations:
any two filter expressions must be joined with the
AND
or theOR
operator.Operators must be separated by space and be in uppercase.
Value expressions must be enclosed in double quotes.
expressions are case-sensitive.
the expression cannot operate on
sys_id
,sys_updated_on
, orsys_created_on
columns.
Note
Configuration changes do not affect the previously ingested data. Row filtering applies only to the newly ingested records. To apply the filter to the already ingested data, the table must be reloaded.
Examples¶
To enable ingestion of table
sys_audit
, but synchronize only the rows that are related to the privacy incidents in theINCIDENT
table, execute:
CALL ENABLE_TABLE('sys_audit', {
'row_filter': 'tablename = "incident" AND fieldname = "cause" AND newvalue LIKE "privacy"'
});
To enable ingestion of table
incident
, but synchronize only the rows under such conditions that:active
field is equal totrue
,sys_created_by
field starts withsupport
or ends withadmin
,category
field is one ofNetwork
,Cloud Management
,
execute:
CALL ENABLE_TABLE('incident', {
'row_filter': 'active = "true" AND sys_created_by STARTSWITH "support" OR sys_created_by ENDSWITH "admin" AND category IN ("Network", "Cloud Management")'
});
To enable ingestion of table
incident
, but ingest only the rows in the specified incident state and only the given columns, execute:
CALL ENABLE_TABLE('incident', {
'row_filter': 'incident_state IN ("1", "2", "3")', -- "New", "In Progress", "On Hold"
'include_columns': ['incident_state', 'description']
});
Specifying the Synchronization Schedule¶
The Snowflake Connector for ServiceNow® synchronizes data from all ServiceNow® tables to Snowflake on a specified schedule. By default, all of the tables are synchronized once every hour (1h).
To change the default synchronization schedule for all tables, call the CONFIGURE_DATA_INGESTION_SCHEDULE
stored procedure
with the following arguments:
CALL CONFIGURE_DATA_INGESTION_SCHEDULE(<schedule>);
Where:
schedule
Specifies the frequency of the synchronization. You can specify one of the following JSON values:
{ 'type': 'interval', 'value': '<interval_value>' }
, whereinterval_value
is one of the following string values:
'30m'
'1h'
'3h'
'6h'
'12h'
'1d'
{ 'type': 'custom', 'value': { 'hour': <hour>, 'day_of_week': '<day_of_week>' } }
, wherehour
specifies the hour in UTC timezone at which the ingestion should start, andday_of_week
specifies day of the week when the ingestion should be performed. It is possible to use special expressions as a day of week:
'*'
to run the ingestion everyday.
'1-3'
to run the ingestion from Monday to Wednesday.
'0,5,6'
to run the ingestion on Friday, Saturday and Sunday.Possible values that can be used in the expression for
day_of_week
configuration are:
'0'
- Sunday
'1'
- Monday
'2'
- Tuesday
'3'
- Wednesday
'4'
- Thursday
'5'
- Friday
'6'
- SaturdayOther non-digit values like
'5L'
indicating the last Friday of a month or'FRI-SUN'
indicating the range from Friday to Sunday are not supported.
It’s possible to configure ingestion schedule for a specific table during its enablement.
To enable a single table and set its ingestion schedule, call the ENABLE_TABLE
stored procedure with the following arguments:
CALL ENABLE_TABLE('<table_name>', <table_config>);
Where:
table_name
Specifies a ServiceNow® table name to enable.
table_config
Object including
schedule
property, which specifies the configuration of the table synchronization. Checkschedule
ofCONFIGURE_DATA_INGESTION_SCHEDULE
stored procedure for details.
For example to enable ingestion of table table_1
and synchronize data every 3h call the following stored procedure:
CALL ENABLE_TABLE('table_1', { 'schedule': { 'type': 'interval', 'value': '3h' } });
The connector also allows you to specify a different schedule for each table that is enabled for
synchronization. To change the synchronization schedule for a selected set of tables, call the
CONFIGURE_TABLES_SCHEDULE
stored procedure with the following arguments:
CALL CONFIGURE_TABLES_SCHEDULE(<table_names>, <schedule>);
Where:
table_names
Specifies an array of table names for which you want to configure the synchronization schedule.
schedule
Specifies the frequency of the synchronization. Check
schedule
ofCONFIGURE_DATA_INGESTION_SCHEDULE
stored procedure for details.
For example to ingest tables table_1
and table_2
each Saturday and Sunday at 11:00 PM UTC call the following stored
procedure:
CALL CONFIGURE_TABLES_SCHEDULE(['table_1', 'table_2'], { 'type': 'custom', 'value': { 'hour': 23, 'day_of_week': '0,6' } });
By default the connector tries to start the ingestion in 3 hour time window from scheduled start time. If it
is not possible to start the ingestion within that time frame, for example, when the connector is ingesting other
tables, the current scheduled run is not executed. The connector attempts to run the ingestion at the next
scheduled time frame. It is possible to change the duration of the time frame by calling CONFIGURE_CUSTOM_SCHEDULE_START_INGESTION_WINDOW
stored procedure:
CALL CONFIGURE_CUSTOM_SCHEDULE_START_INGESTION_WINDOW(<window_length>);
where window_length
is the window length in ISO 8601 duration format. The duration must be rounded up to
the next whole hour, and must last for at least 1 hour. For example, value 'PT12H'
specifies a window that lasts for
12 hours, and 'P2D'
specifies a window that lasts for 2 days.
If you only enable tables with custom schedules, this configuration only affects time it takes to create and refresh flattened views for the configured tables. The flattened views are created in the first ingestion cycle after the following conditions are met:
Ingestion of metadata tables is finished
Ingestion of the configured table has started.
If email alerts are enabled, Snowflake recommends changing the alert frequency to Once per day when using custom scheduling.
Specifying if deletions should be synchronized¶
You can specify if the connector should synchronize deletions from ServiceNow® to Snowflake. By default, the connector synchronizes deletions if a journal table is configured. However, you might want to disable deletions synchronization of a specific table and not change the global configuration.
To enable table ingestion with specified deletions synchronization setting, run the following command:
CALL ENABLE_TABLE('<table_to_enable>', <table_config>);
Where:
table_to_enable
Specifies a ServiceNow® table name.
table_config
Object including
sync_deletions
boolean property. If the value is set totrue
, the connector synchronizes deletions for the table; if the value is set tofalse
, the connector does not synchronize deletions for the table.
For example, to enable ingestion of the table incident
but not synchronize the deletions, run the following command:
CALL ENABLE_TABLE('incident', { 'sync_deletions': false });
Note
If you want to use the default configuration, don’t provide the sync_deletions
property in the configuration object.
If the journal table is not configured, the connector does not synchronize deletions regardless of the provided configuration.
Specifying the data range start time¶
By default, the Snowflake Connector for ServiceNow® synchronizes all the records in the corresponding ServiceNow® tables. For the tables with: sys_updated_on
or sys_created_on
columns (from now on here called time columns) present, it is possible to restrict the range of synchronized
data by setting a data range start time - i.e. lower bound for the corresponding time column value of the records.
With such a configuration the records with the corresponding time column value older than the data range start timestamp is not ingested. The corresponding time column used by this procedure is determined in the same way as for the incremental updates .
To change the data range start time value, call the CONFIGURE_TABLES_RANGE_START
stored procedure with the following arguments:
CALL CONFIGURE_TABLES_RANGE_START(<table_names>, <range_start>);
Where:
table_names
Specifies an array of table names for which you want to configure the data range start time.
range_start
Timestamp specifying the data range start time in TIMESTAMP_TZ format or NULL to unset the current value.
Note
You cannot set the data range start time for the tables with neither sys_updated_on
nor sys_created_on
column present.
If the ingestion of the table has not been started yet, the data range start time value is taken into account with the first ingestion.
If the ingestion of the table has already been started (e.g. a reload is in progress), the data range start time value is ignored and (another) reload of the table(s) is required to filter out the records with too old corresponding time column value.
It is therefore recommended to set the data range start time before starting the first ingestion of a table (hence also before enabling it).
For example, if tables table1
and table2
have the required time column(s), in order to set the data range start time to 2022-11-23 07:00:00 UTC for theses two tables,
run the following command:
CALL CONFIGURE_TABLES_RANGE_START(['table1', 'table2'], TO_TIMESTAMP_TZ('2022-11-23 07:00:00 +00:00'));
Then:
for table
table1
, for example, if ts ingestion has not been started yet, all the records with corresponding time column value before 2022-11-23 07:00:00 is not ingested.for table
table2
, for example, if its ingestion has already been started, the data range time start value is ignored in all data synchronizations until reloading this table. During the reload all the records with corresponding time column value before 2022-11-23 07:00:00 is not ingested.
It is also possible to unset the data range start time. For example, in order to unset it for table table1
, run the following command:
CALL CONFIGURE_TABLES_RANGE_START(['table1'], NULL);
Again, if an ingestion of table table1
has already been started, reloading this table is required to ingest all the records back from ServiceNow®.
Note
Loading the data with respecting the data range start time may take longer than loading all historical data due to lower performance of incremental update.
Reloading Data in a Table¶
To reload data in particular table, call the RELOAD_TABLE
stored procedure:
CALL RELOAD_TABLE('<table_name>');
Where:
table_name
Specifies the name of the table to reload.
When you call the RELOAD_TABLE
stored procedure, the connector performs the following example:
The connector suspends the original table for ingestion temporarily.
Note
While the table is being reloaded, you cannot re-enable the table for ingestion.
The connector creates a separate temporary table for ingestion.
The connector ingests the data into this new temporary table. This table is visible in the CONNECTOR_STATS view as a table named with a
__tmp
suffix).Note
Every reload takes data range start time value into account, which can restrict which records are ingested.
After the data is ingested, the connector replaces the data in the original table with the data in the temporary table.
The connector deletes the temporary table.
The connector re-enables the original table for ingestion.
During this process, you can continue to query the existing data in the original table. However, changes to the data in the ServiceNow® table won’t be reflected in the Snowflake table until the ingestion process completes.
To avoid overloading your ServiceNow® instance, reload only one table at time.
Canceling Table Reload¶
To cancel the process of reloading the data in a table, use the CANCEL_RELOAD_TABLE
stored procedure as
shown in the following example:
CALL CANCEL_RELOAD_TABLE('<table_name>');
Where:
table_name
Specifies the name of the table whose reload you want to cancel.
When you cancel the reload, the connector drops all temporary objects created during the reload. The table is then available for ingestion as part of the normal synchronization schedule.
Configuring the Size of a Single Page Fetch for a Table¶
The connector fetches data from a table by dividing it into smaller chunks called pages. Each API request to ServiceNow® fetches one page.
To account for this, the connector limits the number of rows fetched within a single API request. This limit is the page size.
The connector uses the following process to determine the page size:
Initially, the default page size is set to 10,000 rows.
If the fetch request fails during ingestion because the response size is exceeded, the page size is gradually decreased by 1000, 100, 10 and 1 until the request succeeds or the final page size is set to 1.
The successful page size is saved in the connector state and this value is used by subsequent requests.
The current page size for a table is available in the TABLES_STATE
view. To see the page size, run
the following command:
SELECT PAGE_SIZE FROM TABLES_STATE WHERE TABLE_NAME = '<table_name>';
Where:
table_name
Specifies the name of the ServiceNow® table being ingested.
The process the connector uses for determining the page size may lead to inefficiencies. This process only reduces the page size. It does not increase the page size. This can happen in situations where a table has a single large row that causes the page size to be set to a lower value.
To avoid this situation, you can manually set the page size by calling the RESET_PAGE_SIZE
stored
procedure as shown in the following examples:
CALL RESET_PAGE_SIZE('<table_name>');
or
CALL RESET_PAGE_SIZE('<table_name>', <page_size>);
Where:
table_name
Specifies the name of the ServiceNow® table being ingested.
page_size
(Optional) Specifies the number of rows to fetch in a single page. If not provided, the default value provided in the connector configuration is used. The default and recommended value is 10000. The minimum value is 1 and the maximum value is 25000.
Note
The page size can be also set for a configured journal table, usually sys_audit_delete
. If failures occur
during the deletions ingestion from an underperforming journal table, you can lower the page size to avoid further failures.
Note that the journal table does not need to be explicitly enabled for ingestion to make the connector synchronize deleted rows.
Ingestion run¶
Ingestion runs for a given table are triggered according to the configured schedule. A run downloads all the relevant rows divided into pages mentioned in the previous paragraph from the source table in a loop.
Initial load and updates
As soon as a page of data is fetched, it is inserted into the corresponding event log table. At this stage the newly fetched changes are not yet available in the sync table or through flattened views. When it is done the next request with updated criteria is issued as long as any data is returned. When the ingestion run is complete, and there is no more data to fetch in the source table, an asynchronous merge task is triggered, that takes all the changes from the event log inserted since the last merge and applies them to the sync table. When it is complete, the data becomes available in sync table and flattened views.
Truncate and load
In truncate and load mode a temporary table is created for each ingestion run. Each fetched page of rows is first inserted into this temporary table (this table exists in the internal connector schema and is not available to connector users). At this stage the newly fetched changes are not yet available in the sync table or through flattened views, they still show data fetched in the previous run. When the ingestion run is completed, and there is no more data available in the source table, data from the temporary table replaces existing data in the sync table. All the fetched rows are also added to the event log. At the end the temporary table is dropped.
Monitoring progress
To check the status of a current or past ingestion run, you can query the CONNECTOR_STATS
view. It’s visible in the STATUS
column.
It’s set to DONE
only if data was successfully fetched and all the changes were applied to the sync table.
When the ingestion is running or the merge to the sync table/replace of rows in the sync table has not been completed yet, the status is RUNNING
.
Next steps¶
After configuring ingestion, perform the steps described in Accessing the ServiceNow® data in Snowflake to view and otherwise access ServiceNow® data.