Register a service connection

This topic covers how to register your service connection credentials with Snowflake or your third-party service (for example, Apache Spark). The Snowflake Open Catalog administrator registers a service connection.

The example code in this topic shows how to configure a service connection in Spark, and the example code is in PySpark.

Prerequisites

Before you can register a service connection, you need to configure a service connection. For instructions, see Configure a service connection.

Configure a service connection

The following example code is for configuring a single service connection.

Note

If needed, you can configure multiple service connections. For an example, see Example 2: Configure two service connections.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,<maven_coordinate>') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://<open_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','<client_id>:<client_secret>') \
    .config('spark.sql.catalog.opencatalog.warehouse','<catalog_name>') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:<principal_role_name>') \
    .getOrCreate()
Copy

Parameters

Parameter

Description

<catalog_name>

Specifies the name of the catalog to connect to.

Important:
<catalog_name> is case sensitive.

<maven_coordinate>

Specifies the maven coordinate for your external cloud storage provider:

  • S3: software.amazon.awssdk:bundle:2.20.160
  • Google Cloud Storage (GCS): org.apache.iceberg:iceberg-gcp-bundle:1.5.2
  • Azure: org.apache.iceberg:iceberg-azure-bundle:1.5.2

<client_id>

Specifies the Client ID for the service principal to use.

<client_secret>

Specifies the Client Secret for the service principal to use.

<open_catalog_account_identifier>

Specifies the account identifier for your Open Catalog account. Depending on the region and cloud platform for the account, this identifier might be the account locator by itself (for example, xy12345) or include additional segments. For more information, see Using an account locator as an identifier.

<principal_role_name>

Specifies the principal role that is granted to the service principal.

Configure a cross-region service connection (Amazon S3 only)

The following example code is for configuring a service connection when the following is true:

  • Your Open Catalog account is hosted on Amazon S3.

  • Your external storage provider is Amazon S3.

  • Your Open Catalog account is hosted in an S3 region that is different from the S3 region where the storage bucket containing your Apache Iceberg™ tables is located.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://<open_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','<client_id>:<client_secret>') \
    .config('spark.sql.catalog.opencatalog.warehouse','<catalog_name>') \
    .config('spark.sql.catalog.opencatalog.client.region','<target_s3_region>') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:<principal_role_name>') \
    .getOrCreate()
Copy

Parameters

Parameter

Description

<catalog_name>

Specifies the name of the catalog to connect to.

Important:
<catalog_name> is case sensitive.

<client_id>

Specifies the Client ID for the service principal to use.

<client_secret>

Specifies the Client Secret for the service principal to use.

<open_catalog_account_identifier>

Specifies the account identifier for your Open Catalog account. Depending on the region and cloud platform for the account, this identifier might be the account locator by itself (for example, xy12345) or include additional segments. For more information, see Using an account locator as an identifier.

<target_s3_region>

Specifies the region code where the S3 bucket containing your Apache Iceberg™ tables is located. For the region codes, see AWS service endpoints and refer to the Region column in the table.

<principal_role_name>

Specifies the principal role that is granted to the service principal.

Examples

This section contains examples of configuring a service connection in Spark.

Example 1: Configure a single service connection (S3)

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()
Copy

Example 2: Configure two service connections (S3)

Important

When configuring multiple service connections, you must change the opencatalog instances in the code for the first connection to unique text in the code for each subsequent connection. For example, in the following code, the opencatalog instances for the first connection are changed to opencatalog1 for the second connection.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_scientist') \
    .config('spark.sql.catalog.opencatalog1', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog1.type', 'rest') \
    .config('spark.sql.catalog.opencatalog1.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog1.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog1.credential','222222222222222222222222222=:3333333333333333333333333333333333333333333=') \
    .config('spark.sql.catalog.opencatalog1.warehouse','Catalog2') \
    .config('spark.sql.catalog.opencatalog1.scope','PRINCIPAL_ROLE:data_scientist') \
    .getOrCreate()
Copy

Example 3: Configure a single service connection (GCS)

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,org.apache.iceberg:iceberg-gcp-bundle:1.5.2') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()
Copy

Example 4: Configure a single service connection (Azure)

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,org.apache.iceberg:iceberg-azure-bundle:1.5.2') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()
Copy

Verify the connection to Open Catalog

To verify that Spark is connected to Open Catalog, list the namespaces for the catalog. For more information, see List namespaces.