Register a service connection¶

This topic covers how to register your service connection credentials with Snowflake or your third-party service (for example, Apache Spark™). The Snowflake Open Catalog administrator registers a service connection.

The example code in this topic shows how to register a service connection in Spark, and the example code is in PySpark.

Prerequisites¶

Before you can register a service connection, you need to configure a service connection. For instructions, see Configure a service connection.

Register a service connection¶

The following example code is for registering a single service connection.

Note

You can also register multiple service connections; see Example 2: Register two service connections.

Parameters¶

Parameter	Description
`<catalog_name>`	Specifies the name of the catalog to connect to. Important: <catalog_name> is case sensitive.
`<maven_coordinate>`	Specifies the Maven coordinate for your external cloud storage provider: S3: software.amazon.awssdk:bundle:2.20.160 Cloud Storage (from Google): org.apache.iceberg:iceberg-gcp-bundle:1.5.2 Azure: org.apache.iceberg:iceberg-azure-bundle:1.5.2 If you don’t see this parameter, the correct value is already specified in the code sample.
`<client_id>`	Specifies the client ID for the service principal to use. Enter the Client ID that you copied when you configured a new service connection.
`<client_secret>`	Specifies the client secret for the service principal to use. Enter the Secret that you copied when you configured a new service connection.
`<open_catalog_account_identifier>`	Specifies the account identifier for your Open Catalog account. Depending on the region and cloud platform for the account, this identifier might be the account locator by itself (for example, `xy12345`) or include additional segments. For more information, see Using an account locator as an identifier.
`<principal_role_name>`	Specifies the principal role that is granted to the service principal. To view this principal role, in Open Catalog, select the Connections page, select your service connection, and in the Principal Details dialog, refer to Principal Roles.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,<maven_coordinate>') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://<open_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','<client_id>:<client_secret>') \
    .config('spark.sql.catalog.opencatalog.warehouse','<catalog_name>') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:<principal_role_name>') \
    .getOrCreate()

Copy

Register a cross-region service connection (Amazon S3 only)¶

The following example code is for registering a service connection when the following is true:

Your Open Catalog account is hosted on Amazon S3.
Your external storage provider is Amazon S3.
Your Open Catalog account is hosted in an S3 region that is different from the S3 region where the storage bucket containing your Apache Iceberg™ tables is located.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://<open_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','<client_id>:<client_secret>') \
    .config('spark.sql.catalog.opencatalog.warehouse','<catalog_name>') \
    .config('spark.sql.catalog.opencatalog.client.region','<target_s3_region>') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:<principal_role_name>') \
    .getOrCreate()

Copy

Parameters¶

Parameter	Description
`<catalog_name>`	Specifies the name of the catalog to connect to. Important: <catalog_name> is case sensitive.
`<client_id>`	Specifies the client ID for the service principal to use.
`<client_secret>`	Specifies the client secret for the service principal to use.
`<open_catalog_account_identifier>`	Specifies the account identifier for your Open Catalog account. Depending on the region and cloud platform for the account, this identifier might be the account locator by itself (for example, `xy12345`) or include additional segments. For more information, see Using an account locator as an identifier.
`<target_s3_region>`	Specifies the region code where the S3 bucket containing your Apache Iceberg tables is located. For the region codes, see AWS service endpoints and refer to the Region column in the table.
`<principal_role_name>`	Specifies the principal role that is granted to the service principal.

Examples¶

This section contains examples of registering a service connection in Spark.

Example 1: Register a single service connection (S3)
Example 2: Register two service connections (S3)
Example 3: Register a single service connection (Cloud Storage from Google)
Example 4: Register a single service connection (Azure)

Example 1: Register a single service connection (S3)¶

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()

Copy

Example 2: Register two service connections (S3)¶

Important

When registering multiple service connections, you must change the opencatalog instances in the code for the first connection to unique text in the code for each subsequent connection. For example, in the following code, the opencatalog instances for the first connection are changed to opencatalog1 for the second connection:

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_scientist') \
    .config('spark.sql.catalog.opencatalog1', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog1.type', 'rest') \
    .config('spark.sql.catalog.opencatalog1.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog1.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog1.credential','222222222222222222222222222=:3333333333333333333333333333333333333333333=') \
    .config('spark.sql.catalog.opencatalog1.warehouse','Catalog2') \
    .config('spark.sql.catalog.opencatalog1.scope','PRINCIPAL_ROLE:data_scientist') \
    .getOrCreate()

Copy

Example 3: Register a single service connection (Cloud Storage from Google)¶

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,org.apache.iceberg:iceberg-gcp-bundle:1.5.2') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()

Copy

Example 4: Register a single service connection (Azure)¶

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,org.apache.iceberg:iceberg-azure-bundle:1.5.2') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'opencatalog') \
    .config('spark.sql.catalog.opencatalog', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.opencatalog.type', 'rest') \
    .config('spark.sql.catalog.opencatalog.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.opencatalog.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.opencatalog.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.opencatalog.warehouse','Catalog1') \
    .config('spark.sql.catalog.opencatalog.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()

Copy

Verify the connection to Open Catalog¶

To verify that Spark is connected to Open Catalog, list the namespaces for the catalog. For more information, see List namespaces.