Register a service connection

This topic covers how to register your service connection credentials with Snowflake or your third-party service (for example, Apache Spark). The Polaris Catalog™ administrator registers a service connection.

The example code in this topic shows how to configure a service connection in Spark, and the example code is in PySpark.

Prerequisites

Before you can register a service connection, you need to configure a service connection. For instructions, see Configure a service connection.

Configure a service connection

The following example code is for configuring a single service connection.

Note

If needed, you can configure multiple service connections. For an example, see Example 2: Configure two service connections.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,<maven_coordinate>') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'polaris') \
    .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris.type', 'rest') \
    .config('spark.sql.catalog.polaris.uri','https://<polaris_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris.credential','<client_id>:<client_secret>') \
    .config('spark.sql.catalog.polaris.warehouse','<catalog_name>') \
    .config('spark.sql.catalog.polaris.scope','PRINCIPAL_ROLE:<principal_role_name>') \
    .getOrCreate()
Copy

Parameters

Parameter

Description

<catalog_name>

Specifies the name of the catalog to connect to.

Important:
<catalog_name> is case sensitive.

<maven_coordinate>

Specifies the maven coordinate for your external cloud storage provider:

  • S3: software.amazon.awssdk:bundle:2.20.160
  • Google Cloud Storage (GCS): org.apache.iceberg:iceberg-gcp-bundle:1.5.2
  • Azure: org.apache.iceberg:iceberg-azure-bundle:1.5.2

<client_id>

Specifies the Client ID for the service principal to use.

<client_secret>

Specifies the Client Secret for the service principal to use.

<polaris_catalog_account_identifier>

Specifies the account identifier for your Polaris Catalog account. Depending on the region and cloud platform for the account, this identifier might be the account locator by itself (for example, xy12345) or include additional segments. For more information, see Using an account locator as an identifier.

<principal_role_name>

Specifies the principal role that is granted to the service principal.

Configure a cross-region service connection (Amazon S3 only)

The following example code is for configuring a service connection when the following is true:

  • Your Polaris Catalog account is hosted on Amazon S3.

  • Your external storage provider is Amazon S3.

  • Your Polaris Catalog account is hosted in an S3 region that is different from the S3 region where the storage bucket containing your Apache Iceberg™ tables is located.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'polaris') \
    .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris.type', 'rest') \
    .config('spark.sql.catalog.polaris.uri','https://<polaris_catalog_account_identifier>.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris.credential','<client_id>:<client_secret>') \
    .config('spark.sql.catalog.polaris.warehouse','<catalog_name>') \
    .config('spark.sql.catalog.polaris.client.region','<target_s3_region>') \
    .config('spark.sql.catalog.polaris.scope','PRINCIPAL_ROLE:<principal_role_name>') \
    .getOrCreate()
Copy

Parameters

Parameter

Description

<catalog_name>

Specifies the name of the catalog to connect to.

Important:
<catalog_name> is case sensitive.

<client_id>

Specifies the Client ID for the service principal to use.

<client_secret>

Specifies the Client Secret for the service principal to use.

<polaris_catalog_account_identifier>

Specifies the account identifier for your Polaris Catalog account. Depending on the region and cloud platform for the account, this identifier might be the account locator by itself (for example, xy12345) or include additional segments. For more information, see Using an account locator as an identifier.

<target_s3_region>

Specifies the region code where the S3 bucket containing your Apache Iceberg™ tables is located. For the region codes, see AWS service endpoints and refer to the Region column in the table.

<principal_role_name>

Specifies the principal role that is granted to the service principal.

Examples

This section contains examples of configuring a service connection in Spark.

Example 1: Configure a single service connection (S3)

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'polaris') \
    .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris.type', 'rest') \
    .config('spark.sql.catalog.polaris.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.polaris.warehouse','Catalog1') \
    .config('spark.sql.catalog.polaris.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()
Copy

Example 2: Configure two service connections (S3)

Important

When configuring multiple service connections, you must change the polaris instances in the code for the first connection to unique text in the code for each subsequent connection. For example, in the following code, the polaris instances for the first connection are changed to polaris1 for the second connection.

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'polaris') \
    .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris.type', 'rest') \
    .config('spark.sql.catalog.polaris.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.polaris.warehouse','Catalog1') \
    .config('spark.sql.catalog.polaris.scope','PRINCIPAL_ROLE:data_scientist') \
    .config('spark.sql.catalog.polaris1', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris1.type', 'rest') \
    .config('spark.sql.catalog.polaris1.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris1.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris1.credential','222222222222222222222222222=:3333333333333333333333333333333333333333333=') \
    .config('spark.sql.catalog.polaris1.warehouse','Catalog2') \
    .config('spark.sql.catalog.polaris1.scope','PRINCIPAL_ROLE:data_scientist') \
    .getOrCreate()
Copy

Example 3: Configure a single service connection (GCS)

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,org.apache.iceberg:iceberg-gcp-bundle:1.5.2') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'polaris') \
    .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris.type', 'rest') \
    .config('spark.sql.catalog.polaris.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.polaris.warehouse','Catalog1') \
    .config('spark.sql.catalog.polaris.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()
Copy

Example 4: Configure a single service connection (Azure)

import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('iceberg_lab') \
    .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,org.apache.iceberg:iceberg-azure-bundle:1.5.2') \
    .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
    .config('spark.sql.defaultCatalog', 'polaris') \
    .config('spark.sql.catalog.polaris', 'org.apache.iceberg.spark.SparkCatalog') \
    .config('spark.sql.catalog.polaris.type', 'rest') \
    .config('spark.sql.catalog.polaris.uri','https://ab12345.snowflakecomputing.com/polaris/api/catalog') \
    .config('spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation','vended-credentials') \
    .config('spark.sql.catalog.polaris.credential','000000000000000000000000000=:1111111111111111111111111111111111111111111=') \
    .config('spark.sql.catalog.polaris.warehouse','Catalog1') \
    .config('spark.sql.catalog.polaris.scope','PRINCIPAL_ROLE:data_engineer') \
    .getOrCreate()
Copy

Verify the connection to Polaris Catalog™

To verify that Spark is connected to Polaris Catalog, list the namespaces for the catalog. For more information, see List namespaces.