Create a catalog¶
The Polaris Catalog™ administrator creates and manages a catalog.
The steps to create a catalog depend on your cloud storage provider.
When you create a catalog, you supply information about your external cloud storage, and Polaris Catalog uses that information to create a storage configuration. This configuration stores an identity and access management (IAM) entity for your storage. Polaris Catalog uses the IAM entity to securely connect to your storage locations in order to access table data, Apache Iceberg™ metadata, and manifest files.
For instructions, see the following sections:
Create a catalog using Amazon S3¶
This section covers how to:
Grant Polaris Catalog restricted access to your Amazon S3 bucket.
Create a catalog.
Before you configure access to S3, you need the following:
An S3 storage bucket in the same region that hosts your Snowflake account.
Polaris Catalog can’t support bucket names that contain dots (for example, my.s3.bucket). Polaris Catalog uses virtual-hosted-style paths and HTTPS to access data in S3. However, S3 does not support SSL for virtual-hosted-style buckets with dots in the name.
For data recovery features, see your storage provider.
Permissions in AWS to create and manage IAM policies and roles. If you aren’t an AWS administrator, ask your AWS administrator to perform these tasks.
Step 1: Create an IAM policy that grants access to your S3 location¶
To configure access permissions for Polaris Catalog in the AWS Management Console, do the following:
Log in to the AWS Management Console.
From the home dashboard, search for and select IAM.
From the left-hand navigation pane, select Account settings.
Under Security Token Service (STS) in the Endpoints list, find the Polaris Catalog region where your account is located. If the STS status is inactive, move the toggle to Active.
From the left-hand navigation pane, select Policies.
Select Create Policy.
For Policy editor, select JSON.
Add a policy to provide Polaris Catalog with the required permissions to read and write data to your S3 location. The following example policy grants access to all locations in the specified bucket.
Note
Replace
*my_bucket*
with your actual bucket name. You can also specify a path in the bucket; for example,*my_bucket*/*path*
.Setting the
"s3:prefix":
condition to["*"]
grants access to all prefixes in the specified bucket; setting it to["*path*/*"]
grants access to a specified path in the bucket.For buckets in government regions, the bucket ARNs use the
arn:aws-us-gov:s3:::
prefix.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:GetObjectVersion", "s3:DeleteObject", "s3:DeleteObjectVersion" ], "Resource": "arn:aws:s3:::<my_bucket>/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::<my_bucket>", "Condition": { "StringLike": { "s3:prefix": [ "*" ] } } } ] }
Select Next.
Enter a Policy name (for example,
polaris_catalog_access
) and an optional Description.Select Create policy.
Step 2: Create an IAM role¶
Create an AWS IAM role to grant privileges on your S3 bucket.
From the left-hand navigation pane in the Identity and Access Management (IAM) Dashboard, select Roles.
Select Create role.
For the trusted entity type, select AWS account.
Under An AWS account, select This account. In a later step, you modify the trusted relationship and grant access to Polaris Catalog.
If you want to create an external ID, select the Require external ID option. Enter an external ID of your choice. For example,
polaris_catalog_external_id
.Note
If you don’t create an external ID, when you create a catalog, Polaris Catalog generates an External ID for you to use. An external ID is used to grant access to your AWS resources (such as S3 buckets) to a third party like Polaris Catalog.
Select Next.
Select the policy that you created in the previous step, then select Next.
Enter a Role name and description for the role, then select Create role. You have now created an IAM policy for an S3 location, created an IAM role, and attached the policy to the role.
Select View role to view the role summary page. Locate and record the ARN (Amazon Resource Name) value for the role.
Step 3: Create a catalog¶
Use Polaris Catalog to create a catalog.
To create a catalog in Polaris Catalog, follow these steps:
Sign in to Polaris Catalog.
From the Polaris Catalog home page, in the Catalogs area, select + Create.
From the Create Catalog dialog, complete the fields:
In the Name field, enter a name for the catalog. Important
Catalog names are case sensitive.
If you’re creating an external catalog, move the External toggle to On. For information on external catalogs, see Catalog types.
In the Storage Provider field, select S3.
In the Default base location field, enter the default base location for your AWS S3 storage bucket.
If the catalog will contain objects stored in more than one location, in the Additional locations (optional) field, list each additional storage location, separated by a comma.
In the S3 role arn field, enter the ARN of the IAM role that you created for Polaris Catalog.
If you created an external ID when you created an IAM role, in the External ID field, enter the external ID.
Select Create.
Step 4: Retrieve the AWS IAM user for your Polaris Catalog™ account¶
From the Polaris Catalog home page, in the Catalogs area, select the catalog that you created.
Under Storage Details, copy the IAM user arn; for example,
arn:aws:iam::123456789001:user/abc1-b-self1234
. Polaris Catalog provisions a single IAM user for your entire Polaris Catalog account. All S3 storage configurations in your account use that IAM user.Note
If you didn’t specify an external ID when you created your IAM role, Polaris Catalog generates an external ID for you to use. Record the value so that you can update your IAM role trust policy with the generated external ID.
Step 5: Grant the IAM user permissions to access bucket objects¶
In this step, you configure permissions that allow the IAM user for your Polaris Catalog account to access objects in your S3 bucket.
Log in to the AWS Management Console.
From the home dashboard, search for and select IAM.
From the left-hand navigation pane, select Roles.
Select the IAM role that you created for your storage configuration.
Select the Trust relationships tab.
Select Edit trust policy.
Modify the policy document with the catalog storage details that you recorded.
Policy document for IAM role
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "<polaris_catalog_user_arn>" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<polaris_catalog_external_id>" } } } ] }
Where:
*polaris_catalog_user_arn*
is the IAM user arn that you recorded.polaris_catalog_external_id
is your external ID. If you already specified an external ID when you created the role, and used the same ID to create your storage configuration, leave the value as-is. Otherwise, update sts:ExternalId with the value that you recorded.
Note
You must update this policy document if you create a new storage configuration and don’t provide your own external ID. For security reasons, a new or recreated storage configuration has a different external ID and cannot resolve the trust relationship unless you update this trust policy.
Select Update policy to save your changes.
Create a catalog using Google Cloud Storage¶
This section covers how to create a catalog and grant Polaris Catalog restricted access to a Google Cloud Storage (GCS) bucket using a storage configuration.
An administrator in your organization grants the IAM user permissions in your Google Cloud account.
Note
To complete the instructions in this topic, you must have permissions in Google Cloud to create and manage IAM policies and roles. If you are not a Google Cloud administrator, ask your Google Cloud administrator to perform these tasks.
For data recovery features, see your storage provider.
Step 1: Create a catalog¶
Use Polaris Catalog to create a catalog.
To create a catalog in Polaris Catalog, follow these steps:
Sign in to Polaris Catalog.
From the Polaris Catalog home page, in the Catalogs area, select + Create.
From the Create Catalog dialog, complete the fields:
In the Name field, enter a name for the catalog. Important
Catalog names are case sensitive.
If you’re creating an external catalog, move the External toggle to On. For information on external catalogs, see Catalog types
In the Storage Provider field, select GCS.
In the Default base location field, enter the default base location for your GCS storage bucket.
If the catalog will contain objects stored in more than one location, in the Additional locations (optional) field, list each additional storage location, separated by a comma.
Select Create.
Step 2: Retrieve the GCS service account for your Polaris Catalog account¶
From the Polaris Catalog home page, in the Catalogs area, select the catalog that you created.
Under Storage Details, copy the GCP_SERVICE_ACCOUNT ID; for example,
service-account-id@project1-123456.iam.gserviceaccount.com
. Polaris Catalog provisions a single GCS service account for your entire Polaris Catalog account. Polaris Catalog uses that service account when accessing storage on GCS.
Step 3: Grant the service account permissions to access bucket objects¶
In this step, you configure IAM access permissions for Polaris Catalog in your Google Cloud Platform Console.
Create a custom IAM role¶
Create a custom role that has the permissions required to access the bucket and get objects.
Log in to the Google Cloud Platform Console as a project editor.
From the home dashboard, select IAM & Admin » Roles.
Select Create Role.
Enter a Title and optional Description for the custom role.
Select Add Permissions.
In Filter, select Service and then select storage.
Filter the list of permissions, and add the following from the list:
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
Select Add.
Select Create.
Assign the custom role to the GCS service account¶
Log in to the Google Cloud Platform Console as a project editor.
From the home dashboard, select Cloud Storage » Buckets.
Filter the list of buckets, and select the bucket that you specified in your Polaris Catalog storage configuration.
Select Permissions » View by principals, then select Grant access.
Under Add principals, paste the name of the service account name from the output in Step 2: Retrieve the GCS service account for your Polaris Catalog account.
Under Assign roles, select the custom IAM role that you created previously, then select Save.
Create a catalog using Azure storage¶
This topic covers how to grant Polaris Catalog restricted access to your own Microsoft Azure container using a storage configuration. Polaris Catalog supports the following Azure cloud storage services for storage configurations:
Blob storage
Data Lake Storage Gen2
General-purpose v1
General-purpose v2
An administrator in your organization grants the IAM user permissions in your Azure account.
Note
Completing the instructions in this topic requires permissions in Azure to create and manage IAM policies and roles. If you are not an Azure administrator, ask your Azure administrator to perform these tasks.
For data recovery features, see your storage provider.
Step 1: Create a catalog¶
Use Polaris Catalog to create a catalog.
To create a catalog in Polaris Catalog, follow these steps:
Sign in to Polaris Catalog.
From the Polaris Catalog home page, in the Catalogs area, select + Create.
In the Create Catalog dialog, complete the fields:
In the Name field, enter a name for the catalog. Important Catalog names are case sensitive.
If you’re creating an external catalog, move the External toggle to On. For information on external catalogs, see Catalog types.
In the Storage Provider field, select AZURE.
In the Default base location field, enter the default base location for your Azure storage container by applying the following applicable format to the path to the primary endpoint for your container.
Note
You copied this path and the name of your container when you created a Microsoft Azure container.
In the path to the primary endpoint for your container, the name of your storage account is the text between
https://
and the first period in the path.Use the
abfss://
prefix, nothttps://
.
Endpoint type
Format
Default base location example
Blob
abfss://<container_name>@<storage_account_name>.blob.core.windows.net/<directory_name>/
abfss://my_container1@my_storageaccount1.blob.core.windows.net/my_directory1/
Azure Data Lake Storage (ADLS)
abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<directory_name>/
abfss://my_container2@my_storageaccount2.dfs.core.windows.net/my_directory2/
If the catalog will contain objects stored in more than one location, in the Additional locations (optional) field, list each additional storage location, separated by a comma.
In the Tenant ID field, enter the Azure Tenant ID.
Select Create.
Step 2: Grant Polaris Catalog access to the storage location¶
From the Polaris Catalog home page, in the Catalogs area, select the catalog that you created.
Under Storage Details, copy the following values:
Property |
Description |
---|---|
|
URL to the Microsoft permissions request page. |
|
Name of the Snowflake client application created for your account. In a later step in this section, you grant this application permission to obtain an access token on your allowed storage location. |
You use these values in the following steps.
In a web browser, navigate to the Microsoft permissions request page (the Azure consent URL).
Select Accept. This action allows the Azure service principal created for your Polaris Catalog account to obtain an access token on specified resources inside your tenant. Obtaining an access token succeeds only if you grant the service principal the appropriate permissions on the container (see the next step). The Microsoft permissions request page redirects to the Snowflake corporate site (snowflake.com).
Log in to the Microsoft Azure portal.
Go to Azure Services » Storage Accounts. Select the name of the storage account that the Polaris Catalog service principal needs to access. Note
You must set IAM permissions for a storage configuration at the storage account level, not the container level.
Select Access Control (IAM) » Add role assignment.
Select the desired role to grant to the Polaris Catalog service principal, such as the Storage Blob Data Contributor role. The Storage Blob Data Contributor role grants read and write access to the Polaris Catalog service principal and grants write access to the storage location.
Note
Polaris Catalog issues user delegation SAS. The SAS token for accessing the storage blobs is scoped at the level of container instead of blob or directory. The role you select should have permission to create the user delegation key. For a list of these roles, see Assign permissions with RBAC.
Select Next.
Select + Select members.
Search for the Polaris Catalog service principal. This is the Azure multi-tenant app name property. Search for the string before the underscore in the property value. Important
It can take an hour or longer for Azure to create the Polaris Catalog service principal requested through the Microsoft request page in this section. If the service principal is not available immediately, wait an hour or two and then search again.
If you delete the service principal, the catalog will stop working due to authentication failure.
Select the Polaris Catalog service principal and select Select.
Select Review + assign.
Note
It can take up to 10 minutes for changes to take effect when you assign a role. For more information, see Symptom - Role assignment changes are not being detected in the Microsoft Azure documentation.