Note
The Snowflake Connector for PostgreSQL and Snowflake Connector for MySQL are subject to the Connector Terms.
Working with the MySQL and PostgreSQL connectors for Snowflake¶
Introduction¶
Welcome to our tutorial on using the Snowflake Database Connectors. This guide will help you seamlessly transfer data from relational databases into Snowflake.
In this tutorial, you’ll gain the skills to:
Set up MySQL and PostgreSQL on Docker, complete with sample data for ingestion.
Install and configure two native applications, one for each database.
Set up and fine-tune two agents, again one for each database.
Initiate and manage data ingestion processes.
Monitor the data ingestion workflow.
Let’s get started!
Prerequisites¶
Before beginning this tutorial, ensure you meet the following requirements:
Docker is installed and operational on your local machine.
You have a tool available for connecting to the database. This can be a database-specific tool or a general-purpose tool such as IntelliJ or Visual Studio Code.
Creating MySQL and PostgreSQL Source Databases¶
In this section, we will guide you through the following steps:
Starting the Database Instances - Learn how to launch your MySQL and PostgreSQL instances using Docker.
Connecting to the Database - Instructions on how to establish a connection to your databases.
Loading Sample Data - A walkthrough on how to populate your databases with sample data.
Starting the Database Instances¶
To initiate the MySQL and PostgreSQL databases using Docker, you’ll need to create a file called docker-compose.yaml
.
The content of the file should resemble:
version: '1' services: mysql: container_name: mysql8 restart: always image: mysql:8.0.28-oracle command: --log-bin=/var/lib/mysql/mysql-bin --max-binlog-size=4096 --binlog-format=ROW --binlog-row-image=FULL --binlog-row-metadata=FULL --sql_mode="ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,PAD_CHAR_TO_FULL_LENGTH" environment: MYSQL_ROOT_PASSWORD: 'mysql' volumes: - ./mysql-data:/var/lib/mysql ports: - "3306:3306" postgres: image: "postgres:11" container_name: "postgres11" environment: POSTGRES_DB: 'psql_rows_db' POSTGRES_USER: 'postgres' POSTGRES_PASSWORD: 'postgres' ports: - "5432:5432" command: - "postgres" - "-c" - "wal_level=logical" volumes: - ./postgres-data:/var/lib/postgresql/data
Once your docker-compose.yaml is ready, follow these steps:
Open your terminal.
Navigate to the directory containing your
docker-compose.yaml
file.Execute the following command to launch the source databases:
docker-compose up -d
After running this command, you should see two Docker containers actively running the source databases.
Connecting to the Database¶
To connect to the pre-configured databases using IntelliJ’s or Visual Studio Code database connections, perform the following steps with the provided credentials:
Open your tool of choice for connecting to the MySQL.
Click the ‘+’ sign or similar to add data source.
Fill in the connection details:
User:
root
Password:
mysql
URL:
jdbc:mysql://localhost:3306
Test the connection and save.
Open your tool of choice for connecting to the PostgreSQL.
Click the ‘+’ sign or similar to add data source.
Fill in the connection details:
User:
postgres
Password:
postgres
URL:
jdbc:postgresql://localhost:5432
Test the connection and save.
Loading Sample Data¶
To initialize and load sample please execute those scripts in those connections.
Execute the script to generate sample data
create database mysql_ingest_database;
use mysql_ingest_database;
CREATE TABLE mysql_rows(
id INT AUTO_INCREMENT PRIMARY KEY,
random_string VARCHAR(255),
random_number INT);
INSERT INTO mysql_rows (random_string, random_number) VALUES
('fukjxyiteb', 100),
('ndhbbipodi', 37),
('laebpztxzh', 83);
select * from mysql_ingest_database.mysql_rows;
Execute the script to generate sample data
CREATE SCHEMA psql_rows_schema;
SET search_path TO psql_rows_schema;
CREATE TABLE psql_rows_schema.postgres_rows (
id SERIAL PRIMARY KEY,
a_text TEXT,
a_boolean BOOLEAN,
a_number INTEGER,
a_decimal DOUBLE PRECISION);
INSERT INTO psql_rows_schema.postgres_rows (a_text, a_boolean, a_number, a_decimal) VALUES
('QfJhyWwFuC', True, 37, 15.46),
('GwmIFgwvFy', True, 14, 13.21),
('jYvqOSEtam', True, 25, 20.85);
-- The publication is required to start the replication progress as the Connector is based on PostgreSQL Logical Replication
CREATE PUBLICATION agent_postgres_publication FOR ALL TABLES;
select * from psql_rows_schema.postgres_rows;
At the end you should see three rows in each populated database.
Install and configure the Native App¶
During this step you will:
Install the Native Applications¶
Follow these steps to install the Application from the Snowflake Native Apps Marketplace:
On your Snowflake Account, navigate to the Data Products, then to the Marketplace section.
Search for the apps Snowflake Connector for MySQL and Snowflake Connector for PostgreSQL.
Install both applications.
You should find your Apps listed under the Data Products, Apps section.
Configuring the Native Applications¶
On your Snowflake Account, navigate to the Data Products, Apps section.
Open each application and do the following:
Select Download Driver and keep the file for use with the agent run. The file name should resemble mariadb-java-client-3.4.1.jar or with newer version when available. Save this file for use during Agent configuration.
Select Mark all as done as we will create our source databases from scratch. Note, we don’t need any additional network configuration as we configure the Agent later in this tutorial.
Click Start configuration
On the Configure Connector screen, select Configure.
On the Verify Agent Connection screen, select Generate file to download the Agent Configuration file. The file should resemble snowflake.json. Save this file for later use in Agent configuration.
Select Mark all as done as we will create our source databases from scratch. No additional network configuration is required as its configured later in this tutorial.
Click Start configuration
On the Configure Connector screen, select Configure.
On the Verify Agent Connection screen select Generate file to download the Agent Configuration file. The downloaded file name should resemble snowflake.json. Save this file for use during the Agent configuration section.
Configure the Agents¶
During this section, we will configure the Agent that will operate alongside our Source Databases.
The first step is to create two folders. These should be named agent-mysql and agent-postgresql.
Within each folder, create two directories named agent-keys and configuration. Your folder structure should resemble:
. ├── agent-mysql │ ├── agent-keys │ └── configuration └── agent-postgresql ├── agent-keys └── configuration
Creating configuration files¶
In this step’s we will fill the configuration files for each agent to operate correctly. The configuration files include:
snowflake.json file to connect to the Snowflake.
datasources.json file to connect to the Source Databases.
postgresql.conf/mysql.conf files with additional Agent Environment Variables.
JDBC Driver file for MySQL Agent.
Navigate to the folder called agent-mysql.
Create the docker-compose file named docker-compose.yaml with the following content:
version: '1'
services:
mysql-agent:
container_name: mysql-agent
image: snowflakedb/database-connector-agent:latest
volumes:
- ./agent-keys:/home/agent/.ssh
- ./configuration/snowflake.json:/home/agent/snowflake.json
- ./configuration/datasources.json:/home/agent/datasources.json
- ./configuration/mariadb-java-client-3.4.1.jar:/home/agent/libs/mariadb-java-client-3.4.1.jar
env_file:
- configuration/mysql.conf
mem_limit: 6g
Put the previously downloaded snowflake.json file in the configuration folder.
Put the previously downloaded mariadb-java-client-3.4.1.jar file in the configuration folder.
Create the file named datasources.json in the configuration folder with the following content:
{
"MYSQLDS1": {
"url": "jdbc:mariadb://host.docker.internal:3306/?allowPublicKeyRetrieval=true&useSSL=false",
"username": "root",
"password": "mysql",
"ssl": false
}
}
Create the file named mysql.conf in the configuration folder with the following content:
JAVA_OPTS=-Xmx5g
MYSQL_DATASOURCE_DRIVERPATH=/home/agent/libs/mariadb-java-client-3.4.1.jar
Stat the agent using the following command. There shouldn’t be any error message and the agent should generate public/private key for authorization to the Snowflake.
docker-compose stop # stops the previous container in case you've launched it before
docker-compose rm -f # removes the agent container to recreate it with the latest image in case you had one before
docker-compose pull # refresh remote latest tag in case you have cached previous version
docker-compose up -d # run the agent
Please note that the driver jar file name should be incidental to the one downloaded and used in the:
docker-compose.yaml file.
mysql.conf file
Navigate to the folder called agent-postgresql.
Create the docker-compose file named docker-compose.yaml with the following content:
version: '1'
services:
postgresql-agent:
container_name: postgresql-agent
image: snowflakedb/database-connector-agent:latest
volumes:
- ./agent-keys:/home/agent/.ssh
- ./configuration/snowflake.json:/home/agent/snowflake.json
- ./configuration/datasources.json:/home/agent/datasources.json
env_file:
- configuration/postgresql.conf
mem_limit: 6g
Put the previously downloaded snowflake.json file in the configuration folder.
Create the file named datasources.json with the following content:
{
"PSQLDS1": {
"url": "jdbc:postgresql://host.docker.internal:5432/postgres",
"username": "postgres",
"password": "postgres",
"publication": "agent_postgres_publication",
"ssl": false
}
}
Create the file named postgresql.conf with the following content:
JAVA_OPTS=-Xmx5g
Stat the agent using the following command. There shouldn’t be any error message and the agent should generate public/private key for authorization to the Snowflake.
docker-compose up -d
At the end, your folder structure should resemble the following. Please note the inclusion of the automatically generated private and public keys within the agent-keys directories.
.
├── agent-mysql
│ ├── agent-keys
│ │ ├── database-connector-agent-app-private-key.p8
│ │ └── database-connector-agent-app-public-key.pub
│ ├── configuration
│ │ ├── datasources.json
│ │ ├── mariadb-java-client-3.4.1.jar
│ │ ├── mysql.conf
│ │ └── snowflake.json
│ └── docker-compose.yaml
└── agent-postgresql
├── agent-keys
│ ├── database-connector-agent-app-private-key.p8
│ └── database-connector-agent-app-public-key.pub
├── configuration
│ ├── datasources.json
│ ├── postgresql.conf
│ └── snowflake.json
└── docker-compose.yaml
Verifying connection with Snowflake¶
Go back to your previously created Native Apps. Click on the Refresh button in the Agent Connection Section.
When successfully Configured you should see:
Agent is fully set up and connected. To select data to ingest Open Worksheet.
Configure and monitor the data ingestion process¶
In this step, we will instruct the Connector to begin replicating the selected tables. First, let’s create a shared sink database in Snowflake.
CREATE DATABASE CONNECTORS_DEST_DB;
GRANT CREATE SCHEMA ON DATABASE CONNECTORS_DEST_DB TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL;
GRANT CREATE SCHEMA ON DATABASE CONNECTORS_DEST_DB TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_MYSQL;
Once the database is ready, we can move on to the configuration process.
To begin table replication, you must first add a datasource from which to replicate and then specify the table to be replicated.
CALL SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.ADD_DATA_SOURCE('MYSQLDS1', 'CONNECTORS_DEST_DB');
CALL SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.ADD_TABLES('MYSQLDS1', 'mysql_ingest_database', ARRAY_CONSTRUCT('mysql_rows'));
To monitor the replication you can execute the following queries
select * from SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.REPLICATION_STATE;
select * from SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.CONNECTOR_STATS;
To begin table replication, you must first add a datasource from which to replicate and then specify the table to be replicated.
CALL SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL.PUBLIC.ADD_DATA_SOURCE('PSQLDS1', 'CONNECTORS_DEST_DB');
CALL SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL.PUBLIC.ADD_TABLES('PSQLDS1', 'psql_rows_schema', ARRAY_CONSTRUCT('postgres_rows'));
To monitor the replication you can execute the following queries
select * from SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL.PUBLIC.REPLICATION_STATE;
select * from SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL.PUBLIC.CONNECTOR_STATS;
Understanding connector status¶
The REPLICATION_STATE view is crucial for monitoring the status of table replication. This process encompasses three distinct phases:
SCHEMA_INTROSPECTION: Ensures that the schema of the source table is accurately replicated.
INITIAL_LOAD: Transfers the existing data from the source table to the destination.
INCREMENTAL_LOAD: Continuously replicates ongoing changes from the source.
Upon successful replication, the status display will resemble the following:
REPLICATION_PHASE
SCHEMA_INTROSPECTION_STATUS
SNAPSHOT_REPLICATION_STATUS
INCREMENTAL_REPLICATION_STATUS
INCREMENTAL_LOAD
DONE
DONE
IN PROGRESS
You can read more about it in the official Connector Documentation.
View data¶
Execute the following commands to view data, which should include roughly 3 rows per database.
SELECT * FROM CONNECTORS_DEST_DB."psql_rows_schema"."postgres_rows";
SELECT * FROM CONNECTORS_DEST_DB."mysql_ingest_database"."mysql_rows";
Clean up and additional resources¶
Congratulations! You have successfully completed this tutorial.
To clean up your environment, please execute the following queries in your Snowflake Account. If you don’t do it, the connector will be running 24/7 generating costs.
Cleaning the Native App¶
DROP APPLICATION SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL CASCADE;
DROP APPLICATION SNOWFLAKE_CONNECTOR_FOR_MYSQL CASCADE;
Clean up created warehouses, roles and users¶
During the installation multiple warehouses, roles and users were created. Execute the following queries to drop those objects.
DROP ROLE MYSQL_ADMINISTRATIVE_AGENT_ROLE;
DROP ROLE MYSQL_AGENT_ROLE;
DROP USER POSTGRESQL_AGENT_USER;
DROP WAREHOUSE POSTGRESQL_OPS_WH;
DROP WAREHOUSE POSTGRESQL_COMPUTE_WH;
DROP ROLE POSTGRESQL_ADMINISTRATIVE_AGENT_ROLE;
DROP ROLE POSTGRESQL_AGENT_ROLE;
DROP USER MYSQL_AGENT_USER;
DROP WAREHOUSE MYSQL_COMPUTE_WH;
DROP WAREHOUSE MYSQL_OPS_WH;
Cleaning local docker images¶
You can also stop and remove your Docker containers for both source databases and agents. In folder containing the docker-compose.yaml files execute command docker-compose down -v to remove unwanted docker containers.
Additional resources¶
Continue learning about connectors using the following resources: