Note

The Snowflake Connector for PostgreSQL and Snowflake Connector for MySQL are subject to the Connector Terms.

Working with the MySQL and PostgreSQL connectors for Snowflake

Introduction

Welcome to our tutorial on using the Snowflake Database Connectors. This guide will help you seamlessly transfer data from relational databases into Snowflake.

In this tutorial, you’ll gain the skills to:

  • Set up MySQL and PostgreSQL on Docker, complete with sample data for ingestion.

  • Install and configure two native applications, one for each database.

  • Set up and fine-tune two agents, again one for each database.

  • Initiate and manage data ingestion processes.

  • Monitor the data ingestion workflow.

Let’s get started!

Prerequisites

Before beginning this tutorial, ensure you meet the following requirements:

  • Docker is installed and operational on your local machine.

  • You have a tool available for connecting to the database. This can be a database-specific tool or a general-purpose tool such as IntelliJ or Visual Studio Code.

Creating MySQL and PostgreSQL Source Databases

In this section, we will guide you through the following steps:

Starting the Database Instances

To initiate the MySQL and PostgreSQL databases using Docker, you’ll need to create a file called docker-compose.yaml. The content of the file should resemble:

version: '1'
services:
  mysql:
    container_name: mysql8
    restart: always
    image: mysql:8.0.28-oracle
    command: --log-bin=/var/lib/mysql/mysql-bin
      --max-binlog-size=4096
      --binlog-format=ROW
      --binlog-row-image=FULL
      --binlog-row-metadata=FULL
      --sql_mode="ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,PAD_CHAR_TO_FULL_LENGTH"
    environment:
      MYSQL_ROOT_PASSWORD: 'mysql'
    volumes:
      - ./mysql-data:/var/lib/mysql
    ports:
      - "3306:3306"
  postgres:
    image: "postgres:11"
    container_name: "postgres11"
    environment:
      POSTGRES_DB: 'psql_rows_db'
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: 'postgres'
    ports:
      - "5432:5432"
    command:
      - "postgres"
      - "-c"
      - "wal_level=logical"
    volumes:
      - ./postgres-data:/var/lib/postgresql/data
Copy

Once your docker-compose.yaml is ready, follow these steps:

  1. Open your terminal.

  2. Navigate to the directory containing your docker-compose.yaml file.

  3. Execute the following command to launch the source databases:

    docker-compose up -d
    
    Copy

After running this command, you should see two Docker containers actively running the source databases.

Connecting to the Database

To connect to the pre-configured databases using IntelliJ’s or Visual Studio Code database connections, perform the following steps with the provided credentials:

  1. Open your tool of choice for connecting to the MySQL.

  2. Click the ‘+’ sign or similar to add data source.

  3. Fill in the connection details:

    • User: root

    • Password: mysql

    • URL: jdbc:mysql://localhost:3306

  4. Test the connection and save.

Loading Sample Data

To initialize and load sample please execute those scripts in those connections.

  1. Execute the script to generate sample data

create database mysql_ingest_database;
use mysql_ingest_database;

CREATE TABLE  mysql_rows(
    id INT AUTO_INCREMENT PRIMARY KEY,
    random_string VARCHAR(255),
    random_number INT);

INSERT INTO mysql_rows (random_string, random_number) VALUES
    ('fukjxyiteb', 100),
    ('ndhbbipodi', 37),
    ('laebpztxzh', 83);

select * from mysql_ingest_database.mysql_rows;
Copy

At the end you should see three rows in each populated database.

Install and configure the Native App

During this step you will:

Install the Native Applications

Follow these steps to install the Application from the Snowflake Native Apps Marketplace:

  1. On your Snowflake Account, navigate to the Data Products, then to the Marketplace section.

  2. Search for the apps Snowflake Connector for MySQL and Snowflake Connector for PostgreSQL.

  3. Install both applications.

  4. You should find your Apps listed under the Data Products, Apps section.

Configuring the Native Applications

  1. On your Snowflake Account, navigate to the Data Products, Apps section.

  2. Open each application and do the following:

  1. Select Download Driver and keep the file for use with the agent run. The file name should resemble mariadb-java-client-3.3.2.jar or with newer version when available. Save this file for use during Agent configuration.

  2. Select Mark all as done as we will create our source databases from scratch. Note, we don’t need any additional network configuration as we configure the Agent later in this tutorial.

  3. Click Start configuration

  4. On the Configure Connector screen, select Configure.

  5. On the Verify Agent Connection screen, select Generate file to download the Agent Configuration file. The file should resemble snowflake.json. Save this file for later use in Agent configuration.

Configure the Agents

During this section, we will configure the Agent that will operate alongside our Source Databases.

The first step is to create two folders. These should be named agent-mysql and agent-postgresql.

Within each folder, create two directories named agent-keys and configuration. Your folder structure should resemble:

.
├── agent-mysql
│   ├── agent-keys
│   └── configuration
└── agent-postgresql
    ├── agent-keys
    └── configuration

Creating configuration files

In this step’s we will fill the configuration files for each agent to operate correctly. The configuration files include:

  • snowflake.json file to connect to the Snowflake.

  • datasources.json file to connect to the Source Databases.

  • postgresql.conf/mysql.conf files with additional Agent Environment Variables.

  • JDBC Driver file for MySQL Agent.

  1. Navigate to the folder called agent-mysql.

  2. Create the docker-compose file named docker-compose.yaml with the following content:

version: '1'
services:
  mysql-agent:
    container_name: mysql-agent
    image: snowflakedb/database-connector-agent:latest
    volumes:
      - ./agent-keys:/home/agent/.ssh
      - ./configuration/snowflake.json:/home/agent/snowflake.json
      - ./configuration/datasources.json:/home/agent/datasources.json
      - ./configuration/mariadb-java-client-3.3.2.jar:/home/agent/libs/mariadb-java-client-3.3.2.jar
    env_file:
      - configuration/mysql.conf
    mem_limit: 6g
Copy
  1. Put the previously downloaded snowflake.json file in the configuration folder.

  2. Put the previously downloaded mariadb-java-client-3.3.2.jar file in the configuration folder.

  3. Create the file named datasources.json in the configuration folder with the following content:

{
  "MYSQLDS1": {
    "url": "jdbc:mariadb://host.docker.internal:3306/?allowPublicKeyRetrieval=true&useSSL=false",
    "username": "root",
    "password": "mysql",
    "ssl": false
  }
}
Copy
  1. Create the file named mysql.conf in the configuration folder with the following content:

JAVA_OPTS=-Xmx5g
MYSQL_DATASOURCE_DRIVERPATH=/home/agent/libs/mariadb-java-client-3.3.2.jar
Copy
  1. Stat the agent using the following command. There shouldn’t be any error message and the agent should generate public/private key for authorization to the Snowflake.

docker-compose stop # stops the previous container in case you've launched it before
docker-compose rm -f # removes the agent container to recreate it with the latest image in case you had one before
docker-compose pull # refresh remote latest tag in case you have cached previous version
docker-compose up -d # run the agent
Copy
  1. Please note that the driver jar file name should be incidental to the one downloaded and used in the:

    • docker-compose.yaml file.

    • mysql.conf file

At the end, your folder structure should resemble the following. Please note the inclusion of the automatically generated private and public keys within the agent-keys directories.

.
├── agent-mysql
│   ├── agent-keys
│   │   ├── database-connector-agent-app-private-key.p8
│   │   └── database-connector-agent-app-public-key.pub
│   ├── configuration
│   │   ├── datasources.json
│   │   ├── mariadb-java-client-3.3.2.jar
│   │   ├── mysql.conf
│   │   └── snowflake.json
│   └── docker-compose.yaml
└── agent-postgresql
    ├── agent-keys
    │   ├── database-connector-agent-app-private-key.p8
    │   └── database-connector-agent-app-public-key.pub
    ├── configuration
    │   ├── datasources.json
    │   ├── postgresql.conf
    │   └── snowflake.json
    └── docker-compose.yaml

Verifying connection with Snowflake

Go back to your previously created Native Apps. Click on the Refresh button in the Agent Connection Section.

When successfully Configured you should see:

Agent is fully set up and connected. To select data to ingest Open Worksheet.
Copy

Configure and monitor the data ingestion process

In this step, we will instruct the Connector to begin replicating the selected tables. First, let’s create a shared sink database in Snowflake.

CREATE DATABASE CONNECTORS_DEST_DB;
GRANT CREATE SCHEMA ON DATABASE CONNECTORS_DEST_DB TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL;
GRANT CREATE SCHEMA ON DATABASE CONNECTORS_DEST_DB TO APPLICATION SNOWFLAKE_CONNECTOR_FOR_MYSQL;
Copy

Once the database is ready, we can move on to the configuration process.

  1. To begin table replication, you must first add a datasource from which to replicate and then specify the table to be replicated.

CALL SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.ADD_DATA_SOURCE('MYSQLDS1', 'CONNECTORS_DEST_DB');
CALL SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.ADD_TABLES('MYSQLDS1', 'mysql_ingest_database', ARRAY_CONSTRUCT('mysql_rows'));
Copy
  1. To monitor the replication you can execute the following queries

select * from SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.REPLICATION_STATE;
select * from SNOWFLAKE_CONNECTOR_FOR_MYSQL.PUBLIC.CONNECTOR_STATS;
Copy

Understanding connector status

The REPLICATION_STATE view is crucial for monitoring the status of table replication. This process encompasses three distinct phases:

  1. SCHEMA_INTROSPECTION: Ensures that the schema of the source table is accurately replicated.

  2. INITIAL_LOAD: Transfers the existing data from the source table to the destination.

  3. INCREMENTAL_LOAD: Continuously replicates ongoing changes from the source.

Upon successful replication, the status display will resemble the following:

REPLICATION_PHASE

SCHEMA_INTROSPECTION_STATUS

SNAPSHOT_REPLICATION_STATUS

INCREMENTAL_REPLICATION_STATUS

INCREMENTAL_LOAD

DONE

DONE

IN PROGRESS

You can read more about it in the official Connector Documentation.

View data

Execute the following commands to view data, which should include roughly 3 rows per database.

SELECT * FROM CONNECTORS_DEST_DB."psql_rows_schema"."postgres_rows";
SELECT * FROM CONNECTORS_DEST_DB."mysql_ingest_database"."mysql_rows";
Copy

Clean up and additional resources

Congratulations! You have successfully completed this tutorial.

To clean up your environment, please execute the following queries in your Snowflake Account. If you don’t do it, the connector will be running 24/7 generating costs.

Cleaning the Native App

DROP APPLICATION SNOWFLAKE_CONNECTOR_FOR_POSTGRESQL CASCADE;
DROP APPLICATION SNOWFLAKE_CONNECTOR_FOR_MYSQL CASCADE;
Copy

Clean up created warehouses, roles and users

During the installation multiple warehouses, roles and users were created. Execute the following queries to drop those objects.

DROP ROLE MYSQL_ADMINISTRATIVE_AGENT_ROLE;
DROP ROLE MYSQL_AGENT_ROLE;

DROP USER POSTGRESQL_AGENT_USER;

DROP WAREHOUSE POSTGRESQL_OPS_WH;
DROP WAREHOUSE POSTGRESQL_COMPUTE_WH;
Copy

Cleaning local docker images

You can also stop and remove your Docker containers for both source databases and agents. In folder containing the docker-compose.yaml files execute command docker-compose down -v to remove unwanted docker containers.

Additional resources

Continue learning about connectors using the following resources: