Establishing Database Connection#
Ponder uses your data warehouse as an engine, so we need to establish a connection with your data warehouse in order to start querying the data. Here we show examples of how you can configure the database connection to different data warehouses.
To get started, you’ll need to initialize Ponder.
import ponder
ponder.init()
To learn more about what initializing Ponder does, check out this page!
Below are the connection instructions for users who have a cloud data warehouse (Snowflake, BigQuery). And if you don’t have a warehouse already, we encourage you to try Ponder on DuckDB!
Connecting to Snowflake#
To establish a connection to Snowflake, we leverage Snowflake’s Python connector. Whether we have existing data inside of Snowflake that we want to work with or flat files in a local directory, Ponder will let you use Snowflake as a compute engine.
import snowflake.connector
db_con = snowflake.connector.connect(user=****, password=****, account=****, role=****, database=****, schema=****, warehouse=****)
If you can not find the Snowflake account information you need to set up your database connection or another method to authenticate, please follow our step-by-step guide here for more information.
Connecting to BigQuery#
To establish a connection to BigQuery, we leverage Google Cloud’s Python client for Google BigQuery. Below, we create the BigQuery database connection by authenticating with a service account key.
from google.cloud import bigquery
from google.cloud.bigquery import dbapi
from google.oauth2 import service_account
import json
db_con = dbapi.Connection(
bigquery.Client(
credentials=service_account.Credentials.from_service_account_info(
json.loads(open("my_serviceaccount_key.json").read()),
scopes=["https://www.googleapis.com/auth/bigquery"]
)
)
)
If you do not already have your account key file or don’t have a dataset created, please please follow our step-by-step guide here for more information.
Connecting to DuckDB#
To establish connection to DuckDB, all you need to do is use duckdb.connect(), which creates an in-memory database.
import duckdb
db_con = duckdb.connect()
If you’d like to use a persistent database, you can specify the path to your .db or .duckdb file. If the file does not exist, a new database with the specified name will be created.
import duckdb
db_con = duckdb.connect("ponder.db")
Next, we will be looking at how you can connect to your data source via the pd.read_*
command.