Welcome to Ponder 👋#
Ponder is a scalable data science platform that lets you run your pandas workflows directly in your data warehouse. Ponder gives you the scalability and security benefits of your data warehouse, while still preserving the ease-of-use and flexibility of pandas.
Ponder builds on top of the open source project, Modin, with added support for data warehouses, tailored to the needs of production-scale workloads in enterprise settings.
How it works?#
Ponder uses your data warehouse as an engine. Just initialize Ponder and configure the database connection to get started. The current version of Ponder supports cloud data warehouses (Snowflake, BigQuery) as well as local execution mode (with DuckDB). Additional support for other databases and warehouses coming soon. To get started, you first need to initialize Ponder.
import ponder ponder.init()
To learn more about what initializing Ponder does, check out this page!
Next, you can connect to different database engines:
To establish a connection to Snowflake, we leverage Snowflake’s Python connector.
import snowflake.connector db_con = snowflake.connector.connect(user=****, password=****, account=****, role=****, database=****, schema=****, warehouse=****)
To establish a connection to BigQuery, we leverage Google Cloud’s Python client for Google BigQuery. Here, we are connecting to the
CUSTOMER dataset by authenticating via your BigQuery service account key.
from google.cloud import bigquery from google.cloud.bigquery import dbapi from google.oauth2 import service_account import json db_con = dbapi.Connection( bigquery.Client( credentials=service_account.Credentials.from_service_account_info( json.loads(open("my_serviceaccount_key.json").read()), scopes=["https://www.googleapis.com/auth/bigquery"], ) ) )
If you do not already have your account key file or don’t have a dataset created, please please follow our step-by-step guide here for more information.
To establish connection to DuckDB, all you need to do is use duckdb.connect(), which creates an in-memory database.
import duckdb db_con = duckdb.connect()
Once you establish and initialize the database connection, you can connect to the table via
# Connect to your table named "CUSTOMER" df = pd.read_sql("CUSTOMER", db_con)
Now you can start hacking away with pandas! 🐼
df.describe() df.groupby("C_MKTSEGMENT").mean() pd.concat([df, df]) # .. and much more! 🧹📊🔍🧪