Welcome to Ponder 👋#
Ponder is a scalable data science platform that lets you run your pandas workflows directly in your data warehouse. Ponder gives you the scalability and security benefits of your data warehouse, while still preserving the ease-of-use and flexibility of pandas.
Ponder builds on top of the open source project, Modin, with added support for data warehouses, tailored to the needs of production-scale workloads in enterprise settings.
How it works?#
Ponder uses your data warehouse as an engine. Just initialize Ponder and configure the database connection to get started. The current version of Ponder supports cloud data warehouses (Snowflake, BigQuery) as well as local execution mode (with DuckDB). Additional support for other databases and warehouses coming soon. To get started, you first need to initialize Ponder.
import ponder
ponder.init()
To learn more about what initializing Ponder does, check out this page!
Next, you can connect to different database engines:
To establish a connection to Snowflake, we leverage Snowflake’s Python connector.
import snowflake.connector
db_con = snowflake.connector.connect(user=****, password=****, account=****, role=****, database=****, schema=****, warehouse=****)
To establish a connection to BigQuery, we leverage Google Cloud’s Python client for Google BigQuery. Here, we are connecting to the CUSTOMER
dataset by authenticating via your BigQuery service account key.
from google.cloud import bigquery
from google.cloud.bigquery import dbapi
from google.oauth2 import service_account
import json
db_con = dbapi.Connection(
bigquery.Client(
credentials=service_account.Credentials.from_service_account_info(
json.loads(open("my_serviceaccount_key.json").read()),
scopes=["https://www.googleapis.com/auth/bigquery"],
)
)
)
If you do not already have your account key file or don’t have a dataset created, please please follow our step-by-step guide here for more information.
To establish connection to DuckDB, all you need to do is use duckdb.connect(), which creates an in-memory database.
import duckdb
db_con = duckdb.connect()
Once you establish and initialize the database connection, you can connect to the table via pd.read_sql
.
# Connect to your table named "CUSTOMER"
df = pd.read_sql("CUSTOMER", db_con)
Now you can start hacking away with pandas! 🐼
df.describe()
df.groupby("C_MKTSEGMENT").mean()
pd.concat([df, df])
# .. and much more! 🧹📊🔍🧪
You can find a list of pandas APIs we support here. To get started, check out this 5-minute quickstart guide.
Key Features#
|
|
|
|