Selecting Your Data Source#
Before we start can start our analysis, we need to first connect to a data source. Ponder currently supports read_csv
for operating on CSV files, read_sql
for operating on tables that are already stored in your data warehouse, and read_parquet
for operating on Parquet files.
Note
Note that unlike in pandas, the data ingestion (read_*
) command in Ponder does not actually load in the data into a dataframe in memory. Instead, you can think of the Ponder DataFrame as a pointer to the table in your warehouse, which stores the data and does the computation.

read_sql
: Working with existing tables#
To work with data stored in an existing table in your warehouse, we use the read_sql
command and provide the name of the table CUSTOMER
and pass in your database connection object your_db_con
to the connection parameter.
df = pd.read_sql("CUSTOMER", con= db_con)
Now that we have a Ponder DataFrame that points to the CUSTOMER
table in your data warehouse, you can now work on your DataFrame df
just like you would typically do with any pandas dataframe – with all the computation happening in your database!
read_csv
: Working with CSV files#
Going beyond read_sql
, if the pandas command doesn’t take in a database connection as a parameter, such as in the case of read_csv
, we need to configure Ponder to leverage the database connection that we established earlier.
ponder.configure(default_connection=db_con)
Then, use the read_csv
command to feed in the file path to the CSV file.
df = pd.read_csv("https://github.com/ponder-org/ponder-datasets/blob/main/tpch/orders.csv?raw=True", header=0)
Ponder will automatically process your CSV file and load it into a temporary table in your warehouse for analysis.
read_parquet
: Working with Parquet files#
Going beyond read_sql
, if the pandas command doesn’t take in a database connection as a parameter, such as in the case of read_parquet
, we need to configure Ponder to leverage the database connection that we established earlier.
ponder.configure(default_connection=db_con)
Then, use the read_parquet
command to feed in the file path to the Parquet file.
df = pd.read_parquet("https://github.com/ponder-org/ponder-datasets/blob/main/userdatasample.parquet?raw=True", header=0)
Ponder will automatically process your Parquet file and load it into a temporary table in your warehouse for analysis.
Now that we see how pd.read_*
works in Ponder, we will discuss how you can use pd.to_*
to save your dataframes with Ponder.