Setting Up Ponder#

Ponder requires you to both configure and initialize it before beginning your workflow. In this page, we’ll take a look at why that’s the case, and more importantly, what happens during these steps!

Configuring Ponder#

Configuring Ponder allows you to indicate which database to use as a backend. Unless configured, Ponder does not know how to connect to where your data lives, and cannot run any API’s that ingest data (e.g. read_csv, read_parquet), with the exception of read_sql, which can be manually provided with a SQL connection to use, but will not set the default connection for other data ingestion API’s. Specify a connection using the default_connection keyword argument.

Ponder currently supports Snowflake, BigQuery, and DuckDB databases as backends, and their corresponding Python connection objects as arguments to ponder.configure, but we’re always adding more, so be sure to check back here to see if your favorite database is supported!

Configuring Ponder also allows you to specify certain system wide parameters:

  • Row Transfer Limit: Ponder likes to keep all data inside the warehouse; sometimes however, it may be necessary to pull some of the data out - for example, if a library requests to work with a pure pandas DataFrame. The row_transfer_limit keyword argument specifies the maximum number of rows to pull out of the database into memory in such circumstances. It defaults to 10,000, but can be adjusted as necessary. You can learn more about when this may happen by checking out the Integrations Page!

  • Query Timeout (Snowflake Only): The query timeout is Snowflake specific, and specifies to Ponder how long to allow queries to run before cancelling them. It is passed directly to Snowflake. Specify it using the query_timeout keyword argument.

  • BigQuery Dataset (BigQuery Only): The schema to use when ingesting data from outside of BigQuery, for example, when reading a csv file via read_csv. Must be specified even if the default connection is specified. Specify it using the bigquery_dataset keyword argument.

Initializing Ponder#

Initializing Ponder allows it to check your API Key, and unlock Ponder’s capabilities. You can specify your API Key in one of three ways:

  • As a config file: When you first download Ponder, you can run ponder login from the command line to open up the Web UI to grab your API Key, which you can then provide via the command line, and will be written to your config file - which can be found in your home directory.

  • As an argument: Your API Key can be specified via Python by passing it as a string as an argument to ponder.init()

  • As an environment variable: Your API Key can be specified by setting the PONDER_API_KEY environment variable.

Ponder will check for an API Key in the order listed above - so an API Key passed in as an argument to ponder.init() will have higher priority and override an environment variable, while an API Key found in a config file will always be used.

Warning

Importing Ponder after importing Modin without initializing Ponder will cause Modin to be unusable. Either initialize Ponder with a valid API Key to use Modin with Ponder, or run ponder.restore_modin() to use the Open Source version of Modin.