Saving Your Results#

Let’s say that you used Ponder to run some analysis and you want to store the results back to the database. Here, we will show how you can use the to_sql command to write your dataframe to table in your database.

to_sql: Save as Database Table#

df = pd.read_sql("PONDER_CITIBIKE",con=snowflake_con)

After connecting to our PONDER_CITIBIKE table, we see that there are a lot of records with missing values. So we drop these rows to clean up our dataset.

df_cleaned = df.dropna()

Note

Our design of the I/O interface for Ponder stays true to the expected behavior of pandas. This means that:

  • Like in pandas, you are always working on a temporary copy of the data. By default, all pandas operations returns a copy of the dataframe that was operated on. We will never edit your original data. This is especially important in the data warehouse context as the tables are often the “source of truth” shared across many teams.

  • Like in pandas, we will not edit your original data (stored in the CSV file or database table) even if you specify inplace=True. For example, by doing df.dropna(inplace=True), you are only modifying the Ponder DataFrame (i.e., the temporary copy of the data represented by the variable df) rather than data source. You can think of Ponder as primarily a read-only application, unless you choses to explicitly write back the result to a table.

What this also means is that the reference to your dataframe df is only accessible throughout the session. That means that once you exit the session, the dataframe is no longer accessible.

So if we plan to use the cleaned up data for our analysis later, we can persist it in a table via to_sql.

df_cleaned.to_sql("PONDER_CITIBIKE_CLEANED",con=snowflake_con, index=False)

Now we can access new new table and continue our analysis whereever we want:

pd.read_sql("PONDER_CITIBIKE_CLEANED",con=snowflake_con)

to_csv: Save as CSV file#

You can also save your dataframe as a local CSV file via pd.to_csv.

df_cleaned.to_csv("PONDER_CITIBIKE_CLEANED")

Note saving your dataframe as a CSV requires pulling data out of your warehouse, so it can introduce significant I/O challenges. Please exercise caution if you perform to_csv on a large dataframe as it can led to memory issues if it exceeds the size that can fit in-memory.