BigQuery¶
paystone.bigquery
is a very simple wrapper around the BigQuery Python client library. It provides two main capabilities on top of the GCP client library:
- Async querying.
- Conversion of Pydantic data models to BigQuery schema.
Aside from these two things, the package essentially exposes all of the basic BigQuery functionality via PSBigQuery
. Using this class, one can:
- Query tables.
- List tables.
- Check for the existence of a table.
- Create a table.
- Insert rows via the streaming API.
Async Querying¶
Whereas synchronous querying via .query()
works by immediately calling .result()
on the created query job -- which is a blocking call that waits for the full results of the query and returns them as an iterator -- asynchronous querying via .query_async()
polls the status of the created query job on a heartbeat and async sleeps when the result is not available.
This can be useful when running multiple queries at the same time, to effectively parallelize the jobs. It could also be important to use when making large queries in a serving application, as an otherwise blocking call could greatly reduce the throughput of the service.
Pydantic to BigQuery Schema¶
Any BigQuery tables that are created solely for machine learning purposes will likely have their schema maintained by machine learning engineers. As Pydantic is our primary tool for creating data models, we keep our tooling centralized by developing our BigQuery schema in Pydantic as well. Thus, to interact with the BigQuery API using Pydantic models, we use a Pydantic-to-BigQuery schema converter.