duckdb / pg_duckdb
- воскресенье, 18 августа 2024 г. в 00:00:02
DuckDB-powered Postgres for high performance apps & analytics.
pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications.
pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck.
Pre-built binaries and additional installation options are coming soon.
To build pg_duckdb, you need:
To build and install, run:
make installNext, load the pg_duckdb extension:
CREATE EXTENSION pg_duckdb;IMPORTANT: Once loaded you can use DuckDB execution by running SET duckdb.execution TO true. This is opt-in to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doing ALTER USER my_analytics_user SET duckdb.execution TO true.
SELECT queries executed by the DuckDB engine can directly read Postgres tables.
Read parquet and CSV files from object storage (AWS S3, Cloudflare R2, or Google GCS).
SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)Enable the DuckDB Iceberg extension using SELECT duckdb.enable_extension('iceberg') and read Iceberg files with iceberg_scan.
Write a query — or an entire table — to parquet in object storage.
COPY (SELECT foo, bar FROM baz) TO 's3://...'COPY table TO 's3://...'Read and write to Parquet format in a single query
COPY (
SELECT count(*), name
FROM read_parquet('s3://bucket/file.parquet') AS (name text)
GROUP BY name
ORDER BY count DESC
) TO 's3://bucket/results.parquet';Query and JOIN data in object storage with Postgres tables, views, and materialized views.
Create indexes on Postgres tables to accelerate your DuckDB queries
Install DuckDB extensions using SELECT duckdb.install_extension('extension_name');
Toggle DuckDB execution on/off with a setting:
SET duckdb.execution = true|falseThe best way to get started is to connect Postgres to a new or existing object storage bucket (AWS S3, Cloudflare R2, or Google GCS) with pg_duckdb. You can query data in Parquet, CSV, and Iceberg format using read_parquet, read_csv, and iceberg_scan respectively.
Add a credential to enable DuckDB's httpfs support.
INSERT INTO duckdb.secrets
(cloud_type, cloud_id, cloud_secret, cloud_region)
VALUES ('S3', 'access_key_id', 'secret_accss_key', 'us-east-1');Copy data directly to your bucket - no ETL pipeline!
COPY (SELECT user_id, item_id, price, purchased_at FROM purchases)
TO 's3://your-bucket/purchases.parquet;Perform analytics on your data.
SELECT SUM(price) AS total, item_id
FROM read_parquet('s3://your-bucket/purchases.parquet')
AS (price float, item_id int)
GROUP BY item_id
ORDER BY total DESC
LIMIT 100;Please see the project roadmap for upcoming planned tasks and features.
pg_duckdb integration with MotherDuck will enable hybrid execution with Differential Storage.
pg_duckdb was developed in collaboration with our partners, Hydra and MotherDuck. We look forward to their continued contributions and leadership.
Hydra is a Y Combinator-backed database company, focused on DuckDB-Powered Postgres for app developers.
MotherDuck is the cloud-based data warehouse that extends the power of DuckDB.
We welcome all contributions big and small: