While working on a project, I recently needed a lightweight, serverless vector database solution. The vectors needed to be stored alongside additional metadata, and the database itself would only be updated daily—i.e., read-only for most users. Additionally, the dataset was relatively small.
I initially considered using SQLite with Alex Garcia’s sqlite-vec extension. However, I realized that my application was primarily generating ranked product feeds, where an embedded OLAP database would likely be a better fit. So, I decided to try DuckDB’s VSS extension.
The application ingests data daily as a Parquet file and loads it into a DuckDB database using the following script. This script also installs the VSS extension and creates an HNSW index for the embedding column (a 256-dimensional vector) to enable fast approximate nearest neighbor search using cosine similarity.
# /// script
# dependencies = [
# "polars==1.22.0",
# "pyarrow==19.0.0",
# "duckdb==1.2.0"
# ]
# ///
import sys
import polars as pl
import duckdb
def main(parquet_file, duckdb_file):
df = pl.read_parquet(parquet_file)
con = duckdb.connect(duckdb_file)
con.execute("DROP TABLE IF EXISTS articles;")
con.register("df_temp", df.to_arrow())
con.execute("""
INSTALL vss;
LOAD vss;
SET hnsw_enable_experimental_persistence = true;
CREATE TABLE articles AS
SELECT
* EXCLUDE(embedding),
CAST(embedding AS FLOAT[256]) AS embedding
FROM df_temp;
CREATE INDEX embedding_hnsw_index
ON articles
USING HNSW (embedding)
WITH (metric = 'cosine');
""")
con.unregister("df_temp")
con.close()
print(f"Created single DuckDB table 'articles' in {duckdb_file}.")
if __name__ == "__main__":
if len(sys.argv) != 3:
print(f"Usage: {sys.argv[0]} <input_parquet_file> <output_duckdb_file>")
sys.exit(1)
parquet_file = sys.argv[1]
duckdb_file = sys.argv[2]
main(parquet_file, duckdb_file)
To expose this database via a REST API, I created a serverless service using FastAPI, Docker, and Cloud Run. The database is copied into the Docker image—this is feasible because it’s read-only and (still) small. When the data is updated, I simply redeploy the service with the latest dataset.
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY inventory.duckdb .
RUN python -c "\
import duckdb; \
conn = duckdb.connect('data.duckdb'); \
conn.execute('INSTALL vss;'); \
conn.close()"
COPY main.py .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
The endpoint code for querying the vector database is as follows:
from typing import List, Dict, Any
import duckdb
import uvicorn
from fastapi import FastAPI, Query, Body
from pydantic import BaseModel
app = FastAPI()
DATABASE_PATH = "data.duckdb"
class Request(BaseModel):
offset: int = 0
limit: int = 20
embedding: List[float]
@app.post("/feed")
def get_feed(
request: Request
) -> List[Dict[str, Any]]:
offset = request.offset
limit = request.limit
embedding = request.embedding
query = """
LOAD vss;
SELECT
article_id,
main_name,
retail_price
FROM articles
ORDER BY array_distance(embedding, $emb::FLOAT[256])
LIMIT $limit OFFSET $offset
"""
with duckdb.connect(DATABASE_PATH) as conn:
cursor = conn.execute(query, {"offset": offset, "limit": limit, "emb": embedding})
columns = [desc[0] for desc in cursor.description]
rows = cursor.fetchall()
return [dict(zip(columns, row)) for row in rows]
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
I can’t yet speak to its real-world performance, but initial results look promising. While scaling to larger datasets might introduce new challenges, the simplicity, cost-efficiency, and ease of deployment make this approach an excellent fit for smaller projects. If performance bottlenecks arise, exploring optimizations like external storage, query tuning, or alternative vector search libraries / databases could be the next steps (e.g. qdrant).