config.yaml.
Environment placeholders like ${VAR_NAME} resolve at runtime.
Top-level Keys
sources: connector list for ingestion jobsembedding: embedding provider + model configinference: LLM provider + model config for rephrase/chat tasksvector_store: chunking and pgvector/HNSW settings
sources
Each item:
config fields differ by connector type. See connector pages for exact keys.
Common Connector Fields
The following fields are supported by all connector types:| Field | Required | Default | Description |
|---|---|---|---|
schedules | no | 3600 | Ingestion interval in seconds |
request_delay | no | 0 | Seconds to wait between outbound API requests. Increase to avoid rate-limiting (e.g. 0.1) |
embedding
provider: embedding backendmodel_config: model id/name used by providerembedding_dim: vector dimension returned by model
WARNING: once embedding dimension is set, it cannot be changed without rebuilding the index.
inference
provider: inference backendmodel_config: chat/rephrase model id
NOTE: inference backend is not used for vector search, only for the rephrase endpoint.
vector_store
table_name: embeddings table namehybrid_search: vector + keyword retrieval togglechunk_size: chunk size for indexingchunk_overlap: overlap between adjacent chunkshnsw.hnsw_m: graph neighbors per nodehnsw.hnsw_ef_construction: index build quality/speed tradeoffhnsw.hnsw_ef_search: search recall/speed tradeoffhnsw.hnsw_dist_method: vector distance operator