What It Does
- reads files from a directory path mounted inside the container
- supports recursive traversal of subdirectories
- filters files by extension, hidden status, or empty content
- optionally limits the number of files ingested
- runs ingestion on a configurable schedule
Prerequisites
The directory must be accessible inside thecelery_worker container. Mount it as a Docker volume in compose.yaml:
/data/docs) as the path value in config.yaml.
config.yaml Example
Configuration Reference
| Field | Required | Default | Description |
|---|---|---|---|
path | yes | — | Absolute path to the directory inside the container |
recursive | no | true | Whether to recurse into subdirectories |
required_exts | no | — | Comma-separated file extensions to include (e.g. txt,md,pdf). All files are included when omitted. |
exclude_hidden | no | true | Skip hidden files and directories (those starting with .) |
exclude_empty | no | false | Skip empty files |
num_files_limit | no | — | Maximum number of files to ingest. Unlimited when omitted. |
schedules | no | 3600 | Ingestion interval in seconds |
request_delay | no | 0 | Delay in seconds between processing each file. Useful for throttling I/O on large directories. |
Multiple Directory Sources
Add moresources entries (directory2, directory3, etc.) with separate volume mounts and paths per source.