What It Does
- reads pages from a MediaWiki instance by namespace
- converts content to Markdown for indexing
- optionally filters out redirect pages
- runs ingestion on configurable schedules
Authentication
For public wikis, no authentication is required. For private wikis, provide a username and password via environment variables — see the config example below.It is recommended to use Bot passwords instead of your wiki account credentials. Bot passwords are scoped, revocable tokens that limit access to only the API actions the connector needs.
Environment Variables
Set these in.env.rag:
Required:
MEDIAWIKI1_HOST: hostname of the MediaWiki instance (example:en.wikipedia.org)
MEDIAWIKI1_SCHEDULES: ingestion interval in seconds (default:3600)MEDIAWIKI1_USERNAME: username for private wiki authenticationMEDIAWIKI1_PASSWORD: password for private wiki authentication
config.yaml Example
Configuration Reference
| Field | Required | Default | Description |
|---|---|---|---|
host | yes | — | Hostname of the MediaWiki instance (e.g. en.wikipedia.org) |
path | no | /w/ | Path to the MediaWiki API root |
scheme | no | https | URL scheme (http or https) |
page_limit | no | unlimited | Maximum pages to fetch per namespace |
namespaces | no | content namespaces | Comma-separated namespace IDs to ingest |
filter_redirects | no | true | Exclude redirect pages from ingestion |
username | no | — | Username for private wiki authentication |
password | no | — | Password for private wiki authentication |
schedules | no | 3600 | Ingestion interval in seconds |
request_delay | no | 0 | Delay in seconds between API requests (useful for rate-limited wikis) |
Multiple MediaWiki Sources
Add moresources entries (wiki2, wiki3, etc.) with separate env vars per source.