Skip to main content
Use the MediaWiki Connector to ingest pages from MediaWiki sites into the mAItion knowledge base.

What It Does

  • reads pages from a MediaWiki instance by namespace
  • converts content to Markdown for indexing
  • optionally filters out redirect pages
  • runs ingestion on configurable schedules

Authentication

For public wikis, no authentication is required. For private wikis, provide a username and password via environment variables — see the config example below.
It is recommended to use Bot passwords instead of your wiki account credentials. Bot passwords are scoped, revocable tokens that limit access to only the API actions the connector needs.

Environment Variables

Set these in .env.rag: Required:
  • MEDIAWIKI1_HOST: hostname of the MediaWiki instance (example: en.wikipedia.org)
Optional:
  • MEDIAWIKI1_SCHEDULES: ingestion interval in seconds (default: 3600)
  • MEDIAWIKI1_USERNAME: username for private wiki authentication
  • MEDIAWIKI1_PASSWORD: password for private wiki authentication

config.yaml Example

sources:
  - type: "mediawiki"
    name: "wiki1"
    config:
      host: "${MEDIAWIKI1_HOST}"
      path: "/w/"          # optional, default /w/
      scheme: "https"      # optional, default https
      page_limit: 500      # optional, max pages per namespace (default: unlimited)
      namespaces: "0,1"    # optional, comma-separated namespace IDs (default: content namespaces)
      filter_redirects: true  # optional, exclude redirect pages (default: true)
      username: "${MEDIAWIKI1_USERNAME}"  # optional, for private wikis
      password: "${MEDIAWIKI1_PASSWORD}"  # optional, for private wikis
      schedules: "${MEDIAWIKI1_SCHEDULES}"
      request_delay: 0.1     # optional, delay in seconds between requests (default: 0)

Configuration Reference

FieldRequiredDefaultDescription
hostyesHostname of the MediaWiki instance (e.g. en.wikipedia.org)
pathno/w/Path to the MediaWiki API root
schemenohttpsURL scheme (http or https)
page_limitnounlimitedMaximum pages to fetch per namespace
namespacesnocontent namespacesComma-separated namespace IDs to ingest
filter_redirectsnotrueExclude redirect pages from ingestion
usernamenoUsername for private wiki authentication
passwordnoPassword for private wiki authentication
schedulesno3600Ingestion interval in seconds
request_delayno0Delay in seconds between API requests (useful for rate-limited wikis)

Multiple MediaWiki Sources

Add more sources entries (wiki2, wiki3, etc.) with separate env vars per source.
sources:
  - type: "mediawiki"
    name: "wiki1"
    config:
      host: "${MEDIAWIKI1_HOST}"
      schedules: "${MEDIAWIKI1_SCHEDULES}"

  - type: "mediawiki"
    name: "wiki2"
    config:
      host: "${MEDIAWIKI2_HOST}"
      username: "${MEDIAWIKI2_USERNAME}"  # optional, for private wikis
      password: "${MEDIAWIKI2_PASSWORD}"  # optional, for private wikis
      schedules: "${MEDIAWIKI2_SCHEDULES}"