A tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.
machine-learning training-data ai-learning wiki-parser ai-training machine-learning-tool wiki-to-txt wiki-to-text wiki-to-plaintext wikidump-to-txt wikidump-to-plaintext wikidump-parser ai-learning-tool tool-for-ai wikidumps-parser wiki2plaintext data-parser-for-ai data-for-robots plaintext-data-for-ai wikipedia-to-txt
-
Updated
Nov 11, 2023 - Python