MinT is a machine translation system hosted by Wikimedia Foundation. It uses multiple Neural Machine translation models to provide translation between large number of languages.
Currently used models:
The models are optimized for performance using OpenNMT CTranslate2
Clone the repository. Install the system dependencies:
sudo apt install wget unzip build-essential cmake
Create a python virtual environment and install dependencies
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt
Then run the service:
./entrypoint.sh
By default it will run in http://0.0.0.0:8989
.
Clone the repository, build the docker image and run it.
docker build -t wikipedia-mt . docker run -dp 8989:8989 wikipedia-mt:latest
Open http://0.0.0.0:8989/ using browser
CT2_INTER_THREADS
: maximum number of batches executed in parallel. Refer https://opennmt.net/CTranslate2/parallel.html. Default is number of CPUs.CT2_INTRA_THREADS
: number of computation threads that is used per batch Refer https://opennmt.net/CTranslate2/parallel.html. Default is 0
(auto)For above configurations, Use a value less than or equal to the available CPU cores.
Application can be monitored using graphite. Run the graphite-statsd docker, and point the statsd-host to it
docker run -d \ --name graphite \ --restart=always \ -p 80:80 \ -p 2003-2004:2003-2004 \ -p 2023-2024:2023-2024 \ -p 8125:8125/udp \ -p 8126:8126 \ graphiteapp/graphite-statsd
Now set the env value STATSD_HOST
to localhost
and STATSD_PORT
to 8125. STATSD_PREFIX environment variable can be used to override the default "machinetranslation" prefix.
Example:
STATSD_HOST=127.0.0.1 gunicorn
MinT is licensed under MIT license. See License.txt
MinT uses multiple machine translation models from various projects internally. Please refer the following table for their respective license details.
Project | License for Code/ Library | Documentation/ Public Models License/ Data Set |
---|---|---|
NLLB-200 | MIT | CC-BY-SA-NC 4.0 |
OpusMT | MIT | CC-BY 4.0 |
IndiTrans2 | MIT | CC-0 (No Rights Reserved) |
Softcatala | MIT | MIT |
MADLAB-400 | Apache 2.0 | Apache 2.0 |