Clone this repo:

Branches

  1. e621e6d metrics: record cache hit/miss by Santhosh Thottingal · 8 weeks ago master
  2. b6afcce html: Fix issue with fuzzy matching with non-zero search_start by Santhosh Thottingal · 9 weeks ago
  3. 52fc3eb Update dependencies by Santhosh Thottingal · 9 weeks ago
  4. 58c02ec Avoid duplicate es keys in the config file by Santhosh Thottingal · 3 months ago
  5. 20b80d8 Fix Santali language code and enable NLLB again by Santhosh Thottingal · 3 months ago

MinT machine translation system

MinT is a machine translation system hosted by Wikimedia Foundation. It uses multiple Neural Machine translation models to provide translation between large number of languages.

Currently used models:

The models are optimized for performance using OpenNMT CTranslate2

Usage

Installation

Clone the repository. Install the system dependencies:

sudo apt install wget unzip build-essential cmake

Create a python virtual environment and install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then run the service:

./entrypoint.sh

By default it will run in http://0.0.0.0:8989.

Using docker

Clone the repository, build the docker image and run it.

docker build -t wikipedia-mt .
docker run -dp 8989:8989 wikipedia-mt:latest

Open http://0.0.0.0:8989/ using browser

Environment variables

For above configurations, Use a value less than or equal to the available CPU cores.

Monitoring

Application can be monitored using graphite. Run the graphite-statsd docker, and point the statsd-host to it

docker run -d \
 --name graphite \
 --restart=always \
 -p 80:80 \
 -p 2003-2004:2003-2004 \
 -p 2023-2024:2023-2024 \
 -p 8125:8125/udp \
 -p 8126:8126 \
 graphiteapp/graphite-statsd

Now set the env value STATSD_HOST to localhost and STATSD_PORT to 8125. STATSD_PREFIX environment variable can be used to override the default "machinetranslation" prefix.

Example:

STATSD_HOST=127.0.0.1 gunicorn

License

MinT is licensed under MIT license. See License.txt

MinT uses multiple machine translation models from various projects internally. Please refer the following table for their respective license details.

ProjectLicense for Code/ LibraryDocumentation/ Public Models License/ Data Set
NLLB-200MITCC-BY-SA-NC 4.0
OpusMTMITCC-BY 4.0
IndiTrans2MITCC-0 (No Rights Reserved)
SoftcatalaMITMIT
MADLAB-400Apache 2.0Apache 2.0