Help:Toolforge/Jobs framework
Every non-trivial task performed in Toolforge (like executing a script or running a bot) should be dispatched to a job scheduling backend (in this case, Kubernetes), which ensures that the job is run in a suitable place with sufficient resources.
The basic principle of running jobs is fairly straightforward:
- You create a job from a submission server (usually login.toolforge.org)
- Kubernetes finds a suitable execution node to run the job on, and starts it there once resources are available
- As it runs, your job will send output and errors to files until the job completes or is aborted.
Jobs can be scheduled synchronously or asynchronously, continuously, or simply executed once.
Jobs should be run from a Tool Accounts.
Creating jobs
Information about job creation using the toolforge jobs run
command.
Creating one-off jobs
One-off jobs (or normal jobs) are workloads that will be scheduled by Toolforge Kubernetes and run until finished. They will run once, and are expected to finish at some point.
Select a runtime image, a command in your tool home directory and then use toolforge jobs run
to create the job, example using job name myjob
:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image somelang1.23
The --command
option supports input arguments, using quotes, example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command "./mycommand.sh --witharguments" --image somelang1.23
You can instruct the command line to wait and don't return until the job is finished with the --wait
option. By default the timeout is 10 minutes, but a custom number of seconds can be specified instead:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image somelang1.23 --wait
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run nothing --command "sleep 600" --image somelang1.23 --wait 630
Creating scheduled jobs (cron jobs)
To schedule a recurrent job (also known as cron jobs), use the --schedule WHEN
option when creating it:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run mycronjob --command ./daily.sh --image somelang1.23 --schedule "@daily"
The schedule argument uses cron syntax (see also cron on Wikipedia).
Please use the @hourly
, @daily
, @weekly
, @monthly
, @yearly
macros if possible. Those make it possible to spread the cluster load evenly through the time period which makes maintaining the cluster much easier.
You can force a rerun of a scheduled job with:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs restart mycron
Creating continuous jobs
Continuous jobs are programs that are never meant to end. If they end (for example, because of an error) the Toolforge Kubernetes system will restart them.
To create a continuous job, use the --continuous
option:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myalwaysrunningjob --command ./myendlesscommand.sh --image somelang1.23 --continuous
About the executable
In all job types (one-off, continuous, cronjob) the --command
parameter should meet the following conditions:
- it should refer to an executable file.
- mind the path, the command working directory is the tools home directory, so
--command mycommand.sh
will likely fail (it references $PATH), and--command ./mycommand.sh
is likely what you mean. - arguments are optional but if present then better use quotes, example:
--command "./mycommand.sh --arg1 x --arg2 y"
.
Failing to meet any of these conditions will lead to errors either before launching the job, or shortly after the job is processed by the backend.
About the job name
The job name is a unique string identifier. The string should meet these criteria:
- between 1 and 52 characters long.
- any combination of numbers, lower-case letters and the
.
(dot) and-
(dash) characters. - no spaces, no underscores, no special symbols.
Failing to meet any of these conditions will lead to errors either before launching the job, or shortly after the job is processed by the backend.
Choosing the execution runtime
In Toolforge Kubernetes you can use any image you built with the build service (preferred) or you can use one of the pre-defined container images.
To view which execution runtimes are available, run the toolforge jobs images
command (note that if you are using the build service, you'll have to have built your image already for it to show up).
Example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs images
Short name Container image URL
------------ ----------------------------------------------------------------------
bookworm docker-registry.tools.wmflabs.org/toolforge-bookworm-sssd:latest
bullseye docker-registry.tools.wmflabs.org/toolforge-bullseye-sssd:latest
jdk17 docker-registry.tools.wmflabs.org/toolforge-jdk17-sssd-base:latest
mariadb docker-registry.tools.wmflabs.org/toolforge-mariadb-sssd-base:latest
mono6.8 docker-registry.tools.wmflabs.org/toolforge-mono68-sssd-base:latest
node16 docker-registry.tools.wmflabs.org/toolforge-node16-sssd-base:latest
node18 docker-registry.tools.wmflabs.org/toolforge-node18-sssd-base:latest
perl5.32 docker-registry.tools.wmflabs.org/toolforge-perl532-sssd-base:latest
perl5.36 docker-registry.tools.wmflabs.org/toolforge-perl536-sssd-base:latest
php7.4 docker-registry.tools.wmflabs.org/toolforge-php74-sssd-base:latest
php8.2 docker-registry.tools.wmflabs.org/toolforge-php82-sssd-base:latest
python3.9 docker-registry.tools.wmflabs.org/toolforge-python39-sssd-base:latest
python3.11 docker-registry.tools.wmflabs.org/toolforge-python311-sssd-base:latest
ruby2.1 docker-registry.tools.wmflabs.org/toolforge-ruby21-sssd-base:latest
ruby2.7 docker-registry.tools.wmflabs.org/toolforge-ruby27-sssd-base:latest
ruby3.1 docker-registry.tools.wmflabs.org/toolforge-ruby31-sssd-base:latest
tcl8.6 docker-registry.tools.wmflabs.org/toolforge-tcl86-sssd-base:latest
In addition, there are several deprecated images that are available for older tools that rely on them but should not be used for new use cases.
NOTE: if your tool uses python, you may want to use a virtualenv, see Help:Toolforge/Python#Jobs.
Retry policy
You can specify the retry policy for failed jobs.
The default policy is to not try to restart failed jobs. But you can choose for them to be retried up to five times before given up by the scheduling engine.
Use the --retry N
option. Example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./myjob.sh --image somelang1.23 --retry 2
Note that the retry policy will be ignored for continuous jobs, given they are always restarted in case of failure.
Using envvars for configuration
You can use envvars to pass secrets and other configuration variables to your jobs.
Loading jobs from a YAML file
You can define a list of jobs in a YAML file and load them all at once using the toolforge jobs load
command, example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs load jobs.yaml
NOTE: loading jobs from a file will flush jobs with the same name if their definition varies.
You can use the --job <name>
option to load only one job as defined in the YAML file. Example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs load jobs.yaml --job "everyminute"
Example YAML file:
# https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework
---
# a cronjob
- name: hourly
command: ./myothercommand.sh -v
image: bullseye
no-filelog: true
schedule: "@hourly"
emails: onfailure
# a continuous job
- name: endlessjob
image: python3.11
command: python3 dumps-daemon.py --endless
continuous: true
emails: all
# wait for this one-off job before loading the next
- name: myjob
image: bullseye
command: ./mycommand.sh --argument1
wait: true
emails: onfinish
# another one-off job after the previous one finished running
- name: anotherjob
image: bullseye
command: ./mycommand.sh --argument1
emails: none
# this job sets custom stdout/stderr log files
- name: normal-job-with-custom-logs
image: bullseye
command: ./mycommand.sh --argument1
filelog-stdout: logs/stdout.log
filelog-stderr: logs/stderr.log
# this job sets a custom retry policy
- name: normal-job-with-custom-retry-policy
image: bullseye
command: ./mycommand.sh --argument1
retry: 2
# this job requests a higher memory limit
- name: normal-job-with-higher-memory-limit
image: bullseye
command: ./mycommand.sh --argument1
mem: 500Mi
# this continuous job runs a healthcheck script
- name: job-with-healthcheck-script
image: bullseye
command: ./some-command.sh
continuous: true
health-check-script: ./some-healthcheck-script.sh
# this continuous job has multiple replicas configured
- name: job-with-3-replicas
image: bullseye
command: ./some-command.sh
continuous: true
replicas: 3
You can do the opposite operation, and get all the defined jobs in YAML format, perhaps for a later load
. Examples:
tools.mytool@tools-bastion-12:~$ toolforge jobs dump
- command: ./some-script.sh
continuous: true
image: bookworm
name: test
- command: ./some-script.sh
continuous: true
image: bookworm
mem: 1G
name: test2
tools.mytool@tools-bastion-12:~$ toolforge jobs dump --to-file myjobs.yaml
tools.mytool@tools-bastion-12:~$ toolforge jobs load myjobs.yaml
Configuring health-checks for jobs
Sometimes your continuous jobs can get stuck on the code level but still appear to be running when you run toolforge jobs list
. Configuring health-check can help ensure that toolforge can detect issues like this and restart your continuous job.
To configure a health-check, specify --health-check-script
argument, the value of which should either be an inline string or an executable file. example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run continuous-job-with-health-check --command ./myendlesscommand.sh --image somelang1.23 --continuous --health-check-script "cat /etc/os-release"
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run continuous-job-with-health-check --command ./myendlesscommand.sh --image somelang1.23 --continuous --health-check-script ./health-check.sh
chmod u+x health-check.sh
) before creating your job with health-check configured.In order to properly work with health-checks, your tool/job code needs to be aware of this health check.
A common example:
- The tool main code loop includes some code to create a control file. For example
/tmp/myjob-alive
- You configure the health-check to verify the existence of this file, and to delete it if present. For example:
--health-check-script "test -e /tmp/myjob-alive && rm /tmp/myjob-alive"
- Because the control file was deleted by the health check, if the job is alive it should create the file again in the next loop iteration. If it is not created, the health check will fail, indicating the job is not healthy, and Toolforge will therefore restart the job.
Checks happen in two different phases: startup and liveness. The startup checks happen when the job is first launched. During this phase the health-check script will be called once each second. If it fails 120 times in a row then the job will be restarted. One success will end the startup phase.
The liveness checks happen every 10 seconds. If the health-check script fails 3 times in a row the job will be restarted.
Configuring internal domain names for your jobs
To run a job that expects to receive requests from other jobs (say a backend job that expects requests from a frontend job), you need to configure the internal domain name of the job. This way the jobs making the request won't need to know and keep track of the internal IP address of the target job. This is necessary because the internal IP address of jobs are ephemeral.
To configure the internal domain name, you only need to specify the target port like this --port
. once that is done, your job new job will now be reachable on https://<jobname>:<port>
example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run backend-continuous-job --command ./server.sh --image somelang1.23 --continuous --port 8080
The above job will now be reachable from other jobs using https://backend-continuous-job:8080
Running a continuous job with multiple replicas
Toolforge jobs framework by default creates 1 instance of a job. Sometimes there's a need to run multiple instances of the exact same thing, for example for multiple runner processes.
To create a multi-replica job, you can use the --replicas
option:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run backend-continuous-job --command ./server.sh --image somelang1.23 --continuous --replicas 2
Listing your existing jobs
You can get information about the jobs created for your tool using toolforge jobs list
, example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs list
Job name: Job type: Status:
-------------- ----------------- ---------------------------
myscheduledjob schedule: @hourly Last schedule time: 2021-06-30T10:26:00Z
alwaysrunning continuous Running
myjob normal Completed
Listing even more information at once is possible using --output long
:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs list --output long
Job name: Command: Job type: Image: File log: Output log: Error log: Emails: Resources: Retry: Status:
-------------- ----------------------- ----------------- -------- ----------- ------------- ------------ --------- ------------ -------- ---------
myscheduledjob ./read-dumps.sh schedule: @hourly bullseye no /dev/null /dev/null none default no Running
alwaysrunning ./myendlesscommand.sh continuous bullseye yes test2.out test2.err none default no Running
myjob ./mycommand.sh --debug normal bullseye yes logs/mylog logs/mylog onfinish default 2 Completed
You can also get the list of defined jobs in YAML format, using the dump
operation. Examples:
tools.mytool@tools-sgebastion-10:~$ toolforge jobs list
Job name: Job type: Status:
----------- ----------- ---------
myjob continuous Running
myjob2 continuous Running
tools.mytool@tools-sgebastion-10:~$ toolforge jobs dump
- command: ./some-script.sh
continuous: true
image: bookworm
name: myhob
- command: ./some-script.sh
continuous: true
image: bookworm
mem: 1G
name: myjob2
You can then save this dump YAML output to a file by either redirecting the output, or selecting the file directly with the -f
or --to-file
options. All the next examples are equivalent:
tools.mytool@tools-sgebastion-10:~$ toolforge jobs dump > jobs.yaml
tools.mytool@tools-sgebastion-10:~$ toolforge jobs dump -f jobs.yaml
tools.mytool@tools-sgebastion-10:~$ toolforge jobs dump --to-file jobs.yaml
You can use this YAML dump file later in a load
operation.
Deleting your jobs
You can delete your jobs in two ways:
- manually delete each job, identified by name, using the
toolforge jobs delete
command. - delete all defined jobs at once, using the
toolforge jobs flush
command.
Showing information about your job
You can get information about a defined job using the toolforge jobs show
command, example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs show myscheduledjob
+--------------+---------------------------------------------------------------+
| Job name: | myscheduledjob |
+--------------+---------------------------------------------------------------+
| Command: | ./read-dumps.sh myargument |
+--------------+---------------------------------------------------------------+
| Job type: | schedule: * * * * * |
+--------------+---------------------------------------------------------------+
| Image: | bullseye |
+--------------+---------------------------------------------------------------+
| File log: | yes |
+--------------+---------------------------------------------------------------+
| Output log: | /data/project/tool-name/myscheduledjob.out |
+--------------+---------------------------------------------------------------+
| Error log: | /data/project/tool-name/mysheduledjob.err. |
+--------------+---------------------------------------------------------------+
| Emails: | none |
+--------------+---------------------------------------------------------------+
| Resources: | mem: 10Mi, cpu: 100 |
+--------------+---------------------------------------------------------------+
| Replicas: | 1 |
+--------------+---------------------------------------------------------------+
| Mounts: | all |
+--------------+---------------------------------------------------------------+
| Retry: | no |
+--------------+---------------------------------------------------------------+
| Health check:| none |
+--------------+---------------------------------------------------------------+
| Status: | Last schedule time: 2021-06-30T10:26:00Z |
+--------------+---------------------------------------------------------------+
| Hints: | Last run at 2021-06-30T10:26:08Z. Pod in 'Pending' phase. |
| | State 'waiting' for reason 'ContainerCreating'. |
+--------------+---------------------------------------------------------------+
This should include information about the job status and some hints (in case of failure, etc).
Restarting your jobs
You can restart cronjobs or continuous jobs.
Use toolforge jobs restart <jobname>
. Example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs restart myjob
You can use this functionality to reset internal state of stuck jobs or jobs in failed state. The internal behavior is similar to removing the job and defining it again.
Trying to restart a non-existent job will do nothing.
Job logs
There are currently two possibilities for collecting logs from jobs:
- Internal logs, where the log output can be accessed with the
toolforge jobs logs
command while a job is running and for a short period after the job has finished. - File logs, where stdout and stderr will be redirected to files in the tool home directory. File logs are enabled by default for tools that use a pre-built container image, but disabled by default for jobs using build service images (including the prebuilt Pywikibot image).
File logging
Jobs log stdout/stderr to files in your tool home directory.
For a job myjob
, you will find:
- a
myjob.out
file, containing stdout generated by your job. - a
myjob.err
file, containing stderr generated by your job.
Example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image bullseye
tools.mytool@tools-sgebastion-11:~$ ls myjob*
myjob.out myjob.err
Subsequent same-name job runs will append to the same files.
Log generation can be disabled with the --no-filelog
parameter when creating a new job, for example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image bullseye --no-filelog
Custom log files
You can control where you store your logs. This allows for things like:
- using a custom directory
- merging stdout/stderr logs together into a single file
- ignoring one of the two log streams
To do that, make use of the following options when running a new job:
- (for stdout)
-o path/to/file.log
or--filelog-stdout path/to/file.log
- (for stderr)
-e path/to/file.log
or--filelog-stderr path/to/file.log
Example, running a job that merges both log streams into a single log file:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image bullseye --filelog-stdout myjob.log --filelog-stderr myjob.log
Example, running a job that uses the default `jobname`.out
but ignores stderr:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image bullseye --filelog-stderr /dev/null
Example, running a job that log both streams separately in a custom directory:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image bullseye --filelog-stdout mylogs/myjob.out.log --filelog-stderr mylogs/myjob.err.log
Custom directories should be created by hand previous to the job run. Selecting an invalid directory here will likely result in the job failing with exit code 2.
Pruning log files
Users should take care of log files growing too large.
The mariadb
image includes the logrotate program which can be used to control the sizes of log files using the Toolforge jobs framework.
Single job logs
If you have a continuous job, you will want to use copytruncate
mode for log rotation. To set it up, create a configuration file logrotate-myjob.conf
similar to this:
tools.mytool@tools-sgebastion-11:~$ nano logrotate-myjob.conf
"./logs/myjob.log"
{
daily
rotate 6
copytruncate
dateext
}
This configuration will rotate your log files daily, and keep 6 days of old logs in addition to the log for the current day.
The dateext
option renames rotated log files by appending a date to their filenames, allowing for better organization and differentiation of log files based on the date of rotation.
Then you can start automatic log rotation with:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run logrotate-myjob --command 'logrotate -v $TOOL_DATA_DIR/logrotate-myjob.conf --state $TOOL_DATA_DIR/logrotate-myjob.state' --image mariadb --schedule "@daily"
All logfiles at once
For rotating all your logs, you can use globs like:
tools.mytool@tools-sgebastion-11:~$ cat logrotate-all.conf
"./*.err" "./*.out" {
daily
rotate 6
copytruncate
dateext
}
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run logrotate-all --command 'logrotate -v $TOOL_DATA_DIR/logrotate-all.conf --state $TOOL_DATA_DIR/logrotate-all.state' --image mariadb --schedule "@daily"
Providing more modern approaches and facilities for logs management, metrics, etc. is in the current roadmap for the WMCS team. See Phabricator T127367 for example.
Internal log storage
If a job has file logs disabled (it uses a build service image or --no-filelog
), the Toolforge Kubernetes infrastructure will internally store the output for any currently running jobs. This means that logs are deleted once a job finishes executing (for one-off and scheduled jobs) or restarts. To view these logs, use the toolforge jobs logs
:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs logs myjob
This command also takes some flags:
-f
to follow logs in real-time-l [number]
to only see a specific number of newest log lines
It is intended that these logs will eventually be stored in a more persistent storage system.
Job quotas
Each tool account has a limited quota available. The same quota is used for jobs and other things potentially running on Kubernetes, like webservices.
To check your quota, run:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs quota
Running jobs Used Limit
-------------------------------------------- ------ -------
Total running jobs at once (Kubernetes pods) 0 10
Running one-off and cron jobs 0 15
CPU 0 2
Memory 0 8Gi
Per-job limits Limit
---------------- -------
CPU 1
Memory 4Gi
Job definitions Used Limit
---------------------------------------- ------ -------
Cron jobs 0 50
Continuous jobs (including web services) 0 3
As of this writing, new jobs get 512Mi memory and 0.5 CPU by default.
You can run jobs with additional CPU and memory using the --mem MEM
and --cpu CPU
parameters, example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command "./heavycommand.sh" --image bullseye --mem 1Gi --cpu 2
Requesting more memory or CPU will fail if the tool quota is exceeded.
You can find details on the underlying kubernetes quotas here.
Quota increases
It is possible to request a quota increase if you can demonstrate your tool's need for more resources than the default namespace quota allows. Instructions and a template link for creating a quota request can be found at Toolforge (Quota requests) in Phabricator.
Please read all the instructions there before submitting your request.
Note for Toolforge admins: there are docs on how to do quota upgrades.
Job email notifications
You can select to receive email notifications from your job activity, by using the --emails EMAILS
option when creating a job.
The available choices are:
none
, don't get any email notification. The default behavior.onfailure
, receive email notifications in case of a failure event.onfinish
, receive email notifications in case of the job finishing (both successfully and on failure).all
, receive all possible notifications.
Example:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run myjob --command ./mycommand.sh --image bullseye --emails onfinish
The email will be sent to [email protected]
, which is an email alias that by default redirects to all tool maintainers associated with that particular tool account.
Help command
List all available jobs-framework commands using the toolforge jobs -h
command:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs -h
usage: toolforge jobs [-h] {images,run,show,logs,list,delete,flush,load,restart,quota,dump} ...
Toolforge Jobs Framework, command line interface
positional arguments:
{images,run,show,logs,list,delete,flush,load,restart,quota,dump}
possible operations (pass -h to know usage of each)
images list information on available container image types for Toolforge jobs
run run a new job of your own in Toolforge
show show details of a job of your own in Toolforge
logs show output from a running job
list list all running jobs of your own in Toolforge
delete delete a running job of your own in Toolforge
flush delete all running jobs of your own in Toolforge
load flush all jobs and load a YAML file with job definitions and run them
restart restarts a running job
quota display quota information
dump dump all defined jobs in YAML format, suitable for a later `load` operation
options:
-h, --help show this help message and exit
List all available run command arguments using the toolforge jobs run -h
command:
tools.mytool@tools-sgebastion-11:~$ toolforge jobs run -h
usage: toolforge jobs run [-h] --command COMMAND --image IMAGE [--no-filelog | --filelog] [-o FILELOG_STDOUT] [-e FILELOG_STDERR] [--retry {0,1,2,3,4,5}] [--mem MEM] [--cpu CPU]
[--emails {none,all,onfinish,onfailure}] [--mount {all,none}] [--schedule SCHEDULE | --continuous | --wait [WAIT]] [--health-check-script HEALTH_CHECK_SCRIPT] [-p PORT]
[--replicas REPLICAS]
name
positional arguments:
name new job name
options:
-h, --help show this help message and exit
--command COMMAND full path of command to run in this job
--image IMAGE image shortname (check them with `images`)
--no-filelog disable redirecting job output to files in the home directory
--filelog explicitly enable file logs on jobs using a build service created image
-o FILELOG_STDOUT, --filelog-stdout FILELOG_STDOUT
location to store stdout logs for this job
-e FILELOG_STDERR, --filelog-stderr FILELOG_STDERR
location to store stderr logs for this job
--retry {0,1,2,3,4,5}
specify the retry policy of failed jobs.
--mem MEM specify additional memory limit required for this job
--cpu CPU specify additional CPU limit required for this job
--emails {none,all,onfinish,onfailure}
specify if the system should email notifications about this job. (default: 'none')
--mount {all,none} specify which shared storage (NFS) directories to mount to this job. (default: 'none' on build service images, 'all' otherwise)
--schedule SCHEDULE run a job with a cron-like schedule (example '1 * * * *')
--continuous run a continuous job
--wait [WAIT] wait for job one-off job to complete, optionally specify a value to override default timeout of 600s
--health-check-script HEALTH_CHECK_SCRIPT
specify a health check command to run on the job if any.
-p PORT, --port PORT specify the port to expose for this job. only valid for continuous jobs
--replicas REPLICAS specify the number of job replicas to be used. only valid for continuous jobs
Grid Engine migration
This section contains specific documentation for Grid Engine users that are trying to migrate their jobs to Kubernetes.
Extended content | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
In particular, here is a list of common command equivalences between Grid Engine (legacy, with
NOTE: the old grid |
Useful links
The following tools have been built by the Toolforge admin team to help others see job status:
- k8s-status.toolforge.org — status board of Kubernetes nodes and tools (webservices, jobs) they are currently running.
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)
See also
- Help:Toolforge/Web
- Help:Toolforge/Kubernetes
- News/2022 Toolforge Stretch deprecation
- News/2020 Kubernetes cluster migration
- Alternate procedure for managing jobs in Toolforge Kubernetes, using the raw k8s API, only recommended if you are an advanced user.
- Portal:Toolforge/Admin/Jobs framework - Engineering documentation about this system.
External links
- Source code of the toolforge-jobs command
- Wikimedia Techblog: Toolforge Jobs Framework Arturo Borrero González, Site Reliability Engineer, Wikimedia Cloud Services Team, March 18, 2022