Limit number of jobs users can execute in parallel
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	scfc
	May 26 2014, 6:43 PM

Description

In pmtpa, we limited the number of jobs that could be executed in parallel per queue to 16 IIRC. During migration to eqiad, this seems to have been lost, and thus at the moment tools.currentevents has 86 jobs running (further only queued because the load of the exec nodes is saturated):

scfc@tools-dev:~$ qstat -u tools.currentevents | fgrep ' r ' | wc -l
86
scfc@tools-dev:~$

So we need to limit the number of jobs executed in parallel again. Last time, there was some confusion what configuration option limited what and that initially caused only to limit the number of /pending/ jobs and delete others, so we need to be careful about that.

Details

Reference: bz65777

Related Objects
Search...

Status	Assigned	Task
Resolved	• Bstorm	T199271 Upgrade the tools gridengine system
Resolved	• Bstorm	T123270 Make gridengine exec hosts also submit hosts
Resolved	• Bstorm	T67777 Limit number of jobs users can execute in parallel
Resolved	• Bstorm	T213183 Set up puppet to handle the global and scheduler configuration of gridengine

Event Timeline

• bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:24 AM

• bzimport added a project: Toolforge.

• bzimport set Reference to bz65777.

scfc created this task.May 26 2014, 6:43 PM

coren moved this task from Backlog to Ready to be worked on on the Toolforge board.Nov 25 2014, 4:12 PM

scfc removed coren as the assignee of this task.Apr 7 2015, 5:00 AM

scfc triaged this task as Medium priority.

scfc updated the task description. (Show Details)

scfc set Security to None.

Aklapper added a project: Cloud-Services.Oct 24 2015, 8:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2015, 8:02 PM

zhuyifei1999 closed this task as a duplicate of T123270: Make gridengine exec hosts also submit hosts.Feb 8 2017, 7:01 PM

To make this a bit less confusing, let's make this task about limitting the number of parallel tasks a user can execute and T123270 about setting up execution nodes as submit hosts. T123270 has some information on how we limitted the number of parallel tasks in the past.

zhuyifei1999 mentioned this in T196495: Limit ability of a single user/tool to overwhealm job grid.Jun 5 2018, 6:37 PM

Framawiki subscribed.Jun 8 2018, 5:09 PM

• bd808 added a parent task: T199271: Upgrade the tools gridengine system.Jan 5 2019, 5:20 PM

• bd808 added a subtask: T123270: Make gridengine exec hosts also submit hosts.Jan 5 2019, 5:23 PM

• bd808 mentioned this in T212979: archive-things running a large number of parallel grid jobs.Jan 5 2019, 5:28 PM

• bd808 merged a task: T196495: Limit ability of a single user/tool to overwhealm job grid.Jan 5 2019, 5:39 PM

• bd808 subscribed.

Some notes from the duplicate T196495: Limit ability of a single user/tool to overwhealm job grid:

The resource quota system looks like it would require us to list each user specifically. There is however the maxujobs global scheduler setting:

maxujobs

The maximum number of jobs any user may have running in a Sun Grid Engine cluster at the same time. If set to 0 (default) the users may run an arbitrary number of jobs.

We currently have this set in the grid config with a value of 1000 which coincidentally(?) is also the upper limit on jobs per queue. This means that the limit functionally will never take effect.

That limit apparently was set by @valhallasw for giftbot:

In T123270#1939990, @valhallasw wrote:

@Giftpflanze actually needs more than this because of their array jobs. I'm setting this to 1000 now, which should be enough even for extreme use cases. It might also still be enough to kill gridengine for jobs that are spread out (as opposed to the single giftbot queue), but at least it's better than infinite.

The custom queue for giftbot was closed in T194615: Delete tools-exec-gift-trusty-01.tools.eqiad.wmflabs and giftbot queue. That makes me think that this very high limit for that single tool is no longer necessary. It looks to me like there are two possible paths forward here:

Set maxujobs to a reasonable number--possibly 16 per the initial task description here--to apply a constant limit for each user across the grid
Implement a service that works similar to the existing maintain-kubeusers and maintain-dbusers services which would create a resource quota for each user/tool within the grid engine config. This would let us set limits on things other than job count (slots) like num_proc, mem_free, mem_total, etc. It would also give us a place to add limit variances by user/tool.

The second option could be made more flexible and comprehensive than the first, but requires additional development work and some amount of ongoing maintenance. I would suggest that we start with the global per user limit of maxujobs and then reevaluate when we find credible need for a more advanced setup.

Mentioned in SAL (#wikimedia-cloud) [2019-01-07T15:54:22Z] <bstorm_> T67777 Set stretch grid user job limit to 16

I figure the new grid can start with 16 to see how and where that creates problems at least.

I am curious if the scheduler will simply dump user jobs into a long tail of qw state if we restrict it a lot. So what I'm thinking of trying is reducing the main grid to 50 first to see what happens and then tighten to 16.

In T67777#4859429, @Bstorm wrote:

I am curious if the scheduler will simply dump user jobs into a long tail of qw state if we restrict it a lot.

That should be what it does, yes. There is a max_u_jobs setting that puts an upper bound on the number of jobs that can be enqueued in any state that could be used to limit qw flooding by a single tool.

Ok, so perhaps I can set that to 50 and maxujobs to 16.

Mentioned in SAL (#wikimedia-cloud) [2019-01-07T17:21:11Z] <bstorm_> T67777 - set the max_u_jobs global grid config setting to 50 in the new grid

I do believe we have decided to leave these limits in place only on the new grid.

I did some testing with my user account and saw both the concurrent and max enqueued limits are active and working as hoped. A quick test is to do something like this:

$ for n in $(seq 1 51); do jsub -N limit-test -j yes -o $(pwd)/limit-test.log -stderr sleep 10; done
Your job 413 ("limit-test") has been submitted
Your job 414 ("limit-test") has been submitted
...
Your job 462 ("limit-test") has been submitted
Unable to run job: job rejected: only 50 jobs are allowed per user (current job count: 50)
Exiting.
$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
    430 0.25001 limit-test bd808        r     01/08/2019 03:17:35 [email protected].     1
    431 0.25001 limit-test bd808        r     01/08/2019 03:17:35 [email protected].     1
...
    445 0.25000 limit-test bd808        r     01/08/2019 03:17:35 [email protected].     1
    446 0.00000 limit-test bd808        qw    01/08/2019 03:16:58                                    1
    447 0.00000 limit-test bd808        qw    01/08/2019 03:16:58                                    1
...
    462 0.00000 limit-test bd808        qw    01/08/2019 03:16:59                                    1

In T67777#4860070, @Bstorm wrote:

I do believe we have decided to leave these limits in place only on the new grid.

Are the new settings managed by Puppet, or are they just runtime config in the cluster? If possible I'd like to see this set in Puppet somewhere so that we can put a reference to this task along with the setting so that we can keep some institutional memory of why these flags are active and what they do.

• bd808 edited projects, added cloud-services-team (Kanban); removed Cloud-Services.Jan 8 2019, 3:34 AM

• bd808 removed a subtask: T123270: Make gridengine exec hosts also submit hosts.Jan 8 2019, 4:06 AM

• bd808 edited parent tasks, added: T123270: Make gridengine exec hosts also submit hosts; removed: T199271: Upgrade the tools gridengine system.

The grid script that uses puppet's files in NFS only configures most of the functional grid environment outside global and scheduler stuff. Since qconf can take input from files for global and scheduler, it is possible to build files from templates and then smuggle them in with python, like I did for the rest of the grid. Should be pretty easy to extend it like that.

I'll spawn a task for that. As is, the institutional memory is basically just SAL

• Bstorm mentioned this in T213183: Set up puppet to handle the global and scheduler configuration of gridengine.Jan 8 2019, 3:51 PM

• Bstorm closed subtask T213183: Set up puppet to handle the global and scheduler configuration of gridengine as Resolved.Jan 17 2019, 11:53 PM

Let's call this done. I have a feeling we may end up tweaking the limits once we get more tools running on the new grid, but that can be a follow up ticket/discussion. I'm mostly thinking that the 50 queued jobs limit may be too aggressive (T123270#1925290).

Limit number of jobs users can execute in parallelClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Limit number of jobs users can execute in parallel
Closed, ResolvedPublic
Actions

Related Objects
Search...