2

I am setting up 4 hosts, each exporting one local storage device over iscsi with target. Every other hosts imports it such that each host has concurrent access to all 4 each storage devices. I built a LVM shared volume group that includes all these 4 iscsi devices. In this volume group, I created 4 logical volumes, each baked with one of the iscsi-imported device. Finally I use LVM shared VG synchronization mechanism, using lvmlockd and dlm to make sure that only one host can use these logical volumes, at a time. Finally, I built a raid6 array on top of these 4 logical volumes, so that in principle I can have up to 2 hosts down without interrupting the storage service.

I manage the whole thing using pacemaker, from exporting iscsi volumes with target to building the raid6 array. So far everything works wonders, except managing scenarios where 1 or 2 node is down; the data is safe but since I set constraints to start the raid6 array after the 4 logical volumes resources are started, pacemaker disables the array as soon as at least 1 host gets offline. I would like pacemaker to continue the service up to the loss of 2 hosts.

For that, I would need order (and colocation) constraints that enables the raid6 array if and only if at least 2 of these logical volumes are online. Better still: enable the raid6 array if and only if at most 2 logical volumes could not be brought online, for whatever reason. Unfortunately, pacemaker only allows predecessor resource sets in order and colocation constraints (i.e., the resource or resource set that must start first) to be considered as started if either all resources in the set are started (require-all=true) or if at least one is started (require-all=false), but not if at least two is started or at most two are missing.

As a workaround, I consider creating 11 raid6 resources, one for each possible usable scenario, i.e., one for each possible scenario where at most 2 logical devices are missing:

  • LVs 1 and 2 are available
  • LVs 1 and 3 are available
  • LVs 1 and 4 are available
  • LVs 2 and 3 are available
  • LVs 2 and 4 are available
  • LVs 3 and 4 are available
  • LVs 1, 2 and 3 are available
  • LVs 1, 2 and 4 are available
  • LVs 1, 3 and 4 are available
  • LVs 2, 3 and 4 are available
  • LVs 1, 2, 3 and 4 are available

I would create raid6 resources with order and colocation constraints, each matching one line in the enumeration above. Then I need an additional constraint that exclude each of these raid6 resources mutually, so that at any given time, the actual array is assembled only once.

So here are my 3 questions:

  1. Is there any way to express "at most 2 missing" within predecessor set in order and colocation constraint, or if not is there any "at least 2 active" similar constraint construct?
  2. If the answer to question 1 is no, is there any way to express mutual exclusion between a pair or resources, or within a resource set, preferably with priority settings favoring resource variants that use the highest number of devices?
  3. Is there any other workaround any pacemaker wizard out there can suggest?

1 Answer 1

2

It turns out the agent ocf:heartbeat:mdraid I use is perfectly able to assemble a device with missing volumes. Therefore I created 11 dummy resources following the scheme described in my question, i.e.:

  • One that starts after LVs 1 and 2 resources.
  • One that starts after LVs 1 and 3 resources.
  • One that starts after LVs 1 and 4 resources.
  • etc.

And I made the raid resource to start after all of these dummy resources, with require-all=false set to the dummy resource set in the constraint.

The next trouble is the colocation for LVs 1, 2, 3 and 4 implicitly add an order constraint from LV 1 to LV 2, to LV 2 to LV 3, etc., which means that if for whatever reason LV 3 is stopped, then LV 4 is also stopped; even if nothing else blocks it. I solved it by removing the colocation constraint between all LV resources, adding another dummy resource and creating 4 colocation constraints between this dummy resource and each LV.

This problem seems to be fixed. There are more to solve still, but they are outside the scope of this question.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .