0

My goal is to have primary on a specific host (due to read-replicas not an option for non-RBD), and replicas on any host (including the host already chosen), but not the primary OSD. My current CRUSH rule is

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class nvme
device 1 osd.1 class ssd
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class ssd
device 5 osd.5 class nvme
device 6 osd.6 class ssd
device 7 osd.7 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host nanopc-cm3588-nas {
id -3       # do not change unnecessarily
id -4 class nvme        # do not change unnecessarily
id -5 class ssd     # do not change unnecessarily
id -26 class hdd        # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0  # rjenkins1
item osd.0 weight 0.23288
item osd.2 weight 0.23288
item osd.5 weight 1.81940
item osd.7 weight 0.77588
}
host mbpcp {
id -7       # do not change unnecessarily
id -8 class nvme        # do not change unnecessarily
id -9 class ssd     # do not change unnecessarily
id -22 class hdd        # do not change unnecessarily
# weight 0.37560
alg straw2
hash 0  # rjenkins1
item osd.3 weight 0.37560
}
host mba {
id -10      # do not change unnecessarily
id -11 class nvme       # do not change unnecessarily
id -12 class ssd        # do not change unnecessarily
id -23 class hdd        # do not change unnecessarily
# weight 0.20340
alg straw2
hash 0  # rjenkins1
item osd.4 weight 0.20340
}
host mbpsp {
id -13      # do not change unnecessarily
id -14 class nvme       # do not change unnecessarily
id -15 class ssd        # do not change unnecessarily
id -24 class hdd        # do not change unnecessarily
# weight 0.37155
alg straw2
hash 0  # rjenkins1
item osd.1 weight 0.18578
item osd.6 weight 0.18578
}
root default {
id -1       # do not change unnecessarily
id -2 class nvme        # do not change unnecessarily
id -6 class ssd     # do not change unnecessarily
id -28 class hdd        # do not change unnecessarily
# weight 4.01160
alg straw2
hash 0  # rjenkins1
item nanopc-cm3588-nas weight 3.06104
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
chassis chassis-nanopc {
id -16      # do not change unnecessarily
id -20 class nvme       # do not change unnecessarily
id -21 class ssd        # do not change unnecessarily
id -27 class hdd        # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0  # rjenkins1
item nanopc-cm3588-nas weight 3.06104
}
chassis chassis-others {
id -17      # do not change unnecessarily
id -18 class nvme       # do not change unnecessarily
id -19 class ssd        # do not change unnecessarily
id -25 class hdd        # do not change unnecessarily
# weight 0.95056
alg straw2
hash 0  # rjenkins1
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
# rules
rule replicated_rule {
id 0
type replicated
step take chassis-nanopc
step chooseleaf firstn 1 type host
step emit
step take default
step chooseleaf firstn 0 type osd
step emit
}

However, it resulted in pg dump like this:

version 14099
stamp 2024-10-13T11:46:25.490783+0000
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED
6.3f 3385 0 0 3385 0 8216139409 0 0 1732 3000 1732 active+clean+remapped 2024-10-13T02:21:07.580486+0000 5024'13409 5027:39551 [5,5] 5 [5,4] 5 4373'10387 2024-10-12T09:46:54.412039+0000 1599'106 2024-10-09T15:41:52.360255+0000 0 2 periodic scrub scheduled @ 2024-10-13T17:41:52.579122+0000 2245 0
6.3e 3217 0 0 3217 0 7806374402 0 0 1819 1345 1819 active+clean+remapped 2024-10-13T03:36:53.629380+0000 5025'13549 5027:36882 [7,7] 7 [7,4] 7 4373'10667 2024-10-12T09:46:51.075549+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T13:27:11.454963+0000 2132 0
6.3d 3256 0 0 3256 0 7780755159 0 0 1733 3000 1733 active+clean+remapped 2024-10-13T02:21:46.947129+0000 5024'13609 5027:28986 [5,5] 5 [5,4] 5 4371'11218 2024-10-12T09:39:44.502516+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T14:12:17.856811+0000 2202 0

See [5,5]. Thus my cluster remains in remapping state. Is there anyway I can achieve my goal stated above?

1
  • To have a better overview it would help to have the output of ceph osd tree as well.
    – eblock
    Commented Oct 17 at 6:47

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Browse other questions tagged or ask your own question.