My goal is to have primary on a specific host (due to read-replicas not an option for non-RBD), and replicas on any host (including the host already chosen), but not the primary OSD. My current CRUSH rule is
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class nvme
device 1 osd.1 class ssd
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class ssd
device 5 osd.5 class nvme
device 6 osd.6 class ssd
device 7 osd.7 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host nanopc-cm3588-nas {
id -3 # do not change unnecessarily
id -4 class nvme # do not change unnecessarily
id -5 class ssd # do not change unnecessarily
id -26 class hdd # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.23288
item osd.2 weight 0.23288
item osd.5 weight 1.81940
item osd.7 weight 0.77588
}
host mbpcp {
id -7 # do not change unnecessarily
id -8 class nvme # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
id -22 class hdd # do not change unnecessarily
# weight 0.37560
alg straw2
hash 0 # rjenkins1
item osd.3 weight 0.37560
}
host mba {
id -10 # do not change unnecessarily
id -11 class nvme # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
id -23 class hdd # do not change unnecessarily
# weight 0.20340
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.20340
}
host mbpsp {
id -13 # do not change unnecessarily
id -14 class nvme # do not change unnecessarily
id -15 class ssd # do not change unnecessarily
id -24 class hdd # do not change unnecessarily
# weight 0.37155
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.18578
item osd.6 weight 0.18578
}
root default {
id -1 # do not change unnecessarily
id -2 class nvme # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
id -28 class hdd # do not change unnecessarily
# weight 4.01160
alg straw2
hash 0 # rjenkins1
item nanopc-cm3588-nas weight 3.06104
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
chassis chassis-nanopc {
id -16 # do not change unnecessarily
id -20 class nvme # do not change unnecessarily
id -21 class ssd # do not change unnecessarily
id -27 class hdd # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0 # rjenkins1
item nanopc-cm3588-nas weight 3.06104
}
chassis chassis-others {
id -17 # do not change unnecessarily
id -18 class nvme # do not change unnecessarily
id -19 class ssd # do not change unnecessarily
id -25 class hdd # do not change unnecessarily
# weight 0.95056
alg straw2
hash 0 # rjenkins1
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
# rules
rule replicated_rule {
id 0
type replicated
step take chassis-nanopc
step chooseleaf firstn 1 type host
step emit
step take default
step chooseleaf firstn 0 type osd
step emit
}
However, it resulted in pg dump like this:
version 14099
stamp 2024-10-13T11:46:25.490783+0000
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED
6.3f 3385 0 0 3385 0 8216139409 0 0 1732 3000 1732 active+clean+remapped 2024-10-13T02:21:07.580486+0000 5024'13409 5027:39551 [5,5] 5 [5,4] 5 4373'10387 2024-10-12T09:46:54.412039+0000 1599'106 2024-10-09T15:41:52.360255+0000 0 2 periodic scrub scheduled @ 2024-10-13T17:41:52.579122+0000 2245 0
6.3e 3217 0 0 3217 0 7806374402 0 0 1819 1345 1819 active+clean+remapped 2024-10-13T03:36:53.629380+0000 5025'13549 5027:36882 [7,7] 7 [7,4] 7 4373'10667 2024-10-12T09:46:51.075549+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T13:27:11.454963+0000 2132 0
6.3d 3256 0 0 3256 0 7780755159 0 0 1733 3000 1733 active+clean+remapped 2024-10-13T02:21:46.947129+0000 5024'13609 5027:28986 [5,5] 5 [5,4] 5 4371'11218 2024-10-12T09:39:44.502516+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T14:12:17.856811+0000 2202 0
See [5,5]. Thus my cluster remains in remapping state. Is there anyway I can achieve my goal stated above?
ceph osd tree
as well.