CRUSH rule resulted in duplicated OSD for PG

Ask Question

Asked 2 months ago

Modified 2 months ago

Viewed 14 times

My goal is to have primary on a specific host (due to read-replicas not an option for non-RBD), and replicas on any host (including the host already chosen), but not the primary OSD. My current CRUSH rule is

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class nvme
device 1 osd.1 class ssd
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class ssd
device 5 osd.5 class nvme
device 6 osd.6 class ssd
device 7 osd.7 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host nanopc-cm3588-nas {
id -3       # do not change unnecessarily
id -4 class nvme        # do not change unnecessarily
id -5 class ssd     # do not change unnecessarily
id -26 class hdd        # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0  # rjenkins1
item osd.0 weight 0.23288
item osd.2 weight 0.23288
item osd.5 weight 1.81940
item osd.7 weight 0.77588
}
host mbpcp {
id -7       # do not change unnecessarily
id -8 class nvme        # do not change unnecessarily
id -9 class ssd     # do not change unnecessarily
id -22 class hdd        # do not change unnecessarily
# weight 0.37560
alg straw2
hash 0  # rjenkins1
item osd.3 weight 0.37560
}
host mba {
id -10      # do not change unnecessarily
id -11 class nvme       # do not change unnecessarily
id -12 class ssd        # do not change unnecessarily
id -23 class hdd        # do not change unnecessarily
# weight 0.20340
alg straw2
hash 0  # rjenkins1
item osd.4 weight 0.20340
}
host mbpsp {
id -13      # do not change unnecessarily
id -14 class nvme       # do not change unnecessarily
id -15 class ssd        # do not change unnecessarily
id -24 class hdd        # do not change unnecessarily
# weight 0.37155
alg straw2
hash 0  # rjenkins1
item osd.1 weight 0.18578
item osd.6 weight 0.18578
}
root default {
id -1       # do not change unnecessarily
id -2 class nvme        # do not change unnecessarily
id -6 class ssd     # do not change unnecessarily
id -28 class hdd        # do not change unnecessarily
# weight 4.01160
alg straw2
hash 0  # rjenkins1
item nanopc-cm3588-nas weight 3.06104
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
chassis chassis-nanopc {
id -16      # do not change unnecessarily
id -20 class nvme       # do not change unnecessarily
id -21 class ssd        # do not change unnecessarily
id -27 class hdd        # do not change unnecessarily
# weight 3.06104
alg straw2
hash 0  # rjenkins1
item nanopc-cm3588-nas weight 3.06104
}
chassis chassis-others {
id -17      # do not change unnecessarily
id -18 class nvme       # do not change unnecessarily
id -19 class ssd        # do not change unnecessarily
id -25 class hdd        # do not change unnecessarily
# weight 0.95056
alg straw2
hash 0  # rjenkins1
item mbpcp weight 0.37560
item mba weight 0.20340
item mbpsp weight 0.37157
}
# rules
rule replicated_rule {
id 0
type replicated
step take chassis-nanopc
step chooseleaf firstn 1 type host
step emit
step take default
step chooseleaf firstn 0 type osd
step emit
}

However, it resulted in pg dump like this:

version 14099
stamp 2024-10-13T11:46:25.490783+0000
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN LAST_SCRUB_DURATION SCRUB_SCHEDULING OBJECTS_SCRUBBED OBJECTS_TRIMMED
6.3f 3385 0 0 3385 0 8216139409 0 0 1732 3000 1732 active+clean+remapped 2024-10-13T02:21:07.580486+0000 5024'13409 5027:39551 [5,5] 5 [5,4] 5 4373'10387 2024-10-12T09:46:54.412039+0000 1599'106 2024-10-09T15:41:52.360255+0000 0 2 periodic scrub scheduled @ 2024-10-13T17:41:52.579122+0000 2245 0
6.3e 3217 0 0 3217 0 7806374402 0 0 1819 1345 1819 active+clean+remapped 2024-10-13T03:36:53.629380+0000 5025'13549 5027:36882 [7,7] 7 [7,4] 7 4373'10667 2024-10-12T09:46:51.075549+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T13:27:11.454963+0000 2132 0
6.3d 3256 0 0 3256 0 7780755159 0 0 1733 3000 1733 active+clean+remapped 2024-10-13T02:21:46.947129+0000 5024'13609 5027:28986 [5,5] 5 [5,4] 5 4371'11218 2024-10-12T09:39:44.502516+0000 0'0 2024-10-08T07:13:08.545820+0000 0 2 periodic scrub scheduled @ 2024-10-13T14:12:17.856811+0000 2202 0

See [5,5]. Thus my cluster remains in remapping state. Is there anyway I can achieve my goal stated above?

asked Oct 14 at 20:24

TruongSinh

4,85634 silver badges52 bronze badges

To have a better overview it would help to have the output of ceph osd tree as well.
– eblock
Commented Oct 17 at 6:47

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Collectives™ on Stack Overflow

CRUSH rule resulted in duplicated OSD for PG

0

Your Answer

Browse other questions tagged
ceph
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged ceph or ask your own question.

Browse other questions tagged
ceph
or ask your own question.