OceanStor Dorado
OceanStor Dorado
OceanStor Dorado
Storage CTO
The Cutting Edge of Storage Innovation
2
Primary Storage Leader in Gartner Magic Quadrant
2018 MQ for General-Purpose Arrays 2019 MQ for Primary Storage 2018 MQ for Solid-State Arrays
“Huawei has progressively become one of the leading providers of primary storage on the global stage.”
“Its external enterprise storage portfolio for primary storage workloads - OceanStor - spans all market segments”
“Huawei announced new versions of OceanStor Dorado6000 V3 and Dorado18000 V3 that support internal NVMe SSD ”
“Huawei’s SmartVirtualization plus SmartMigration software enables users to nondisruptively migrate data from competitive external enterprise storage systems
to OceanStor, or to migrate from an older OceanStor platform to a new OceanStor platform. ”
— Quote by Gartner
3
The Cutting Edge of Storage Innovation
4
OceanStor Dorado Product Portfolio
5
FlashLink ® - The Foundation of Evolution
Multi-Protocol Network Chip: Hi1822 AI Chip: Ascend 310 Processing power requirement
• Support both FC and Ethernet • AI SoC for small scale training • > 1 TeraFLOPS for real-time analytics
Real-time analytics
• Data Correlations;Data Similarity;
BMC Chip: Hi1710 Adaptive Optimization;Health Analytics;
• Troubleshooting accuracy 93% Data Temperature;Failure Prediction
Use case
• Intelligent Cache
Array Controller Chip: Kunpeng 920 SSD Controller Chip: Hi1812e
• Smart QoS
• SPECint 930+, #1 performance ARM • Half the latency of previous model
processor • Intelligent Data Dedup
• Processor embedded intelligent disk • ……
enclosure
6
Kunpeng® CPU - The Heart of New Storage
Submission Completion
Queue Queue
48
Core
Submission Completion
Queue Queue
Submission Completion
Queue Queue
7
SmartMatrix - Symmetric A/A Controller Architecture
Engine Engine
Shared Shared Shared Shared Shared Shared Shared Shared
Frontend Frontend Frontend Frontend Frontend Frontend Frontend Frontend
RDMA
Network
• Symmetric active/active controller with fully • Persistent cache mirroring with max of 3 copies • End to end NVMe support
meshed topology • Non-disruptive firmware upgrade, IO hang-up • Backend RDMA network over 100Gb/s Ethernet.
• Shared everything architecture from frontend, time is limited within 1 second • SCM support for read acceleration*
backend, to drive enclosure
8
OceanStor Dorado - New Gen of Mission Critical Storage
10
Every Second is Valuable
11
Time is Money
$5,000,000
$5,000,000
$4,300,000
$4,000,000
$3,400,000
$3,000,000
$3,000,000
$2,400,000 $2,500,000
$2,000,000
$1,300,000 $1,240,000
$970,320 $913,242
$1,000,000
$0
12
One-Second Controller Failover
Solution A Solution B
FE FE FE FE FE FE FE FE
Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl
BE BE BE BE BE BE BE BE
IOPS
IOPS
IOPS
4S 6S 6S 9S >9S 1S 1S 1S
* The above figures are referring to the testing result in Huawei lab.
13
One-Second’s Magic - Shared Frontend Adapter
Engine Server
Shared Shared Shared Shared
Frontend Frontend Frontend Frontend
14
Multiple Controller Fault Tolerance
FE Front-End Adapter
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
BE BE BE BE BE BE BE BE BE BE BE BE
15
Best-of-Breed Reliability & Availability
Solution A Solution B
• Frontend adapter can’t be shared • Frontend adapter can be shared to all • Frontend adapter can be shared to
between controllers in one engine. the controllers in one engine. all the controllers in one engine.
• LUN has to be owned by a single • LUN ownership is eliminated.
controller.
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl
Ctrl Ctrl Ctrl Ctrl
BE BE BE BE BE BE BE BE BE BE BE BE
• SSD enclosure can be shared by all • SSD enclosure can be shared by all • SSD enclosure can be shared by all
controllers in one engine. controllers in one engine. controllers in multiple engines
16
Firmware Non-Disruptive Upgrade (NDU)
Storage Firmware
Service • 94% of firmware components
1S • Modular design
• Online upgrade
• One-second to active
Manage
Data …… • No connection loss with server
ment
Inter- • Transparent to application
Protocol Control Commun
ication
17
Intelligent DAE - The SSD Shelf with Processing Power
Engine
Storage Controller Offloading
FE FE FE FE • Each DAE has two controllers, and each controller has its own
processor, cache and adapter.
Data Erasure Data
Ctrl Compression Ctrl Coding Ctrl Rebuilding Ctrl • DAE controller takes over some of workloads from array controller,
including:
BE BE BE BE - Data rebuilding
- Erasure Coding (EC)*
- Data compression*
DAE
With the help of DAE controller, DAE is much more intelligent than
Intelligent DAE Controller Intelligent DAE Controller
ever, this distributed computing design half the data rebuilding time,
and reduce the performance impact (max. IOPS) to controller from
15% to 5%, the bandwidth of data rebuilding is increased from 80MB/s
to 200MB/s while the array controller’s CPU Util% remains at 70%.
18
Comprehensive HA/DR Solutions
Asynchronous
Asynchronous
A B A B
VMware VSphere
Fusionsphere WAN WAN
Cluster
...
C C
A B A B
Synchronous Mirroring
C C
IP IP
Network Network Parallel
Production Storage Production Storage Synchronous Synchronous
HyperMetro HyperMetro
Asynchronous
Asynchronous
Site C A B A B
Standby
WAN WAN
Quorum Server C C
Star
19
More Robust Storage HA Cluster
Scenario #1 Scenario #2 Scenario #3
Solution A
#1 #1
#2 #2
Scenario #4 Scenario #5 Scenario #6 #3 #3
#4 #4
#5 #5
Scenario #7 Scenario #8 Scenario #9
#6 #6
#7 #7
#8 #8
Storage Witness/Quorum
#9 #9
20
Extreme Performance Experience
21
Extreme Performance Experience
DB Acceleration VM Delivery VDI Support
Transaction Per Second
7200
57,000 1.5 Minutes VDI
52 Minutes
1500
VDI
11,500
3.84TB SSD*40, SwingBench OE2 transaction generator 100 VM clone, 50GB each 3.84TB SSD*100 with data reduction
10% 10%
0%
43.4% Test Case: Mixed workload, 8K, 7:3/1ms average latency/8x LUN, 32x outstanding 78.9% 0%
23
End-to-End Load Balancing
I/O I/O I/O I/O
Global Cache
Storage Storage Storage Storage
Controller Controller Controller Controller • Write I/O requests for single LUN can be placed into cache space from
multiple storage controllers.
Backend Backend Backend Backend • For better cache read hit, storage controller can place the prefetched data in
Adapter Adapter Adapter Adapter
global cache for potential read requests from any front-end link
24
CPU Resource Dynamic Scheduling
Slice#9 processing
Storage Controller Core Core Core • CPU Core Grouping
Slice#8
I/O Read I/O Read
Slice#7 & Write #2 CPU cores will be divided into
Core Core
multiple groups, each group will be
Slice#6
assigned with a specific job.
Slice#5 Storage Controller Core Core Core
Data I/O Write • Dynamic Scheduling
Slice#4
…… Flushing #1
Core Core Higher priority jobs can acquire more
Slice#3 core from shared core groups.
Slice#2 Storage Controller Core Core Core • Workload Isolation
Data IO Write
Slice#1 Reduction #2 Each CPU core has its own I/O
Core Core
Slice#0 request to process to avoid interlock
25
Powered by NVMe and RoCE
Server The Latest Protocol & Network Standard
• 50% of latency reduction can be done with the latest
protocols (NVMe & RoCE v2)
RoCE
/FC Front-End/Back-End Adapter Protocol Offloading
Engine • 10% of latency reduction can be done with:
Frontend Frontend Frontend Frontend
Adapter Adapter Adapter Adapter ‐ Self-developed TOE frontend adapter chip.
‐ ASIC IO balancing/distribution
50us
30us
Storage Storage Storage Storage
Controller Controller Controller Controller
Intelligent DAE and Self-Developed SSD
• Read priority technology: Read requests on SSDs are
Backend Backend Backend Backend
Adapter Adapter Adapter Adapter preferentially executed to respond to hosts in a timely
manner. The latency in hybrid scenarios is reduced by 20%.
100us
30us
DAE • 30% of performance improvement for SAS DAE connection
multiplexing technology
26
Load Balancing - Core Demand of Mission Critical Application
Database
Activelog Activelog Activelog Activelog Activelog Activelog Activelog Activelog
Bank X Case Study
#1 #2 #3 #4 #5 #6 #7 #8
Activelog Activelog Activelog Activelog Activelog Activelog Activelog Activelog Database Log Switching
#9 #10 #11 #12 #13 #14 #15 #16
• The customer in FSI was using DB2 for core banking application, 24 active
Activelog Activelog Activelog Activelog Activelog Activelog Activelog Activelog
#17 #18 #19 #20 #21 #22 #23 #24 logs were activated in circular mode, one log per hour.
• In each hour, only one active log was busy.
DKC
LUN Ownership
FED FED FED FED
• The customer enabled “DB2 full logging” for potential problem analysis,
therefore, the workload got much higher than before.
VSD VSD VSD VSD
• The storage also met performance issue accordingly, because of the
BED BED BED BED mapping relationship between active logs and storage controllers were
fixed either (also known as: LUN ownership).
• The I/O workload on specific storage controller can’t be shared by other
DKU
controllers, unbalanced workload led to performance bottleneck.
27
Processor-Level Load Balancing
Data
Solution B Data
FE FE FE FE FE FE FE `FE
Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
Proc
BE BE BE BE BE BE BE BE
SSD Enclosure (DKU owned by DKC #0) SSD Enclosure (DKU owned by DKC #1) SSD Enclosure (shared) SSD Enclosure (shared)
• Owner controller has to take most of the workload. • Workload can be spread out across all the controllers owned by the
• The 2nd engine cannot be involved in load balancing. 1st and 2nd engine at processor level.
• SSD enclosures owned by the 2nd engine cannot be shared to the • SSD enclosures are shared between engines via RDMA network,
1st engine, therefore, write I/O flushing is constrained within one and data could be flushed from both engines.
engine.
28
Business Always-On
29
Business Always-On with Lower TCO
Cost
Traditional Solution
Huawei Solution
FlashEver: FlashEver:
Replace Intermix of
Controller various gens
Module Only of DAE
No Downtime No Downtime
70% max
90+%
Traditional Solution
Huawei Solution
No Data
Less Data
Migration
Migration
No Cabling
30
Non-Disruptive Tech. Refresh
• Support controller upgrade with Non- • Up to 128 controllers • Virtualize third party storage by taking over
Disruptive, even include next several • Support OceanStor Dorado and the following the access path
generations by 10 years generations • Reuse old storage to protect customer
• Tech refresh the existing assets to obtain the • Can mix different gens of OceanStor Dorado investment
advantages of the latest technology in one federation cluster • Smoothly cutover the business to run in
• Support data mobility non-disruptively, and OceanStor Dorado and the following
online node reorganization generations
31
FlashEver & Storage Federation Use Case
32
Incomparable Flexibility for The Next Decade
Solution A Solution B
Cost
Traditional Solution
Huawei Solution
Only available for VSP G1000 upgrade to Replace old storage controller module
VSP G1500, a temporary design. only, protest investment as much as
Upgrade
Initial Purchase Upgrade Tech Refresh possible
Upgrade Upgrade
Year Year
33
Wrap Up
Strong Capability of Intelligent Chips Development Symmetric A/A Storage Controller Architecture
• Array Controller • Shared Frontend & Backend Adapter
• BMC • Fully-Meshed Topology
• Multi-Protocol Chip (FE/BE Adapter) • No LUN-Ownership
• SSD Controller • Cross Engine Load Balancing
• AI Chip
34
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.