OceanStor Dorado

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

OceanStor Dorado

Born for Mission Critical Business

Storage CTO
The Cutting Edge of Storage Innovation

2
Primary Storage Leader in Gartner Magic Quadrant
2018 MQ for General-Purpose Arrays 2019 MQ for Primary Storage 2018 MQ for Solid-State Arrays

“Huawei has progressively become one of the leading providers of primary storage on the global stage.”
“Its external enterprise storage portfolio for primary storage workloads - OceanStor - spans all market segments”
“Huawei announced new versions of OceanStor Dorado6000 V3 and Dorado18000 V3 that support internal NVMe SSD ”
“Huawei’s SmartVirtualization plus SmartMigration software enables users to nondisruptively migrate data from competitive external enterprise storage systems
to OceanStor, or to migrate from an older OceanStor platform to a new OceanStor platform. ”
— Quote by Gartner

3
The Cutting Edge of Storage Innovation

4
OceanStor Dorado Product Portfolio

Entry-Level Mid-Range High-End

Model 3000 5000 6000 8000 18000

Height / Controllers of Each Engine 2U/2C 2U/2C 2U/2C 4U/4C 4U/4C

Controller Expansion 2-16 2-16 2-16 2-16 2-32

Cores in Each Controller 24 64 96 128 192

Maximum Disks 1200 1600 2400 3200 6400

Cache/Dual Controller 192G 256G/512G 512G/1024G 512G/1024G/2048G 512G/1024G/2048G

Front-end ports 8/16/32G FC, 1/10/25/40/100G Ethernet

Back-end ports SAS 3.0 SAS 3.0/100G Ethernet

5
FlashLink ® - The Foundation of Evolution
Multi-Protocol Network Chip: Hi1822 AI Chip: Ascend 310 Processing power requirement
• Support both FC and Ethernet • AI SoC for small scale training • > 1 TeraFLOPS for real-time analytics

Ascend 310 capability


• FP16:8 TeraFLOPS
• INT8:16 TeraOPS
• Max power:8W

Real-time analytics
• Data Correlations;Data Similarity;
BMC Chip: Hi1710 Adaptive Optimization;Health Analytics;
• Troubleshooting accuracy 93% Data Temperature;Failure Prediction
Use case
• Intelligent Cache
Array Controller Chip: Kunpeng 920 SSD Controller Chip: Hi1812e
• Smart QoS
• SPECint 930+, #1 performance ARM • Half the latency of previous model
processor • Intelligent Data Dedup
• Processor embedded intelligent disk • ……
enclosure

6
Kunpeng® CPU - The Heart of New Storage

Submission Completion
Queue Queue

48
Core
Submission Completion
Queue Queue

Submission Completion
Queue Queue

7
SmartMatrix - Symmetric A/A Controller Architecture
Engine Engine
Shared Shared Shared Shared Shared Shared Shared Shared
Frontend Frontend Frontend Frontend Frontend Frontend Frontend Frontend

Storage Storage Storage Storage Storage Storage Storage Storage


Controller Controller Controller Controller Controller Controller Controller Controller

Shared Shared Shared Shared Shared Shared Shared Shared


Backend Backend Backend Backend Backend Backend Backend Backend

RDMA
Network

DAE DAE DAE DAE


Intelligent Intelligent Intelligent Intelligent Intelligent Intelligent Intelligent Intelligent
DAE Controller DAE Controller DAE Controller DAE Controller DAE Controller DAE Controller DAE Controller DAE Controller

• Symmetric active/active controller with fully • Persistent cache mirroring with max of 3 copies • End to end NVMe support
meshed topology • Non-disruptive firmware upgrade, IO hang-up • Backend RDMA network over 100Gb/s Ethernet.
• Shared everything architecture from frontend, time is limited within 1 second • SCM support for read acceleration*
backend, to drive enclosure

8
OceanStor Dorado - New Gen of Mission Critical Storage

OceanStor Dorado OceanStor Dorado


(2017) (2019)

Max. Performance 7M IOPS 20M IOPS

Max. Storage Controller 16 32

NVMe Support Back-End End-to-End

Backend Network SAS/PCIe 100Gb RoCE v2

SSD Form Factor 25 Drive/2U Shelf 36 Drive/2U Shelf

SSD Shelf Standard DAE (No CPU) Intelligent DAE

Data Deduplication Fixed-Length Fixed & Variable Length

Controller Fault Tolerance 1 of 2 7 of 8

Engine Fault Tolerance N/A 1 of 2

Artificial Intelligence (AI) N/A AI Module with Ascend Chip


9
Commitment to Business Continuity

10
Every Second is Valuable

Application Each second of timeout to mission critical


1XX Seconds
business could be:
• Tens percent of transaction lost
Operating System
• Thousands of dollars profit lost
XX Seconds
• Tens of thousands unsatisfied customer (specifically on black
Friday)
Host Bus Adapter (HBA) Some of the big FSI enterprise require
XX Seconds
storage to minimize timeout to 1 SECOND,
e.g.
Network
1X Seconds • Industrial and Commercial Bank of China
• Itaú Unibanco
• China Construction Bank
Storage • Agricultural Bank of China
X Seconds
• ……

11
Time is Money

Profit/Hour of TOP 10 Banks


$6,000,000

$5,000,000
$5,000,000
$4,300,000

$4,000,000
$3,400,000
$3,000,000
$3,000,000
$2,400,000 $2,500,000

$2,000,000
$1,300,000 $1,240,000
$970,320 $913,242
$1,000,000

$0

12
One-Second Controller Failover

Solution A Solution B

FE FE FE FE FE FE FE FE

Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl

BE BE BE BE BE BE BE BE
IOPS

IOPS

IOPS
4S 6S 6S 9S >9S 1S 1S 1S

Time Time Time

* The above figures are referring to the testing result in Huawei lab.

13
One-Second’s Magic - Shared Frontend Adapter
Engine Server
Shared Shared Shared Shared
Frontend Frontend Frontend Frontend

Storage Storage Storage Storage


Controller Controller Controller Controller Data

Shared Shared Shared Shared


Backend Backend Backend Backend
FC/Eth
Network

• Frontend adapter holds the connection with server Engine


independently, storage controller is not involved. Shared Frontend Shared Frontend Shared Frontend Shared Frontend
Data

• Normally, each I/O will be directed to one storage


controller through back plane
Back Plane

• If the controller was failed, the I/O will be redirected to


Storage Controller Storage Controller Storage Controller Storage Controller
other survived controllers, while the connection between
the frontend adapter and the server is still keeping as
normal, the server is not aware of the failure.

14
Multiple Controller Fault Tolerance

FE Front-End Adapter

Ctrl Controller Any 2 of 8 Controllers 1 of 2 7 Controllers


BE Back-End Adapter Simultaneous Failure Engine Failure Continuous Failure
Data Copy in Cache

Engine Engine Engine Engine Engine Engine


FE FE FE FE FE FE FE FE FE FE FE FE
Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl
BE BE BE BE BE BE BE BE BE BE BE BE

DAE DAE DAE

15
Best-of-Breed Reliability & Availability
Solution A Solution B
• Frontend adapter can’t be shared • Frontend adapter can be shared to all • Frontend adapter can be shared to
between controllers in one engine. the controllers in one engine. all the controllers in one engine.
• LUN has to be owned by a single • LUN ownership is eliminated.
controller.

Engine Engine DKC DKC Engine Engine


FE FE FE FE FE FE FE FE FE FE FE FE

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl

Ctrl
Ctrl Ctrl Ctrl Ctrl

BE BE BE BE BE BE BE BE BE BE BE BE

DAE DAE DKU DKU DAE

• SSD enclosure can be shared by all • SSD enclosure can be shared by all • SSD enclosure can be shared by all
controllers in one engine. controllers in one engine. controllers in multiple engines

16
Firmware Non-Disruptive Upgrade (NDU)

Storage Firmware
Service • 94% of firmware components
1S • Modular design
• Online upgrade
• One-second to active
Manage
Data …… • No connection loss with server
ment
Inter- • Transparent to application
Protocol Control Commun
ication

Kernel • 6% of firmware components


• Rolling upgrade

17
Intelligent DAE - The SSD Shelf with Processing Power

Engine
Storage Controller Offloading
FE FE FE FE • Each DAE has two controllers, and each controller has its own
processor, cache and adapter.
Data Erasure Data
Ctrl Compression Ctrl Coding Ctrl Rebuilding Ctrl • DAE controller takes over some of workloads from array controller,
including:
BE BE BE BE - Data rebuilding
- Erasure Coding (EC)*
- Data compression*
DAE
With the help of DAE controller, DAE is much more intelligent than
Intelligent DAE Controller Intelligent DAE Controller
ever, this distributed computing design half the data rebuilding time,
and reduce the performance impact (max. IOPS) to controller from
15% to 5%, the bandwidth of data rebuilding is increased from 80MB/s
to 200MB/s while the array controller’s CPU Util% remains at 70%.

* The workloads will be available in near future, including garbage collection.

18
Comprehensive HA/DR Solutions

Site A Site B Synchronous Asynchronous


HyperMetro
WAN
WAN
Oracle RAC

Asynchronous

Asynchronous
A B A B
VMware VSphere
Fusionsphere WAN WAN
Cluster
...

C C

FC/IP WAN FC/IP Serial


Network Network
Synchronous Asynchronous
HyperMetro
WAN

A B A B
Synchronous Mirroring

C C
IP IP
Network Network Parallel
Production Storage Production Storage Synchronous Synchronous
HyperMetro HyperMetro

Asynchronous

Asynchronous
Site C A B A B

Standby
WAN WAN

Quorum Server C C
Star

19
More Robust Storage HA Cluster
Scenario #1 Scenario #2 Scenario #3
Solution A

#1 #1
#2 #2
Scenario #4 Scenario #5 Scenario #6 #3 #3
#4 #4
#5 #5
Scenario #7 Scenario #8 Scenario #9
#6 #6
#7 #7
#8 #8
Storage Witness/Quorum
#9 #9
20
Extreme Performance Experience

21
Extreme Performance Experience
DB Acceleration VM Delivery VDI Support
Transaction Per Second

7200
57,000 1.5 Minutes VDI
52 Minutes

1500
VDI

11,500

3.84TB SSD*40, SwingBench OE2 transaction generator 100 VM clone, 50GB each 3.84TB SSD*100 with data reduction

OceanStor Dorado 6000 vs Solution A High-End AFA, Dual Controller


22
Consistent Performance Experience
GC + 80% Pre-Conditioning
Snapshot
100% RAID 5 Performance RAID TP / RAID 6 / RAID 5 100%
Base Line Inline Compression
90% 90%
RAID 6 RAID Inline Dedup
Performance Impact
80% 350K 80%
HA Storage Cluster
GC + RAID 6 Advanced Feature
70% 300K Inline Compression Performance Impact 70%

60% 250K 60%


Inline Dedup
50% 200K 50%
HA Storage Cluster
40% 150K 40%

30% 100K 30%

20% 50K 20%

10% 10%

0%
43.4% Test Case: Mixed workload, 8K, 7:3/1ms average latency/8x LUN, 32x outstanding 78.9% 0%

23
End-to-End Load Balancing
I/O I/O I/O I/O

Engine Shared Front-end Adapter


• Requests from host can be evenly distributed on every front-end link
Frontend Frontend Frontend Frontend
Adapter Adapter Adapter Adapter • LUNs are shared by all controllers (aka no controller ownership).

Global Cache
Storage Storage Storage Storage
Controller Controller Controller Controller • Write I/O requests for single LUN can be placed into cache space from
multiple storage controllers.
Backend Backend Backend Backend • For better cache read hit, storage controller can place the prefetched data in
Adapter Adapter Adapter Adapter
global cache for potential read requests from any front-end link

DAE Global Storage Pool


• Global storage pool can be accessed by multiple storage controllers.
• With RAID 2.0+, multiple LUNs are distributed over multiple SSDs naturally.

24
CPU Resource Dynamic Scheduling

Engine CPU Core Group • LUN Space Sharding


LUN
Each LUN is sliced into multiple

Storage Controller pieces (aka shard), and each shard


Core Core Core
…… Data I/O Read will be mapped to a specific CPU in a
Switching #1
Slice#10 storage controller for relevant I/O
Core Core

Slice#9 processing
Storage Controller Core Core Core • CPU Core Grouping
Slice#8
I/O Read I/O Read
Slice#7 & Write #2 CPU cores will be divided into
Core Core
multiple groups, each group will be
Slice#6
assigned with a specific job.
Slice#5 Storage Controller Core Core Core
Data I/O Write • Dynamic Scheduling
Slice#4
…… Flushing #1
Core Core Higher priority jobs can acquire more
Slice#3 core from shared core groups.
Slice#2 Storage Controller Core Core Core • Workload Isolation
Data IO Write
Slice#1 Reduction #2 Each CPU core has its own I/O
Core Core
Slice#0 request to process to avoid interlock

25
Powered by NVMe and RoCE
Server The Latest Protocol & Network Standard
• 50% of latency reduction can be done with the latest
protocols (NVMe & RoCE v2)

RoCE
/FC Front-End/Back-End Adapter Protocol Offloading
Engine • 10% of latency reduction can be done with:
Frontend Frontend Frontend Frontend
Adapter Adapter Adapter Adapter ‐ Self-developed TOE frontend adapter chip.

‐ ASIC IO balancing/distribution

50us

30us
Storage Storage Storage Storage
Controller Controller Controller Controller
Intelligent DAE and Self-Developed SSD
• Read priority technology: Read requests on SSDs are
Backend Backend Backend Backend
Adapter Adapter Adapter Adapter preferentially executed to respond to hosts in a timely
manner. The latency in hybrid scenarios is reduced by 20%.

100us
30us
DAE • 30% of performance improvement for SAS DAE connection
multiplexing technology

26
Load Balancing - Core Demand of Mission Critical Application
Database
Activelog Activelog Activelog Activelog Activelog Activelog Activelog Activelog
Bank X Case Study
#1 #2 #3 #4 #5 #6 #7 #8

Activelog Activelog Activelog Activelog Activelog Activelog Activelog Activelog Database Log Switching
#9 #10 #11 #12 #13 #14 #15 #16
• The customer in FSI was using DB2 for core banking application, 24 active
Activelog Activelog Activelog Activelog Activelog Activelog Activelog Activelog
#17 #18 #19 #20 #21 #22 #23 #24 logs were activated in circular mode, one log per hour.
• In each hour, only one active log was busy.

DKC
LUN Ownership
FED FED FED FED
• The customer enabled “DB2 full logging” for potential problem analysis,
therefore, the workload got much higher than before.
VSD VSD VSD VSD
• The storage also met performance issue accordingly, because of the
BED BED BED BED mapping relationship between active logs and storage controllers were
fixed either (also known as: LUN ownership).
• The I/O workload on specific storage controller can’t be shared by other
DKU
controllers, unbalanced workload led to performance bottleneck.

27
Processor-Level Load Balancing

Data
Solution B Data

Engine (DKC #0) Engine (DKC #1) Engine #0 Engine #1

FE FE FE FE FE FE FE `FE

Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc

Proc
BE BE BE BE BE BE BE BE

SSD Enclosure (DKU owned by DKC #0) SSD Enclosure (DKU owned by DKC #1) SSD Enclosure (shared) SSD Enclosure (shared)

• Owner controller has to take most of the workload. • Workload can be spread out across all the controllers owned by the
• The 2nd engine cannot be involved in load balancing. 1st and 2nd engine at processor level.
• SSD enclosures owned by the 2nd engine cannot be shared to the • SSD enclosures are shared between engines via RDMA network,
1st engine, therefore, write I/O flushing is constrained within one and data could be flushed from both engines.
engine.

28
Business Always-On

29
Business Always-On with Lower TCO
Cost

Traditional Solution

Huawei Solution
FlashEver: FlashEver:
Replace Intermix of
Controller various gens
Module Only of DAE
No Downtime No Downtime
70% max
90+%

Initial Purchase Upgrade Tech Refresh Upgrade Upgrade Year


Labor

Traditional Solution

Huawei Solution

No Data
Less Data
Migration
Migration
No Cabling

Deployment Optimization Provisioning Data Migration Provisioning Data Migration Year


& Provisioning & Provisioning

30
Non-Disruptive Tech. Refresh

FlashEver Program Storage Federation Smart Virtualization

• Support controller upgrade with Non- • Up to 128 controllers • Virtualize third party storage by taking over
Disruptive, even include next several • Support OceanStor Dorado and the following the access path
generations by 10 years generations • Reuse old storage to protect customer
• Tech refresh the existing assets to obtain the • Can mix different gens of OceanStor Dorado investment
advantages of the latest technology in one federation cluster • Smoothly cutover the business to run in
• Support data mobility non-disruptively, and OceanStor Dorado and the following
online node reorganization generations

31
FlashEver & Storage Federation Use Case

Storage Controller Tech-Refresh


Replace existing storage controller only with the
next gen controller w/o application downtime and
data migration (DIP upgrade-Data In Place upgrade)

SSD Enclosure EOL


After the replacement from old controller to new
one (DIP), if SSD & disk enclosures get EOL later,
new enclosure could be added and data migration
could also be done w/o application downtime

Whole System Tech-Refresh


Replace the whole storage system w/o application
downtime, data migration could be done internally
within a cluster (Storage Federation)

32
Incomparable Flexibility for The Next Decade
Solution A Solution B
Cost

Traditional Solution

Huawei Solution

N/A N/A Storage Federation

Manage and manipulate data movement


across multiple various gens of array

N/A N/A Mix Use of Gens of DAE

Eliminate data migration as much as


possible to simplify capacity upgrade

N/A FlashEver Program

Only available for VSP G1000 upgrade to Replace old storage controller module
VSP G1500, a temporary design. only, protest investment as much as
Upgrade
Initial Purchase Upgrade Tech Refresh possible
Upgrade Upgrade
Year Year

33
Wrap Up

Strong Capability of Intelligent Chips Development Symmetric A/A Storage Controller Architecture
• Array Controller • Shared Frontend & Backend Adapter
• BMC • Fully-Meshed Topology
• Multi-Protocol Chip (FE/BE Adapter) • No LUN-Ownership
• SSD Controller • Cross Engine Load Balancing
• AI Chip

The Highest Level of Availability Distributed Computing Design


• Tolerate Any 2 of 8 Controllers Fault • Intelligent DAE
• Tolerate Any 1 of 2 Engine Fault • Frontend And Backend Adapter TOE Engine
• 1 Second Controller Failover And NDU
Incomparable Flexibility
Comprehensive HA/DR solution • FlashEver
• A/A Storage Cluster • Storage Federation
• Serial/Parallel/Star Topology 3DC • SmartVirtualization

34
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2018 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like