forked from openucx/ucc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
96 lines (71 loc) · 2.71 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
/**
* @copyright Copyright (C) Mellanox Technologies Ltd. 2021. ALL RIGHTS RESERVED.
*
* See file LICENSE for terms.
*/
## 1.0.0 (TBD)
### Features
#### API
- Added Avg reduce operation
- Added nonblocking team destroy option
- Added user-defined datatype definitions
- Added Bfloat16 type
- Clarify semantics of core abstractions including teams and context
- Added timeout option
#### Core
- Added coll scoring and selection support
- Added support for Triggered collectives
- Added support for timeouts in collectives
- Added support for team create without ep in post
- Added support for multithreaded context progress
- Added support for nonblocking team destroy
#### CL
- Added support for hierarchical collectives
- Added support for hierarchical allreduce collective operation
- Added support for collectives based on one-sided communication routines
#### TL
- Added SHARP TL
##### UCP
- Added Bcast SAG algorithm for large messages
- Added Knomial based reduce algorithm
- Making allgather and alltoall agree with the API
- Added SRA knomial allreduce algorithm
- Added pairwise alltoall and alltoallv algorithms
- Added allgather and allgatherv ring algorithms
- Added support for collective operations based on one-sided semantics
- Added support for alltoall with one-sided transfer semantics
- Bug fixes
##### SHARP
- Added support for switch based hardware collectives (SHARP)
#### NCCL
- Add support for NCCL allreduce, alltoall, alltoallv, barrier, reduce, reduce
scatter, bcast, allgather and allgatherv
#### Tests
- Updated tests to test the newly added algorithms and operations
## 0.1.0 (TBD)
### Features
#### API
- UCC API to support library, contexts, teams, collective operations, execution
engine, memory types, and triggered operations
#### Core
- Added implementation for UCC abstractions - library, context, team,
collective operations, execution engine, memory types, and triggered
operations
- Added support for memory types - CUDA, and CPU
- Added support for configuring UCC library and contexts
#### CL
- Added support for collectives, while the source and destination is either in
CPU or device (GPU)
- Added support for UCC_THREAD_MULTIPLE
- Added support for CUDA stream-based collectives
#### TL
- Added support for send/receive based collectives using UCX/UCP as a transport
layer
- Support for basic collectives types including barrier, alltoall, alltoallv,
broadcast, allgather, allgatherv, allreduce was added in the UCP TL
- Added support using NCCL as a transport layer
- Support for collectives types including alltoall, alltoallv, allgather,
allgatherv, allreduce, and broadcast
#### Tests
- Added support for unit testing (gtest) infrastructure
- Added support for MPI tests