80

I'm reading the Book Java Concurrency in Practice. In chapter 15, they are talking about the nonblocking algorithms and the compare-and-swap (CAS) method.

It is written that CAS perform much better than the locking methods. I want to ask the people who already worked with both of these concepts and would like to hear when you are preferring which one of these concepts? Is it really so much faster?

For me, the usage of locks is much clearer and easier to understand and maybe even better to maintain (please correct me if I am wrong). Should we really focus on creating our concurrent code related to CAS than locks to get a better performance boost or is sustainability more important?

I know there is maybe not a strict rule when to use what. But I just would like to hear some opinions, experiences with the new concept of CAS.

1
  • Usual locking with monitor-objects involves CAS as well as the locking kernel-calls. The cass alleviates the locking so that there will be no kernel call without contention. And when there is contention the CAS might spin for a time and then the kernel call would take place or the kernel call would follow immediately. Commented Sep 3, 2019 at 11:38

5 Answers 5

55

CAS is generally much faster than locking, but it does depend on the degree of contention. Because CAS may force a retry if the value changes between reading and comparing, a thread can theoretically get stuck in a busy-wait if the variable in question is being hit hard by many other threads (or if it is expensive to compute a new value from the old value (or both)).

The main issue with CAS is that it is much more difficult to program with correctly than locking. Mind you, locking is, in turn, much harder to use correctly than message-passing or STM, so don't take this as a ringing endorsement for the use of locks.

1
  • 2
    I agree, putting accent on expense of computing a new value. Often it is so big, making non-locking strategy lose to lock-based. Commented Apr 22, 2015 at 11:32
30

The relative speed of the operations is largely a non-issue. What is relevant is the difference in scalability between lock-based and nonblocking algorithms. And if you're running on a 1 or 2 core system, stop thinking about such things.

Nonblocking algorithms generally scale better because they have shorter "critical sections" than lock-based algorithms.

0
24

You can look at the numbers between a ConcurrentLinkedQueue and a BlockingQueue. What you will see is that CAS is noticeably faster under moderate (more realistic in real world applications) thread contention.

The most attractive property of nonblocking algorithms is the fact that if one thread fails (cache miss, or worse, seg fault) then other threads will not notice this failure and can move on. However, when acquiring a lock, if the lock holding thread has some kind of OS failure, every other thread waiting for the lock to be freed will be hit with the failure also.

To answer your questions, yes, nonblocking thread-safe algorithms or collections (ConcurrentLinkedQueue, ConcurrentSkipListMap/Set) can be significantly faster than their blocking counterparts. As Marcelo pointed out though, getting nonblocking algorithms correct is very difficult and requires a lot of consideration.

You should read about the Michael and Scott Queue, this is the queue implementation for ConcurrentLinkedQueue and explains how to handle a two-way, thread-safe, atomic function with a single CAS.

1
  • 1
    The article about the "Michael and Scott Queue" is interesting. Thx!
    – Prine
    Commented Apr 20, 2010 at 20:57
14

There's good book strongly related to the topic of lock-free concurrency: "The Art of multi-processor programming" by Maurice Herlihy

1
11

If you are looking for a real world comparison, here is one. Our application has two (2) threads 1) A reader thread for network packet capture and 2) a consumer thread that takes the packet, counts it and reports statistics.

Thread #1 exchanges a single packet at a time with thread #2

Result #1 - uses a custom CAS based exchange using the same principles as SynchronousQueue, where our class is called CASSynchronousQueue:

30,766,538 packets in 59.999 seconds ::  500.763Kpps, 1.115Gbps 0 drops
libpcap statistics: recv=61,251,128, drop=0(0.0%), ifdrop=0

Result #2 - when we replace our CAS implementation with the standard java SynchronousQueue:

8,782,647 packets in 59.999 seconds ::  142.950Kpps, 324.957Mbps 0 drops 
libpcap statistics: recv=69,955,666, drop=52,369,516(74.9%), ifdrop=0

I don't think the difference in performance couldn't be any more clearer.

Not the answer you're looking for? Browse other questions tagged or ask your own question.