1

I am building an application which process very large data(more that 3 million).I am new to cassandra and I am using 5 node cassandra cluster to store data. I have two column families

Table 1 : CREATE TABLE keyspace.table1 (
    partkey1 text,
    partkey2 text,
    clusterKey text,
    attributes text,
    PRIMARY KEY ((partkey1, partkey2), clusterKey1)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Table 2 : CREATE TABLE keyspace.table2 (
    partkey1 text,
    partkey2 text,
    clusterKey2 text,
    attributes text,
    PRIMARY KEY ((partkey1, partkey2), clusterKey2)
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

note : clusterKey1 and clusterKey2 are randomly generated UUID's

My concern is on nodetool cfstats I am getting good throughput on Table1 with stats :

  • SSTable count: 2
  • Space used (total): 365189326
  • Space used by snapshots (total): 435017220
  • SSTable Compression Ratio: 0.2578485727722293
  • Memtable cell count: 18590
  • Memtable data size: 3552535
  • Memtable switch count: 171
  • Local read count: 0
  • Local read latency: NaN ms
  • Local write count: 2683167
  • Local write latency: 1.969 ms
  • Pending flushes: 0
  • Bloom filter false positives: 0
  • Bloom filter false ratio: 0.00000
  • Bloom filter space used: 352

where as for table2 I am getting very bad read performance with stats :

  • SSTable count: 33
  • Space used (live): 212702420
  • Space used (total): 212702420
  • Space used by snapshots (total): 262252347
  • SSTable Compression Ratio: 0.1686948750752438
  • Memtable cell count: 40240
  • Memtable data size: 24047027
  • Memtable switch count: 89
  • Local read count: 24027
  • Local read latency: 0.580 ms
  • Local write count: 1075147
  • Local write latency: 0.046 ms
  • Pending flushes: 0
  • Bloom filter false positives: 0
  • Bloom filter false ratio: 0.00000
  • Bloom filter space used: 688

I was wondering why table2 is creating 33 SSTables and why is the read performance very low in it. Can anyone help me figure out what I am doing wrong here?

This is how I query the table :

 BoundStatement selectStamt;
if (selectStamt == null) {
            PreparedStatement prprdStmnt = session
                    .prepare("select * from table2 where clusterKey1 = ? and partkey1=? and partkey2=?");
            selectStamt = new BoundStatement(prprdStmnt);
        }
        synchronized (selectStamt) {
            res = session.execute(selectStamt.bind("clusterKey", "partkey1", "partkey2"));
        }

In another thread, I am doing some update operations on this table on different data the same way.

In case of measuring throughput, I measuring number of records processed per sec and its processing only 50-80 rec.

2
  • Low (sub ms) "Local read latency" is something good. Commented Mar 18, 2015 at 20:38
  • yeah 500 microseconds would make almost everyone high frequency traders happy :). Although cfstats (depending on version I think) does get reset after execution, so maybe the performance Rijo is seeing is worse than this shows. Commented Mar 18, 2015 at 21:08

3 Answers 3

5

When you have a lot of SSTables, the distribution of your data among those SSTables is very important. Since you are using SizeTieredCompactionStrategy, SSTables get compacted and merged approximately when there are 4 SSTables the same size.

If you are updating data within the same partition frequently and at different times, it's likely your data is spread across many SSTables which is going to degrade performance as there will be multiple reads of your SSTables.

In my opinion, the best way to confirm this is to execute cfhistograms on your table:

nodetool -h localhost cfhistograms keyspace table2

Depending on the version of cassandra you have installed, the output will be different, but it will include a histogram of number of SSTables read for a given read operation.

If you are updating data within the same partition frequently and at different times, you could consider using LeveledCompactionStrategy (When to use Leveled Compaction). LCS will keep data from the same partition together in the same SSTable within a level which greatly improves read performance, at the cost of more Disk I/O doing compaction. In my experience, the extra compaction disk I/O more than pays off in read performance if you have a high ratio of reads to writes.


EDIT: With regards to your question about your throughput concerns, there are a number of things that are limiting your throughput.

  1. A possible big issue is that unless you have many threads making that same query at a time, you are making your request serially (one at a time). By doing this, you are severely limiting your throughput as another request can not be sent until you get a response from Cassandra. Also, since you are synchronizing on selectStmt, even if this code were being executed by multiple threads, only one request could be executed at a time anyways. You can dramatically improve throughput by having multiple worker threads that make the request for you (if you aren't already doing this), or even better user executeAsync to execute many requests asynchronously. See Asynchronous queries with the Java driver for an explanation on how the request process flow works in the driver and how to effectively use the driver to make many queries.
  2. If you are executing this same code each time you make a query, you are creating an extra roundtrip by calling 'session.prepare' each time to create your PreparedStatement. session.prepare sends a request to cassandra to prepare your statement. You only need to do this once and you can reuse the PreparedStatement each time you make a query. You may be doing this already given your statement null-checking (can't tell without more code).
  3. Instead of reusing selectStmt and synchronizing on it, just create a new BoundStatement off of the single PreparedStatement you are using each time you make a query. This way no synchronization is needed at all.
8
  • Thanks for the response...I am getting very low read as in less than 50 reads per sec. Do you know why more SSTable are created for table to as the structure is similar to table1? Commented Mar 19, 2015 at 4:14
  • I think I need more information to adequately answer that question. Can you update your question to include the following: 1) What exact query are you making? 2) How are you making the queries (what client)? 3) Are you making the queries serially (1 at a time after another), or submitting them asynchronously? 4) How are you measuring your throughput? Commented Mar 19, 2015 at 4:39
  • Andy Tolbert, I have updated my question. Can you take a look ? Commented Mar 19, 2015 at 5:11
  • Thank you! and is there way to see how many partitions where created for this table? Commented Mar 19, 2015 at 5:15
  • @RioJoseph I have updated my answer. With regards to # of partitions there's a number of ways to do that, but it's never going to be a cheap query. For testing, you could do a range query and count the rows, i.e.: select count(*) from keyspace.table2;. You could use something like the spark connector to do a count query effectively. Cassandra is a distributed system, so to collect that information you would have to query all ranges, so it's not going to be cheap. Commented Mar 19, 2015 at 5:33
2

Aside from switching compaction strategies (this is expensive, you will compact hard for a while after the change) which as Andy suggests will certainly help your read performance, you can also tune your current compaction strategy to try to get rid of some of the fragmentation:

  1. If you have pending compactions (nodetool compactionstats) -- then try to catch up by increasing compactionthrottling. Keep concurrent compactors to 1/2 of your CPU cores to avoid compaction from hogging all your cores.
  2. Increase bucket size (increase bucket_high, drop bucket low)- dictates how similar sstables have to be in size to be compacted together.
  3. Drop Compaction threshold - dictates how many sstables must fit in a bucket before compaction occurs.

For details on 2 and 3 check out compaction subproperties

Note: do not use nodetool compact. This will put the whole table in one huge sstable and you'll loose the benefits of compacting slices at a time.

  1. In case of emergencies use JMX --> force user defined compaction to force minor compactions
2
  • Thanks for the response...I am getting very low read as in less than 50 reads per sec. Do you know why more SSTable are created for table to as the structure is similar to table1? Commented Mar 19, 2015 at 4:14
  • That's a great point to tune within SizeTieredCompaction. Sometimes LeveledCompaction is used as a 'one size fits all' (I am very guilty of this), but that's not always the case. Commented Mar 19, 2015 at 5:34
0

You have many SSTable's and slow reads. The first thing you should do is to find out how many SSTable's are read per SELECT.

The easiest way is to inspect the corresponding MBean: In the MBean domain "org.apache.cassandra.metrics" you find your keyspace, below it your table and then the SSTablesPerReadHistorgram MBean. Cassandra records min, max, mean and also percentiles.

A very good value for the 99th percentile in SSTablesPerReadHistorgram is 1, which means you normally read only from a single table. If the number is about as high as the number of SSTable's, Cassandra is inspecting all SSTable's. In the latter case you should double-check your SELECT, whether you are doing a select on the whole primary key or not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.