We have a single machine cassandra setup on which we are trying different things for analytics. One of Column family we have goes with this discription:
CREATE TABLE playground.event_user_table (
event_date date,
event_time timestamp,
author text,
content_id text,
content_model text,
event_id text,
event_type text,
PRIMARY KEY (event_date, event_time)
) WITH CLUSTERING ORDER BY (event_time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX author ON playground.event_user_table (author);
With this table, we populated data in it for different apps/models. Now when we query system with something like:
cqlsh:playground> select * from event_user_table where event_date = '2013-06-02' ;
event_date | event_time | author | content_id | content_model | event_id | event_type
------------+--------------------------+----------------------------------------+-------------------------------------------------+---------------+----------+------------
2013-06-02 | 2013-06-02 00:00:00+0000 | anuragabes@example.com | anuragabes@example.com##2##2012 | A | | submitted
2013-06-02 | 2013-06-02 01:28:13+0000 | dear_kirnesh@example.com | 1000910424 | B | | closed
2013-06-02 | 2013-06-02 01:59:31+0000 | dsammaiah@example.com | dsammaiah@example.com##1##2013 | A | | created
2013-06-02 | 2013-06-02 02:00:44+0000 | jacobg@example.com | jacobg@example.com##5##2013 | A | | created
2013-06-02 | 2013-06-02 02:02:16+0000 | kriti.jneja@example.com | kriti.jneja@example.com##4##2013 | A | | created
Result looks good and as expected. Now I query system on the secondary index of author and I get empty or partial results:
cqlsh:playground> select * from event_user_table where author = 'anuragabes@example.com' ;
event_date | event_time | author | content_id | content_model | event_id | event_type
------------+------------+--------+------------+---------------+----------+------------
(0 rows)
cqlsh:playground> select * from event_user_table where author = 'senthil.ramachandran@example.com' ;
event_date | event_time | author | content_id | content_model | event_id | event_type
------------+--------------------------+------------------------------------+------------+---------------+----------+------------
2014-01-18 | 2014-01-18 09:01:52+0000 | senthil.ramachandran@example.com | 1001068325 | SRF | | closed
(1 rows)
And I have tried this combinations of PRIMARY KEY too ((event_date, event_time), author)
but with same results. There are known issues with secondary indexes
and scaling1 but it affects single node systems too? I am not
sure about it. Time to confirm things.
Update1 : As mentioned here2, Cassandra has "'lazy' updating to secondary indexes. When you change an indexed value, you need to remove the old value from the index." Could that be the reason?