3

I have a table with multiple columns in HBase. The structure of the table is something like this:

row1 column=cf:c1, timestamp=xxxxxx, value=v1
row1 column=cf:c2, timestamp=xxxxxx, value=v2
row1 column=cf:c3, timestamp=xxxxxx, value=v3
...

I want to write a custom filter which can filter the value in a certain column. For example, if the value v3 in the column c3 exists, I want to include the whole row, otherwise drop it. As far as I understand, the HBase filter is based on the cell, which will include/skip just one column. I wonder if there is a type of filter in Hbase that can do the filtering like I want? And how should I implement it?

Thanks.

2 Answers 2

3

You could use SingleColumnValueFilter for this problem. Using your example, you could do this:

SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("c3"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("v3"));

Then, you can add the filter to your scan this way:

Scan scan = new Scan();
scan.setFilter(filter);

Also, if you wanted to have multiple filters you can do that too. Just make sure to add them to a FilterList and pass it to your scan (using the setFilter method).

SingleColumnValueFilter f1 = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("c3"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("v3"));
SingleColumnValueFilter f2 = new SingleColumnValueFilter(Bytes.toBytes("cf"), Bytes.toBytes("c2"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("v2"));

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE); //could be FilterList.Operator.MUST_PASS_ALL instead
filterList.addFilter(f1);
filterList.addFilter(f2);

Scan scan = new Scan();
scan.setFilter(filterList);
1

You could use SingleColumnValueFilter for both single and multiple conditions. For your case,if you need to exactly match qualifier(field) value you can try below answer:

scan '<table_name>',{FILTER=>"SingleColumnValueFilter('cf','c3',=,'binary:v3')",COLUMNS=>['cf']}

In-case for multiple columns conditions,here is the syntax:

scan '<table_name>',{FILTER=>"SingleColumnValueFilter('<column_family>','<column_qualifier>',<comp_operator>,'binary:<qualifier_value>') AND SingleColumnValueFilter('<column_family>','<column_qualifier>',<comp_operator>,'binary:<qualifier_value>')",COLUMNS=>['column_family']}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.