Data structure for subset-reduced growing list

Question

I'm working on a problem which involves going through a lot of data. To reduce the work (because current calculations take about two weeks of compute time, and I'd like to reduce that dramatically) I came up with an algorithm which would be much faster if it was able to avoid a certain type of duplication. (The current algorithm avoids storing this information because it is too large, unreduced, to fit in memory.)

I have a collection of sets, and I don't want to insert a set A if there is already a set B which is a subset of A. At the moment the sets are represented by integers where individual binary digits represent a particular element being present or absent. In that interpretation the set/integer A should not be inserted if there is already a set/integer B such that (~A) & B is 0, where ~ is bitwise negation and & is bitwise AND.

For example, if my collection has the following sets

[ {a,b}, {b,c}, {b,d,e} ]

and I asked to add {b,c,e} it should not be added (since {b,c} is already there) and similarly with {a,b} (since {a,b} is there) but {a,e} should be added.

The numeric equivalent would be starting with

[ `0b11`, `0b110`, `0b11010` ]

where 0b10110 is not added since (~0b10110) & 0b110 == 0, 0b11 is not added since (~0b11) ^ 0b11 == 0, but 0b10001 can be added.

Ideally the structure would prune itself as new sets are added, so if {c} were added all existing sets containing c would be removed. But it's acceptable if it doesn't update in that way as long as I can normalize it to that form in some not-too-expensive way every so often.

Charles · Accepted Answer · 2018-01-27 22:43:36Z

1

This is a well-known problem known as "finding extremal sets"; unfortunately, there is nothing fundamentally faster known than the obvious approach of testing a newly inserted set against all existing sets, but good heuristic improvements exist. Here is a recent paper discussing this problem: https://arxiv.org/abs/1508.01753

An open-source implementation of a related algorithm: https://code.google.com/archive/p/google-extremal-sets/

edited Jan 27, 2018 at 22:43

Charles

11.4k14 gold badges69 silver badges110 bronze badges

answered Jan 27, 2018 at 10:12

Falk Hüffner

5,03020 silver badges26 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Data structure for subset-reduced growing list

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
data-structures
language-agnostic
set
bit-manipulation
subset
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged data-structureslanguage-agnosticsetbit-manipulationsubset or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
data-structures
language-agnostic
set
bit-manipulation
subset
or ask your own question.