95

I remotely remember that tries don't store the whole data per node, only the suffix to the parent node.

Where trees do store the whole data, but only organize themselves based on a prefix.

So tries get smaller, which allows for example to compress dictionaries very well.

Is that really the only difference?

From actual applications I remember that tries are faster in range queries. There are even special solr/lucene trie fields to speed up range queries. But how is that so?

What is the actual difference and what are the advantages and disadvantages of tries and trees?

4 Answers 4

108

A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').

Each kind of tree has a different purpose, structure and behaviour. For example, a binary tree stores a collection of comparable items (eg numbers). It can therefore be used to store a set of numbers, or to index other data that can be represented by numbers (eg objects that can be hashed). Its structure is sorted so it can be searched quickly to find a single item. Other tree structures, such as a balanced tree, are similar in principle.

A trie represents a sequence in its structure. It is very different in that it stores sequences of values rather than individual single values. Each level of recursion says 'what is the value of item I of the input list'. This is different to a binary tree which compares the single searched value to each node.

6
  • Isn't trie like sort of lame? Doesn't binary tree beat trie in every respect except for storage space?
    – Pacerier
    Commented Sep 4, 2016 at 0:01
  • There's a place for every data structure. what about finding all strings with the same prefix? O(n) access?
    – Joe
    Commented Sep 4, 2016 at 15:35
  • 2
    Wouldn't tree do that too? Let's have 1 billion entries, finding prefix of 20. Trie does it in 20 steps. Tree does it in lg 1B / lg 2 = 30 steps. Now with the same 1B entries, we find prefix of 40. Tree still does it in 30 steps, but trie does it in 40. With prefix of 100, trie now takes 100 steps while tree still takes 30.
    – Pacerier
    Commented Sep 7, 2016 at 15:11
  • 2
    Wait what, to me Tries seem if anything better. Since lookups in a trie are O(m) in the length of the string, whereas in a tree they are O(m * log n) where n is the number of nodes, since comparing two strings takes worst case O(m) time, and you need to do worst case O(log n) comparisons. Whereas Tries get to do O(1) character comparison.
    – semicolon
    Commented Jun 8, 2017 at 15:36
  • 2
    @Pacerier Trie's have better asymptotics for basically all operations, since comparisons in a Trie are O(1) because you are comparing a constant size sub-part of the value, whereas they are O(m) in a Tree.
    – semicolon
    Commented Jun 29, 2017 at 15:09
18

In tree structure, we store whole words. There is a lot of overhead. If you search for antic, you would traverse each word and compare all the strings in antic. This takes up too much time and memory:

enter image description here

However, in tries, compression is the key. We store only the common prefixes. this will reduce the reduncancy.

enter image description here

Pros of tries:

  • The search time only depends on the length of the searched string.

  • Search misses only involve examining a few characters (in particular, just the longest common prefix between the search string and the corpus stored in the tree).

  • There are no collisions of unique keys in a trie.

  • There is no need to provide a hash function or to change hash functions as more keys are added to a trie.

  • A trie can provide an alphabetical ordering of the entries by key.

unfortunately, there is no perfect data structure. So even tries do have some downsides:

  • Tries can be slower than hash tables at looking up data whenever a container is too big to fit in memory. Hash tables would need fewer disk accesses, even down to a single access, while a trie would require O(m) disk reads for a string of length m.

  • Hash tables are usually allocated in a single big and contiguous chunk of memory, while trie nodes can span the whole heap. So, the former would better exploit the principle of locality.

  • A trie’s ideal use case is storing text strings. We could, in theory, stringify any value, from numbers to objects, and store it. Yet, if we were to store floating-point numbers, for instance, there are some edge cases that can produce long meaningless paths, such as periodic or transcendent numbers, or results of certain floating points operations such as 0.1+0.2, due to issues with double-precision representation.

  • Tries have memory overhead for nodes and references.

Use tries when you have to frequently perform prefix searches ( longestPrefix or keysWithPrefix ). Use hash tables when data is stored on slow supports like disk or whenever memory locality is important.

enter image description here

Reference

9

A binary tree or a bst is typically used to store numerical values. The time complexity in a bst is O(log(n)) for insertion, deletion and searching. Each node in a binary tree has at most 2 child nodes.

Trie : Every node of trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as leaf node. A trie node field value will be used to distinguish the node as leaf node (there are other uses of the value field)

To learn about tries refer this topcoder tutorial. https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/

6

Just got some insights from this talk, even through the Radix tree used in linux kernel is slight different to the one on wikipedia.

Trees only store keys, they don't store the value associated with the keys.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.