Trie Design

Overview

The trie-based container has the following declaration:

template<
    typename Key,
    typename Mapped,
    typename Cmp_Fn = std::less<Key>,
    typename Tag =  pat_trie_tag,
    template<
        typename Const_Node_Iterator,
        typename Node_Iterator,
        typename E_Access_Traits_,
        typename Allocator_>
    class Node_Update = null_trie_node_update,
    typename Allocator = std::allocator<char> >
class trie;

The parameters have the following meaning:

  1. Key is the key type.
  2. Mapped is the mapped-policy, and is explained in Tutorial::Associative Containers::Associative Containers Others than Maps.
  3. E_Access_Traits is described in Element-Access Traits.
  4. Tag specifies which underlying data structure to use, and is described shortly.
  5. Node_Update is a policy for updating node invariants. This is described in Node Invariants.
  6. Allocator is an allocator type.

The Tag parameter specifies which underlying data structure to use. Instantiating it by pat_trie_tag, specifies an underlying PATRICIA trie (explained shortly); any other tag is currently illegal.


Following is a description of a (PATRICIA) trie (pb_ds follows specifically [okasaki98mereable] and [filliatre2000ptset]).

A (PATRICIA) trie is similar to a tree, but with the following differences:

  1. It explicitly views keys as a sequence of elements. E.g., a trie can view a string as a sequence of characters; a trie can view a number as a sequence of bits.
  2. It is not (necessarily) binary. Each node has fan-out n + 1, where n is the number of distinct elements.
  3. It stores values only at leaf nodes.
  4. Internal nodes have the properties that A) each has at least two children, and B) each shares the same prefix with any of its descendant.

Element-Access Traits shows an example of such a trie.

A (PATRICIA) trie has some useful properties:

  1. It can be configured to use large node fan-out, giving it very efficient find performance (albeit at insertion complexity and size).
  2. It works well for common-prefix keys.
  3. It can support efficiently queries such as which keys match a certain prefix. This is sometimes useful in file systems and routers.

(We would like to thank Matt Austern for the suggestion to include tries.)

Element-Access Traits

A trie inherently views its keys as sequences of elements. For example, a trie can view a string as a sequence of characters. A trie needs to map each of n elements to a number in {0, n - 1}. For example, a trie can map a character c to static_cast<size_t>(c).

Seemingly, then, a trie can assume that its keys support (const) iterators, and that the value_type of this iterator can be cast to a size_t. There are several reasons, though, to decouple the mechanism by which the trie accesses its keys' elements from the trie:

  1. In some cases, the numerical value of an element is inappropriate. Consider a trie storing DNA strings. It is logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C', 'G', 'T'}|. This requires mapping 'T' to 3, though.
  2. In some cases the keys' iterators are different than what is needed. For example, a trie can be used to search for common suffixes, by using strings' reverse_iterator. As another example, a trie mapping UNICODE strings would have a huge fan-out if each node would branch on a UNICODE character; instead, one can define an iterator iterating over 8-bit (or less) groups.

trie is, consequently, parametrized by E_Access_Traits - traits which instruct how to access sequences' elements. string_trie_e_access_traits is a traits class for strings. Each such traits define some types, e.g.,

typename E_Access_Traits::const_iterator

is a const iterator iterating over a key's elements. The traits class must also define methods for obtaining an iterator to the first and last element of a key.

Figure A PATRICIA trie shows a (PATRICIA) trie resulting from inserting the words: "I wish that I could ever see a poem lovely as a trie" (which, unfortunately, does not rhyme).

The leaf nodes contain values; each internal node contains two typename E_Access_Traits::const_iterator objects, indicating the maximal common prefix of all keys in the sub-tree. For example, the shaded internal node roots a sub-tree with leafs "a" and "as". The maximal common prefix is "a". The internal node contains, consequently, to const iterators, one pointing to 'a', and the other to 's'.

no image
A PATRICIA trie.

Node Invariants

Trie-based containers support node invariants, as do tree-based containers (see Tree-Based Containers::Node Invariants). There are two minor differences, though, which, unfortunately, thwart sharing them sharing the same node-updating policies:

  1. A trie's Node_Update template-template parameter is parametrized by E_Access_Traits, while a tree's Node_Update template-template parameter is parametrized by Cmp_Fn.
  2. Tree-based containers store values in all nodes, while trie-based containers (at least in this implementation) store values in leafs.

Figure A trie and its update policy shows the scheme, as well as some predefined policies (which are explained below).

no image
A trie and its update policy.

pb_ds offers the following pre-defined trie node updating policies:

  1. trie_order_statistics_node_update supports order statistics.
  2. trie_prefix_search_node_update supports searching for ranges that match a given prefix. See trie_prefix_search.cc.
  3. null_trie_node_update is the null node updater.

Additional Methods

Trie-based containers support split and join methods; the rationale is equal to that of tree-based containers supporting these methods (see Tree-Based Containers::Additional Methods).