Short Tutorial

Following is a short tutorial illustrating the main points of pb_ds. Concepts describes and summarizes some concepts.

For the most part, pb_ds's containers have the same interface as the STL's, except for the names used for the container classes themselves. For example, this shows basic operations on a collision-chaining hash-based container:

cc_hash_table<int, char> c;

c[2] = 'b';

assert(c.find(1) == c.end());

The container is called cc_hash_table as opposed to unordered_map, since "unordered map" does not necessarily mean a hash-based map (as the STL implicitly implies). For example, list-based associative containers, which are very useful for the construction of "multimaps" (see Associative-Container Performance Tests::Observations::Mapping-Semantics Considerations), are also unordered. It is also not called hash_map since there are more ways than one to implement hash tables.

This snippet shows a red-black tree based container:

tree<int, char> c;

c[2] = 'b';

assert(c.find(2) != c.end());

The container is called tree as opposed to map, since "map" doesn't say that much.

Most of the STL's familiar methods are unchanged. E.g., being, end, size, empty, and clear, do just the same as is customary. Associative-Container Examples::Basic use, and especially basic_map.cc, show examples of this.

This isn't to say that things are exactly as one would expect, given the container requirments and interfaces in the C++ standard.

The names of containers' policies and policy accessors are different than those of the STL. For example, if C is some type of hash-based container, then

C::hash_fn

gives the type of its hash functor, and if c is some hash-based container object, then

c.get_hash_fn()

will return a reference to its hash-functor object.

Similarly, if C is some type of tree-based container, then

C::cmp_fn

gives the type of its comparison functor, and if c is some tree-based container object, then

c.get_cmp_fn()

will return a reference to its comparison-functor object.

It would be nice to give names consistent with those in the existing C++ standard (inclusive of TR1). Unfortunately, these standard containers don't consistently name types and methods. For example, std::tr1::unordered_map uses hasher for the hash functor, but std::map uses key_compare for the comparison functor. Also, we could not find an accessor for std::tr1::unordered_map's hash functor, but std::map uses compare for accessing the comparison functor.

Instead, pb_ds attempts to be internally consistent, and uses standard-derived terminology if possible.

Another source of difference is in scope: pb_ds contains more types of associative containers than the STL, and more opportunities to configure these new containers, since different types of associative containers are useful in different settings (see Associative-Container Performance Tests::Observations::Underlying Data-Structure Families).

pb_ds contains different classes for hash-based containers, tree-based containers, trie-based containers, and list-based containers. Inteface::Containers::Associative Containers lists the containers. Design::Associative Containers::Hash-Based Containers, Design::Associative Containers::Tree-Based Containers, Design::Associative Containers::Trie-Based Containers, and Design::Associative Containers::List-Based Containers, explain some more about these types of containers, respectively.

Since associative containers share parts of their interface, they are organized as a class hierarchy; it is shown in Figure Class hierarchy.

Class hierarchy.

Each type or method is defined in the most-common ancestor in which it makes sense: basic_map.cc shows an example of most of the associative-container types.

For example, all associative containers support iteration. Consequently, container_base has the interface:

template<...>
class container_base
{
    ...
    
public:
    ...
    
    const_iterator
    begin() const;
    
    iterator
    begin();

    const_iterator
    end() const;
    
    iterator
    end();
        
    ...
};

and so all associative containers inherent this method. Conversely, both collision-chaining and (general) probing hash-based associative containers have a hash functor, so basic_hash_table has the interface:

template<...>
class basic_hash_table : public container_base
{
    ...
    
public:
    ...
    
    const hash_fn&
    get_hash_fn() const;
        
    hash_fn&
    get_hash_fn();
    ...
};

and so all hash-based associative containers inherit the same hash-functor accessor methods.

This is discussed further in Design::Associative Containers::Data-Structure Genericity.

Configuring Associative Containers

In general, each of pb_ds's containers is parametrized by more policies than those of the STL's. For example, the STL's hash-based container is parametrized as follows:

template<
    typename Key,
    typename Mapped,
    typename Hash,
    typename Pred,
    typename Allocator,
    bool Cache_Hashe_Code>
class unordered_map;

and so can be configured by key type, mapped type, a functor that translates keys to unsigned integral types, an equivalence predicate, an allocator, and an indicator whether to store hash values with each entry. pb_ds's collision-chaining hash-based container is parametrized as

template<
    typename Key,
    typename Mapped,
    typename Hash_Fn,
    typename Eq_Fn,
    typename Comb_Hash_Fn,
    typename Resize_Policy
    bool Store_Hash
    typename Allocator>
class cc_hash_table;

and so can be configured by the first four types of std::tr1::unordered_map, then a policy for translating the key-hash result into a position within the table, then a policy by which the table resizes, an indicator whether to store hash values with each entry, and an allocator (which is typically the last template parameter in STL containers).

Nearly all policy parameters have default values, so this need not be considered for casual use. It is important to note, however, that hash-based containers' policies can dramatically alter their performance in different settings, and that tree-based containers' policies can make them useful for other purposes than just look-up.

Design::Associative Containers::Hash-Based Containers, Design::Associative Containers::Tree-Based Containers, Design::Associative Containers::Trie-Based Containers, and Design::Associative Containers::List-Based Containers, explain some more about configuring hash based, tree based, trie based, and list base containers, respectively. Interface::Container Policy Classes shows the different policy classes for configuring associative containers. Examples::Hash-Based Containers, Examples::Tree-Like-Based Containers, and Examples::Trie-Based Containers show examples for this.

Determining Containers' Attributes

Associative-containers' underlying data structures obviously affect their performance; Unfortunately, they can also affect their interface. When manipulating generically associative containers, it is often useful to be able to statically determine what they can support and what the cannot. (This was discussed in Motivation::Associative Containers::Data-Structure Genericity.)

Happily, the STL provides a good solution to a similar problem - that of the different behavior of iterators. If It is an iterator, then

typename std::iterator_traits<It>::iterator_category

is one of a small number of pre-defined structs, and,

typename std::iterator_traits<It>::value_type

is the value type to which the iterator "points".

Similarly, in pb_ds, if C is an associative container, then

typename container_traits<C>::container_category

is one of a small number of pre-defined structs, each one corresponding to a class in Figure Class hierarchy. These tags are listed in Interface::Associative Containers::Data-Structure Tags and Traits::Data-Structure Tags::Associative-Containers; Design::Associative Containers::Data-Structure Tags and Traits explains this further; Design::Associative Containers::Data-Structure Tags and Traits::Data-structure tag class hierarchy shows a class diagram.

In most cases, however, the exact underlying data structure is not really important, but only one of its attributes: whether it guarantees storing elements by key order, for example. For this one can use

typename container_traits<C>::order_preserving

This is described further in Design::Data-Structure Genericity; assoc_container_traits.cc shows an example of querying containers' attributes.

Point-Type and Range-Type Methods and Iterators

(This subsection addresses points from Motivation::Associative Containers::Differentiating between Iterator Types.)

pb_ds differentiates between two types of methods and iterators: point-type, and range-type. For example, find and insert are point-type methods, since they each deal with a specific element; their returned iterators are point-type iterators. begin and end are range-type methods, since they are not used to find a specific element, but rather to go over all elements in a container object; their returned iterators are range-type iterators.

Most containers store elements in an order that is determined by their interface. Correspondingly, it is fine that their point-type iterators are synonymous with their range-type iterators. For example, in the following snippet

std::for_each(c.find(1), c.find(5), foo);

two point-type iterators (returned by find) are used for a range-type purpose - going over all elements whose key is between 1 and 5.

Conversely, the above snippet makes no sense for self-organizing containers - ones that order (and reorder) their elements by implementation. It would be nice to have a uniform iterator system that would allow the above snippet to compile only if it made sense.

This could trivially be done by specializing std::for_each for the case of iterators returned by std::tr1::unordered_map, but this would only solve the problem for one algorithm and one container. Fundamentally, the problem is that one can loop using a self-organizing container's point-type iterators.

pb_ds's containers define two families of iterators: const_point_iterator and point_iterator are the iterator types returned by point-type methods; const_iterator and iterator are the iterator types returned by range-type methods.

class <- some container ->
{
public:
    ...

    typedef <- something -> const_iterator;

    typedef <- something -> iterator;

    typedef <- something -> const_point_iterator;

    typedef <- something -> point_iterator;
 
    ...

public:
    ...

    const_iterator begin () const;

    iterator begin();

    const_point_iterator find(...) const;

    point_iterator find(...);
};

Design::Associative Containers::Data-Structure Genericity::Point-Type and Range-Type Methods and Iterators discusses the relationship between point-type and range-type iterators in general; for containers whose interface defines sequence order, however, it is very simple: point-type and range-type iterators are exactly the same, which means that the above snippet will compile if it is used for an order-preserving associative container.

For self-organizing containers, however, (hash-based containers as a special example), the preceding snippet will not compile, because their point-type iterators do not support operator++.

In any case, both for order-preserving and self-organizing containers, the following snippet will compile:

typename Cntnr::point_iterator it = c.find(2);

because a range-type iterator can always be converted to a point-type iterator.

Design::Associative Containers::Data-Structure Genericity::Point-Type and Range-Type Methods and Iterators discusses this further.

Motivation::Associative Containers::Differentiating between Iterator Types also raised the point that a container's iterators might have different invalidation rules concerning their de-referencing abilities and movement abilities. This now corresponds exactly to the question of whether point-type and range-type iterators are valid. As explained in Determining Containers' Attributes, container_traits allows querying a container for its data structure attributes. The iterator-invalidation guarantees are certainly a property of the underlying data structure, and so

container_traits<C>::invalidation_guarantee

gives one of three pre-determined types that answer this query. This is explained further in Design::Associative Containers::Data-Structure Genericity::Point-Type and Range-Type Methods and Iterators.

Distinguishing between Maps and Sets

Anyone familiar with the STL knows that there are four kinds of associative containers: maps, sets, multimaps, and multisets. Basic Use discussed how to use maps, i.e. containers that associate each key to some data.

Sets are associative containers that simply store keys - they do not map them to anything. In the STL, each map class has a corresponding set class. E.g., std::map<int, char> maps each int to a char, but std::set<int, char> simply stores ints. In pb_ds, however, there are no distinct classes for maps and sets. Instead, an associative container's Mapped template parameter is a policy: if it is instantiated by null_mapped_type, then it is a "set"; otherwise, it is a "map". E.g.,

cc_hash_table<int, char>

is a "map" mapping each int value to a char, but

cc_hash_table<int, null_mapped_type>

is a type that uniquely stores int values.

Once the Mapped template parameter is instantiated by null_mapped_type, then the "set" acts very similarly to the STL's sets - it does not map each key to a distinct null_mapped_type object. Also, , the container's value_type is essentially its key_type - just as with the STL's sets. For a simple example, see basic_set.cc .

The STL's multimaps and multisets allow, respectively, non-uniquely mapping keys and non-uniquely storing keys. As discussed in Motivation::Associative Containers::Alternative to Multiple Equivalent Keys, the reasons why this might be necessary are 1) that a key might be decomposed into a primary key and a secondary key, 2) that a key might appear more than once, or 3) any arbitrary combination of 1)s and 2)s. Correspondingly, one should use 1) "maps" mapping primary keys to secondary keys, 2) "maps" mapping keys to size types, or 3) any arbitrary combination of 1)s and 2)s. Thus, for example, an std::multiset<int> might be used to store multiple instances of integers, but using pb_ds's containers, one might use

tree<int, size_t>

i.e., a "map" of ints to size_ts.

Associative-Container Examples::"Multimaps" and "Multisets" shows some simple examples.

These "multimaps" and "multisets" might be confusing to anyone familiar with the STL's std::multimap and std::multiset, because there is no clear correspondence between the two. For example, in some cases where one uses std::multiset in the STL, one might use in pb_ds a "multimap" of "multisets" - i.e., a container that maps primary keys each to an associative container that maps each secondary key to the number of times it occurs.

When one uses a "multimap," one should choose with care the type of container used for secondary keys. This is further explained in Associative-Container Performance Tests::Observations::Mapping-Semantics Considerations.

Priority Queues

Basic Use

pb_ds's priority_queue container is similar to the STL's in interface. For example:

priority_queue<int> p;

p.push(2);
p.push(4);
p.push(1);

assert(p.top() == 4);

p.pop();

assert(p.top() == 2);

assert(p.size() == 2);
assert(!p.empty());

Configuring Priority Queues

As opposed to associative containers, priority queues have relatively few configuration options. The priority queue is parametrized as follows:

template<
    typename Value_Type,
    typename Cmp_Fn,
    typename Tag,
    typename Allocator>
class priority_queue;

The Value_Type, Cmp_Fn, and Allocator parameters are the container's value type, comparison-functor type, and allocator type, respectively; these are very similar to the STL's priority queue. The Tag parameter is different: there are a number of pre-defined tag types corresponding to binary heaps, binomial heaps, etc., and Tag should be instantiated by one of them. Interface::Data-Structure Tags and Traits::Data Structure Tags::Priority-Queues lists the possible types, Priority-Queue Design explains this further, and basic_priority_queue.cc shows an example.

Note that as opposed to the STL's priority queue, priority_queue is not a sequence-adapter; it is a regular container.

Supporting More Operations

priority_queue's push method returns a point-type iterator, which can be used for modifying or erasing arbitrary values. For example:

priority_queue<int> p;

priority_queue<int>::point_iterator it = p.push(3);

p.modify(it, 4);

These types of operations are necessary for making priority queues useful for different applications, especially graph applications. Priority-Queue Examples::Cross-Referencing gives some examples.

Determining Container Attributes

Similarly to container_traits (described in Associative Containers::Determining Containers' Attributes), container_traits can be used to statically determine priority-queues' attributes:

container_traits<C>::container_category

is one of a small number of predefined tag structures that identifies the underlying data structure, and

container_traits<C>::invalidation_guarantee

is its invalidation guarantee. Invalidation guarantees are especially important regarding priority queues, since in pb_ds's design, iterators are practically the only way to manipulate them.

Design::Priority Queues::Traits discusses this further. Priority-Queue Examples::Generics shows an example.