Priority-Queue Design

Overview

The priority-queue container has the following declaration:

template<
    typename Value_Type,
    typename Cmp_Fn = std::less<Value_Type>,
    typename Tag = pairing_heap_tag,
    typename Allocator = std::allocator<char> >
class priority_queue;

The parameters have the following meaning:

Value_Type is the value type.
Cmp_Fn is a value comparison functor
Tag specifies which underlying data structure to use.
Allocator is an allocator type.

The Tag parameter specifies which underlying data structure to use. Instantiating it by pairing_heap_tag, binary_heap_tag, binomial_heap_tag, rc_binomial_heap_tag, or thin_heap_tag, specifies, respectively, an underlying pairing heap [fredman86pairing], binary heap [clrs2001], binomial heap [clrs2001], a binomial heap with a redundant binary counter [maverik_lowerbounds], or a thin heap [kt99fat_heas]. These are explained further in Implementations.

As mentioned in Tutorial::Priority Queues, __gnu_pbds::priority_queue shares most of the same interface with std::priority_queue. E.g. if q is a priority queue of type Q, then q.top() will return the "largest" value in the container (according to typename Q::cmp_fn). __gnu_pbds::priority_queue has a larger (and very slightly different) interface than std::priority_queue, however, since typically push and pop are deemed insufficient for manipulating priority-queues.

Different settings require different priority-queue implementations which are described in Implementations; Traits discusses ways to differentiate between the different traits of different implementations.

Iterators

There are many different underlying-data structures for implementing priority queues. Unfortunately, most such structures are oriented towards making push and top efficient, and consequently don't allow efficient access of other elements: for instance, they cannot support an efficient find method. In the use case where it is important to both access and "do something with" an arbitrary value, one would be out of luck. For example, many graph algorithms require modifying a value (typically increasing it in the sense of the priority queue's comparison functor).

In order to access and manipulate an arbitrary value in a priority queue, one needs to reference the internals of the priority queue from some form of an associative container - this is unavoidable. Of course, in order to maintain the encapsulation of the priority queue, this needs to be done in a way that minimizes exposure to implementation internals.

In pb_ds the priority queue's insert method returns an iterator, which if valid can be used for subsequent modify and erase operations. This both preserves the priority queue's encapsulation, and allows accessing arbitrary values (since the returned iterators from the push operation can be stored in some form of associative container).

Priority queues' iterators present a problem regarding their invalidation guarantees. One assumes that calling operator++ on an iterator will associate it with the "next" value. Priority-queues are self-organizing: each operation changes what the "next" value means. Consequently, it does not make sense that push will return an iterator that can be incremented - this can have no possible use. Also, as in the case of hash-based containers, it is awkward to define if a subsequent push operation invalidates a prior returned iterator: it invalidates it in the sense that its "next" value is not related to what it previously considered to be its "next" value. However, it might not invalidate it, in the sense that it can be de-referenced and used for modify and erase operations.

Similarly to the case of the other unordered associative containers, pb_ds uses a distinction between point-type and range type iterators. A priority queue's iterator can always be converted to a point_iterator, and a const_iterator can always be converted to a const_point_iterator.

The following snippet demonstrates manipulating an arbitrary value:

// A priority queue of integers.
priority_queue<int> p;

// Insert some values into the priority queue.
priority_queue<int>::point_iterator it = p.push(0);

p.push(1);
p.push(2);

// Now modify a value.
p.modify(it, 3);

assert(p.top() == 3);

(Priority Queue Examples::Cross-Referencing shows a more detailed example.)

It should be noted that an alternative design could embed an associative container in a priority queue. Could, but most probably should not. To begin with, it should be noted that one could always encapsulate a priority queue and an associative container mapping values to priority queue iterators with no performance loss. One cannot, however, "un-encapsulate" a priority queue embedding an associative container, which might lead to performance loss. Assume, that one needs to associate each value with some data unrelated to priority queues. Then using pb_ds's design, one could use an associative container mapping each value to a pair consisting of this data and a priority queue's iterator. Using the embedded method would need to use two associative containers. Similar problems might arise in cases where a value can reside simultaneously in many priority queues.

Implementations

There are three main implementations of priority queues: the first employs a binary heap, typically one which uses a sequence; the second uses a tree (or forest of trees), which is typically less structured than an associative container's tree; the third simply uses an associative container. These are shown, respectively, in Figures Underlying Priority-Queue Data-Structures A1 and A2, Figure Underlying Priority-Queue Data-Structures B, and Figures Underlying Priority-Queue Data-Structures C.

Underlying Priority-Queue Data-Structures.

Roughly speaking, any value that is both pushed and popped from a priority queue must incur a logarithmic expense (in the amortized sense). Any priority queue implementation that would avoid this, would violate known bounds on comparison-based sorting (see, e.g., [clrs2001] and brodal96priority]).

Most implementations do not differ in the asymptotic amortized complexity of push and pop operations, but they differ in the constants involved, in the complexity of other operations (e.g., modify), and in the worst-case complexity of single operations. In general, the more "structured" an implementation (i.e., the more internal invariants it possesses) - the higher its amortized complexity of push and pop operations.

pb_ds implements different algorithms using a single class: priority_queue. Instantiating the Tag template parameter, "selects" the implementation:

Instantiating Tag = binary_heap_tag creates a binary heap of the form in Figures Underlying Priority-Queue Data-Structures A1 or A2. The former is internally selected by priority_queue if Value_Type is instantiated by a primitive type (e.g., an int); the latter is internally selected for all other types (e.g., std::string). This implementations is relatively unstructured, and so has good push and pop performance; it is the "best-in-kind" for primitive types, e.g., ints. Conversely, it has high worst-case performance, and can support only linear-time modify and erase operations; this is explained further in Traits.
Instantiating Tag = pairing_heap_tag creates a pairing heap of the form in Figure Underlying Priority-Queue Data-Structures B. This implementations too is relatively unstructured, and so has good push and pop performance; it is the "best-in-kind" for non-primitive types, e.g., std:strings. It also has very good worst-case push and join performance (O(1)), but has high worst-case pop complexity.
Instantiating Tag = binomial_heap_tag creates a binomial heap of the form in Figure Underlying Priority-Queue Data-Structures B. This implementations is more structured than a pairing heap, and so has worse push and pop performance. Conversely, it has sub-linear worst-case bounds for pop, e.g., and so it might be preferred in cases where responsiveness is important.
Instantiating Tag = rc_binomial_heap_tag creates a binomial heap of the form in Figure Underlying Priority-Queue Data-Structures B, accompanied by a redundant counter which governs the trees. This implementations is therefore more structured than a binomial heap, and so has worse push and pop performance. Conversely, it guarantees O(1) push complexity, and so it might be preferred in cases where the responsiveness of a binomial heap is insufficient.
Instantiating Tag = thin_heap_tag creates a thin heap of the form in Figure Underlying Priority-Queue Data-Structures B. This implementations too is more structured than a pairing heap, and so has worse push and pop performance. Conversely, it has better worst-case and identical amortized complexities than a Fibonacci heap, and so might be more appropriate for some graph algorithms.

Priority-Queue Performance Tests shows some results for the above, and discusses these points further.

Of course, one can use any order-preserving associative container as a priority queue, as in Figure Underlying Priority-Queue Data-Structures C, possibly by creating an adapter class over the associative container (much as std::priority_queue can adapt std::vector). This has the advantage that no cross-referencing is necessary at all; the priority queue itself is an associative container. Most associative containers are too structured to compete with priority queues in terms of push and pop performance.

Traits

It would be nice if all priority queues could share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining two binary heaps might throw an exception (not corrupt any of the heaps on which it operates), but joining two pairing heaps is exception free.

Tags and traits are very useful for manipulating generic types. __gnu_pbds::priority_queue publicly defines container_category as one of the tags discussed in Implementations. Given any container Cntnr, the tag of the underlying data structure can be found via typename Cntnr::container_category; this is one of the types shown in Figure Data-structure tag class hierarchy.

Data-structure tag class hierarchy.

Additionally, a traits mechanism can be used to query a container type for its attributes. Given any container Cntnr, then __gnu_pbds::container_traits<Cntnr> is a traits class identifying the properties of the container.

To find if a container might throw if two of its objects are joined, one can use container_traits<Cntnr>::split_join_can_throw, for example.

Different priority-queue implementations have different invalidation guarantees. This is especially important, since as explained in Iterators, there is no way to access an arbitrary value of priority queues except for iterators. Similarly to associative containers, one can use container_traits<Cntnr>::invalidation_guarantee to get the invalidation guarantee type of a priority queue.

It is easy to understand from Figure Underlying Priority-Queue Data-Structures, what container_traits<Cntnr>::invalidation_guarantee will be for different implementations. All implementations of type Underlying Priority-Queue Data-Structures B have point_invalidation_guarantee: the container can freely internally reorganize the nodes - range-type iterators are invalidated, but point-type iterators are always valid. Implementations of type Underlying Priority-Queue Data-Structures A1 and A2 have basic_invalidation_guarantee: the container can freely internally reallocate the array - both point-type and range-type iterators might be invalidated.

This has major implications, and constitutes a good reason to avoid using binary heaps. A binary heap can perform modify or erase efficiently given a valid point-type iterator. However, inn order to supply it with a valid point-type iterator, one needs to iterate (linearly) over all values, then supply the relevant iterator (recall that a range-type iterator can always be converted to a point-type iterator). This means that if the number of modify or erase operations is non-negligible (say super-logarithmic in the total sequence of operations) - binary heaps will perform badly.