Copyright Donovan Rebbechi 2001. This tutorial may be distributed freely if this copyright notice is preserved. Changes require authors permission. The code is no-strings attached free software, it may be redistributed, modified, and used.
A topic that is often addressed in introductory computer science courses is the design and implementation of a linked list. This seemingly simple task turns out to be surprisingly difficult. Unfortunately, most textbooks make a terrible mess of it. We examine some of the design issues that surface in the process of designing and implementing such a class, and offer some implementations.
A linked list is a data structure that consists of a sequence of nodes, x1, ..., xn where each node xi contains the following information:
A C++ implementation of a node is quite simple:
template <class T> struct Node { Node (const T& x, Node* y):data(x),next(y) {} T data; Node* next; };
The main desirable property of the linked list is that the ordering
is a property of the Node*
data members of the list.
This means that a lot of algorithms only require pointer manipulations.
For example, to delete a node x2 from a linked list,
the following code suffices:
x1->next = x2->next; delete x2;
Notice that this operation did not require a change in the memory address of
other nodes in the list, and it did not require us to move data around (deleting
the second element of an array on the other hand would require this.)
In fact, even operations such as sorting a linked list do not invalidate
pointers to elements in the list, because the ordering is a
property of the values of the member variable next
in each node.
This is often a very useful property. For
example, it is possible to maintain a linked list in a sorted state, and
hold pointers to elements in the list, while adding data to the list. The
pointers do not need to be updated when new elements are added.
On the other hand, linked lists do have some disadvantages. For example, they support sequential access but not random access. To locate the nth element, one must dereference n pointers. This is in contrast to the constant time element access in an array.
Some good advice found in one textbook (which unfortunately does not appear to heed its own advice) is to pay careful attention to how an interface will be actually used, when designing it. It's instructive to write some code that uses your class before you implement it. This may seem perverse to the uninitiated, but it is in fact very useful. First, it has the benefit of making sure that the class is usable, and secondly, it serves as a kind of instant test suite.
The most difficult aspect of designing a list class is element access. While this seems very simple, it's surprisingly subtle, and textbooks do a surprisingly good job of botching it. The following are key questions:
find, insert
and erase
require a means of representing position.
Addressing the former issue immediately addresses the latter, because if we have a way to represent a position in the list, then we can use an operation that moves from one position to the next, to iterate.
First, we look at the first consideration.
Clearly, the best way to flexibly address the former concern would be to provide a way to address the latter. There are a few ways to access elements in a list:
The first approach is not acceptable. It suffers from the substantial problem that simply accessing data modifies the state of the list. Here is an example of code that will break if we use the bad approach:
find (mylist,3); // sets current element print (mylist); // print() modifies current element cout << mylist.current();
The problem with the above code is that the print() statement needs to modify the "current" element. This has a few consequences: it means that we cannot pass a list by const reference to any useful function (since most useful functions will modify the "current" element), and the deeper problem is that seemingly innocent statements, even debugging statements, modify the state of the list. The prospect of debugging statements changing program behaviour should send shivers down the spine of any programmer. Such behaviour results in infuriating Heisenbugs -- bugs whose visibility depends on whether or not debugging is enabled.
This raises an issue -- change in the state of the "current" element should not be considered a change in the list. In other words, there is a need for a representation of "current element" in the list that is not part of the list itself. Having found our first approach unworkable, we move on to the other two alternatives.
An approach that appeals to beginners, and even some misguided textbook authors, is to use array notation to access elements of linked lists. This is tempting, because programmers become acquainted with arrays at an early level, and from this stems a natural desire to treat any sequence container as an array. Resist this temptation at all costs!!! There are several problems with this, but they all boil down to a very simple fact:
a linked list is not an array, and is not a random access data structureAnd hence attempts to treat it as one inevitably lead to disaster. Consider the following code:
for (int i = 0; i < mylist.size(); ++i ) cout << mylist[i] << endl;
Accessing the nth node takes n operations where n is the size of the list, so the loop takes 1 + 2 + 3 + ... + n = n(n+1)/2 operations. This is disasterous for large values of n, considering that we can do the same in linear time if we use the pointer based or iterator based approach.
Another fundamental problem with integer offsets is that they are invalidated by insertions and deletions. For example, consider this code:
int pos = mylist.find ( "Thomas" ); mylist.insert ( "William" ); cout << mylist[pos]; // pos no longer refers to "Thomas"This is a weakness that we are stuck with when we work with arrays, but there is no reason why we should suffer the same for lists, because inserting in a list does not change the mewmory address of the nodes.
Suffice it to say that we will not seriously consider implementing a linked list that uses this misguided approach as the primary means of element access.
The second approach is the one that is usually used in C implementations. In a C++ implementation, there are two different sub-categories of this approach.
void insert_front ( Node*& head, const element_type & x) { Node* tmp = new Node(head, x); head = tmp; } // example client code insert_front(mylist, x); Node* insert_front ( Node* head, const element_type & x) { Node* tmp = new Node(head, x); return tmp; } // example client code head = insert_front(mylist, x);The linked-list-is-a-node approach is problematic in that calling them on a Node that is not at the head of the list will result in disaster (it will "unlink" the list, making access to elements beyond the operand inaccesible, and it will leave the pointer to the operand pointing at garbage)
The linked-list-is-not-a-node approach has the advantage that encapsulation makes it possible to ensure that client code does not corrupt the list. An example of code using this approach:
void insert_front ( const element_type & x) { Node* tmp = new Node(head, x); head = tmp; } // example client code mylist.insert_front(x);Notice that client code does not have the opportunity to corrupt the list, because the client code does not have to know which Node is at the front of the list.
This approach uses a special iterator class that encapsulates a node pointer. One of the key advantages of this is that we can use this to provide consistent semantics to different container types. We defer further discussion of iterators.
Recall our previous Node
implementation:
struct Node { Node(const T& x, Node* y):data(x),next(y) {} T data; Node* next; };
Notice the constructor. This makes it simpler to write code that creates list nodes. Even if we were implementing the list in C, it would be desirable to write a function that creates a node. Notice also the fact that the node is a struct. This is intentional -- the node is a simple data aggregate.
Now we need to design the interface for the list. The first thing to consider is what operations should a linked list support. Here is a start:
In addition, we also need to be able to create lists. This gives rise to the following creational operations:
There are other operations that may be required, and have consequences for our design. For example, fast removal at the back of the list, and concatenation of two lists. These require us to keep track of the last node of the list. Depending on how important these operations are, it may be useful for the list class to hold a pointer to the last element of the list.
Another possible design consideration is using a sentinel node at the beginning of the list. This has the drawback of adding overhead, but is advantageous in that it makes it possible to avoid "special-casing".
In our examples, we will decline both the "sentinel" and the "tail pointer" design enhancements, while observing that these are actually quite useful in certain situations, and could be introduced to our class via a wrapper class.
So we make a first attempt at writing an interface. For now, we omit the destructor. Strictly speaking, it isn't part of the interface, it's an implementation detail specific to the C++ language.
template <class T> class List { Node* head; public: struct Node { Node(const T& x, Node* y):data(x),next(y) {} T data; Node* next; }; void push_front(const T& x) { Node* tmp = new Node(x, head); head = tmp; } List(); List(const List&); List& operator=(const List&); bool empty(); void pop_front(); T& front(); };
We get stuck on the last two items. To "insert anywhere in the list", we need a way to represent a place in the list. Fortunately, our previous discussion has prepared us for this.
Another problem with singly linked lists is that given a node in the list, we can not erase that node in constant time, and we can't insert a node before it in constant time. This is because these operations require manipulation of the preceding node. This is a problem with singly linked lists that detracts from their usability, and the simplest solution is to use a doubly linked list instead if you want a more flexible, general purpose class (one could write iterators that contain ordered pairs of succesive node pointers, but given this overhead, one might as well use a doubly linked list)
We address this problem by using erase_after
and
insert_after
methods, that operate on the successor to the
function argument. The ugliness of this is noted.
We first look into implementing a list, using the pointer based approach, with the linked-list-is-not-a-node model. We previously raised the importance of considering how the class is actually going to be used prior to implementing it. In this example, we could iterate over the list by using node pointers. For example, we could write a loop, and an insertion like this:
for ( Node* i = L.begin(); i!=L.end(); i=i-> next ) cout << i->data << endl; Node* x = L.find (3); L.insert_after(x,4);
The notation is not that elegant, but it works.
There are a few issues that surface during implementation. For example,
it proves useful to implement a swap()
function that swaps
the data of two lists. This function makes the assignment operator easier
to implement (it can reuse the copy constructor) and it also offers a
cheap destructive copy operation.
Another implementation issue is that copying a singly linked list requires a function to reverse the list, because the obvious strategy of iterating over the nodes and inserting them gives us a list in the wrong order. So a method that reverses the order of the list by pointer manipulation proves desirable.
Another subtle issue is const
correctness. To declare a method
that gives out pointers to data in the list const
is dangerous,
because it would allow functions that accepted a const
reference
to a list to use these pointers to modify the list. So we provide
const
versions of the methods begin()
and front()
that return a const
pointer/reference.
Here is what the code would look like:, and here is a simple driver program
template <class T> class List { public: struct Node { Node(const T& data, Node* next=0):data(data),next(next) {} Node* next; T data; }; List() : head(0) {} List(const List& L) : head(0) { // copy in reverse order for ( const Node* i = L.begin(); i!= L.end(); i=i->next ) push_front(i->data); reverse(); } void reverse() { Node* p = 0; Node* i = begin(); Node* n; while (i) { n = i->next; i->next = p; p = i; i = n; } head = p; } void swap(List& x) { Node* tmp = head; head = x.head; x.head = tmp; } List& operator=(const List& x) { List tmp(x); swap(tmp); return *this; } ~List() { clear(); } void clear() { while (!empty()) pop_front(); } bool empty() { return ! head; } void push_front(const T& x) { Node* tmp = new Node(x,head); head = tmp; } void pop_front() { if (head) { Node* tmp = head; head=head->next; delete tmp; } } void insert_after(Node* x, const T& data) { Node* tmp = new Node(data, x->next); x->next = tmp; } void erase_after(Node* x) { Node* tmp = x->next; if (tmp) { x->next = x->next->next; delete tmp; } } T& front() { return head->data; } const T& front() const { return head->data; } Node* begin() { return head; } Node* end() { return 0; } const Node* begin() const { return head; } const Node* end() const { return 0; } private: Node* head; };
#include "simple_list.h" #include <iostream> int main() { List<int> X; X.push_front(3); X.push_front(2); X.push_front(1); for (List<int>::Node* it = X.begin(); it; it = it->next ) cout << it->data << endl; X.reverse(); for (List<int>::Node* it = X.begin(); it; it = it->next ) cout << it->data << endl; }
Node
is straightforward, and has been
covered.
List()The default constructor initialises
head
to a
NULL
pointer. This is important, because of the class invariant that the element
one past the end of the list (or, in the case of an empty list, the pointer to
the "first Node") is signified by a NULL
pointer.
List(const List& L)The copy constructor traverses its argument, but the problem is that the data is inserted at the front. So the elements at the front of the list are inserted first, and hence end up at the end of the list. So we need to reverse the list to rectify the situation. This requires a reverse() member function.
void reverse()Implementing this function is delicate. We need to traverse the list, and modify the
next
pointer of each node, so it points to
the previous node. This requires that we maintain the following information
in our loop:
i
that we are changing.
next
pointer
in the current node to this addres, and it is not possible to
backtrack in a singly linked list.
next
pointer in the
current node, we have no way of accessing the next node.
So we need to save the address of the next node before
modifying the current node. Note the code:
while(i) { n = i->next;This saves the address of the next node, so we can continue moving through the list, after we modify the
next
pointer:
i->next = p; // make the next pointer point to previous node. p = i; // update previous node to point to "current" i = n; // recall the address of the node after i
void swap(List& x)The swap function is a very useful idiom in C++. It offers a means to transfer ownership:
List tmp; tmp.swap(x); // tmp "steals" the data from xAnd in circumstances where it's desirable to swap data, it makes it possible to do this in constant time (that is, the time does not depend on the length of the list, since we only swap head pointers) We will later see that this also provides us with a means to implement the assignment operator in terms of the copy constructor.
List& operator=(const List& );This is a very easy function to implement, with the help of a copy constructor.
~List()The destructor removes all the nodes from the list. Since this functionality is useful to client code, we implement this in a separate function,
clear()
.
void clear()
The implementation of this is straightforward. It deletes the first node in the list until the list is empty.
bool empty()The implementation of this is straightforward. Note that there is an implicit coversion from pointer to
bool
, we needn't provide an explicit
conversion.
void push_front(const T& x)This inserts a node. This is a very straightforward task -- create the new node, and update the list so that the head node points to the address of our new node.
void pop_front()This removes the first node. Observant readers may notice that we don't return the value of the deleted node in this function. There is an important reason why we don't do this -- exception safety. A detailed discussion of this topic is outside the scope of this discussion. For further reading, I recommend Exceptional C++ by Herb Sutter.
void insert_after(Node* x, const T& data); void erase_after(Node* x);We previously discussed the reasons for inserting or erasing after the current node -- it's because we can't backtrack in a singly linked list. Note that we check the special case where
erase_after
is called
on the last node of the list. In this case, erase_after
has
no effect.
T& front(); const T& front() const ;We provide two versions of
front()
.
The compiler automatically decides which version of the function gets called.
One of these is automatically
called when we have a const
list (for example, if a list is
passed to a function by const
reference), the other is used on
non-const lists. front()
can be used in conjunction with
pop_front
.
Foo x = mylist.front(); mylist.pop_front();The exception safety benefit of this approach is that we know that we have succesfully copied the element at the front of the list before it is removed.
Node* begin(); Node* end(); const Node* begin() const; const Node* end() const;
The end() function is not strictly necessary, it is included for the
benefit of programmers who are used to STL conventions (checking
i != mylist.end()
as opposed to i!= NULL
)
Like the front()
member function, we need a const
and non-const
version. We discuss the reasoning behind
this further on.
This list is still somewhat incomplete. It would be nice to have methods that perform splice and merge operations, as well as sorting and searching. But the code we have is a good start.
The simple implementation has some positive traits that set it apart from many textbook examples. It has the following properties:
However, there is one key drawback:
Node
is an implementation detail, which
exposes the structure of the list. This violates encapsulation.
A more abstract and ideal representation would be to have a class
to represent position, and give the class similar semantics to the
familiar C++ pointer: the ++
operator to advance, and
the *
operator to get at the data in the node.
In the case of the list, it happens that Node
pointers
do not fare too badly as a means of representing position, because their
structure closely resembles the interface we want to expose to client code.
However, more advanced examples do not share this property. A good example
of this is a sorted list implemented as a binary search tree. A node in
such a class looks like this:
template <class T> struct Node { Node* parent; Node* left; Node* right; T data; };The structure of such a node is not very helpful to client code. It is true that we could turn
Node
into a full-fledged class,
with private data and public member functions. But we would still be stuck with
explicit pointers, and we'd also be in the awkward position of using a class
that serves two separate purposes (representing a node in a data structure,
and a bookmark for client code)
There are classes that are designed especially for abstracting the notion of position in a container. These classes are called iterators, and not only are they very useful, they are the focus of the next section.
An iterator is an object that represents a position. It makes it possible to move around in a container, without exposing the structure of that container. All container iterators support a "fetch" operation that retrieves the data at a given position, as well as comparison and assignment operations. But different iterators support different operations. For example, an iterator for a singly linked list supports "forward" movement, but not "backward" movement. An iterator for an array class supports movement in both directions, and supports constant time addition (random access). There are different types of iterators, with different capabilities:
General operations for iterators:
*it; // fetch data referenced by an iterator; ++it; it++; // move a forward iterator forward. --it; it--; // move bidirectional iterator backward. it += n; it -=n; // increment/decrement random access iterator i1 == i2; j1 == j2; // comparison i1 = x.begin(); // assignment
Iterators make for containers that are very consistent and easy to use. To get some idea of the power of iterators, here is some example code that uses iterators for vastly different STL containers:
// print out the elements of an STL set (implemented as a binary // search tree) for (set<double>::iterator it = x.begin(); it != x.end(); ++it ) cout << *it << endl; // print out the elements of an STL list (doubly linked list) for (list<double>::iterator it = y.begin(); it != y.end(); ++it ) cout << *it << endl; // print out the elements of an STL vector (dynamic array) for (vector<double>::iterator it = z.begin(); it != z.end(); ++it ) cout << *it << endl;
As you can see, the iterator lets the user get at the data, and keeps
the details of the underlying data structure out of the way. While
it's useful to know that the set is implemented as a binary tree, and
has those performance characteristics (namely, it's maintained in
sorted order, and searching takes O(log(n)) ), the complexity of the
underlying data structure does not interfere with accessing data in
the container. There is a good reason that the STL set is called
set
and not binary_search_tree
-- the STL
has diffused the complexity to the point that it really does behave
like a "set", and iterators deserve a substantial amount of the credit
for the success of this abstraction.
Iterators also have the advantage that one can use them to write generic functions that operate on pairs of iterators (ranges), and are entirely container independent (though they may be dependent on capaibilities of the iterator). For example, we could use the copy function to write some code that prints the contents of the above containers as follows:
// print out the elements of an STL set (implemented as a binary // search tree) copy (x.begin(),x.end(),ostream_iterator<double>(cout,"\n")); // print out the elements of an STL list (doubly linked list) copy (y.begin(),y.end(),ostream_iterator<double>(cout,"\n")); // print out the elements of an STL vector (dynamic array) copy (z.begin(),z.end(),ostream_iterator<double>(cout,"\n"));
An iterator that refers to a place in the list should support the following operations:
T& operator*();overloading the operator gives us pointer-like semantics for lists.
it++; ++it
it1 == it2; it1 != it2
iterator i; iterator j(it); i = j;
Some things we can't do --
( it +5 )
in constant time, because
this is a linear time operation in a linked list.
( it1 < it2 )
in constant time.
T& operator*(); iterator& operator++(); // ++it iterator operator++(int); // it++ bool operator==(const iterator&); // i == j bool operator!=(const iterator&); // i != j iterator(); // iterator i; iterator(Node*); iterator(const iterator&); // iterator j(i) iterator& operator=(const iterator&); // iterator j; j = i;
Now we can add operations that depend on position to the list:
iterator begin(); // return iterator at front of list iterator end(); // return iterator at end of list insert_after(iterator,const T&); // insert after iterator erase_after(iterator); // erase after iterator
We need to insert after the given iterator, because there is no way to backtrack in a singly linked list. We have the same issue with erase -- to erase a given node, we need to modify the node that precedes it.
You may have observed that the methods begin()
and
end()
were not declared const
. There
is a reason for this -- suppose we pass our list to a function by
const
reference. If we declared begin()
const, then this code would be legal:
void do_something(const list<int> & a ) { *a.begin() = 3; // modify list }
This is obviously a problem -- the function prototype claims to guarantee
that the state of the list will remain unchanged, but does not honor this
contract. To deal with this, we need a separate type for
constant iterators, const_iterator
. This supports the same
operations as iterator, with the following differences:
const_iterator
from an
iterator
.
operator*()
returns a const reference.
Since there is an implicit conversion from iterator to const_iterator, there is no need to provide more comparison operators to handle mixed comparisons.
So this results in the addition of the following methods:
const_iterator begin() const; const_iterator end() const; const T& front() const;
The standard template library (STL) offers a number of algorithms that we can use on our list for free, provided that we make our iterators STL compatible. The main thing this requires is several typedefs. in particular, the following information is expected:
typedef T value_type; // type of element typedef T& reference; // return type of operator*() typedef T* pointer; // return type of operator->() difference_type; // type used to measure distance between iterators typedef std::forward_iterator_tag iterator_category;
The simplest way to do this is derive iterator
from
std::iterator
. The standard iterator class is a template
that takes two parameters -- an iterator tag, and a type.
For example, for a list iterator
, we want to derive from
std::iterator<std::forward_iterator_tag, T >and for a
const_iterator
, we derive from
std::iterator<std::forward_iterator_tag, const T >
The reason we use const T is that this makes the pointer and reference types const *T and const T&.
It should be noted that there is no requirement that iterators be derived from std::iterator. However, it is a useful technique.
The iterator category tells algorithms about the capabilities of the
iterator. For example, this tells the compiler that our iterators cannot
be used with an algorithm that requires bidierctional or random access
iteration. The typedefs for const_iterator
are similar,
just replace references and pointers with const
versions.
Finally, we have our list class and driver program
#include <iterator> template <class T> class List { struct Node { Node(const T& x,Node* y = 0):m_data(x),m_next(y){} T m_data; Node* m_next; }; Node* m_head; public: class iterator : public std::iterator<std::forward_iterator_tag, T> { Node* m_rep; public: friend class const_iterator; friend class List; inline iterator(Node* x=0):m_rep(x){} inline iterator(const iterator& x):m_rep(x.m_rep) {} inline iterator& operator=(const iterator& x) { m_rep=x.m_rep; return *this; } inline iterator& operator++() { m_rep = m_rep->m_next; return *this; } inline iterator operator++(int) { iterator tmp(*this); m_rep = m_rep->m_next; return tmp; } inline reference operator*() const { return m_rep->m_data; } inline pointer operator->() const { return m_rep; } inline bool operator==(const iterator& x) const { return m_rep == x.m_rep; } inline bool operator!=(const iterator& x) const { return m_rep != x.m_rep; } }; class const_iterator : public std::iterator<std::forward_iterator_tag, const T> { const Node* m_rep; public: friend class iterator; friend class List; inline const_iterator(const Node* x=0):m_rep(x){} inline const_iterator(const const_iterator& x):m_rep(x.m_rep) {} inline const_iterator(const iterator& x):m_rep(x.m_rep){} inline const_iterator& operator=(const const_iterator& x) { m_rep=x.m_rep; return *this; } inline const_iterator& operator=(const iterator& x) { m_rep=x.m_rep; return *this; } inline const_iterator& operator++() { m_rep = m_rep->m_next; return *this; } inline const_iterator operator++(int) { const_iterator tmp(*this); m_rep = m_rep->m_next; return tmp; } inline reference operator*() const { return m_rep->m_data; } inline pointer operator->() const { return m_rep; } inline bool operator==(const const_iterator& x) const { return m_rep == x.m_rep; } inline bool operator!=(const const_iterator& x) const { return m_rep != x.m_rep; } }; List() : m_head(0) {} List(const List& L) : m_head(0) { for ( const_iterator i = L.begin(); i!=L.end(); ++i ) push_front(*i); reverse(); } void reverse() { Node* p = 0; Node* i = m_head; Node* n; while (i) { n = i->m_next; i->m_next = p; p = i; i = n; } m_head = p; } void swap(List& x) { Node* tmp = m_head; m_head = x.m_head; x.m_head = tmp; } List& operator=(const List& x) { List tmp(x); swap(tmp); return *this; } ~List() { clear(); } void clear() { while (!empty()) pop_front(); } inline void push_front(const T&x) { Node* tmp = new Node(x); tmp->m_next = m_head; m_head = tmp; } inline void pop_front() { if (m_head) { Node* newhead = m_head->m_next; delete m_head; m_head = newhead; } } inline bool empty() { return m_head; } inline T& front() { return *begin(); } inline const T& front() const { return *begin(); } inline iterator begin() { return iterator(m_head); } inline iterator end() { return iterator(); } inline const_iterator begin() const { return m_head; } inline const_iterator end() const { return const_iterator(); } void erase_after (iterator& x) { Node* tmp = x.m_rep->m_next; if (x.m_rep->m_next) x.m_rep->m_next = x.m_rep->m_next->m_next; delete tmp; } void insert_after (iterator& x, const T& y) { Node* tmp = new Node(y,x.m_rep->m_next); x.m_rep->m_next = tmp; } };
// linked list driver program #include "linked_list.h" #include <algorithm> #include <numeric> #include <iostream> template <class T> void print(const List<T> & x) { std::copy(x.begin(), x.end(), std::ostream_iterator<T>(std::cout,"\n")); } int main() { List<int> x; x.push_front(3); x.push_front(4); x.push_front(9); print(x); List<int>::iterator it = std::find(x.begin(),x.end(),9); if (it != x.end()) std::cout << *it << std::endl; x.insert_after (it,11); std::cout << "The list is" << std::endl; print (x); int sum = std::accumulate ( x.begin(),x.end(),0); int sumsq = std::inner_product( x.begin(),x.end(),x.begin(),0); std::cout << "sum " << sum << std::endl; std::cout << "sum sq " << sumsq << std::endl; std::cout << "Copying ... " << std::endl; List<int> y(x); print (y); std::cout << "Done with copy" << std::endl; List<int>::const_iterator cit; std::cout << "** "; for (cit = y.begin(); cit != y.end(); ++cit) std::cout << *cit << ' '; std::cout << std::endl; }
There are several other possible variations on the linked list theme. We discuss some of them:
template < class T> struct Node { Node( const T& data, Node* prev, Node* next):data(data),prev(prev),next(next) {} T data; Node* next; Node* prev; };An iterator in a doubly linked list is bidirectional -- we can move backwards or forwards in a doubly linked list. Doubly linked lists do not suffer from the awkward special-casing issues associated with singly linked lists. As such, they are useful as general purpose classes, which is why the STL includes a class with similar performance characteristics (the C++ standards document tends to beat around the bush -- it doesn't specifically mandate a linked list, but rather, enumerates the properties of a linked list, and mandates a class with those properties)
(x0 , x2 , ... , xn ) -> (xi mod n , x(i+2) mod n , ... , x(i-1+n) mode n )This becomes a constant time operation.
std::list
class uses a sentinel node and a circular list.
Note that they do not use the same trick for their singly linked
slist
class (which is very similar to the previous example), because
while list
is a general use class, slist
is intended
to provide maximum speed, and minimum size overhead.
The main possible advantage of exposing the circular structure to client
code is that it could make use ranges of iterators that spanned the
"end" of the list. But there is also an obvious problem -- if we use
the obvious implementation, we have begin() == end()
.
We can circumvent this problem by using a sentinel, but using a sentinel
is a problem, in that the sentinel will get in the way if we want to use
ranges that span the "end" of the list. Another alternative is to
linearise iterators by storing state information -- an initial node, and
a count of how many "laps" the iterator has done. This results in iterators
that are larger, and more fragile (since the state depends on two node pointers,
deleting either pointer will invalidate the iterator)
But it offers the following benefits:
We implement a circular linked list, with the following properties:
The bottom line is that we trade a slight amount of overhead (a sentinel node) for a fair degree of flexibility. At a first glance, it doesn't appear obvious that this design helps us write code that inserts before a given iterator. The trick here is to use "off-by-one" iterators. In particular, dereferencing an iterator produces the element at the next node:
inline T& operator*() { return m_rep->m_next->m_data; }The iterator has the effect of abstracting an "always look ahead by one position" strategy. Instead of providing messy operations such as find the iterator before the element whose value is 9, or erase the iterator after this one, which amounts to forcing the client code to look ahead, the iterators automatically do this work. This is possible because of the fact that we have a sentinel node, hence every node in the list has a predecessor. In fact, even the sentinel node has a predecessor, because the list is circular.
The main change in this list with respect to the non-circular list is
the fact that we don't keep track of the head node, we keep track of a
pointer m_node
that points to the last element in the list.
Hence m_node->next
points to the dummy sentinel node,
and m_node->next->next
points to the first node
in the list.
We dicuss the important changes:
void reverse()This function is tricky to implement, because looping in a circular list is not that easy. The problem is that when iterating over a circular list, the termination condition is the same as the initial condition. One gets around this problem by terminating on an empty list, then using a do-while loop to perform the first step without a check. There is another subtle issue -- recall that the data member
m_node
points to the last member of the list. So we need to reassign this to
a pointer to the first node, which is
m_node->m_next>m_next
. (Note that m_node->m_next
is the dummy sentinel node).
inline void push_front(const T&x); { insert (begin(),x); } inline void push_back(const T& x) { insert (end(),x); } inline void pop_front() { erase(begin()); } inline bool empty() { return m_node == m_node->m_next; }Push and pop functions admit trivial implementations, because of the lack of a need for special-casing in our new setting. Note that even though
end()
is "one past the end of the list", it is still a valid
iterator, and we can use it in a call to insert()
.
Note also that empty()
checks that the sentinel is its own
successor.
void erase (iterator x)The implementation of the erase() method suffers some awkwardness, but it's not that much worse than the dreaded "erase_after" method. First, it checks that we're not trying to erase the sentinel node.
if (x==end()) return;Then it checks to see if the last node on the list is being erased.
if (x.m_rep->m_next == m_node) m_node = x.m_rep;Since
m_node
needs to point to the last node on the list,
this variable needs to be reassigned if the last element of the list
is removed. If the iterator does reference the last element on the
list, its corresponding pointer, m_rep
points to the
element before that, thanks to our off-by-one iterator implementation.
So in this case, we assign the value of x.m_rep
to
m_node
.
The last few lines simply cut out the node referenced by the iterator :
Node* tmp = x.m_rep->m_next; x.m_rep->m_next = x.m_rep->m_next->m_next; delete tmp;
void insert (iterator x, const T& y)This is similar to our old
insert_after
function, except it
requires a check for the case where we insert before end()
.
In the case where we insert before end()
, the
newly inserted node becomes the last node in the list, so it's necessary to
assign the value m_node
accordingly.
void rotate(iterator)This rotates the list so that the iterator argument is at the front of the list. Rotation may seem like an operation that is "natural" for a circular list, but the only requirement for a constant time implementation is that both ends of the list be quickly accesible.
Without further ado, we present the full listing. The source can be downloaded here, as can the driver program
#include <iterator> template <class T> class List { struct Node { Node(const T& x,Node* y = 0):m_data(x),m_next(y){} T m_data; Node* m_next; }; Node* m_node; public: class iterator : public std::iterator<std::forward_iterator_tag, T> { Node* m_rep; public: friend class List; friend class const_iterator; typedef T value_type; typedef T& reference; typedef T* pointer; typedef int difference_type; typedef std::forward_iterator_tag iterator_category; inline iterator(Node* x=0):m_rep(x){} inline iterator(const iterator& x):m_rep(x.m_rep) {} inline iterator& operator=(const iterator& x) { m_rep=x.m_rep; return *this; } inline iterator& operator++() { m_rep = m_rep->m_next; return *this; } inline iterator operator++(int) { iterator tmp(*this); m_rep = m_rep->m_next; return tmp; } inline reference operator*() const { return m_rep->m_next->m_data; } inline pointer operator->() const { return m_rep->m_next; } inline bool operator==(const iterator& x) const { return m_rep == x.m_rep; } inline bool operator!=(const iterator& x) const { return m_rep != x.m_rep; } }; class const_iterator : public std::iterator<std::forward_iterator_tag, const T> { const Node* m_rep; public: friend class List; friend class iterator; inline const_iterator(const Node* x=0):m_rep(x){} inline const_iterator(const const_iterator& x):m_rep(x.m_rep) {} inline const_iterator(const iterator& x):m_rep(x.m_rep){} inline const_iterator& operator=(const const_iterator& x) { m_rep=x.m_rep; return *this; } inline const_iterator& operator=(const iterator& x) { m_rep=x.m_rep; return *this; } inline const_iterator& operator++() { m_rep = m_rep->m_next; return *this; } inline const_iterator operator++(int) { const_iterator tmp(*this); m_rep = m_rep->m_next; return tmp; } inline reference operator*() const { return m_rep->m_next->m_data; } inline pointer operator->() const { return m_rep->m_next; } inline bool operator==(const const_iterator& x) const { return m_rep == x.m_rep; } inline bool operator!=(const const_iterator& x) const { return m_rep != x.m_rep; } }; List() : m_node(new Node(T())) { m_node->m_next = m_node; } List(const List& L) : m_node(new Node(T())) { m_node->m_next=m_node; for ( const_iterator i = L.begin(); i!= L.end(); ++i ) push_front(*i); reverse(); } void reverse() { if (empty()) return; Node* new_m_node = m_node->m_next->m_next; Node* p = m_node; Node* i = m_node->m_next; Node* n; do { n = i->m_next; i->m_next = p; p = i; i = n; } while (p!=m_node); m_node = new_m_node; } void swap(List& x) { Node* tmp = m_node; m_node = x.m_node; x.m_node = tmp; } List& operator=(const List& x) { List tmp(x); swap(tmp); return *this; } ~List() { clear(); delete m_node; } void clear() { while (!empty()) pop_front(); } inline void push_front(const T&x) { insert (begin(),x); } inline void push_back(const T& x) { insert (end(),x); } inline void pop_front() { erase(begin()); } inline bool empty() { return m_node == m_node->m_next; } inline T& front() { return *begin(); } inline const T& front() const { return *begin(); } inline T& back() { return m_node->data; } inline const T& back() const { return m_node->data; } inline iterator begin() { return iterator(m_node->m_next); } inline iterator end() { return iterator(m_node); } inline const_iterator begin() const { return const_iterator(m_node->m_next); } inline const_iterator end() const { return const_iterator(m_node); } void erase (iterator x) { if (x==end()) return; if (x.m_rep->m_next == m_node) m_node = x.m_rep; Node* tmp = x.m_rep->m_next; x.m_rep->m_next = x.m_rep->m_next->m_next; delete tmp; } void insert (iterator x, const T& y) { Node* tmp = new Node(y,x.m_rep->m_next); if (x.m_rep == m_node) m_node = tmp; x.m_rep->m_next = tmp; } // rotate x to beginning void rotate (iterator x) { if (x == end()) return; Node* sentinel = m_node->m_next; m_node->m_next = m_node->m_next->m_next; sentinel->m_next = x.m_rep->m_next; x.m_rep->m_next = sentinel; m_node = x.m_rep; } };
We present example code for
a circular list without a sentinel and a
driver program.
The main changes are that we need a lot of special-case code in methods
such as push_front(), push_back(),pop_front()
, and some
extra baggage for the iterators. The up()
method is provided
to offer a means of handing the entire list as a range to an STL function.
For example, the range it, it.up()
will include the entire
list, while it, it
would obviously be an empty list.
On the other hand, the rotate()
function is somewhat simpler
in this version. The driver
code for this list can transparently handle ranges that straddle the end.
The main penalty is not so much in the increased iterator sizes (we
don't need to construct iterators very often anyway), but in the
increased bookkeeping overhead. The relative slowness of comparing and
incrementing iterators is a liability in cases where a lot of simple operations
need to be performed on large lists. On the other hand, if iteration speed
is not a potential performance bottleneck, these lists could prove useful.
They have the versatility of the sentinel based linked list, and lessmemory overhead.
Here, we compare three list implementations -- the circular, and basic
lists implemented above, and the SGI implementation of the STL
list
class.
The SGI list is a doubly linked list. This list has a characteristic
that sets it apart from singly linked lists -- it supports bidirectional
iteration.
Support for bidirectional iteration is an STL requirement, so a linked
list that meets the requirements for the STL list must be
doubly linked. As we will see, there is a pattern emerging in the
comparison -- there is an escalating tradeoff between minimising overhead
and maximising flexibility. The STL class aims to offer as much flexibility
as possible.
Operation/Feature | Simple list | Our Circular list | SGI STL list |
---|---|---|---|
Circular | No | Yes | Yes |
Doubly linked | No | No | Yes |
Sentinel | No | Yes | Yes |
Empty list overhead | One pointer | One pointer and one node | One pointer and one node |
Overhead per node | One pointer and data | One pointer and data | Two pointers and data |
Constant time insert at front | Yes | Yes | Yes |
Constant time append at back | No | Yes | Yes |
Constant time remove at front | Yes | Yes | Yes |
Constant time remove at back | No | No | Yes |
Access first element | Constant time | Constant time | Constant time |
Access last element | Linear time | Constant time | Constant time |
insert() | insert_after only | Yes | Yes |
erase() | erase_after only | Yes | Yes |
forward iteration | Yes | Yes | Yes |
reverse iteration | No | No | Yes |
Constant time rotate | No | Yes | Yes (using splice()) |
While we've covered a lot of issues, there are more issues that the reader should be aware of. Think of this section as an invitation to further exploration.
is an important consideration in library design, especially where templates are involved. There are three possible guarantees a class can make for a given method:
Exception safety is one of the most essential topics for library designers, especially when template code is involved. Further reading is highly recommended.
is a classical problem with implementing template classes. The problem
is that templates generate a certain amount of object code per template instance. So for example, if you create a
list<int>
,
list<float>
,
list<string>
, and a
list<double>
, then the amount of object code generated
is approximately 4 times as much as what you'd have if you only created
one of the types of lists.
The solution to this problem is to pull as much code as possible out of
the templates. It is possible to do this for containers of pointers, by
using specialisations:
template <class T> class List<T*> : private List<void*> { public: typedef List<void*> super; inline void push_back(const T* & x) { super::push_back(x); } inline T* front() { return (T*) super::front(); } ... };
This way, the overhead per class amounts to a small type-safety wrapper,
but the bulk of the implementation need only be compiled once.
The approach I would recommend to this is to use an STL list as a basis,
and use private inheritance.
One of the nice things about this approach is that you can choose to
only re-implement the member functions that you need, to minimise
the overhead. In this example, I only implement push_front
,
pop_front
and non-const iterators (conventional, as
well as back/front inserters).
Most of the member functions are very simple to implement, one simply
forwards the call to the base class.
This section is not intended to be a tutorial so much as it is an
introduction to this template bloat issue (even so, it's as detailed
as anything you're going to find in a book).
However, in this instance, an example is in itself very instructive.
Here is some example code (download) that
illustrates the basic approach.
One can test the effectiveness of the example, by comparing the
size of object
files generated by creating several instances of PtrList
parametrised by different types, and do the same for the
STL list class. One can explicitly request instances with the
following syntax:
template class list<int>
The only compiler I know of that uses the
derive-from-void*
idiom presented here is Metrowerks Code
Warrior.
template <class T> class PtrList : private std::list<void*> { typedef std::list<void*> base_class; public: inline void push_back (T* x) { base_class::push_back(x); } inline void push_front (T* x) { base_class::push_front(x); } friend class back_insert_iterator<PtrList>; friend class front_insert_iterator<PtrList>; class iterator : private base_class::iterator { public: friend class PtrList; inline iterator& operator++() { base_class::iterator::operator++(); return *this; } inline iterator operator++(int x) { return base_class::iterator::operator++(x); } inline iterator& operator--() { base_class::iterator::operator--(); return *this; } inline iterator operator--(int x) { return base_class::iterator::operator--(x); } inline bool operator==(const iterator& right) const { return base_class::iterator::operator==(right); } inline T* operator*() { return (T*)base_class::iterator::operator*(); } private: inline iterator ( const base_class::iterator & x) : base_class::iterator(x) {} }; iterator begin() { return iterator(base_class::begin()); } iterator end() { return iterator(base_class::end()); } }; template<class T> class back_insert_iterator<PtrList<T> > : private back_insert_iterator<std::list<void*> > { typedef back_insert_iterator<std::list<void*> > back_inserter_base; public: back_insert_iterator(PtrList<T> & x) : back_inserter_base(x) {} back_insert_iterator& operator=(T* x) { back_inserter_base::operator=(x); return *this; } back_insert_iterator& operator*(){ return *this; } back_insert_iterator& operator++(){ return *this; } back_insert_iterator operator++(int){ return *this; } }; template<class T> class front_insert_iterator<PtrList<T> > : private front_insert_iterator<std::list<void*> > { typedef front_insert_iterator<std::list<void*> > front_inserter_base; public: front_insert_iterator(PtrList<T> & x) : front_inserter_base(x) {} front_insert_iterator& operator=(T* x) { front_inserter_base::operator=(x); return *this; } front_insert_iterator& operator*(){ return *this; } front_insert_iterator& operator++(){ return *this; } front_insert_iterator operator++(int){ return *this; } };
There are many textbooks that cover linked lists, and they reflect different approaches. Unfortunately, the vast majority of them do a fairly poor job at it. Of these titles, the only one that provides much insight to beginners on the topic of implementing data structures is the Weiss book. The Koenig/Moo book is also good, but it addresses a more advanced audience.
Of these books, I'd enthusiastically recommend Cormen et al as a pure theory book for Data Structures. I believe theory should be studied separately from implementation, it is complex and rich enough in its own right to deserve special attention. On the other hand, sticky implementation details are important, and this is where the books by Weiss prove valuable. The Koenig and Moo book also does a good job at addressing many library implementation and generic programming issues, but it is not a data structures book (the main theme is the related topic of library design)
std::cout