Lesson: Introduction To Collections: What Is A Collections Framework?
Lesson: Introduction To Collections: What Is A Collections Framework?
Lesson: Introduction To Collections: What Is A Collections Framework?
If you've used the Java programming language — or just about any other programming language
— you're already familiar with collections. Collection implementations in earlier (pre-1.2)
versions of the Java platform included Vector, Hashtable, and array. However, those earlier
versions did not contain a collections framework.
Interfaces: These are abstract data types that represent collections. Interfaces allow
collections to be manipulated independently of the details of their representation. In
object-oriented languages, interfaces generally form a hierarchy.
Implementations: These are the concrete implementations of the collection interfaces. In
essence, they are reusable data structures.
Algorithms: These are the methods that perform useful computations, such as searching
and sorting, on objects that implement collection interfaces. The algorithms are said to be
polymorphic: that is, the same method can be used on many different implementations of
the appropriate collection interface. In essence, algorithms are reusable functionality.
Apart from the Java Collections Framework, the best-known examples of collections frameworks
are the C++ Standard Template Library (STL) and Smalltalk's collection hierarchy. Historically,
collections frameworks have been quite complex, which gave them a reputation for having a
steep learning curve. We believe that the Java Collections Framework breaks with this tradition,
as you will learn for yourself in this chapter.
Reduces programming effort: By providing useful data structures and algorithms, the
Collections Framework frees you to concentrate on the important parts of your program
rather than on the low-level "plumbing" required to make it work. By facilitating
interoperability among unrelated APIs, the Java Collections Framework frees you from
writing adapter objects or conversion code to connect APIs.
Increases program speed and quality: This Collections Framework provides high-
performance, high-quality implementations of useful data structures and algorithms. The
various implementations of each interface are interchangeable, so programs can be easily
tuned by switching collection implementations. Because you're freed from the drudgery
of writing your own data structures, you'll have more time to devote to improving
programs' quality and performance.
Allows interoperability among unrelated APIs: The collection interfaces are the
vernacular by which APIs pass collections back and forth. If my network administration
API furnishes a collection of node names and if your GUI toolkit expects a collection of
column headings, our APIs will interoperate seamlessly, even though they were written
independently.
Reduces effort to learn and to use new APIs: Many APIs naturally take collections on
input and furnish them as output. In the past, each such API had a small sub-API devoted
to manipulating its collections. There was little consistency among these ad hoc
collections sub-APIs, so you had to learn each one from scratch, and it was easy to make
mistakes when using them. With the advent of standard collection interfaces, the problem
went away.
Reduces effort to design new APIs: This is the flip side of the previous advantage.
Designers and implementers don't have to reinvent the wheel each time they create an
API that relies on collections; instead, they can use standard collection interfaces.
Fosters software reuse: New data structures that conform to the standard collection
interfaces are by nature reusable. The same goes for new algorithms that operate on
objects that implement these interfaces.
Interfaces
Object Ordering
Summary of Interfaces
A Set is a special kind of Collection, a SortedSet is a special kind of Set, and so forth. Note also
that the hierarchy consists of two distinct trees — a Map is not a true Collection.
Note that all the core collection interfaces are generic. For example, this is the declaration of the
Collection interface.
The <E> syntax tells you that the interface is generic. When you declare a Collection instance
you can and should specify the type of object contained in the collection. Specifying the type
allows the compiler to verify (at compile-time) that the type of object you put into the collection
is correct, thus reducing errors at runtime. For information on generic types, see the Generics
lesson.
When you understand how to use these interfaces, you will know most of what there is to know
about the Java Collections Framework. This chapter discusses general guidelines for effective
use of the interfaces, including when to use which interface. You'll also learn programming
idioms for each interface to help you get the most out of it.
To keep the number of core collection interfaces manageable, the Java platform doesn't provide
separate interfaces for each variant of each collection type. (Such variants might include
immutable, fixed-size, and append-only.) Instead, the modification operations in each interface
are designated optional — a given implementation may elect not to support all operations. If an
unsupported operation is invoked, a collection throws an UnsupportedOperationException.
Implementations are responsible for documenting which of the optional operations they support.
All of the Java platform's general-purpose implementations support all of the optional operations.
Queues typically, but do not necessarily, order elements in a FIFO (first-in, first-out)
manner. Among the exceptions are priority queues, which order elements according to a
supplied comparator or the elements' natural ordering. Whatever the ordering used, the
head of the queue is the element that would be removed by a call to remove or poll. In a
FIFO queue, all new elements are inserted at the tail of the queue. Other kinds of queues
may use different placement rules. Every Queue implementation must specify its ordering
properties. Also see The Queue Interface section.
Map — an object that maps keys to values. A Map cannot contain duplicate keys; each key can
map to at most one value. If you've used Hashtable, you're already familiar with the basics of
Map. Also see The Map Interface section.
The last two core collection interfaces are merely sorted versions of Set and Map:
SortedSet — a Set that maintains its elements in ascending order. Several additional
operations are provided to take advantage of the ordering. Sorted sets are used for naturally
ordered sets, such as word lists and membership rolls. Also see The SortedSet Interface section.
SortedMap — a Map that maintains its mappings in ascending key order. This is the Map analog
of SortedSet. Sorted maps are used for naturally ordered collections of key/value pairs, such
as dictionaries and telephone directories. Also see The SortedMap Interface section.
To understand how the sorted interfaces maintain the order of their elements, see the Object Ordering
section.
Implementations
Set Implementations
List Implementations
Map Implementations
Queue Implementations
Wrapper Implementations
Convenience Implementations
Summary of Implementations
Trail: Collections
« Previous • Trail • Next »
Lesson: Implementations
Implementations are the data objects used to store collections, which implement the interfaces
described in the Interfaces section.. This lesson describes the following kinds of implementations:
General-purpose implementations are the most commonly used implementations, designed for
everyday use. They are summarized in the table below.
Special-purpose implementations are designed for use in special situations and display
nonstandard performance characteristics, usage restrictions, or behavior.
Concurrent implementations are designed to support high concurrency, typically at the expense
of single-threaded performance. These implementations are part of the
java.util.concurrent package.
Wrapper implementations are used in combination with other types of implementations, often
the general-purpose ones, to provide added or restricted functionality.
Convenience implementations are mini-implementations, typically made available via static
factory methods, that provide convenient, efficient alternatives to general-purpose
implementations for special collections (for example, singleton sets).
Abstract implementations are skeletal implementations that facilitate the construction of
custom implementations — described later in the Custom Collection Implementations section.
An advanced topic, it's not particularly difficult, but relatively few people will need to do it.
Hash table Resizable array Tree Linked list Hash table + Linked list
Queue
As you can see from the table, the Java Collections Framework provides several general-purpose
implementations of the Set, List , and Map interfaces. In each case, one implementation —
HashSet, ArrayList, and HashMap — is clearly the one to use for most applications, all other
things being equal. Note that the SortedSet and the SortedMap interfaces do not have rows in
the table. Each of those interfaces has one implementation ( TreeSet and TreeMap) and is listed
in the Set and the Map rows. There are two general-purpose Queue implementations —
LinkedList, which is also a List implementation, and PriorityQueue, which is omitted from
the table. These two implementations provide very different semantics: LinkedList provides
FIFO semantics, while PriorityQueue orders its elements according to their values.
Each of the general-purpose implementations provides all optional operations contained in its
interface. All permit null elements, keys, and values. None are synchronized (thread-safe). All
have fail-fast iterators, which detect illegal concurrent modification during iteration and fail
quickly and cleanly rather than risking arbitrary, nondeterministic behavior at an undetermined
time in the future. All are Serializable and all support a public clone method.
The fact that these implementations are unsynchronized represents a break with the past: The
legacy collections Vector and Hashtable are synchronized. The present approach was taken
because collections are frequently used when the synchronization is of no benefit. Such uses
include single-threaded use, read-only use, and use as part of a larger data object that does its
own synchronization. In general, it is good API design practice not to make users pay for a
feature they don't use. Furthermore, unnecessary synchronization can result in deadlock under
certain circumstances.
If you need thread-safe collections, the synchronization wrappers, described in the Wrapper
Implementations section, allow any collection to be transformed into a synchronized collection.
Thus, synchronization is optional for general-purpose implementations, whereas it is mandatory
for legacy implementations. Moreover, the java.util.concurrent package provides
concurrent implementations of the BlockingQueue interface, which extends Queue, and of the
ConcurrentMap interface, which extends Map. These implementations offer much higher
concurrency than mere synchronized implementations.
As a rule, you should be thinking about the interfaces, not the implementations. That is why
there are no programming examples in this section. For the most part, the choice of
implementation affects only performance. The preferred style, as mentioned in the Interfaces
section, is to choose an implementation when a Collection is created and to immediately assign
the new collection to a variable of the corresponding interface type (or to pass the collection to a
method expecting an argument of the interface type). In this way, the program does not become
dependent on any added methods in a given implementation, leaving the programmer free to
change implementations anytime that it is warranted by performance concerns or behavioral
details.
The sections that follow briefly discuss the implementations. The performance of the
implementations is described using words such as constant-time, log, linear, n log(n), and
quadratic to refer to the asymptotic upper-bound on the time complexity of performing the
operation. All this is quite a mouthful, and it doesn't matter much if you don't know what it
means. If you're interested in knowing more, refer to any good algorithms textbook. One thing to
keep in mind is that this sort of performance metric has its limitations. Sometimes, the nominally
slower implementation may be faster. When in doubt, measure the performance!
Lesson: Algorithms
The polymorphic algorithms described here are pieces of reusable functionality provided by the Java
platform. All of them come from the Collections class, and all take the form of static methods whose
first argument is the collection on which the operation is to be performed. The great majority of the
algorithms provided by the Java platform operate on List instances, but a few of them operate on
arbitrary Collection instances. This section briefly describes the following algorithms:
Sorting
Shuffling
Routine Data Manipulation
Searching
Composition
Finding Extreme Values
Sorting
The sort algorithm reorders a List so that its elements are in ascending order according to an
ordering relationship. Two forms of the operation are provided. The simple form takes a List and sorts
it according to its elements' natural ordering. If you're unfamiliar with the concept of natural ordering,
read the Object Ordering section.
The sort operation uses a slightly optimized merge sort algorithm that is fast and stable:
Fast: It is guaranteed to run in n log(n) time and runs substantially faster on nearly sorted
lists. Empirical tests showed it to be as fast as a highly optimized quicksort. A quicksort is
generally considered to be faster than a merge sort but isn't stable and doesn't guarantee n
log(n) performance.
Stable: It doesn't reorder equal elements. This is important if you sort the same list repeatedly
on different attributes. If a user of a mail program sorts the inbox by mailing date and then sorts
it by sender, the user naturally expects that the now-contiguous list of messages from a given
sender will (still) be sorted by mailing date. This is guaranteed only if the second sort was stable.
The following trivial program prints out its arguments in lexicographic (alphabetical) order.
import java.util.*;
The second form of sort takes a Comparator in addition to a List and sorts the elements with
the Comparator. Suppose you want to print out the anagram groups from our earlier example in
reverse order of size — largest anagram group first. The example that follows shows you how to
achieve this with the help of the second form of the sort method.
Recall that the anagram groups are stored as values in a Map, in the form of List instances. The
revised printing code iterates through the Map's values view, putting every List that passes the
minimum-size test into a List of Lists. Then the code sorts this List, using a Comparator that
expects List instances, and implements reverse size-ordering. Finally, the code iterates through
the sorted List, printing its elements (the anagram groups). The following code replaces the
printing code at the end of the main method in the Anagrams example.
Shuffling
The shuffle algorithm does the opposite of what sort does, destroying any trace of order that may
have been present in a List. That is, this algorithm reorders the List based on input from a source of
randomness such that all possible permutations occur with equal likelihood, assuming a fair source of
randomness. This algorithm is useful in implementing games of chance. For example, it could be used to
shuffle a List of Card objects representing a deck. Also, it's useful for generating test cases.
This operation has two forms: one takes a List and uses a default source of randomness, and the
other requires the caller to provide a Random object to use as a source of randomness. The code
for this algorithm is used as an example in the List section.
Searching
The binarySearch algorithm searches for a specified element in a sorted List. This algorithm has two
forms. The first takes a List and an element to search for (the "search key"). This form assumes that
the List is sorted in ascending order according to the natural ordering of its elements. The second form
takes a Comparator in addition to the List and the search key, and assumes that the List is sorted
into ascending order according to the specified Comparator. The sort algorithm can be used to sort
the List prior to calling binarySearch.
The return value is the same for both forms. If the List contains the search key, its index is
returned. If not, the return value is (-(insertion point) - 1), where the insertion point is the
point at which the value would be inserted into the List, or the index of the first element greater
than the value or list.size() if all elements in the List are less than the specified value. This
admittedly ugly formula guarantees that the return value will be >= 0 if and only if the search
key is found. It's basically a hack to combine a boolean (found) and an integer (index) into a
single int return value.
The following idiom, usable with both forms of the binarySearch operation, looks for the
specified search key and inserts it at the appropriate position if it's not already present.
Composition
The frequency and disjoint algorithms test some aspect of the composition of one or more
Collections:
frequency — counts the number of times the specified element occurs in the specified
collection
disjoint — determines whether two Collections are disjoint; that is, whether they contain
no elements in common
The min and the max algorithms return, respectively, the minimum and maximum element contained in
a specified Collection. Both of these operations come in two forms. The simple form takes only a
Collection and returns the minimum (or maximum) element according to the elements' natural
ordering. The second form takes a Comparator in addition to the Collection and returns the
minimum (or maximum) element according to the specified Comparator.
The following list illustrates the sort of custom Collections you might want to implement. It is not
intended to be exhaustive:
Persistent: All of the built-in Collection implementations reside in main memory and vanish
when the program exits. If you want a collection that will still be present the next time the
program starts, you can implement it by building a veneer over an external database. Such a
collection might be concurrently accessible by multiple programs.
Application-specific: This is a very broad category. One example is an unmodifiable Map
containing real-time telemetry data. The keys could represent locations, and the values could be
read from sensors at these locations in response to the get operation.
High-performance, special-purpose: Many data structures take advantage of restricted usage to
offer better performance than is possible with general-purpose implementations. For instance,
consider a List containing long runs of identical element values. Such lists, which occur
frequently in text processing, can be run-length encoded — runs can be represented as a single
object containing the repeated element and the number of consecutive repetitions. This
example is interesting because it trades off two aspects of performance: It requires less space
but more time than an ArrayList.
High-performance, general-purpose: The Java Collections Framework's designers tried to
provide the best general-purpose implementations for each interface, but many, many data
structures could have been used, and new ones are invented every day. Maybe you can come
up with something faster!
Enhanced functionality: Suppose you need an efficient bag implementation (also known as a
multiset): a Collection that offers constant-time containment checks while allowing duplicate
elements. It's reasonably straightforward to implement such a collection atop a HashMap.
Convenience: You may want additional implementations that offer conveniences beyond those
offered by the Java platform. For instance, you may frequently need List instances
representing a contiguous range of Integers.
Adapter: Suppose you are using a legacy API that has its own ad hoc collections' API. You can
write an adapter implementation that permits these collections to operate in the Java
Collections Framework. An adapter implementation is a thin veneer that wraps objects of one
type and makes them behave like objects of another type by translating operations on the latter
type into operations on the former.
Writing a custom implementation is surprisingly easy. The Java Collections Framework provides abstract
implementations designed expressly to facilitate custom implementations. We'll start with the following
example of an implementation of Arrays.asList.
MyArrayList(T[] array) {
a = array;
}
Suppose you want to make the implementation a bit faster. The API documentation for abstract
implementations describes precisely how each method is implemented, so you'll know which
methods to override to get the performance you want. The preceding implementation's
performance is fine, but it can be improved a bit. In particular, the toArray method iterates over
the List, copying one element at a time. Given the internal representation, it's a lot faster and
more sensible just to clone the array.
1. Choose the appropriate abstract implementation class from the preceding list.
2. Provide implementations for all the class's abstract methods. If your custom collection is to be
modifiable, you'll have to override one or more of the concrete methods as well. The API
documentation for the abstract implementation class will tell you which methods to override.
3. Test and, if necessary, debug the implementation. You now have a working custom collection
implementation.
4. If you're concerned about performance, read the abstract implementation class's API
documentation for all the methods whose implementations you're inheriting. If any seem too
slow, override them. If you override any methods, be sure to measure the performance of the
method before and after the override. How much effort you put into tweaking performance
should be a function of how much use the implementation will get and how critical to
performance its use is. (Often this step is best omitted.)
Compatibility
The Java Collections Framework was designed to ensure complete interoperability between the core
collection interfaces and the types that were used to represent collections in the early versions of the
Java platform: Vector, Hashtable, array, and Enumeration. In this section, you'll learn how to
transform old collections to the Java Collections Framework collections and vice versa.
Upward Compatibility
Suppose that you're using an API that returns legacy collections in tandem with another API that
requires objects implementing the collection interfaces. To make the two APIs interoperate smoothly,
you'll have to transform the legacy collections into modern collections. Luckily, the Java Collections
Framework makes this easy.
Suppose the old API returns an array of objects and the new API requires a Collection. The
Collections Framework has a convenience implementation that allows an array of objects to be
viewed as a List. You use Arrays.asList to pass an array to any method requiring a
Collection or a List.
Enumeration e = oldMethod(arg);
newMethod(Collections.list(e));
Backward Compatibility
Suppose you're using an API that returns modern collections in tandem with another API that requires
you to pass in legacy collections. To make the two APIs interoperate smoothly, you have to transform
modern collections into old collections. Again, the Java Collections Framework makes this easy.
Suppose the new API returns a Collection, and the old API requires an array of Object. As
you're probably aware, the Collection interface contains a toArray method designed expressly
for this situation.
Collection c = newMethod();
oldMethod(c.toArray());
What if the old API requires an array of String (or another type) instead of an array of Object? You
just use the other form of toArray — the one that takes an array on input.
Collection c = newMethod();
oldMethod((String[]) c.toArray(new String[0]));
If the old API requires a Vector, the standard collection constructor comes in handy.
Collection c = newMethod();
oldMethod(new Vector(c));
The case where the old API requires a Hashtable is handled analogously.
Map m = newMethod();
oldMethod(new Hashtable(m));
Finally, what do you do if the old API requires an Enumeration? This case isn't common, but it does
happen from time to time, and the Collections.enumeration method was provided to handle it.
This is a static factory method that takes a Collection and returns an Enumeration over the
elements of the Collection.
Collection c = newMethod();
oldMethod(Collections.enumeration(c));
API Design
In this short but important section, you'll learn a few simple guidelines that will allow your API to
interoperate seamlessly with all other APIs that follow these guidelines. In essence, these rules define
what it takes to be a good "citizen" in the world of collections.
Parameters
If your API contains a method that requires a collection on input, it is of paramount importance that you
declare the relevant parameter type to be one of the collection interface types. Never use an
implementation type because this defeats the purpose of an interface-based Collections Framework,
which is to allow collections to be manipulated without regard to implementation details.
Further, you should always use the least-specific type that makes sense. For example, don't
require a List or a Set if a Collection would do. It's not that you should never require a List
or a Set on input; it is correct to do so if a method depends on a property of one of these
interfaces. For example, many of the algorithms provided by the Java platform require a List on
input because they depend on the fact that lists are ordered. As a general rule, however, the best
types to use on input are the most general: Collection and Map.
Caution: Never define your own ad hoc collection class and require objects of this class on input. By
doing this, you'd lose all the benefits provided by the Java Collections Framework.
Return Values
You can afford to be much more flexible with return values than with input parameters. It's fine to
return an object of any type that implements or extends one of the collection interfaces. This can be one
of the interfaces or a special-purpose type that extends or implements one of these interfaces.
For example, one could imagine an image-processing package, called ImageList, that returned
objects of a new class that implements List. In addition to the List operations, ImageList
could support any application-specific operations that seemed desirable. For example, it might
provide an indexImage operation that returned an image containing thumbnail images of each
graphic in the ImageList. It's critical to note that even if the API furnishes ImageList instances
on output, it should accept arbitrary Collection (or perhaps List) instances on input.
In one sense, return values should have the opposite behavior of input parameters: It's best to
return the most specific applicable collection interface rather than the most general. For example,
if you're sure that you'll always return a SortedMap, you should give the relevant method the
return type of SortedMap rather than Map. SortedMap instances are more time-consuming to
build than ordinary Map instances and are also more powerful. Given that your module has
already invested the time to build a SortedMap, it makes good sense to give the user access to its
increased power. Furthermore, the user will be able to pass the returned object to methods that
demand a SortedMap, as well as those that accept any Map.
Legacy APIs
There are currently plenty of APIs out there that define their own ad hoc collection types. While this is
unfortunate, it's a fact of life, given that there was no Collections Framework in the first two major
releases of the Java platform. Suppose you own one of these APIs; here's what you can do about it.
If possible, retrofit your legacy collection type to implement one of the standard collection
interfaces. Then all the collections you return will interoperate smoothly with other collection-
based APIs. If this is impossible (for example, because one or more of the preexisting type
signatures conflict with the standard collection interfaces), define an adapter class that wraps
one of your legacy collections objects, allowing it to function as a standard collection. (The
Adapter class is an example of a custom implementation.)
Retrofit your API with new calls that follow the input guidelines to accept objects of a standard
collection interface, if possible. Such calls can coexist with the calls that take the legacy
collection type. If this is impossible, provide a constructor or static factory for your legacy type
that takes an object of one of the standard interfaces and returns a legacy collection containing
the same elements (or mappings). Either of these approaches will allow users to pass arbitrary
collections into your API.