Start of Tutorial > Start of Trail > Start of Lesson |
Search
Feedback Form |
ASet
is aCollection
that cannot contain duplicate elements.Set
models the mathematical set abstraction. TheSet
interface extendsCollection
and contains no methods other than those inherited fromCollection
. It adds the restriction that duplicate elements are prohibited.Set
also adds a stronger contract on the behavior of theequals
andhashCode
operations, allowingSet
objects with different implementation types to be compared meaningfully. TwoSet
objects are equal if they contain the same elements.The
Set
interface is shown below:The JDK contains two general-purposepublic interface Set { // Basic Operations int size(); boolean isEmpty(); boolean contains(Object element); boolean add(Object element); // Optional boolean remove(Object element); // Optional Iterator iterator(); // Bulk Operations boolean containsAll(Collection c); boolean addAll(Collection c); // Optional boolean removeAll(Collection c); // Optional boolean retainAll(Collection c); // Optional void clear(); // Optional // Array Operations Object[] toArray(); Object[] toArray(Object a[]); }Set
implementations.HashSet
, which stores its elements in a hash table, is the best-performing implementation.TreeSet
, which stores its elements in a red-black tree, guarantees the order of iteration. For more information on implementations, see the Implementations lesson.Here's a simple but useful
Set
idiom. Suppose you have aCollection
,c
, and you want to create anotherCollection
containing the same elements, but with all duplicates eliminated. The following one-liner does the trick:It works by creating aCollection noDups = new HashSet(c);Set
(which, by definition, cannot contain duplicates) initially containing all the elements inc
. It uses the "standardCollection
constructor" described in theCollection
interface lesson.
Thesize
operation returns the number of elements in theSet
(its cardinality). TheisEmpty
method does exactly what you think it does. Theadd
method adds the specified element to theSet
if it's not already present, and returns a boolean indicating whether the element was added. Similarly, theremove
method removes the specified element from theSet
if it's present, and returns a boolean indicating whether the element was present. Theiterator
method returns anIterator
over theSet
.Here's
a little program
that takes the words in its argument list and prints out any duplicate words, the number of distinct words, and a list of the words with duplicates eliminated:Now let's run the program:import java.util.*; public class FindDups { public static void main(String args[]) { Set s = new HashSet(); for (int i=0; i<args.length; i++) if (!s.add(args[i])) System.out.println("Duplicate detected: "+args[i]); System.out.println(s.size()+" distinct words detected: "+s); } }Note that the example code always refers to the collection by its interface type (% java FindDups i came i saw i left Duplicate detected: i Duplicate detected: i 4 distinct words detected: [came, left, saw, i]Set
), rather than by its implementation type (HashSet
). This is a strongly recommended programming practice, as it gives you the flexibility to change implementations merely by changing the constructor. If the variables used to store a collection, or the parameters used to pass it around, are declared to be of the collection's implementation type rather than its interface type, then all such variables and parameters must be changed to change the collection's implementation type. Furthermore, there's no guarantee that the resulting program will work; if the program uses any non-standard operations that are present in the original implementation type but not the new one, the program will fail. Referring to collections only by their interface keeps you honest, in the sense that it prevents you from using any non-standard operations.The implementation type of the
Set
in the example above isHashSet
, which makes no guarantees as to the order of the elements in theSet
. If you want the program to print the word list in alphabetical order, all you have to do is to change the set's implementation type fromHashSet
toTreeSet
. Making this trivial one-line change causes the command line in the previous example to generate the following output:% java FindDups i came i saw i left Duplicate word detected: i Duplicate word detected: i 4 distinct words detected: [came, i, left, saw]
The bulk operations are particularly well suited toSets
: they perform standard set-algebraic operations. Supposes1
ands2
areSets
. Here's what the bulk operations do:To calculate the union, intersection, or set difference of two sets non-destructively (without modifying either set), the caller must copy one set before calling the appropriate bulk operation. The resulting idioms are shown below:
s1.containsAll(s2)
: Returnstrue
ifs2
is a subset ofs1
. (For example, sets1
is a subset ofs2
if sets2
contains all the elements ins1
.)s1.addAll(s2)
: Transformss1
into the union ofs1
ands2
. (The union of two sets is the set containing all the elements contained in either set.)s1.retainAll(s2)
: Transformss1
into the intersection ofs1
ands2
. (The intersection of two sets is the set containing only the elements that are common in both sets.)s1.removeAll(s2)
: Transformss1
into the (asymmetric) set difference ofs1
ands2
. (For example, the set difference ofs1
-s2
is the set containing all the elements found ins1
but not ins2
.)The implementation type of the resultSet union = new HashSet(s1); union.addAll(s2); Set intersection = new HashSet(s1); intersection.retainAll(s2); Set difference = new HashSet(s1); difference.removeAll(s2);Set
in the above idioms isHashSet
, which is, as mentioned above, the best all-aroundSet
implementation in the JDK. However, any general-purposeSet
implementation could be substituted.Let's revisit the
FindDups
example program above. Suppose you want to know which words in the argument list occur only once and which occur more than once, but you do not want any duplicates printed out repeatedly. This effect can be achieved by generating two sets, one containing every word in the argument list, and the other containing only the duplicates. The words that occur only once are the set difference of these two sets, which we know how to compute. Here's howthe resulting program
looks:Now let's run the revised program with the same same argument list we used before:import java.util.*; public class FindDups2 { public static void main(String args[]) { Set uniques = new HashSet(); Set dups = new HashSet(); for (int i=0; i<args.length; i++) if (!uniques.add(args[i])) dups.add(args[i]); uniques.removeAll(dups); // Destructive set-difference System.out.println("Unique words: " + uniques); System.out.println("Duplicate words: " + dups); } }A less common set-algebraic operation is the symmetric set difference: the set of elements contained in either of two specified sets, but not contained in both of them. The following code calculates the symmetric set difference of two sets non-destructively:% java FindDups2 i came i saw i left Unique words: [came, left, saw] Duplicate words: [i]Set symmetricDiff = new HashSet(s1); symmetricDiff.addAll(s2); Set tmp = new HashSet(s1); tmp.retainAll(s2); symmetricDiff.removeAll(tmp);
The array operations don't do anything special forSets
beyond what they do for any otherCollection
. They are described in theCollection
interface lesson.
Start of Tutorial > Start of Trail > Start of Lesson |
Search
Feedback Form |