Overview

Sets are a collection of unique objects. In mathematics sets are do not change in size (static sets). However in computer science sets can change in size (dynamic sets).

Dynamic Sets

Dynamic sets support operations to add and remove items, among other common set operations such as unions, intersections, difference, checking for subsets, checking for disjoints.

Disjoint Sets

Disjoint sets (aka. union-find) are a specific type of dynamic set. Two sets are disjoint if they do not have any elements in common. Each set in a disjoint set is identified by a representative. This representative can, usually, be any member of the set.

e.g: Two disjoint sets

S1={2,7,9}S2={5,11,4,10}

Linked List Representation

The linked list representation uses nodes that have three attributes, a head pointer, a next pointer, and a value. Each value points to the head (representative) node, the next node, and the value of the current node, respectively. The last node in the linked list will be a null pointer.

The set itself contains an attribute that points to the head and tail elements. Again these are represented using pointers.

e.g. in C

struct Node {
  int value;
  Node *head; // Point to the representative
  Node *next; // Point to the next node (null if last node)
};

struct Set {
  Node *head; // Point to the representative
  Node *tail; // Point to the last node in the list
};

struct DisjointSet {
  Set *sets[]; // Collection of sets
};

Tree Representation

Here each node has one parent pointer which points to the parent node. If the node is the root, it will point to itself. Thus the root of the tree is the representative of the set.

e.g. in C

struct Node {
  int value;
  Node *parent; // Point to the representative
};

struct DisjointSet {
  Node *sets[]; // Collection of sets
};

Disjoint Set Operations

The following operations are commonly preformed to manipulate disjoint sets:

  1. MakeSet(x)
  2. Union(x, y)
  3. FindSet(x)

MakeSet

Linked List Running Time: Θ(1)
Tree Running Time: Θ(1)

MakeSet accepts an element (x) and creates a new set that contains only that element (x), as long as (x) does not exist in another set. This makes (x) the representative of the set.

In a linked list a new list is created with just one element.

FindSet

Linked List Running Time: Θ(1)
Tree Running Time: Θ(d) (d is the depth of the tree)

FindSet returns the representative of the set containing the element (x).

In a linked list it follows the pointer to the head node, Node headNode = *node.head.

In a tree we just follow the parent pointer until the parent is equal to itself. We can also augment find-set in a tree with path-compression. This updates the parent pointer such that we don't have to traverse the entire tree every time we search for the representative.

Union

Linked List Running Time: Θ(|S2|)

Union joins two sets (x) and (y). There are two types of unions, an simple and weighted union.

There are two types of unions for the linked list representation. The simple union appends the second element's (y) list to the first element's (x) list. A weighted union appends the shorter list to the longer list, which shortens the runtime as less pointers have to be updated.

For the tree representation there are also two types of unions. The simple union always points the root of the second tree (y) to the root of the first tree (x). A union by rank on the other hand makes the root of the tree with the smaller rank point to the root of the tree with large rank. The rank is defined as the upper bound of the height of a tree.

Reference

  1. Sun, Hongyang Disjoint set The University of Kansas 2024

Related