Javier Casas

A random walk through computer science

Understanding balanced binary trees

The balanced binary tree is one of these data structures that deserve some time for unpacking and understanding. It is often not properly explained, even though it's a very interesting construct. Especially the "balanced" part. Yes, if you have a properly balanced tree, finding elements is a guaranteed O(n * log n). But how do we end with a properly balanced tree?

All this comes from the paper "Functional Pearls Efficient sets — a balancing act":

Adams, S. (1993). Functional Pearls Efficient sets—a balancing act. Journal of Functional Programming, 3(4), 553-561. doi:10.1017/S0956796800000885

But first, the basics.

The binary tree

There are many styles of trees, but we will focus on a tree with these characteristics:

  • Binary: Each branch will have two sub-trees, unlike quadtrees, octrees, B-trees and others.
  • Store the elements in the branches, not the leaves: The leaves will be empty, and contain no elements. Instead, the elements go in the branches.
  • Ordering: we will store elements that can be properly ordered, so there will be a lt(a, b) (less than) predicate that represents some type of well order.

    • Every pair of elements can be compared with the predicate
    • If a < b, then it's not the case that b < a
    • If a < b and b < c, then a < c
    • Finally, if not a < b and not b < a, then a = b
    • It doesn't have to be a meaningful order, it just has to be a well order.
  • Ordered binary tree invariant: Given a branch with stored element a, all the elements in the left sub-branch will be smaller (I/E they will satisfy lt(X, a)), and all the elements in the right sub-branch will be greater (I/E they will satisfy lt(a, X)).

    • There is no place for repeated elements. Each element can appear up to 1 time.
  • Balanced binary tree invariant: Annotate each branch with the amount of elements the subtree contains: A tree is unbalanced when one of the sub-branches has significantly more elements than the other. We will figure out when it happens because each branch will also store an integer with the amount of elements contained in that branch.

    • We don't really want a perfectly balanced tree (with exactly O(n log2 n) access costs) because the balancing may be very expensive, but instead we want a tree that is not very unbalanced. So instead we will do so that the amount of elements in a branch can't be greater than K times the amount of elements in the other branch.

Finding elements

Finding an element in a binary tree is easy, as it's just traversing down starting from the root, on each branch taking the left or right sub-branches guided by the lt(a, b) comparison function. We reach a branch with the searched element, or an empty leaf, in at most steps as the height of the tree. And the height of the tree is some factor of the logarithm of the number of elements in the tree, because we have been balancing the tree so that no sub-branch has too many elements in comparison to the other sub-branch.

Inserting elements

If the element doesn't exist in the tree, we insert it by replacing a leaf with a sub-branch, and rebalancing as required. The cost is the same as finding an element (O(n log n)) plus required rebalancing on the elements we traverse tryint to find the element (O(n log n) again).

Rebalancing

We have talked about the mythical balanced tree, but we haven't explained how do we balance a tree. The literature talks about rotations, but that, in my opinion, is a bad name. A rotated tree becomes a sideways tree, not a balanced tree. Instead, let's talk about equivalent trees. First, two trees are equivalent if:

  • They contain the same elements.
  • But the elements are organized in different sub-branches. Otherwise, they would be equal
  • Given an unbalanced tree, we can transition it through multiple steps, on each step going to an equivalent tree, until we reach a balanced tree. We can also go backwards to get an unbalanced tree, the equivalence relation allows us. But we don't want that, unbalanced trees are slow.

So, what we do is simple:

  • We start with an empty tree. It is balanced by default because it has no elements yet.
  • When we insert or remove elements, we also balance the tree as we go, so as to never reach the situation where the tree is too unbalanced

With this in place, we can start understanding tree rotations.

Single rotations

Single rotations are designing to shift one element from a sub-branch to the other sub-branch.

Single rotation

I don't know if you are able to see a rotation there, but I definitely can't. But, we can explain what's happening via tree equivalences. Let's look at the left tree. First we mark the b branch , and write down the tree invariant:

Right-biased tree

All the elements to the left of a branch are smaller, all the elements to the right are greater. Therefore:

X < a < bbranch

And now we do the same for the b branch:

Y < b < Z

Finally, we substitute in the general tree:

X < a < Y < b < Z

Let's do the same for the right tree:

Right-biased tree

All the elements to the left of a branch are smaller, all the elements to the right are greater. Therefore:

abranch < b < Z

And now we do the same for the a branch:

X < a < Y

Finally, we substitute in the general tree:

X < a < Y < b < Z

We arrived to the same conclusion. These two trees are equivalent. They hold the same elements, but in different positions in the tree. So we can go from one to the other without losing or adding elements, and while maintaining the tree ordering invariant.

All this becomes more clear if we imagine it as if we are pulling up from an element of the less-than relation:

Dragging up elements of the tree

Double rotations

Double rotations reduce the depth of the tree.

Double rotation

If you think single rotations don't make sense because of not-sideway-trees, double rotations should look like crazy hallucinations. But it all starts to make sense if you view it as tree equivalences. The left tree is:

X < a < cbranch

The c branch is:

bbranch < c < Z

The b branch is:

Y1 < b < Y2

And substituting upwards:

X < a < Y1 < b < Y2 < C < Z

And, the right tree has exactly the same composition (what a surprise!).

In fact, if we go to the "pulling up" ideas, we find there are 6 equivalent trees, and we should be able to rotate between them at leisure:

6 Double rotation equivalences

Two of them are equivalent (the middle ones), but the others are a mix of single-rotations with extra sub-trees, and proper double rotations.

Implementation

Going back to the basics, we have a data structure with two possible values: leaf or branch.

data Tree a = Leaf
            | Branch
              { element :: a
              , height :: Int
              , left :: (Tree a)
              , right :: (Tree a)
              }

If your programming language has pattern matching, rotations become trivial. Just pattern match on the tree structure, and reconstruct with a smart constructor:

-- smart constructor, automatically calculates the resulting height
mkBranch :: Tree a -> a -> Tree a -> Tree a
mkBranch l a r = Branch a (height l + height r + 1) l r

-- Single rotation to the right
singleR :: Tree a -> Tree a
singleR (Branch a _ x (Branch b _ y z)) = mkBranch (mkBranch x a y) b z

-- Single rotation to the left
singleL :: Tree a -> Tree a
singleL (Branch a _ (Branch b _ x y) z) = mkBranch x b (mkBranch y a z)

-- Double rotation to the right
doubleR :: Tree a -> Tree a
doubleR (Branch a _ x (Branch c _ (Branch b _ y1 y2)) z) = mkBranch (mkBranch x a y1) b (mkBranch y2 z)

-- Double rotation to the left
doubleL :: Tree a -> Tree a
doubleL (Branch a _ (Branch c _ x (Branch b _ y1 y2)) z) = mkBranch (mkBranch x a y1) b (mkBranch y2 z)

Now draw the rest of the owl

Rotations is just a part of the data structure. There are many more parts. You can check a basic implementation in ML in the paper (with some empty bits left as homework). Adams, S. (1993). Functional Pearls Efficient sets—a balancing act. Journal of Functional Programming, 3(4), 553-561. doi:10.1017/S0956796800000885

You can also check my implementation in Prolog at https://github.com/javcasas/balanced-binary-tree.

Back to index
Copyright © 2018-2023 Javier Casas - All rights reserved.