Trees are strings

Often, a great way to solve a problem is to transform it into something else. Natural language processing can be seen as a category theory problem. You can turn graph problems into linear algebra problems. You can look at chemistry through a group theory lens and at electrical circuits through a graph theory lens. Et cetera.

This post is about an elegant problem space transformation in Leetcode-style problems: tree comparison problems can be seen as string comparison problems.

Consider the problem of determining if two trees are the same (i.e., are structurally identical and their nodes have the same value), given their root nodes.

Here is the standard, depth first search way to do it in Python:

class Solution:
    def isSameTree(self, p: Optional[TreeNode], q: Optional[TreeNode]) -> bool:
        if p is None and q is None:
            return True

        if p is None or q is None:
            return False

        return (
            p.val == q.val and
            self.isSameTree(p.left, q.left) and
            self.isSameTree(p.right, q.right)
        )

This is a recursive solution. Here’s what’s going on:

If both root nodes are None, return True. Two trees that are both None are trivially identical.
If both root nodes are not simultaneously None but one of them is None (the condition p is None or q is None, which we evaluate only if p is None and q is None evaluates to False), return False. This is because if one of the trees is None but the other tree has actual stuff in it, the two trees are obviously not identical.
Okay, say neither of the root nodes is None. Then, return True if all of the following hold, else return False:
- the root nodes have the same value (p.val == q.val),
- the root nodes have identical left subtrees (self.isSameTree(p.left, q.left)), and
- the root nodes have identical right subtrees (self.isSameTree(p.right, q.right)).

However, it’s possible to transform the problem space, moving from the realm of tree comparison to that of string comparison. This will let us write a solution that is easier to reason about and therefore more maintainable. Dare I say, it’s also more elegant:

class Solution:
    def isSameTree(self, p: Optional[TreeNode], q: Optional[TreeNode]) -> bool:
        def serialize(root):
            if root is None:
                return "#"
            return f"{root.val} {serialize(root.left)} {serialize(root.right)}"
        return serialize(p) == serialize(q)

Here’s what’s going on:

The key idea is to convert a tree into a string representation that’s unique to it. Non-identical trees cannot have the same string representation. (In other words, we want a one-to-one mapping between trees and their string representations.) Then, instead of comparing trees, we compare their string representations.

The serialize() function needs to recursively convert each tree into a string representing its pre-order or post-order traversal. (For this example, we do pre-order.) We use whitespace to delimit nodes, and # to represent null nodes. So, the tree

   1
  /
 2

becomes 1 2 # # #, while the tree

becomes 1 2 4 # # 5 # # 3 # #.

Note that it has to be pre-order or post-order; in-order leaves the structure of the tree ambiguous. To see why, consider the in-order string # 2 # 1 #. It could represent both

   1
  /
 2

and

 2
  \
   1

The intuition is that in-order traversal flattens the tree, removing any information about the hierarchy of the nodes.

Where the serialization approach really shines is in the problem where you have to determine that a tree is a subtree of another.

Here’s the straightforward node-by-node comparison solution:

class Solution:
    def isSubtree(self, root: Optional[TreeNode], subRoot: Optional[TreeNode]) -> bool:
        if subRoot is None:
            return True
        if root is None:
            return False
        return (
            self.sameTree(root, subRoot) or
            self.isSubtree(root.left, subRoot) or
            self.isSubtree(root.right, subRoot)
        )

    def sameTree(self, root: Optional[TreeNode], subRoot: Optional[TreeNode]) -> bool:
        if root is None and subRoot is None:
            return True

        if root is None or subRoot is None:
            return False

        return (
            root.val == subRoot.val and
            self.sameTree(root.left, subRoot.left) and
            self.sameTree(root.right, subRoot.right)
        )

This is a long piece of code! Here’s what’s going on:

If the candidate subtree is None, immediately return True, because any tree (including a null one) contains a null subtree.
If the candidate subtree is not None but the candidate parent tree is, return False, because a null tree cannot contain a non-null subtree.
If both candidate subtree and parent tree are not None, return True if any of these conditions apply, else return False:
- the subtree is identical to the parent tree (evaluated using a sameTree helper function, inherited from the previous problem),
- the candidate subtree is a subtree of the left subtree of the candidate parent tree, or
- the candidate subtree is a subtree of the right subtree of the candidate parent tree.

There is an inelegant wastefulness in this. For each node of the parent tree, we must make as many comparisons as there are nodes in the subtree (because we call sameTree at every node of the parent tree).

On the other hand, behold the serialization solution:

class Solution:
    def isSubtree(self, root: Optional[TreeNode], subRoot: Optional[TreeNode]) -> bool:
        def serialize(root):
            if root is None:
                return "#"
            return f"{root.val} {serialize(root.left)} {serialize(root.right)}"
        return serialize(subRoot) in serialize(root)

Notice how short it is. It’s as many lines of code as the solution to the “same tree” problem from earlier!

I will note, though, that serialization is more elegant but is also asymptotically slower. The node-by-node comparison solution to the subtree problem is of time complexity $\mathcal{O}(m \cdot n)$, where $m$ and $n$ are the number of nodes in the parent tree and subtree respectively. In principle, the serialization solution is $\mathcal{O}(m + n)$ (faster than $\mathcal{O}(m \cdot n)$!): serializing the parent tree and subtree is $\mathcal{O}(m)$ and $\mathcal{O}(n)$ respectively (since each node is visited exactly once), which is dominated by the string search serialize(subRoot) in serialize(root) that’s $\mathcal{O}(m + n)$ (see the KMP algorithm). However, notice that we build a new string each time we call serialize(). Building a string is linear in the number of characters in the string. We build a new string for each node of the parent tree, giving us a time complexity of $\mathcal{O}(n^2)$. But, hey, what’s a factor of $\frac{n}{m}$ between friends.

In general, problems that involve coming up with a one-to-one mapping between certain objects and their string representations are fun to reason about. For example, the nontrivial solution to the (non-tree) problem Encode and Decode Strings is lovely. (The trivial solution, which Matthew Khoriaty pointed out to me, is amusing. Behold.)

class Solution:
    def encode(self, strs: List[str]) -> str:
        return f"""{strs}"""

    def decode(self, s: str) -> List[str]:
        return eval(s)

These tree-to-string problems are also worth checking out: Symmetric Tree, Path Sum, and, of course, Serialize and Deserialize N-ary Tree.