Serialize and deserialize different data structures

7 min readJun 17, 2024

Most of the data structures we study are in-memory construct i.e. how to represent in memory data. These data are not persistent i.e. on machine restart we lose in-memory data. Therefore we need to write these data into persistent storage for example disk to retrieve it.

In this post, we will go through how to serialize different data structures so that they can be written on disk. How to read data from disk to build the data structure i.e. how to deserialize.

There are various formats to write data on disk

string
binary
proto buffer

For simplicity, we will use string format to serialize data on disk.

Serialize and deserialize string array

https://leetcode.com/problems/encode-and-decode-strings/

Approach #1: Encode strings using a separator

Given an array of strings for example ["abcd","pqrs","uvw"]
We can choose a delimiter ,to convert into a single string as "abcd,pqrs,uvw
While decoding the same string "abcd,pqrs,uvw" we can split based on delimiter , and convert into ["abcd","pqrs","uvw"]

Pros

Easy to implement

Cons

The algorithm breaks if the input array has the delimiter as well.
For example ["apple, orange and mango", "are","healthy", "food"] . It will be encoded as "apple, orange and mango,are,healthy,food" but decoded as ["apple"," orange and mango", "are","healthy", "food"] which is incorrect.

Optimization

Instead of an ASCII separator, we use non-ASCII characters as a delimiter.
For example, we can choose π

Drawback

While the non-ASCII delimiter approach can work well for many applications, it assumes that the delimiter character will not appear in the strings to be encoded.
However, in many practical situations, we cannot make this assumption.

Approach #2: Escape characters

Choose /: as delimiter i.e. special character.
Choose / as an escape character.
We can denote that any delimiter following the escape character should be treated as a normal character instead of a delimiter.
For example Wor/:ld , after escaping using the escape character / it would be converted into Wor//:ld .
It means the encoder/decoder will treat /: in this context as a normal character instead of a delimiter.

Let’s understand by example.

For example [“Hello”, “World/:”, “How/are you?”], after escaping with /will be first converted into [“Hello”, “World//:”, “How//are you?”] .
Then encoded using delimiter /: , it would be converted into “Hello/:World//:/:How//are you?"
To decode the string, we process each char and do the following.

a. Initialize ans=[]

b. Initialize token=[] to hold the processed character.

c. If a character is / then read one more character.

d. If the next character is also / it means we found // which means the original string had / which was escaped into // . We will simply add / to the token and move to the next char.

e. But if the next character is : , it means we found /: which is a delimiter. It means we reached to end of the current token. Convert the token into a string and add it into ans and also empty the token to hold the next word.

f. This will give back the original array [“Hello”, “World/:”, “How/are you?”]

Approach #3: Chunked transfer encoding

Instead of just joining all the strings together with a delimiter, we would precede each string with its length, followed by a delimiter, and then the string itself.
This way, even if our string contains the delimiter, we can correctly identify the string boundaries.
When we decode our encoded string, we know that the first item before the delimiter is the length of the string.

Let’s understand it by example [“Hello”, “World”, “/:Example/:”] . Let’s use /: as delimiter. Applying the above logic, we can encode it as 5/:Hello5/:World11/:/:Example/:

To decode it, we will start scanning serialized strings from left to right.

We read until we encounter /:, which gives us 5 . This tells us that the length of our first string is 5. So, we read the next 5 characters to get “Hello”
Next, we again read until /: to get 5, indicating that our next string is of length 5. Reading the next 5 characters give us “World”.
Finally, reading until the next /: gives us 11. Reading the next 11 characters give us “/:Example/:”.
This way we get the original array [“Hello”, “World”, “/:Example/:”]

297. Serialize and Deserialize Binary Tree

https://leetcode.com/problems/serialize-and-deserialize-binary-tree/description/

There are four standard ways to traverse a binary tree.

BFS
DFS preorder
DFS inorder
DFS postorder

DFS preorder can intuitively store the structure of a tree structure as below.

We can write code as below using # to represent null node. Also, use ,to separate the nodes.


/**
 * Encodes a tree to a single string.
 *
 * @param {TreeNode} root
 * @return {string}
 */
var serialize = function (root) {
    const ans = [];
    // preorder dfs
    const dfs = node => {
        if (!node) {
            // reresent null node as #
            ans.push('#');
            return;
        }
        ans.push(node.val); // pre-order processing of node
        dfs(node.left);
        dfs(node.right);

    }
    dfs(root);
    return ans.join(',');
};

/**
 * Decodes your encoded data to tree.
 *
 * @param {string} data
 * @return {TreeNode}
 */
var deserialize = function (data) {
    if (!data) {
        return null;
    }
    const preorder = data.split(',');
    const recursion = (nodes) => {
        // if empty preorder
        if (nodes.length === 0) {
            return null;
        }
        const val = nodes.shift(); // Use linked list for O(1) operation

        // if # then return null
        if (val === '#') {
            return null;
        }
        const node = new TreeNode(Number.parseInt(val, 10));
        node.left = recursion(nodes);
        node.right = recursion(nodes);
        return node;
    }
    return recursion(preorder);
};

Serialize and Deserialize n-ary Tree

Given an N-ary tree where every node has the most N children. How to serialize and deserialize it?

We can apply a similar approach as a binary tree here.

In an N-ary tree, there are no designated left and right children. An N-ary tree is represented by storing an array or list of child pointers with every node.

The idea is to apply preorder traversal and store an ‘end of children’ marker with every node. The following diagram shows serialization where ‘#’ is used as the end of the children’s marker. Use , to separate node value.

It would be serialized as A,B,E,#,F,K,#,#,#,C,#,D,G,#,H,#,I,#,J,#,#,# . Following is the code to implement it.

const serialize = root => {
  const ans = [];
  
  const dfs = node => {
    // Base case: null node
    if(!root) {
      return '';
    }
    // process node
    ans.push(root.val);
    // scan each child
    for (const child of root.children) {
      dfs(child);
    }
    // Marker at the end of child
    ans.push('#');
  }
  dfs(root);
  return ans.join(',');
}

const deserialize = serialized => {
  const preorder = serialized.split(',');
  const recursion = nodes => {
    // Base case: empty nodes return null
    if(nodes.length ==0) {
      return null;
    }
    const val = nodes.shift();
    // Marker # represent end of child
    if(val === '#') {
      return null;
    }
    // Create the node with this item 
    const node = new TreeNode(val);
    // Recurse for child until end marker # return null
    while(true) {
      const child = recursion(nodes);
      if(!child) {
        break;
      }
      node.children.push(child);
    }
    return node;
  }
  return recursion(preorder);
}

449. Serialize and Deserialize BST

https://leetcode.com/problems/serialize-and-deserialize-bst/description/

Approach #1: Postorder traversal

If we ignore the BST nature then we can apply the above-discussed approach to serialize and deserialize trees.

However, we can utilize the following nature of BST to optimize the overall algorithm.

BST can be constructed based on either preorder or postorder (as inorder can be found by sorting preorder/postorder).
So the question now becomes a simple preorder/postorder traversal and BST tree construction.

/**
 * Encodes a tree to a single string.
 *
 * @param {TreeNode} root
 * @return {string}
 */
var serialize = function (root) {
    if (!root) {
        return '';
    }
    const ans = [];
    const dfs = node => {
        if (!node) {
            return;
        }
        // preorder traversal
        ans.push(node.val);
        dfs(node.left);
        dfs(node.right);
    }
    dfs(root);
    return ans.join(' '); // white space takes lesser space compare to comma. 
};

/**
 * Decodes your encoded data to tree.
 *
 * @param {string} data
 * @return {TreeNode}
 */
var deserialize = function (data) {
    if (!data) {
        return null;
    }
    const preorder = data.split(' ').map(a => Number.parseInt(a, 10));
    const inorder = [...preorder];
    inorder.sort((a, b) => a - b);
    const inorderMap = inorder.reduce((map, val, i) => { map.set(val, i); return map; }, new Map())
    const bstConstruction = (pLeft, pRight, iLeft, iRight) => {
        // if empty node
        if (pLeft > pRight) {
            return null;
        }

        const val = preorder[pLeft];
        // first element of preorder is root
        const root = new TreeNode(val);

        // get index of root in inorder
        const rIndex = inorderMap.get(val);
        const leftDistance = rIndex - iLeft;
        const rightDistance = iRight - rIndex;
        // process left
        root.left = bstConstruction(pLeft + 1, pLeft + leftDistance, iLeft, rIndex - 1);
        // process right
        root.right = bstConstruction(pLeft + leftDistance + 1, pRight, rIndex + 1, iRight);

        return root;
    }
    // reconstruct bst from preorder and inorder
    const n = preorder.length;
    return bstConstruction(0, n - 1, 0, n - 1);

};

/**
 * Your functions will be called as such:
 * deserialize(serialize(root));
 */

Trie Serialization

One simple approach would be to use a modified n-array approach to treat null values as special characters.

Another approach is to use serialized index#char .

Happy coding :-)