Serialize and deserialize different data structures

Dilip Kumar
7 min readJun 17, 2024

--

Most of the data structures we study are in-memory construct i.e. how to represent in memory data. These data are not persistent i.e. on machine restart we lose in-memory data. Therefore we need to write these data into persistent storage for example disk to retrieve it.

In this post, we will go through how to serialize different data structures so that they can be written on disk. How to read data from disk to build the data structure i.e. how to deserialize.

There are various formats to write data on disk

  1. string
  2. binary
  3. proto buffer

For simplicity, we will use string format to serialize data on disk.

Serialize and deserialize string array

https://leetcode.com/problems/encode-and-decode-strings/

Approach #1: Encode strings using a separator

  1. Given an array of strings for example ["abcd","pqrs","uvw"]
  2. We can choose a delimiter ,to convert into a single string as "abcd,pqrs,uvw
  3. While decoding the same string "abcd,pqrs,uvw" we can split based on delimiter , and convert into ["abcd","pqrs","uvw"]

Pros

  1. Easy to implement

Cons

  1. The algorithm breaks if the input array has the delimiter as well.
  2. For example ["apple, orange and mango", "are","healthy", "food"] . It will be encoded as "apple, orange and mango,are,healthy,food" but decoded as ["apple"," orange and mango", "are","healthy", "food"] which is incorrect.

Optimization

  1. Instead of an ASCII separator, we use non-ASCII characters as a delimiter.
  2. For example, we can choose π

Drawback

  1. While the non-ASCII delimiter approach can work well for many applications, it assumes that the delimiter character will not appear in the strings to be encoded.
  2. However, in many practical situations, we cannot make this assumption.

Approach #2: Escape characters

  1. Choose /: as delimiter i.e. special character.
  2. Choose / as an escape character.
  3. We can denote that any delimiter following the escape character should be treated as a normal character instead of a delimiter.
  4. For example Wor/:ld , after escaping using the escape character / it would be converted into Wor//:ld .
  5. It means the encoder/decoder will treat /: in this context as a normal character instead of a delimiter.

Let’s understand by example.

  1. For example [“Hello”, “World/:”, “How/are you?”], after escaping with /will be first converted into [“Hello”, “World//:”, “How//are you?”] .
  2. Then encoded using delimiter /: , it would be converted into “Hello/:World//:/:How//are you?"
  3. To decode the string, we process each char and do the following.

a. Initialize ans=[]

b. Initialize token=[] to hold the processed character.

c. If a character is / then read one more character.

d. If the next character is also / it means we found // which means the original string had / which was escaped into // . We will simply add / to the token and move to the next char.

e. But if the next character is : , it means we found /: which is a delimiter. It means we reached to end of the current token. Convert the token into a string and add it into ans and also empty the token to hold the next word.

f. This will give back the original array [“Hello”, “World/:”, “How/are you?”]

Approach #3: Chunked transfer encoding

  1. Instead of just joining all the strings together with a delimiter, we would precede each string with its length, followed by a delimiter, and then the string itself.
  2. This way, even if our string contains the delimiter, we can correctly identify the string boundaries.
  3. When we decode our encoded string, we know that the first item before the delimiter is the length of the string.

Let’s understand it by example [“Hello”, “World”, “/:Example/:”] . Let’s use /: as delimiter. Applying the above logic, we can encode it as 5/:Hello5/:World11/:/:Example/:

To decode it, we will start scanning serialized strings from left to right.

  1. We read until we encounter /:, which gives us 5 . This tells us that the length of our first string is 5. So, we read the next 5 characters to get “Hello”
  2. Next, we again read until /: to get 5, indicating that our next string is of length 5. Reading the next 5 characters give us “World”.
  3. Finally, reading until the next /: gives us 11. Reading the next 11 characters give us “/:Example/:”.
  4. This way we get the original array [“Hello”, “World”, “/:Example/:”]

297. Serialize and Deserialize Binary Tree

https://leetcode.com/problems/serialize-and-deserialize-binary-tree/description/

There are four standard ways to traverse a binary tree.

  1. BFS
  2. DFS preorder
  3. DFS inorder
  4. DFS postorder

DFS preorder can intuitively store the structure of a tree structure as below.

We can write code as below using # to represent null node. Also, use ,to separate the nodes.


/**
* Encodes a tree to a single string.
*
* @param {TreeNode} root
* @return {string}
*/
var serialize = function (root) {
const ans = [];
// preorder dfs
const dfs = node => {
if (!node) {
// reresent null node as #
ans.push('#');
return;
}
ans.push(node.val); // pre-order processing of node
dfs(node.left);
dfs(node.right);

}
dfs(root);
return ans.join(',');
};

/**
* Decodes your encoded data to tree.
*
* @param {string} data
* @return {TreeNode}
*/
var deserialize = function (data) {
if (!data) {
return null;
}
const preorder = data.split(',');
const recursion = (nodes) => {
// if empty preorder
if (nodes.length === 0) {
return null;
}
const val = nodes.shift(); // Use linked list for O(1) operation

// if # then return null
if (val === '#') {
return null;
}
const node = new TreeNode(Number.parseInt(val, 10));
node.left = recursion(nodes);
node.right = recursion(nodes);
return node;
}
return recursion(preorder);
};

Serialize and Deserialize n-ary Tree

Given an N-ary tree where every node has the most N children. How to serialize and deserialize it?

We can apply a similar approach as a binary tree here.

In an N-ary tree, there are no designated left and right children. An N-ary tree is represented by storing an array or list of child pointers with every node.

The idea is to apply preorder traversal and store an ‘end of children’ marker with every node. The following diagram shows serialization where ‘#’ is used as the end of the children’s marker. Use , to separate node value.

It would be serialized as A,B,E,#,F,K,#,#,#,C,#,D,G,#,H,#,I,#,J,#,#,# . Following is the code to implement it.

const serialize = root => {
const ans = [];

const dfs = node => {
// Base case: null node
if(!root) {
return '';
}
// process node
ans.push(root.val);
// scan each child
for (const child of root.children) {
dfs(child);
}
// Marker at the end of child
ans.push('#');
}
dfs(root);
return ans.join(',');
}

const deserialize = serialized => {
const preorder = serialized.split(',');
const recursion = nodes => {
// Base case: empty nodes return null
if(nodes.length ==0) {
return null;
}
const val = nodes.shift();
// Marker # represent end of child
if(val === '#') {
return null;
}
// Create the node with this item
const node = new TreeNode(val);
// Recurse for child until end marker # return null
while(true) {
const child = recursion(nodes);
if(!child) {
break;
}
node.children.push(child);
}
return node;
}
return recursion(preorder);
}

449. Serialize and Deserialize BST

https://leetcode.com/problems/serialize-and-deserialize-bst/description/

Approach #1: Postorder traversal

If we ignore the BST nature then we can apply the above-discussed approach to serialize and deserialize trees.

However, we can utilize the following nature of BST to optimize the overall algorithm.

  1. BST can be constructed based on either preorder or postorder (as inorder can be found by sorting preorder/postorder).
  2. So the question now becomes a simple preorder/postorder traversal and BST tree construction.
/**
* Encodes a tree to a single string.
*
* @param {TreeNode} root
* @return {string}
*/
var serialize = function (root) {
if (!root) {
return '';
}
const ans = [];
const dfs = node => {
if (!node) {
return;
}
// preorder traversal
ans.push(node.val);
dfs(node.left);
dfs(node.right);
}
dfs(root);
return ans.join(' '); // white space takes lesser space compare to comma.
};

/**
* Decodes your encoded data to tree.
*
* @param {string} data
* @return {TreeNode}
*/
var deserialize = function (data) {
if (!data) {
return null;
}
const preorder = data.split(' ').map(a => Number.parseInt(a, 10));
const inorder = [...preorder];
inorder.sort((a, b) => a - b);
const inorderMap = inorder.reduce((map, val, i) => { map.set(val, i); return map; }, new Map())
const bstConstruction = (pLeft, pRight, iLeft, iRight) => {
// if empty node
if (pLeft > pRight) {
return null;
}

const val = preorder[pLeft];
// first element of preorder is root
const root = new TreeNode(val);

// get index of root in inorder
const rIndex = inorderMap.get(val);
const leftDistance = rIndex - iLeft;
const rightDistance = iRight - rIndex;
// process left
root.left = bstConstruction(pLeft + 1, pLeft + leftDistance, iLeft, rIndex - 1);
// process right
root.right = bstConstruction(pLeft + leftDistance + 1, pRight, rIndex + 1, iRight);

return root;
}
// reconstruct bst from preorder and inorder
const n = preorder.length;
return bstConstruction(0, n - 1, 0, n - 1);

};

/**
* Your functions will be called as such:
* deserialize(serialize(root));
*/

Trie Serialization

One simple approach would be to use a modified n-array approach to treat null values as special characters.

Another approach is to use serialized index#char .

Happy coding :-)

--

--

Dilip Kumar
Dilip Kumar

Written by Dilip Kumar

With 18+ years of experience as a software engineer. Enjoy teaching, writing, leading team. Last 4+ years, working at Google as a backend Software Engineer.

No responses yet