Serialize and deserialize different data structures
Most of the data structures we study are in-memory construct i.e. how to represent in memory data. These data are not persistent i.e. on machine restart we lose in-memory data. Therefore we need to write these data into persistent storage for example disk to retrieve it.
In this post, we will go through how to serialize different data structures so that they can be written on disk. How to read data from disk to build the data structure i.e. how to deserialize.
There are various formats to write data on disk
- string
- binary
- proto buffer
For simplicity, we will use string
format to serialize data on disk.
Serialize and deserialize string array
https://leetcode.com/problems/encode-and-decode-strings/
Approach #1: Encode strings using a separator
- Given an array of strings for example
["abcd","pqrs","uvw"]
- We can choose a delimiter
,
to convert into a single string as"abcd,pqrs,uvw
- While decoding the same string
"abcd,pqrs,uvw"
we can split based on delimiter,
and convert into["abcd","pqrs","uvw"]
Pros
- Easy to implement
Cons
- The algorithm breaks if the input array has the delimiter as well.
- For example
["apple, orange and mango", "are","healthy", "food"]
. It will be encoded as"apple, orange and mango,are,healthy,food"
but decoded as["apple"," orange and mango", "are","healthy", "food"]
which is incorrect.
Optimization
- Instead of an ASCII separator, we use non-ASCII characters as a delimiter.
- For example, we can choose π
Drawback
- While the non-ASCII delimiter approach can work well for many applications, it assumes that the delimiter character will not appear in the strings to be encoded.
- However, in many practical situations, we cannot make this assumption.
Approach #2: Escape characters
- Choose
/:
as delimiter i.e. special character. - Choose
/
as an escape character. - We can denote that any delimiter following the escape character should be treated as a normal character instead of a delimiter.
- For example
Wor/:ld
, after escaping using the escape character/
it would be converted intoWor//:ld
. - It means the encoder/decoder will treat
/:
in this context as a normal character instead of a delimiter.
Let’s understand by example.
- For example
[“Hello”, “World/:”, “How/are you?”]
, after escaping with/
will be first converted into[“Hello”, “World//:”, “How//are you?”]
. - Then encoded using delimiter
/:
, it would be converted into“Hello/:World//:/:How//are you?"
- To decode the string, we process each char and do the following.
a. Initialize ans=[]
b. Initialize token=[]
to hold the processed character.
c. If a character is /
then read one more character.
d. If the next character is also /
it means we found //
which means the original string had /
which was escaped into //
. We will simply add /
to the token and move to the next char.
e. But if the next character is :
, it means we found /:
which is a delimiter. It means we reached to end of the current token. Convert the token into a string and add it into ans
and also empty the token to hold the next word.
f. This will give back the original array [“Hello”, “World/:”, “How/are you?”]
Approach #3: Chunked transfer encoding
- Instead of just joining all the strings together with a delimiter, we would precede each string with its length, followed by a delimiter, and then the string itself.
- This way, even if our string contains the delimiter, we can correctly identify the string boundaries.
- When we decode our encoded string, we know that the first item before the delimiter is the length of the string.
Let’s understand it by example [“Hello”, “World”, “/:Example/:”]
. Let’s use /:
as delimiter. Applying the above logic, we can encode it as 5/:Hello5/:World11/:/:Example/:
To decode it, we will start scanning serialized strings from left to right.
- We read until we encounter
/:
, which gives us5
. This tells us that the length of our first string is5
. So, we read the next5
characters to get“Hello”
- Next, we again read until
/:
to get5
, indicating that our next string is of length5
. Reading the next5
characters give us“World”
. - Finally, reading until the next
/:
gives us11
. Reading the next11
characters give us“/:Example/:”
. - This way we get the original array
[“Hello”, “World”, “/:Example/:”]
297. Serialize and Deserialize Binary Tree
https://leetcode.com/problems/serialize-and-deserialize-binary-tree/description/
There are four standard ways to traverse a binary tree.
- BFS
- DFS preorder
- DFS inorder
- DFS postorder
DFS preorder can intuitively store the structure of a tree structure as below.
We can write code as below using #
to represent null
node. Also, use ,
to separate the nodes.
/**
* Encodes a tree to a single string.
*
* @param {TreeNode} root
* @return {string}
*/
var serialize = function (root) {
const ans = [];
// preorder dfs
const dfs = node => {
if (!node) {
// reresent null node as #
ans.push('#');
return;
}
ans.push(node.val); // pre-order processing of node
dfs(node.left);
dfs(node.right);
}
dfs(root);
return ans.join(',');
};
/**
* Decodes your encoded data to tree.
*
* @param {string} data
* @return {TreeNode}
*/
var deserialize = function (data) {
if (!data) {
return null;
}
const preorder = data.split(',');
const recursion = (nodes) => {
// if empty preorder
if (nodes.length === 0) {
return null;
}
const val = nodes.shift(); // Use linked list for O(1) operation
// if # then return null
if (val === '#') {
return null;
}
const node = new TreeNode(Number.parseInt(val, 10));
node.left = recursion(nodes);
node.right = recursion(nodes);
return node;
}
return recursion(preorder);
};
Serialize and Deserialize n-ary Tree
Given an N-ary tree where every node has the most N children. How to serialize and deserialize it?
We can apply a similar approach as a binary tree here.
In an N-ary tree, there are no designated left and right children. An N-ary tree is represented by storing an array or list of child pointers with every node.
The idea is to apply preorder traversal and store an ‘end of children’ marker with every node. The following diagram shows serialization where ‘#’ is used as the end of the children’s marker. Use ,
to separate node value.
It would be serialized as A,B,E,#,F,K,#,#,#,C,#,D,G,#,H,#,I,#,J,#,#,#
. Following is the code to implement it.
const serialize = root => {
const ans = [];
const dfs = node => {
// Base case: null node
if(!root) {
return '';
}
// process node
ans.push(root.val);
// scan each child
for (const child of root.children) {
dfs(child);
}
// Marker at the end of child
ans.push('#');
}
dfs(root);
return ans.join(',');
}
const deserialize = serialized => {
const preorder = serialized.split(',');
const recursion = nodes => {
// Base case: empty nodes return null
if(nodes.length ==0) {
return null;
}
const val = nodes.shift();
// Marker # represent end of child
if(val === '#') {
return null;
}
// Create the node with this item
const node = new TreeNode(val);
// Recurse for child until end marker # return null
while(true) {
const child = recursion(nodes);
if(!child) {
break;
}
node.children.push(child);
}
return node;
}
return recursion(preorder);
}
449. Serialize and Deserialize BST
https://leetcode.com/problems/serialize-and-deserialize-bst/description/
Approach #1: Postorder traversal
If we ignore the BST nature then we can apply the above-discussed approach to serialize and deserialize trees.
However, we can utilize the following nature of BST to optimize the overall algorithm.
- BST can be constructed based on either preorder or postorder (as inorder can be found by sorting preorder/postorder).
- So the question now becomes a simple preorder/postorder traversal and BST tree construction.
/**
* Encodes a tree to a single string.
*
* @param {TreeNode} root
* @return {string}
*/
var serialize = function (root) {
if (!root) {
return '';
}
const ans = [];
const dfs = node => {
if (!node) {
return;
}
// preorder traversal
ans.push(node.val);
dfs(node.left);
dfs(node.right);
}
dfs(root);
return ans.join(' '); // white space takes lesser space compare to comma.
};
/**
* Decodes your encoded data to tree.
*
* @param {string} data
* @return {TreeNode}
*/
var deserialize = function (data) {
if (!data) {
return null;
}
const preorder = data.split(' ').map(a => Number.parseInt(a, 10));
const inorder = [...preorder];
inorder.sort((a, b) => a - b);
const inorderMap = inorder.reduce((map, val, i) => { map.set(val, i); return map; }, new Map())
const bstConstruction = (pLeft, pRight, iLeft, iRight) => {
// if empty node
if (pLeft > pRight) {
return null;
}
const val = preorder[pLeft];
// first element of preorder is root
const root = new TreeNode(val);
// get index of root in inorder
const rIndex = inorderMap.get(val);
const leftDistance = rIndex - iLeft;
const rightDistance = iRight - rIndex;
// process left
root.left = bstConstruction(pLeft + 1, pLeft + leftDistance, iLeft, rIndex - 1);
// process right
root.right = bstConstruction(pLeft + leftDistance + 1, pRight, rIndex + 1, iRight);
return root;
}
// reconstruct bst from preorder and inorder
const n = preorder.length;
return bstConstruction(0, n - 1, 0, n - 1);
};
/**
* Your functions will be called as such:
* deserialize(serialize(root));
*/
Trie Serialization
One simple approach would be to use a modified n-array approach to treat null values as special characters.
Another approach is to use serialized index#char
.
Happy coding :-)