Comment reply system design
Designing a comment system, is a feature integrated into websites that allows users to post comments on published content, interact with other users through replies, upvote or downvote comments, and sort or filter comments.
Schema design to store threaded reply
Approach 1: Use parent child relationship
Following can be schema for Posts
table.
PostId Text CreateTimestamp UserId
1 Party? 1 1
2 Tennis? 2 2
Following can be schema for Comments
table with CommentId
as primary key.
CommentId PostId ParentCommentId Text AuthorUserId CreateTimestamp
1 1 NULL Hello 1 1
2 1 1 Hi 2 2
3 1 2 Meeting? 1 3
4 1 3 No 2 4
Following can be query to read all the comments of level1 and level2 on post #1.
SELECT
PostId,
parent.Text as Level1,
child.Text as Level2,
FROM Comment AS parent
LEFT JOIN
Comment as child on child.ParentCommentId = parent.CommentId
WHERE PostId = 1;
But query become complex if need to find N level comments. Following is example of repeated self join query.
SELECT
PostId,
p1.Text as Level1,
p2.Text as Level2,
p3.Text as Level3,
p4.Text as Level4,
p5.Text as Level5,
p6.Text as Level6,
FROM Comment AS p1
LEFT JOIN
Comment as p2 on p2.ParentCommentId = p1.CommentId
LEFT JOIN
Comment as p3 on p2.ParentCommentId = p2.CommentId
LEFT JOIN
Comment as p4 on p2.ParentCommentId = p3.CommentId
LEFT JOIN
Comment as p5 on p2.ParentCommentId = p4.CommentId
LEFT JOIN
Comment as p6 on p2.ParentCommentId = p5.CommentId
WHERE PostId = 1;
Obviously, this approach goes out of control quickly.
Approach 2: Parent path-style Identifiers
We can remove the ParentCommentId
column from Comments
table and add Path
column to store the path hierarchies of comment.
CommentId PostId Path Text AuthorUserId CreateTimestamp
1 1 NULL Hello 1 1
2 1 1/ Hi 2 2
3 1 1/2/ Meeting? 1 3
4 1 1/2/3/ No 2 4
5 1 1/ How are you? 3 5
We can now query parent comment by comparing the current row’s path to a pattern formed from path of another row.
For example, to find out ancestor of comment #4 whose path is 1/2/3/
we can run following query.
SELECT CommentId, Text
FROM Comments AS c
WHERE (Select Path from Comments Where CommentId=4) LIKE c.Path || '%'
This matches the patterns formed from paths of ancestor 1/2/3/%
, 1/2/%
and 1/%
.
Drawback of this schema is that Database can’t perform any schema validation on Path
column.
Approach 3: Tree style table
We can introduce Relationship
table to store the comments relationship instead of using the Comments
table.
Following will be schema for Comments
table.
CommentId PostId Text AuthorUserId CreateTimestamp
1 1 Hello 1 1
2 1 Hi 2 2
3 1 Meeting? 1 3
4 1 No 2 4
Following will be the schema for Relationship
table.
ParentCommentId ChildCommentId
1 2
2 3
1 4
To get the all the child comments for #1 will be as below.
SELECT *
FROM Comments AS C
JOIN
Relationship p ON c.CommentId = p.ChildCommentId
WHERE
C.CommentId = 1
Similarly the query to print all the parents will be as below.
SELECT c.*
FROM Comment AS c
JOIN
Relationship AS p ON p.ParentCommentId = c.CommentId
WHERE
c.CommentId = 3
Approach 4: XML/JSON based schema
We can leverage either XML/JSON based NoSQL storage system and write entire comments thread into a single XML/JSON object. Everytime a reply is made will be modifying this XML/JSON file.
High level System design
On high level, we can have microservice for each components as below.
Enjoy system designing :-)