Design an online judge like Leetcode
This problem has two versions
Programming Contest
This one is similar to Leetcode Contest. Users participate in the contest in a limited time window and have very high traffic.
Programming practice
This one is the regular version of Leetcode, and the traffic is spread throughout the day and we need to support many different questions.
In this post, we are going to discuss about Programming contest system design with following features
- Contests last 2 hours.
- Take submission from contestants
- Evaluate submission to successful or failed
- Rank the contestants
- We need to support multiple languages for coding.
Complexities
Scalability
Assume we have 10 million users participating in the contest. Each user makes 5 attempts. So in two hours we have 50m / 7200 ~= 7k writes per second.
Assume each user refreshes the leaderboard 5 times, so that is also about 7k reads per second.
Consistency
It’s reasonable to have eventual consistency on the leaderboard, i.e. after a user finishes a question, it may take up to a few seconds to see it updated in the leaderboard.
Latency
Getting the leaderboard should be very fast (100ms p99).
Getting a submission result will take longer because we need to compile and run the code.
Storage
Assume we need to store the submission for every attempt. Assume each submission is about 1kb, so 10m * 1kb * 5 ~= 50GB data for the contest.
Let’s say number of contests in a week = 1
Number of contest in 5 years = 52 * 5 = 260
Total storage in 5 years = 260 * 50GB = 13,000 GB = 13TB
High level system design
Following is high level system design for online contest judge system.
Contests Service
When contest starts, user will see the list of questions for the contest.
Following will be the API to start the contest for the given user.
POST /contest/<contest_id>/participents/<user_id>
ConesParticipentResponse {
contest_id=xxx
user_id=xxx
}
Contest participants will be stored in ContestParticipents
table. Following is schema for this table.
ContestParticipentId ContestId UserId StartTime SubmissionTime SubmissionId
xxx xxxx xxxx xxx xxxx xxxx
ContestParticipents
will be sharded based on UserId
with primary key as ContestParticipentId.
Following is API to read the list of questions for the contest.
GET /contest/<contest_id>/questions
ConestResponse:
{
contest_id = xxx
durattion = xxx
questions = [
{
question_id= xxx
title = xxx
content_url = xxx
}
]
}
It will read data from Contests
and Questions
table to return the result. Following is scheam for Questions
table.
QuestionId QuestionUrl Level
xxxx xxxx xxx
Following is Contests
table schema.
ContestId QuestionIds Duration StartTime EndTime
xxx [xxx,xxx,xx,xx] 2h xxx xxx
Following is sample query for this API.
SELECT c.ContestId, ARRAY(q.QuestionId, q.QuestionUrl) AS questions
FROM Contests c, Questions q
WHERE c.ContestId=xxx AND q.QuestionId IN c.QuestionIds
GROUP BY c.ContestId
Submission Service
Once user is ready, they can submit their code using this service. User code submission will be done using two APIs. First to upload the code and second to write into Submission queue for asynchronous processing.
Upload Code API
Following will be the API to upload the code submission.
POST /contest/<contest_id>/participents/<user_id>/upload
Request {
// Multipart files
}
Response {
submission_url= xxx
}
Code content will be uploaded to the Blob storage and it will return the url for reference.
Submission request API
Following will be the API to submit request for submission.
POST /contest/<contest_id>/participents/<user_id>/submission
Request {
contest_id = xxx
submission_url = xxx
user_id = xxx
}
Response {
submission_id= xxx
}
This API will write following payload to SubmissionQueue
.
{
userId = xxx
eventTimestamp=xxxxx // For stream processing
payload = {
submission_id = xxx
user_id = xxx
submission_url = xxx
language= xxxx
}
}
This queue will be sharded based on user_id
to keep the processing evenly spread across the world.
View Submission Result API
A third API will also be needed to view the result for submission. Following is details of this api.
GET /contest/<contest_id>/participents/<user_id>/submission/<submission_id>
Response{
submission_id = xxx
Status = xxx
}
Submission Evaluation Handler
A background task is needed to process the submission from the queue. It will do following.
- Pick the task from Queue
- Based on the submitted code language, choose the right sandbox environment to execute the code.
- It will read the test cases for the each problem and attach the input and related expected output to the Sandbox.
- Get the execution result from Sandbox environment and update the
SubmissionResults
table.
Following is schema for SubmissionResults
table.
SubmissionId Status TotalTime TotalMemory TotalCPU Score
xxx xxx xxxx xxxxx xxx xxx
Isolated Sandbox Environment
A sandbox environment is a secure, isolated space where user code can be executed without affecting the host system or other running processes. It’s crucial for online judges to protect against malicious code and ensure fair evaluation of submissions.
Key Features of a Sandbox
- Isolation: Code runs in a separate process or container, preventing access to the host system’s files, network, or other resources.
- Resource Limits: Enforces strict time, memory, and CPU usage limits to prevent resource exhaustion and denial-of-service attacks.
- Input/Output Redirection: Controls the code’s interaction with the outside world, providing standard input and capturing standard output for comparison.
- Security: Implements measures to prevent code injection, buffer overflows, and other vulnerabilities.
Common Sandbox Technologies
1. Virtual Machines (VMs): Provide complete isolation but can be resource-heavy. Examples: VirtualBox, VMware etc
2. Containers: Lightweight and efficient, offering good isolation. Examples: Docker, Kubernetes etc
We can use Google run or AWS lambda as Sandbox environment to execute the code with right input and expected output.
Ranking Stream job
Ranking can be calculated based on following
- Status for each submitted question
- Points for each successfully submitted question
- Number of attempts to solve the problem
- Time/memory/cpu etc consumed during code execution
We have two approaches to implement the ranking system
- Batch processing: It works on bounded data. I.e. once all code are submitted then run the batch processing to calculate the final ranking for each user.
- Stream processing: It works on unbounded data i.e. can provide realtime ranking.
For our problem, we will use the Stream processing.
We can either use Apache Flink or Google Dataflow to read streams of code submission results from Results
Kafka topic and process the ranks of each users. Following will be the schema of payload in this topic.
{
EventOriginTimestamp: xxx
Payload: {
UserId: xxx
QuestionId: xxx
SubmissionId: xxx
Result: {
status: xxx
time: xxx
memory: xx
cpu: xxx
}
}
}
Stream processing can leverage the code submission time as EventOriginTime and leverage the persistent storage to calculate the ranking for all users based on processing of window of submissions streams.
It will write final result into Rankings
table with following schema.
SubmissionId Rank Score
xxxx x x
xxxx x x
Leaderboard Service
Following Leaderboard API will be used to show the rankings for each user.
GET /contest/<contest_id>/leaderboard
Response {
contest_id = xxx
leaderboard = [
{
user_id = xxx
rank = xxx
score = xxx
}
]
}
This API will first look into RankingsCache
, on cache hits it will return the rankings data, otherwise it will read from Rankings
table and update the cache followed by returning this data. Cache can be update by the Ranking Stream processors.
Hope you learnt something. Happy system designing :-)