Ticket master system design
Requirements
Design an online booking system to purchase tickets for various events. User should be able to search and browse events. User should be allowed to view and select the seats. User should be claim ticket and to hold it for 5 minutes until it expires or successfully able to purchase it.
Scale
- A popular event can have 1+ millions of users.
- On every day 100 new events are posted for ticket sale.
- Event can be watched online as well therefore consider the geo distributed users.
- Rate of ticket selling is fast, consider a user experience to update the client on changing the ticket availability.
- Allow user to select the seat
Complexity of system
- If multiple user is trying to claim for the same seat then only one user should be completed successfully, rest should be rejected.
- There should be mechanism to notify rejected users in case of availability of tickets.
- App should keep updating the availability of tickets if seats gets available on cancellation or rejection.
- System should have fairness for ticket reservation.
High level system design
Following is high level system design.
On high level, following is ticket booking flow.
- Search: User will search events based on their choice. It will return list of EventIds with the other minimal data needed to show details as a list to user.
- Avail: Search service can make one time call to backend to get the seat availability details to show on the search result page.
- Details: User click on event to learn more details about event. This will make Events service to get the data. This service will also make static call to backend to get the seat availability details to show on the event details page.
- Seat Selection: Seats service will be used to show the seats. At this time client will establish streaming connection to the Session server. This is needed to update the seat status realtime to connected clients. At this time we will also capture the User and Events mapping so that server can do the fanout about seat status.
- Payment: Until payment, flow is called shopping. Payment flow onwards it is called Booking. Before payment, we start the price lock (could be external and internal call to lock the price). Generally a third party API is called for the payment processing. ThirdParty gives either Poll API so that we can keep polling on certain interval. Or it also provides a callback mechanism. Where ThirdParty will call our API once payment processing is either completed or failed.
- Booking is a two step process HOLD and BOOK. During price check, inventory is put on HOLD. If price change then again show new price to user and ask them to accept it. Once payment is completed successfully then BOOK process is started to complete the booking. Generally, system is designed to keep HOLD longer and very short period for Payment window. So that payment should always be completed before HOLD. Entire booking will be on hold if payment confirmation is not returned from external payment.
Search Service
To support fulltext search, we will maintain a separate search index. Following fields will be indexed for better search user experience.
- Event title
- Event description
- City
- Celebrity
User will also be able to apply date filter to narrow down searching events based on their desired timeline.
On high level, following is API for search events.
GET /events/search
Request {
Title: xx
Description: xxx
City: xxx
DateFrom: xxx
DateTo: xxx
}
Response {
data: [
{
EventId: xxx
EventImage: xxx
Title: xxx
Description:xxx
City: xxx
ShowTime: xxx
SeatsAvailable: xxx
}
],
nextPage: {
size: xxx
start:xxxx
}
}
In order to return the latest seats inventory status, rating etc details, Search API will also make query to other system to get the desired details.
Events Service
Once user sees the list of events they will click on one of the event to see the more details. Events service will be used to show more details about selected Event. This service will use Events
database table to get the more details. Following are details on API needed by this service.
GET /events/details/<event_id>
Response {
EventId: xxx
EventImage: xxx
Title: xxx
Description:xxx
City: xxx
ShowTime: xxx
SeatsAvailable: xxx
Rating: xxx
ConcurrentUsers: xxxx
}
Following is schema for Events
table.
EventId Title Description City Image ShowTime Rating
xx xx xx xx xx xxx xx
Events
table will have EventId
as primary key. We can keep City
as shardKey to keep events belongs to one city on same shard server.
In order to return the latest seats inventory status, rating etc details, EventDetails API will also make query to other system to get the desired details.
Seats Service
Seats service will expose API to allow user to view the seats arrangement. This will allow user to choose the right seats to start the reservation process. Following are API details.
GET /events/<event_id>/seats
Response {
data: [
{
SeatId: xxx
EventId: xxx
Row: xx
Col: xx
Label: xxx
Status: FREE/LOCKED/BOOKED
}
]
nextPage: {
size: xxx
start: xxx
}
}
This service will use Seats
table to get the seat details about the event. Seats
table will have SeatId
as primary key. EventId
will be used as ShardKey to keep all seats for the same event on same shard. Following is schema for this table.
SeatId EventId RowId ColId Label
xx xx x x xx
Seat availability will be based on Reservations
table. In the beginning of launching event for sale, this table will be populated from Seats
table with status as FREE
.
SeatId EventId Status HoldExpirationTime UserId
1 1 FREE
2 1 FREE
3 1 FREE
4 1 FREE
Following will be the enums for Status.
FREE
HOLD
BOOKED
Following is query to get the seat details.
SELECT S.SeatId, S.RowId, S.ColId, S.Label, R.Status
FROM Seats S, Reservation R
WHERE S.seatId AND = R.SeatID AND S.EventId=xxxx
At this time client will establish streaming connection with SessionServer. To manage the fanout, we will maintain EventSessions
table with following schema.
EventId UserId Hearbeat ExpiryTime(TTL)
xx xxx xxx
Client will keep sending heartbeat which will be used to add ExpiryTime column used as row retention policy.
Please refer to SessionServer for more details.
Reservation Service
Reservation service will be used to book the tickets. User can claim multiple seats in one request.
Hold claim
Each request will be processed using database transaction. If multiple users makes the request for same Seat then only one will be completed successfully and others will fail. On successful transaction, it will update Status
, HoldExpiration
and UserId
column.
SeatId EventId Status HoldExpiration UserId
1 1 HOLD t1+5mins 1
2 1 HOLD t1+5mins 1
3 1 FREE
4 1 FREE
Following is query to perform write
UPDATE TABLE Reservations
SET UserId=xxxx, Status = 'HOLD'
WHERE SeatId=xxx AND STATUS=='FREE'
To make above query idempotent, we will have to combine it with READ+MODIFY transaction.
Notify Seat status
On successful transaction, we have goal to notify other active client interested for that Event to show the seatmap with updated status.
We have two approaches
- Part of
Reservations
transaction, we can also leverage distributed transaction tech stack to also publish a message intoNotificationQueue
as well asHoldExpirationQueue
. Entry inHoldExpirationQueue
will be made with5mins
of delayed delivery so that handler will received this message after 5mins only. - We can use changelog stream on
Reservations
table and publish message intoNotificationQueue
if status column was changed. We can also publish toHoldExpirationQueue
only if the status was changed fromFREE
toHOLD
with 5mins of delayed delivery.
For #1, we need specialize database like Google Spanner
otherwise we can also go wtih #2 as it allows to use separate database stack for Reservations
table and separate messaging queue.
We will go with #1 for better architecture.
NotificationQueue Handler
This will take care of fanout seat status based on EventSessions
. It will makes rpc call to SessionServer which will take care of sending seat status to the recipient user.
HoldExpirationQueue Handler
After ~5mins, this message will be delivered to handler. Once message received, handler will do following.
- Check the
Reservations
table to see if Seat is still on hold. - If seat is still on hold then mark it as FREE and unset UserId.
- If seat is no more hold then ignore the message.
Note: We can also design to give a prompt to user if they want to extend the hold. If not accepted then perform the cleanup.
How to handle rejected claim
If claim is rejected then user will received new status of seat if other user had already successfully hold the seat as streaming events.
Once user sees that their earlier seats are no more free then either user can wait or try other seats.
There is alternate way to handle the booking where we don’t allow user to choose seat, instead system decides seat.
Alternate booking: Doesn’t allow user to select seats
We can simplify claim system if we don’t allow user to select the seat, instead only allow to request for number of seats with preference and let server to allocate seats. This approach is used in train booking system etc.
In this approach, booking service will simply enque the request into AsyncBookingQueue
. None of the user’s claim request is rejected. All requests are concurrently written into queue. System will process the request one by one (or in batch) to allocate the next available seats based on choices.
Session Server
Session Server will be used by client and server to establish the bi-directional streaming connection.
Client uses Registration
rpc to find out the available target. Server allocate the available session server and return the details to the client. Client uses that details to establish streaming connection to target server.
It also expose DispatchEvent
rpc to send event to connected clients.
Payments Service
Once seats are hold by user then user use Payment Service to start the payment flow. This service does two things
- Make external payment system to begin the payment
- Publish message into
PaymentProcessingQueue
to track the payment with 2 minutes of delayed delivery time.
If ThirdParty Payment system provides callback mechanism then we will register our /payment/callback
API and once this API is invoked by Third party then we will update the Reservation status based on fail/success.
If ThirdParty payment system gives their API and we should be calling it in interval then code at the client side should be calling it in interval to check the status. At the end it should call /payment/callback
API to update the payment status.
PaymentTrackerQueue
handler will look into Reservations
table and check if payment is not yet completed then notify user if they want more time. If want more time then client should make /payment/extend
rpc call which simply enqueue a new message into PaymentTrackerQueue
with next two minutes of ttl. If payment was completed then it will mark the reservation completed.
Scale system to match the requirement
To support globally distributed users, we need to make sure that every requests are served by co-located servers.
We can achive this by partitioning servers and map users to that partition. We can come up with following partitions.
- Compute layer partitions: us-01, us-02….us-0n,eu-01-…eu-0n, asia-01….0n,.
- Database layer partitions: A corresponding shards are created for storage layer us-01, us-02….us-0n
- A separate
Homemap
database is used to maintain the user and it’s partition based on geo location, load and other scenarios.
UserId AppPartition DbPartition
xx xxx xxxx
4. A smart router is used which is aware of UserID as entity present in the every request. Note: This forced user to must logged in before start interacting with application. Alternate approach is use the generate unique session_id and use it as ID for user for partition.
5. Once router allocates the compute layer shard then all the requests for that user will goes to same app layer.
6. Database sharding router also uses same Homemap db to find out the partitions for the data.
This gives us a opportunity to scale the geo location based traffic. Based on the API latency, we can add copies of replicas in that partition to achieve the desired traffic.
Do we need to match any global qps?
Global qps based scaling is a waste exercise. Across the geo, due to different timezone, user’s online presence is different. Therefore instead of trying to scale our system for global, we will scale based on the geo boundaries as discussed in the apps partition above.
Hope you have enjoyed this blog :-)