Google drive system design

Dilip Kumar

5 min readJul 20, 2024

Design Google drive scale application which allows

Users to upload and download their files from any device.
Users should be able to share files or folders with other users.
It should also support automatic synchronization between devices, i.e., after updating a file on one device, it should get synchronized on all devices.
The system should support storing large files up to XGB.

High level system design

One high level, we can have following services to implement Google drive application.

We have following problem to handle

On upload error ,retry entire file upload is very bad user experience as repeating entire upload will multiply time to upload the file.
As per research more than 60% content on internet are duplicate. Therefore design should consider optimizing storage for duplicate contents.
On uploading a new file (or modified) content synchronize it to rest of the clients for that user.

Optimize upload service

Instead of upload one single file of large size, we can chunk into small size of 10MB and upload each chunk separately.

Client side design

On high level, client can be designed as below.

Watcher

It will monitor the local workspace folders and notify the chunker of any action performed by the users, e.g. when users create, delete, or update files or folders.

Chunker

Chunker component will take care of splitting large file into small chunks at client system. It will also produce metadata which will be used by storage component.

It will also be responsible for reconstructing a file from its chunks.

It will also detect the parts of the files that have been modified by the user and only choose those parts to get uploaded to the Cloud Storage; this will save us bandwidth and synchronization time.

It will also write events into Upload Chunks Queue to notify uploader.

Chunking algorithm

Following are different chunking algorithm that can be used chunkers to split files.

Fixed size file Chunking

We can choose the fixed size for chunk and then use that size to simply split file. Then create hash for each chunk which then later used to compare two chunks to decide if it is duplicate or not.

Fixed size chunking works fine if files were not edited. If one of the file is edited by even a single character from beginning then two same files will produced different chunks which means we will have to treat as separate files.

Even with this limitation, this approach is still practiced due to easy and less cpu utilization.

Content Define file Chunking

In this approach, instead of chunk size, we choose a separator and use that separator to chunk file.

It helps to maintain the chunk hash for all chunks except the first one if some edit were made on the second file.

This is CPU intensive as well as selection of separator bring another complexities.

Chunks Database

Client needs to maintain it’s own storage to store chunks metadata and upload status. Following is schema for Chunks table.

ChunkId  FileId  ChunkOrder  ChunkSize  UploadStatus   ModifiedTimestamp
1         1      1           10mb       UPLOADED         xxx
2         1      2           10mb       INPROGRESS       xxx
3         1      3           10mb       NOT_UPLOADED     xxx 
4         1      4           10mb       NOT_UPLOADED     xxx
5         1      5           10mb       DOWNLOADED       xxx

Uploader

It will process the pending messages in the Upload Chunks Queue. It will make server Upload Service to upload the file content with it’s metadata.

Once chunk is uploaded successfully then it will update the Chunks table to show the status of chunk upload.

Synchronizer

Server will notify this component if chunks is uploaded by other client. It will simply enqueue message into Download Chunks Queue for processing.

Downloader

Downloader will process messages from Download Chunks Queue and make server Download Service to download the chunk and update the local file system.

Server side design

Following is high level server side design.

Upload Service

Upload service takes care of uploading file chunks to object storage and also updating Metadata table. It also publish message into Chunk Uploaded Queue for synchronization.

Sync Service

Sync Service will process message from Chunk Uploaded Queue . It will look into UserClients table to find out the list of connected clients for the given user. Then it will fanout making call to Session server for each client to notify client.

Session Server

Every client first make Registration rpc call to Session server to get the target session server. If not already established then this rpc will allocate a new Session server. Then client establish bi-directional connection to session server.

Session Server also expose DispatchEvent rpc which is invoked by Sync service to perform fanout.

To learn more about Session server concept, please go through following design.

Chat system design

Design a Chat system to support one to one chat and chat in groups. Also support the online status to other users.

dilipkumar.medium.com

File Sharing service

Please go through following design to understand how privacy is implemented as a shared service.

Design the Facebook post privacy functionality

Once user makes a post on Facebook, they can decide the restriction of post that can be made available to other users…