Image upload service to the scale of Facebook and Instagram

6 min readAug 30, 2024

Design a system to store images for Facebook and Instagram as a common infrastructure that would require 1000 uploads per sec and handle duplication.

Following is scale we should consider while designing the system

3+ billion monthly active users on Facebook
2+ billion monthly active users on Instagram
It means 5+ billion monhtly active users
Each user posts 1 image per week
Files are small size in the range of ~10mbs

High level system design

On high level, we can come up with following system design.

Upload Service

User will interact with upload service to upload their file. Since these are small files in the range of ~10mbs therefore client doesn’t need to implement chunking of files to make upload of files efficiently.

Following is API details of this service.

POST /upload
=======
multi-part file-upload
XXXXXXX
XXXXXXX

This service will upload file content to Row blob storage and will also write metadata about image into Metadata table. It will also publish event to Queue for further image processing. Following is schema for Metadata table.

ImageId  FilePath StoragePath ReadyToUse 
xxx      xxx      xxxx         Boolean

Primary Key: ImageId

ShardKey: ImageId

Note: We are not going to choose chunk design as file size is going to be small.

Image Preprocessing

Two main reasons we need to pre-process images

Device type: LCD, OLED, Retina etc are different display types for devices. These needs images at different resolutions for better rendering on screen.
Network speed: 5G, 4G, Wifi etc drives the network speed for device. At low network speed, it is recommended to serve the smaller images compare to high network speed.

Based on these needs, image processing is required on raw image to generate various types of images. We can apply following few processing.

Compression: We can produce JPEG (for photographs) or PNG (for images with transparent backgrounds) images to reduce file size.
Resolutions: Same image might needs to be render as different quality. For example, images can be at 640x480 pixels, 1280x720 pixels (HD), 1920x1080 pixels (Full HD), 2560x1440 pixels (Quad HD), 3840x2160 pixels (4K)etc.

Progressive image rendering is an approach that prioritizes the rendering of lower-resolution or placeholder images before gradually replacing them with higher-resolution versions. This technique aims to improve the perceived loading time for users, providing a more engaging and seamless visual experience.

Keeping this in mind, we can add many handler to produce images as per our target.

These handler will take the raw image and after processing upload it into storage and update the Metadata table with the corresponding path and mark it ready to use.

Duplicate check Handler

This service will create Hash of uploaded image and will compare with Duplicates table to check if there is any existing file with same hash. If yes then file will be removed from Raw Blob storage and file path will be updated in the Metadata table.

ImageHash  ImageId  CreateTimestamp
xxxx       yyyy     xxxx
xxxx

PrimaryKey = ImageHash

ShardKey = ImageId

Metadata table will be modified to add IsDuplicate column as below.

ImageId  FilePath StoragePath ReadyToUse IsDuplicate
xxx      xxx      xxxx         Boolean     Boolean
1        1        1            1
2        1        1            1

Virus Scan Handler

Uploaded file stored in the raw blob storage are kept in isolated environment. This service perform virus scan to detect the virus in the uploaded image if any.

If Virus image is found that file is immediately removed from storage and file is marked as not usable.

ImageId  FilePath StoragePath ReadyToUse IsDuplicate   VirusInfected
xxx      xxx      xxxx         Boolean     Boolean     Boolean

Privacy Scan Handler

Many users these days upload the same file claiming as the author of the file. Or they post some sensitive content.

This service takes care scanning file for privacy. If image violates the privacy then status is updated as not usable.

Storage infrastructure

As per report, image usages drops completely after a week, therefore it doesn’t make sense to use the same infrastructure to maintain images for their entire lifecycle.

Based on cost to manage infrastructure, we can come up with following

Hot Storage Infrastructure: High read/write performance with low latency and high availability. Solid-State Drives (SSDs), All-Flash Arrays (AFAs) or Cloud-based object storage (e.g., Amazon S3, Google Cloud Storage) are used to build this infrastructure therefore it is very expensive.
Warm Storage Infrastructure: Lower read/write performance with higher latency and lower availability. Hard Disk Drives (HDDs) is used for storage or Cloud-based object storage (e.g., Amazon Glacier, Google Cloud Storage).
Cold Storage Infrastructure: Lower read/write performance with higher latency and lower availability. Generally Tape storage or cheap Cloud-based object storage (e.g., Amazon Glacier, Google Cloud Storage) are used for this.

Keeping this in mind, we need corresponding handler will look into image usage pattern as well as time since it was first time made it available to use. Based on that it will decide to move image from Hot storage to warm storage.

Hot storage writer

Image processing will use hot storage to store the processed images ready to use. It will also publish message into Post Image processing queue to detect the right time to move the image to other storage system. It also update mapping in ImageHomemap table.

Warm storage mover handler

Purpose of moving least used images to Warm storage to handle the scalability need for Hot storage. It also update mapping in ImageHomemap table.

Cold Storage Handler

Since after a year, the usage of images are very low therefore images are moved to Cold storage to keep the images on very cheap storage infrastructure. It also update mapping in ImageHomemap table.

Storage Router

Request for images goes through Storage router which first read the ImageHomemap table and based on that find out if image is stored on Hot vs Warm vs Cold storage. Based on that it redirect request to corresponding infrastructure.

Optimize file size for hot storage

Single file storage

Since Facebook/Instagram images files are small in sizes therefore storing it as separate physical file increase the I/O operations due to very high read traffic on hot storage. Following are reasons for I/O problem.

Seek operations: To retrieve a specific file, the storage system must locate its exact position on the disk. This involves seeking to the file’s starting address, which can be time-consuming, especially for large storage systems.
Storage Fragmentation: Over time as files are deleted and new ones created, storage systems can become fragmented, meaning that free space is scattered across the disk. This can increase the time it takes to access files, as the system may need to read multiple blocks of data to retrieve a single file.

Merge small files into larger files

We can uses a file consolidation technique to merge small files into larger files. This reduces the number of individual files that need to be managed, which can improve storage efficiency and reduce I/O overhead.

Due to very high read traffic, Smart router will process request in batch and leverage it to read the disk one time.

Merged file with blocks details will be stored in separate database table for faster disk read.

Note: Facebook Haystack system use this approach for storage optimization.

CDN Service

Once images are made available to use then it needs to be served to users across the geo locations. Therefore accessing origin server (storage router) is not an efficient approach.

Instead, we can leverage CDN service which is a distributed network of servers that deliver content to users based on their geographic location.

CDNs improve website performance, reduce latency, and enhance user experience by caching content closer to the end-user.

Reference

Read the Facebook original paper on F4 to get the more details.

Happy learning :-)