URL Shortener System Design

Dilip Kumar
8 min readJun 30, 2024

--

Given a URL, design a system to generate a shortened URL. When users access a short link, our service should redirect them to the original link.

Services to implement URL shortener

We need to two microservices

  1. Shortener service: Create a short URL for given long URL.
  2. Lookup service: Return the long URL for given short URL with 302 http code to support self redirection.

A high traffic system will have 500M create request per month. 100:1 will be read and write ratio.

Main critical component of this design is to implement shortener service with low latency.

URL Shortener service

Approach # 1: Hash based base62 encoded shortened URL

A URL can have any length. Therefore to generate the fixed-length output, first we will hash the URL using sha256 which will generate 256 bit i.e. 64 hex string.

sha256(https://this-is-a-long-url.com/desing/url/shortner) = 
2553a57edb1b87b60ac40d12ba491560fdf1cc886b6f9171b2e42b2caf4eede5

To shorten the generated hash, we can use base62 encoding (a-z, A-Z, 0–9, exclude + and / ).

base62(2553a57edb1b87b60ac40d12ba491560fdf1cc886b6f9171b2e42b2caf4eede5) =
8qmTY1gUvtjJA6mmUKYvRswXGxwdujJ5JW12xxiPXfR

The input of 64 hex string, it generated 43 chars long encoded string. To shorten it further, we can get the random 7characters from the encoded string.

random_7chars(8qmTY1gUvtjJA6mmUKYvRswXGxwdujJ5JW12xxiPXfR) = UKmTY1g

Why 7chars long?
6 letter long key = 62⁶ = ~ 56.8 billion
7 letter long key = 62⁷ = ~ 3,5 trillion
8 letter long key = 62⁸ = ~ 218 trillion

Based on our application needs, we can choose the length of the shortened URL. 7chars length will be enough for most of the applications therefore we will go with 7chars.

How to handle collision?

Extracting random 7chars will lead to a collision. To check the used keys we will have to store them in the database.

Shortend_Url     Url                      Primary_Key = <Shorend_Url>
8qmTY1g https://xxx/yyy/zzza
1qmTY1g https://xxx/yyy/zzzb
2qmTY1g https://xxx/yyy/zzzc
3qmTY1g https://xxx/yyy/zzzd
4qmTY1g https://xxx/yyy/zzze

Now every time the algorithm generates a shortened URL, it will look into the database to check if it is already used or not. If not used then we can use the generated shortened URL. If it is already used then we will create a new one.

How to make sure two users using the same URL should generate different shortened url?

Since we choose 7chars randomly therefore chances are low that same URL will result into same output.

To optimize it further, we can add randomization to the input. This is also called salt. Following would be the steps.

  1. Create a random string of fixed length (let’s say 10 chars).
  2. Concatenate the random string to the original URL. Use this as an input.
salt = random_chars(size=10) = 0niQ7jXL012

sha256(https://this-is-a-long-url.com/desing/url/shortner+0niQ7jXL012) =
ce634cc672c1d50f7d5708d19dd1c418cadc7680a02b846988b98ac39fcd0cd0

base62(ce634cc672c1d50f7d5708d19dd1c418cadc7680a02b846988b98ac39fcd0cd0) =
mwKusUKWKD9SUL8dHyueEqBAD3N3lzs3c8JjNyPFM6i

random_7chars(mwKusUKWKD9SUL8dHyueEqBAD3N3lzs3c8JjNyPFM6i) = 3DKusUK

3. For the above example, it generates a new shortened URL as 3DKusUK .

4. We will check the database, if already in use then simply retry.

Overall design to create the shortened url

Problem with this design

  1. URL shorter makes a network call to database which will cause additional latency.
  2. Duplicate check will lead to O(n) database call which will make it worst.

Overall this design has scalability and latency issue therefore need to explore alternate design.

Approach # 2: Block based UUID Sequence generation logic to create shortened URL offline

As discussed in UUID Generation design, we can utilize the block based ID generation approach to generate the shortened URL offline.

Base 10 number system counter

In base10 number system, we can keep counter as a local variable and we simply increment it counter++ to get the new ID.

Use base62 number system

Similarly, we can keep counter as a base62 number system and simply increment it counter++ to get the new ID.

Base62 number system consist of digits {0-9, a-z, A-Z} . A shortened URL of 7 size length can begin with 1000000 and ends to ZZZZZZZ . In between we have almost ~3.5 trillian numbers. Following are few examples.

Decimal          Base62
182619516112 3DKusUK
182619516113 3DKusUL
182619516114 3DKusUM
182619516115 3DKusUN
182619516116 3DKusUO
182619516117 3DKusUP
2079276347712 abcdefg
2020598544545 ZZZZZZZ

As we see, we can simply increment the counter and get the new 7 chars long base62 encoded characters.

Based on this observation, we can come up with blocks range in base62 number system to implement sequence generation which is nothing but a key for shorten URL.

Block_Start        Block_end (range 1 million)
abcdefg aciJYLw
aciJYLx adnzS2D
adnzS2E aetfLiU
aetfLiV afzLFOl
....... .......
....... .......

You can use https://jalu.ch/coding/base_converter.php to run the conversion for quick testing.

Once App server receives the block then it will simply run the counter in base62.

To handle the fault tolerance, we can have two databases, one for odd blocks and second for even blocks.

Bit Reversal for primary key

A monotonic increasing primary key creates hotspot on database write during high traffic. To handle it, we can simply reverse the bits. Following will be modified steps.

  1. Generate monotonic increasing shortened URL, let’s call it X (preferably odd sequence)
  2. Reverse the bit and get Y (one option is to reverse the order of the bits in the 2’s complement binary representation of X).
  3. Use Y as a primary key.

Overall system design

  1. On restarting Shortener server, it will query database system to get the next available sequence block.
  2. Every request to generate the shorten the URL, now it will simply increment the counter and use to generate the shortened URL.
  3. For persistent storage, it will also write into Shortened database.
  4. Time complexity to generate the URL shortener is O(1) .
  5. Write shortened URL into database is still same i.e. O(1) .
  6. If counter exhausted then it will again ask the database for next sequence.

Does this design solves all the bottlenecks in the previous design? Let’s go through each.

Two users using same URL

Since every request is using new counter value which gives new shortened URL therefore two users with same URL will result into different shortened value.

Maintain user specific shortened URL

We can capture the signed in user’s ID to store it in the database as below.

Shortend_Url  UserId    Url      Primary_Key = <Shorend_Url,UserId>
8qmTY1g 123 https://xxx/yyy/zzza
1qmTY1g 124 https://xxx/yyy/zzzb
2qmTY1g 125 https://xxx/yyy/zzzc
3qmTY1g 126 https://xxx/yyy/zzzd
4qmTY1g 127 https://xxx/yyy/zzze

Waste for Key blocks

If there is a bad server which keeps crashing then it will lead to waste for key blocks range.

This can be optimize to make sure service ask for new block from database only after all health check is passed.

Still there is chances that we will waste key blocks for example routine rollout etc.

Since 7 chars long shortened URL can generate 3000 blocks with 1million size. We can take following approaches to improve it further.

  1. Reduce the size to 10,000. That will give us 300,000 blocks.
  2. Increase the shortened URL from 7 chars to 8 chars. It will increase from 3.5 trillion to 218 trillion. It would be enough for any application if we also reduce the block size as per #1.

Prevent malicious user using api_dev_key

A malicious user can put us out of business by consuming all URL keys in the current design.

To prevent abuse, we can limit users via their api_dev_key. Each api_dev_key can be limited to a certain number of URL creations and redirection per some time period (which may be set to a different duration per developer key).

How to handle if user uses shortened url to short it?

We can apply pattern detection and reject the request if shortened URL is used to short it again. This needed to avoid the circular redirection.

URL Lookup Service

Lookup service will read original URL form the database for the given shortened URL. Since shortened URL is a primary key therefore read will be based on key based and no extra index is required.

Read vs write traffic is 100:1 as well as since shortened URL is immutable therefore we can simply add an in-memory cache to speedup the lookup.

If it’s present in the DB, issue an “HTTP 302 Redirect” status back to the browser, passing the stored URL in the “Location” field of the request.

If that key is not present in our system, issue an “HTTP 404 Not Found” status or redirect the user back to the homepage.

Support custom shortened key

Supporting custom shortened key will be complex with Approach#2 design. Main reason is that the shortener service uses the counter to create next key. A user given custom key might conflict with other key blocks.

If we want to support custom key then we will have to go with hybrid approach #1.

Every custom shortened key, we can write alias in the separate table.

Alias_Url     Shortend_Url   Primary_Key = <Alias_Url, Shorend_Url>
sample_1 8qmTY1g
sample_2 1qmTY1g
sample_3 2qmTY1g
sample_4 3qmTY1g
sample_5 4qmTY1g

During the read, now we will have to first query the alias table, if exist then user the shortened_url to query the ShortenedUrl table otherwise use the same for lookup.

Note: There should be validation on the size of custom key to avoid misuse.

Storage Capacity Estimation and Constraints

Following are size for each fields.

  1. OriginalUrl: string 256 chars ~ 512 bytes
  2. ShortenedUrl: string 7 chars ~ 14 bytes
  3. UserId: int 4 bytes
  4. CreateTimestamp: datetime 8 bytes
  5. UpdateTimestamp: datetime 8 bytes

Size for each document: 512 + 14+ 4+ 8+ 8 ~ 546 bytes ~ 1000bytes ~ 1kb (consider index and other storage level encoding).

URL shortenings request per month: 500M

Total number of objects in 5 years: 500 million * 5 years * 12 months = 30 billion

Total storage: 30 billion * 1 kb ~ 30Tb

Number of shards: 300 (With 100GB per shard)

Number of replicas per shard for failover: 3

Total storage: 900TB

Traffic estimate

  1. 500M new URL shortenings per month
  2. Query Per seconds for write: 500 million / (30 days * 24 hours * 3600 seconds) ~ ~200 URLs/s
  3. Avg server can handle 100 thread per second. With 60% CPU utilization, we can keep total ~4 Shortener servers. We should also add +2 for failover.
  4. Considering 100:1 read/write ratio, URLs redirection per second will be: 100 * 200 URLs/s = 20K/s
  5. Avg server can handle 100 thread per second. With 60% CPU utilization
    Number of app server: 20000/60 = 333 servers ~ 350 lookup servers

Cache memory estimate

If we want to cache some of the hot URLs that are frequently accessed, how much memory will we need to store them?

  1. If we follow the 80–20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs.
  2. We have 20K requests per second.
  3. Requests per day: 20K * 3600 seconds * 24 hours = ~1.7 billion
  4. To cache 20% of these requests: 0.2 * 1.7 billion * 500 bytes = ~170GB

Metrics for analytics

We can add following metrics.

  1. How many times short URL was used?
  2. What was the location of user?

Happy learning :-)

--

--

Dilip Kumar
Dilip Kumar

Written by Dilip Kumar

With 18+ years of experience as a software engineer. Enjoy teaching, writing, leading team. Last 4+ years, working at Google as a backend Software Engineer.

Responses (1)