Robin Karp Pattern matching algorithm (Fixed length Sliding Window)

Dilip Kumar
8 min readAug 29, 2024

--

A string-searching algorithm that efficiently finds occurrences of a pattern string within a larger text string.

It uses hashing to quickly identify potential matches, then performs full comparisons only for those candidates.

It slides a window of the same size as the pattern across the text. At each position, calculate the hash value of the current substring and compare it to the pattern’s hash value.

Let’s solve few problem using this algorithm.

28. Find the Index of the First Occurrence in a String

https://leetcode.com/problems/find-the-index-of-the-first-occurrence-in-a-string/description/

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:

Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.

Approach

We can apply fixed sliding window to search for needle in the haystack as below.

const slidingWindow = (haystack, needle) => {
// Fixed sliding window
const m = haystack.length;
const n = needle.length;
let left = 0;
const str = [];
for (let right = 0; right < m; right++) {
str.push(haystack[right]);
// Handle violation
if (right - left + 1 > n) {
str.shift();
left++;
}
// ans zone
if (right - left + 1 === n) {
if (str.join('') === needle) {
return left;
}
}
}
return -1;
}

What is problem in above logic?

str.join(‘’) === needle operation is O(n) . Both .join() and string === is O(n) operation. It makes overall time complexity as O(m*n) .

Use Robin-Karp to optimize the time complexity

We can optimize overall algorithm to run it in O(m) using Robin-Karp algorithm. We can take following approach.

  1. Calculate hash value for needle , this become search key.
  2. Since hash may cause collision therefore on matching the two hash value, perform extra string comparison to validate the match.

String slice at constant time?

s.substring(i,j) this is a linear time complexity operation. Therefore time complexity of above algorithm is O(n*size) . This code leads to TLE.

Robin-Karp is optimized algorithm to find out the string slice at constant time using hash. It uses rolling hash to generate hash at constant time as below.

  1. Treat string as 26 based number system with a-z as a digit.
  2. To add a new char at the right end, simply multiply hash*26 followed by add the value of new char as hash=hash*26+value
  3. To remove the new char from the left end use hash=hash — value*(26^size)

How to handle overflow?

26^size may lead to overflow. To handle it, we can keep taking module with the prime number.

How Robin Karp helps to fix the TLE?

With Robin-Karp, we only do the substring matching on finding hash duplicate. It skips running inner O(needle) every time.

Following is modified Sliding window code using Robin-Karp algorithm.

const PRIME = Math.pow(10, 6) + 1; // Take prime to avoid overflow
const getChrValue = chr => chr.charCodeAt(0) - 'a'.charCodeAt(0);
const matching = (haystack, left, needle) => {
const n = needle.length;
for (let i = left; i < left + n; i++) {
if (haystack[i] !== needle[i - left]) {
return false;
}
}
return true;
}
const getHash = s => {
let hash = 0;
const n = s.length;
for (let i = 0; i < n; i++) {
const chr = s[i];
const value = getChrValue(chr);
hash = ((hash * 26) % PRIME + value) % PRIME;
}
return hash;
}
const getRoll = n => {
const roll = new Array(n + 1);
roll[0] = 1;
for (let i = 1; i < n; i++) {
roll[i] = (roll[i - 1] * 26) % PRIME
}
return roll;
}
const slidingWindowUsingRobinKarp = (haystack, needle) => {
// Robin karp algo
const m = haystack.length;
const n = needle.length;
const ROLL = getRoll(n + 1);
const target = getHash(needle);
let left = 0;
let hash = 0;
for (let right = 0; right < m; right++) {
const chr = haystack[right];
const value = getChrValue(chr);
// shift to left and add new value
hash = ((hash * 26) % PRIME + value) % PRIME;
// Handle violation
if (right - left >= n) {
// slide window i.e. drop the left char
const chr = haystack[left];
const value = getChrValue(chr);
const contribution = (ROLL[n] * value) % PRIME
hash = (hash - contribution + PRIME) % PRIME;
left++;
}
// Ans zone
if (right - left === n - 1) {
if (hash === target) {
// handle hash collision
if (matching(haystack, left, needle)) {
return left;
}
}
}
}
return -1;
}
/**
* @param {string} haystack
* @param {string} needle
* @return {number}
*/
var strStr = function (haystack, needle) {
return slidingWindowUsingRobinKarp(haystack, needle);
};

1044. Longest Duplicate Substring

https://leetcode.com/problems/longest-duplicate-substring/description/

Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.

Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".

Example 1:

Input: s = "banana"
Output: "ana"

Approach

We have two sub problem here

  1. Find all the possible substring size k : We can use binary search to find out the possible window size.
  2. For each k size substring, find out if there is any duplicate: We can use Robin Karp to find out the duplicate.

What is problem with sliding window to use Set<string> to capture the seen substring before as below (instead of using Robin-Karp)?

const isDuplicate = (s, size) => {
const n = s.length;
let left = 0;
const set = new Set();
for (let right = 0; right < n; right++) {
// violation zone
if (right - left >= size) {
left++;
}
// ans zone
if (right - left === size - 1) {
const sub = s.substring(left, right + 1);
if (set.has(sub)) { // Duplicate check
return sub;
}
set.add(sub); // Update set
}
}
return '';
}

How to store seen substring in map?

Instead of string entire substring in the map , we can store the beginning index. Since we know the size of the substring therefore only on matching the hash we need to calculate the collision.

Why there is need to cache the roll calculation?

Everytime we run the sliding window on size , we need to calculate 26^size . Instead of keep calculating it, we can cache it and reuse.

Following is code for reference.

const PRIME = Math.pow(10, 6) + 1;
const getCharIndex = chr => chr.charCodeAt(0) - 'a'.charCodeAt(0);
let ROLL = null;
const getRoll = n => {
const roll = new Array(n);
roll[0] = 1;
for (let i = 1; i < n; i++) {
roll[i] = (roll[i - 1] * 26) % PRIME;
}
return roll;
}
const isDuplicateRobinKarp = (s, size) => {
const n = s.length;
let left = 0;
const map = new Map();
let hash = 0;
for (let right = 0; right < n; right++) {
// Process char to add into hash
const chr = s[right];
const value = getCharIndex(chr);
// Shift left then add the current value (use 26 base number system)
hash = ((hash * 26) % PRIME + value) % PRIME;
// violation
if (right - left >= size) {
const chr = s[left];
const value = getCharIndex(chr);
// find contribution of left char and discard it
const contribution = (value * ROLL[size]) % PRIME;
hash = (hash - contribution + PRIME) % PRIME;
left++;
}
// ans zone
if (right - left === size - 1) {
// If hash for current window found in map then it would be a duplicate
if (map.has(hash)) {
// Due to hash collision, two different substring might has same hash
// Therefore need to compare each to re-confirm the equality
const sub = s.substring(left, left + size);
for (const begin of map.get(hash)) {
const candidate = s.substring(begin, begin + size);
if (candidate === sub) {
return sub;
}
}
} else {
map.set(hash, []);
}
// Update hash and substring mapping
map.get(hash).push(left);
}
}
return '';
}
/**
* @param {string} s
* @return {string}
*/
var longestDupSubstring = function (s) {
const n = s.length;
let start = 1;
let end = n;
let ans = '';
ROLL = getRoll(n);
// Binary search to try all possible window size
while (start <= end) {
const mid = start + Math.floor((end - start) / 2);
const duplicate = isDuplicateRobinKarp(s, mid);
if (duplicate) {
ans = duplicate;
// move right to try higher size
start = mid + 1;
} else {
// move to left to try lower size
end = mid - 1;
}
}
return ans;
};

1062. Longest Repeating Substring

https://leetcode.com/problems/longest-repeating-substring/description/

Given a string s, return the length of the longest repeating substrings. If no repeating substring exists, return 0.

Example 1:

Input: s = "aabcaabdaab"
Output: 3
Explanation: The longest repeating substring is "aab", which occurs 3 times.

Approach

This problem is exactly same as previous. We can simply reuse same approach to return the length (instead of string value) as below.

const PRIME = Math.pow(10, 6) + 1;
const getCharIndex = chr => chr.charCodeAt(0) - 'a'.charCodeAt(0);
let ROLL = null;
const getRoll = n => {
const roll = new Array(n);
roll[0] = 1;
for (let i = 1; i < n; i++) {
roll[i] = (roll[i - 1] * 26) % PRIME;
}
return roll;
}
const isDuplicateRobinKarp = (s, size) => {
const n = s.length;
let left = 0;
const map = new Map();
let hash = 0;
for (let right = 0; right < n; right++) {
// Process char to add into hash
const chr = s[right];
const value = getCharIndex(chr);
// Shift left then add the current value (use 26 base number system)
hash = ((hash * 26) % PRIME + value) % PRIME;

// violation
if (right - left >= size) {
const chr = s[left];
const value = getCharIndex(chr);
// find contribution of left char and discard it
const contribution = (value * ROLL[size]) % PRIME;
hash = (hash - contribution + PRIME) % PRIME;
left++;
}
// ans zone
if (right - left === size - 1) {
// If hash for current window found in map then it would be a duplicate
if (map.has(hash)) {
// Due to hash collision, two different substring might has same hash
// Therefore need to compare each to re-confirm the equality
const sub = s.substring(left, left + size);
for (const begin of map.get(hash)) {
const candidate = s.substring(begin, begin + size);
if (candidate === sub) {
return size;
}
}
} else {
map.set(hash, []);
}
// Update hash and substring mapping
map.get(hash).push(left);
}
}
return -1;
}
/**
* @param {string} S
* @return {string}
*/
var longestRepeatingSubstring = function(S) {
const n = S.length;
ROLL = getRoll(n);
let start = 1;
let end = n;
while(start <= end) {
const mid = start + Math.floor((end - start)/2);
// check if substring of size mid exist as duplicate
const index = isDuplicateRobinKarp(S, mid)
if(index !== -1) {
// try next higher size
start = mid + 1;
} else {
// try lower size
end = mid - 1;
}
}
// get index
return end;
};

Happy learning :-)

--

--

Dilip Kumar
Dilip Kumar

Written by Dilip Kumar

With 18+ years of experience as a software engineer. Enjoy teaching, writing, leading team. Last 4+ years, working at Google as a backend Software Engineer.

No responses yet