KMP Algorithm :Detailed Explanation

KMP Algorithm :Detailed Explanation

Understanding the KMP Algorithm: Efficient Pattern Matching

The Knuth-Morris-Pratt (KMP) algorithm is a powerful string matching technique that optimizes the search process by avoiding redundant comparisons. It is particularly useful when you need to find all occurrences of a "pattern" string in a "text" string, especially for large inputs.

In this blog, we’ll dive deep into how the KMP algorithm works, walk through its implementation, and explore practical use cases.


The Problem: Naive Pattern Matching

A naive approach to pattern matching involves comparing each substring of the text with the pattern. While simple, it can be inefficient, particularly when repeated comparisons occur after mismatches. For example:

Text:    ababababc
Pattern:    ababc

After failing at position 3, the naive algorithm restarts comparisons from the next character, even though we already know part of the substring matches.

This redundancy is what KMP avoids.


How the KMP Algorithm Solves This

The KMP algorithm eliminates redundant comparisons by precomputing a Longest Prefix Suffix (LPS) table. This table determines how far to "jump" in the pattern when a mismatch occurs.


The LPS Table: Key to Efficiency

The Longest Prefix Suffix (LPS) table tells us:

  • The length of the longest prefix of the pattern that is also a suffix for each substring of the pattern.

For example, consider the pattern "abcabcd":

Index iiSubstringLongest Prefix SuffixLPS Value
0aNone0
1abNone0
2abcNone0
3abcaa1
4abcabab2
5abcabcabc3
6abcabcdNone0

The LPS table for this pattern is: [0, 0, 0, 1, 2, 3, 0].


How KMP Works

  1. Preprocessing Phase:

    • Build the LPS array for the pattern.
  2. Matching Phase:

    • Compare the text and pattern.

    • On a mismatch, use the LPS array to determine how far to shift the pattern without rechecking already matched characters.


KMP Algorithm: Java Implementation

Here’s how you can implement the KMP algorithm:

Step 1: Build the LPS Table

private static int[] computeLPS(String pattern) {
    int m = pattern.length();
    int[] lps = new int[m];
    int length = 0; // Longest prefix suffix
    int i = 1;

    lps[0] = 0; // First element is always 0

    while (i < m) {
        if (pattern.charAt(i) == pattern.charAt(length)) {
            length++;
            lps[i] = length;
            i++;
        } else {
            if (length != 0) {
                length = lps[length - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }
    return lps;
}

Step 2: Search for the Pattern

public static void KMPSearch(String text, String pattern) {
    int n = text.length();
    int m = pattern.length();

    // Step 1: Compute the LPS Array
    int[] lps = computeLPS(pattern);

    int i = 0; // Pointer for text
    int j = 0; // Pointer for pattern

    while (i < n) {
        if (text.charAt(i) == pattern.charAt(j)) {
            i++;
            j++;
        }

        if (j == m) { // Pattern match found
            System.out.println("Pattern found at index: " + (i - j));
            j = lps[j - 1]; // Continue searching for further matches
        } else if (i < n && text.charAt(i) != pattern.charAt(j)) {
            if (j != 0) {
                j = lps[j - 1]; // Use LPS to skip comparisons
            } else {
                i++;
            }
        }
    }
}

Step 3: Run the Algorithm

public static void main(String[] args) {
    String text = "aaabcabcdabcabcabcd";
    String pattern = "abcabcd";

    KMPSearch(text, pattern);
}

Example Walkthrough

Input:

  • Text: "aaabcabcdabcabcabcd"

  • Pattern: "abcabcd"

Execution:

  1. Compute the LPS table for the pattern:

    • LPS: [0, 0, 0, 1, 2, 3, 0]
  2. Match the pattern in the text:

    • Start matching from index 3: "abcabcd" matches.

    • Skip redundant comparisons and find another match at index 12.

Output:

Pattern found at index: 3
Pattern found at index: 12

Complexity Analysis

  1. Preprocessing Phase:

    • Time Complexity: O(m)O(m), where mm is the length of the pattern.

    • Space Complexity: O(m)O(m), for the LPS table.

  2. Matching Phase:

    • Time Complexity: O(n)O(n), where nn is the length of the text.

Overall Time Complexity: O(n+m)O(n+m)


When to Use KMP

  • Text Processing: Searching for a substring in documents or logs.

  • Bioinformatics: Pattern searching in DNA sequences.

  • Search Engines: Finding keywords in large texts.

  • Code Editors: Implementing "Find" or "Search" functionality.


Why KMP is Better

The naive algorithm can have a worst-case complexity of O(n×m)O(n×m), while KMP guarantees linear performance relative to the size of the input. It is especially effective when repeated patterns or large texts are involved.


Conclusion

The KMP algorithm is a cornerstone of efficient string matching. By leveraging the LPS table, it eliminates unnecessary comparisons and optimizes performance. Whether you’re building search features in applications or analyzing large datasets, KMP is an indispensable tool.

Try implementing this in your projects and see its magic in action! 🚀