0
\$\begingroup\$

Below is the Rabin Karp implementation. I am not so sure about it, and there is no where that I can check whether this code really works or not. It will most helpful for extra set of eyes if whether this gives correct output or not. The code finds the substring (in code it is, pattern) from the string given (in code it is, text). If the substring is found it will return true, otherwise false.

public class RabinKarpSubstring {

    // Check if text contains pattern.
    public boolean isSubstring (String pattern, String text) {
        if (pattern.length() > text.length()) return false;

        int prime = 31;
        int MOD = 1000000009;
        long hashPattern = 0;
        long[] primePower = new long[text.length()];
        long[] hashText = new long[text.length() + 1];
        
        primePower[0] = 1;
        for (int i = 1; i < text.length(); i++) {
            primePower[i] = (primePower[i - 1] * prime) % MOD;
        }

        for (int i = 0; i < text.length(); i++) {
            hashText[i + 1] = (hashText[i] + ((text.charAt(i) - 'a' + 1) * primePower[i])) % MOD;
        }

        for (int i = 0; i < pattern.length(); i++) {
            hashPattern = (hashPattern + ((pattern.charAt(i) - 'a' + 1) * primePower[i])) % MOD;
        }

        for (int i = 0; i < (text.length() - pattern.length() + 1); i++) {
            long currHash = (hashText[i + pattern.length()] + MOD - hashText[i]) % MOD;
            if (currHash == (hashPattern * primePower[i] % MOD)) {
                return true;
            }
        }

        return false;
    }

    public static void main(String[] args) {
        System.out.println(new RabinKarpSubstring().isSubstring("aab", "aaaab"));
    }
}

Please do share if anymore information is to added. Thank you.

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Just a suggestion, add additional test to main that both pass and fail have expected values for the pass and fail and test some edge cases. \$\endgroup\$
    – pacmaninbw
    Commented Jul 11, 2021 at 14:59
  • \$\begingroup\$ @pacmaninbw Sure, I will do that. \$\endgroup\$
    – Leprachon
    Commented Jul 11, 2021 at 15:09

1 Answer 1

0
\$\begingroup\$

An important omission

Consider this snippet:

if (currHash == (hashPattern * primePower[i] % MOD)) {
    return true;
}

It returns true when the hashes match. But hashes are not unique, they cannot be, there are fewer of them than there are strings. So there should be a proper string comparison here, to distinguish false positives from true positives.

False positives are not only a problem for huge strings. As a specific example, using the hash you're using, "zdxudvc" and "gulbina" collide. Finding that was a nice puzzle.

To put it another way: Rabin-Karp reduces the expected number of string comparisons (relative to the naive substring search), but does not remove the need for them entirely.

Inefficient design

While you didn't put a tag, I will comment on this anyway, after all the point of Rabin-Karp is to be fast (otherwise you could use a naive substring search).

Building a large array of prime powers, as well as a large array of hashes, is unnecessary and inefficient. These can be computed on the fly. That saves not only the large allocations (larger than the input string!) but also, if a match is found, it saves computing these things for the part of the string after the match.

Using a prime for MOD is common, but not necessary. A sufficient condition is that MOD and prime are relatively prime. Not all such choices are good choices, for example prime has better not be 1, but there are many reasonable choices, some of which have advantages. For example, choosing MOD equal to 232 means that the modulo step is implicit and (for any power of two MOD) you can stop using long because premature wrapping is no longer a problem. There are various analyses of whether a prime MOD is good or not. A popular argument is that it causes fewer hash collisions. That may be true, but that's not really the question: the question is whether that's worth it compared to the cost of having non-trivial modulo reductions. I don't think it is, but try it, experiment.

\$\endgroup\$
1
  • \$\begingroup\$ Thank you so much, @harold. This answer really solved many of my doubts and I welcome the performance suggestion and will surely experiment. \$\endgroup\$
    – Leprachon
    Commented Jul 11, 2021 at 16:43

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.