Scala algorithm: Length of the longest common substring
Published
Algorithm goal
The longest common substring is shared between two Strings. For example: 'XYZzz' and 'ddXYZdd' has common substring 'XYZ', which is of length 3.
Test cases in Scala
assert(longestCommonSubstringLength("XYZ", "XYZ") == 3)
assert(longestCommonSubstringLength("XYZd", "XYZ") == 3)
assert(longestCommonSubstringLength("XYZdd", "XYZ") == 3)
assert(longestCommonSubstringLength("ddXYZdd", "XYZ") == 3)
assert(longestCommonSubstringLength("zzXYZzz", "ddXYZ") == 3)
assert(longestCommonSubstringLength("zzXYZzz", "ddXYZdd") == 3)
assert(longestCommonSubstringLength("XYZzz", "ddXYZdd") == 3)
assert(longestCommonSubstringLength("XYZ", "ddXYZdd") == 3)
assert(longestCommonSubstringLength("zzXYZdd", "ddXYZ") == 3)
Algorithm in Scala
15 lines of Scala (compatible versions 2.13 & 3.0), showing how concise Scala can be!
Explanation
The type of mathematical deduction or proof we can deduce here is similar to LongestIncreasingSubSequenceLength:
Consider \(l(f, s)\) being the length of common sub-string ending at position \(f\) of the first string, and position \(s\) of the second string. (this is © from www.scala-algorithms.com)
Then, the next longest sub-string is \(l(f + 1, s + 1)\), which has a 1 added to it if the characters \(f+1\) of the first string and \(s+1\) of the second are equal.
If they are not equal, then \(l(f + 1, s + 1)\) is \(0\).
Scala concepts & Hints
Pattern Matching
Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.
scanLeft and scanRight
Scala's `scan` functions enable you to do folds like foldLeft and foldRight, while collecting the intermediate results
Stack Safety
Stack safety is present where a function cannot crash due to overflowing the limit of number of recursive calls.
This function will work for n = 5, but will not work for n = 2000 (crash with java.lang.StackOverflowError) - however there is a way to fix it :-)
In Scala Algorithms, we try to write the algorithms in a stack-safe way, where possible, so that when you use the algorithms, they will not crash on large inputs. However, stack-safe implementations are often more complex, and in some cases, overly complex, for the task at hand.
View
The
.view
syntax creates a structure that mirrors another structure, until "forced" by an eager operation like .toList, .foreach, .forall, .count.Zip
'zip' allows you to combine two lists pair-wise (meaning turn a pair of lists, into a list of pairs)
It can be used over Arrays, Lists, Views, Iterators and other collections.