Scala algorithm: Run-length encoding (RLE) Decoder
Published
Algorithm goal
Run-length encoding is one of the most basic compression methods, which is especially useful where there long runs of a particular character or a group of characters. It could be particularly useful for example in column-based databases which have potentially many repeating values.
Run-length turns a sequence of characters into effectively a sequence of count-character pairs. For example, WWWAAW
becomes 3W2A1W
.
See RunLengthEncoding for the Encoder.
Test cases in Scala
assert(decodeString("1W") == "W")
assert(decodeString("3W") == "WWW")
assert(decodeString("3W1A1W") == "WWWAW")
assert(decodeString("3W1A2B2W") == "WWWABBWW")
assert(
decodeString("x3W1A2B2W") == "WWWABBWW",
"Invalid character in front is ignored"
)
assert(
decodeString("3W1A2B2") == "WWWABB",
"Invalid character at the back is ignored"
)
assert(
decodeString("13Y") == "YYYYYYYYYYYYY",
">10-strong RLE is emmitted correctly"
)
Algorithm in Scala
18 lines of Scala (compatible versions 2.13 & 3.0), showing how concise Scala can be!
Explanation
While this can be implemented in Tail Recursion, we will implement in a streaming style because:
- It is more complex to read and reason about
- It it is not possible to make it work in a streaming fashion (because who wants to run out of memory?).
In a functional/immutable style approach, we must think in terms of transformations , and in this particular algorithm, because we need to parse the input, we need to consider that either an input is a digit or a non-digit, then fuse any digits into numbers of occurrences (because we do wish to be able to handle inputs like 13Y
); and finally, combine digit-char pairs and emit them as a String
. (this is © from www.scala-algorithms.com)
Scala concepts & Hints
Collect
'collect' allows you to use Pattern Matching, to filter and map items.
Pattern Matching
Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.
scanLeft and scanRight
Scala's `scan` functions enable you to do folds like foldLeft and foldRight, while collecting the intermediate results
Sliding / Sliding Window
Get fixed-length sliding sub-sequences (sliding windows) from another sequence
Stack Safety
Stack safety is present where a function cannot crash due to overflowing the limit of number of recursive calls.
This function will work for n = 5, but will not work for n = 2000 (crash with java.lang.StackOverflowError) - however there is a way to fix it :-)
In Scala Algorithms, we try to write the algorithms in a stack-safe way, where possible, so that when you use the algorithms, they will not crash on large inputs. However, stack-safe implementations are often more complex, and in some cases, overly complex, for the task at hand.
View
The
.view
syntax creates a structure that mirrors another structure, until "forced" by an eager operation like .toList, .foreach, .forall, .count.