Scala algorithm: Remove duplicates from an unsorted List
Published
Algorithm goal
Remove duplicates from an unsorted list; for example, [1,2,1,2,3] becomes [1,2,3] because the 1 can only be repeated once, the 2 can only be repeated once, and the 3 is only repeated once. Use cases here are especially in streaming and processing idempotent data, such as that in unreliable networks where packets or messages may be sent multiple times.
Test cases in Scala
assert(removeDuplicatesUnsortedList(Nil) == Nil)
assert(removeDuplicatesUnsortedList(List(1)) == List(1))
assert(removeDuplicatesUnsortedList(List(1, 2)) == List(1, 2))
assert(removeDuplicatesUnsortedList(List(1, 2, 1)) == List(1, 2))
assert(removeDuplicatesUnsortedList(List(1, 2, 1, 2)) == List(1, 2))
assert(removeDuplicatesUnsortedList(List(1, 2, 1, 2, 3)) == List(1, 2, 3))
assert(
removeDuplicatesUnsortedList(List(1, 2, 1, 2, 3, 1)) == List(1, 2, 3)
)
Algorithm in Scala
17 lines of Scala (compatible versions 2.13 & 3.0), showing how concise Scala can be!
Explanation
Using a scanLeft and scanRight, we can do what would typically be a while-loop, however we do it in a declarative way where we check if the element has been seen, and if not, then we emit it as a Some(); otherwise we emit a None to say no value has been emmitted.
This is very close to a State machine, and in fact can be extracted out as such. See: RemoveDuplicatesFromSortedListStateMachine, ParenthesesFoldingStateMachine. (this is © from www.scala-algorithms.com)
Scala concepts & Hints
Collect
'collect' allows you to use Pattern Matching, to filter and map items.
Lazy List
The 'LazyList' type (previously known as 'Stream' in Scala) is used to describe a potentially infinite list that evaluates only when necessary ('lazily').
Option Type
The 'Option' type is used to describe a computation that either has a result or does not. In Scala, you can 'chain' Option processing, combine with lists and other data structures. For example, you can also turn a pattern-match into a function that return an Option, and vice-versa!
Pattern Matching
Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.
scanLeft and scanRight
Scala's `scan` functions enable you to do folds like foldLeft and foldRight, while collecting the intermediate results
Stack Safety
Stack safety is present where a function cannot crash due to overflowing the limit of number of recursive calls.
This function will work for n = 5, but will not work for n = 2000 (crash with java.lang.StackOverflowError) - however there is a way to fix it :-)
In Scala Algorithms, we try to write the algorithms in a stack-safe way, where possible, so that when you use the algorithms, they will not crash on large inputs. However, stack-safe implementations are often more complex, and in some cases, overly complex, for the task at hand.