Scala algorithm: Remove duplicates from a sorted list (Sliding)
Published
Algorithm goal
In Streaming data, data may come in duplicated; it could be due to various factors such as duplicated data from sources and idempotency for redundancy; for consumption though we may need to deduplicate the data for at-most-once processing. Some deduplicators retain state of long-gone elements (in which case .distinct will suffice, but have a memory cost), but in this case we are looking at only consecutive duplicate elements.
This is an alternative approach to RemoveDuplicatesFromSortedListStateMachine
Test cases in Scala
assert(removeDuplicates(List.empty[Int]) == List.empty[Int])
assert(removeDuplicates(List(1)) == List(1))
assert(removeDuplicates(List(1, 1)) == List(1))
assert(removeDuplicates(List(1, 2)) == List(1, 2))
assert(removeDuplicates(List(1, 1, 2)) == List(1, 2))
assert(removeDuplicates(List(1, 1, 1, 2)) == List(1, 2))
assert(removeDuplicates(List(1, 1, 2, 3, 3)) == List(1, 2, 3))
assert(removeDuplicatesWithMarker(List(1, 1, 2, 3, 3)) == List(1, 2, 3))
Algorithm in Scala
23 lines of Scala (compatible versions 2.13 & 3.0), showing how concise Scala can be!
Get the full algorithm !
or
'Unlimited Scala Algorithms' gives you access to all the 100 published Scala Algorithms!
Upon purchase, you will be able to Register an account to access all the algorithms on multiple devices.
Explanation
The approach here is a little different to the State Machine approach:
The idea is that by running a sliding window, we can check pairs of elements for their equality, and if they are not equal, emit the newly found one. (this is © from www.scala-algorithms.com)
Full explanation is available to subscribers
Scala concepts & Hints
Collect
'collect' allows you to use Pattern Matching, to filter and map items.
Option Type
The 'Option' type is used to describe a computation that either has a result or does not. In Scala, you can 'chain' Option processing, combine with lists and other data structures. For example, you can also turn a pattern-match into a function that return an Option, and vice-versa!
Pattern Matching
Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.
Sliding / Sliding Window
Get fixed-length sliding sub-sequences (sliding windows) from another sequence