Spelling Suggestion on the JVM #3: Implementing Norvig’s Algo in Kotlin
This tutorial project implements a basic spelling suggestion service. The motivation is didactic: becoming familiar with “better java” languages around a simple but instructional example. Ease of implementation outranks optimal performance so readers can focus on the JVM languages as such. Examples and new concepts are introduced in Java 9 first so they’re immediately understandable to the experienced Java programmer. Crucially, the Java implementation is accompanied by equivalent (but idiomatic) implementations in today’s most relevant alternative JVM languages: Kotlin, Scala and Xtend. Code is available at Spellbound_’s Github repo.
This third post presents an idiomatic Kotlin implementation of Norvig’s spelling corrector.
The next post (Implementing Norvig’s Algo in Scala, in the works) presents an idiomatic Scala implementation of Norvig’s spelling corrector.
The previous post (Implementing Norvig’s Algo in Java) illustrates how to implement Norvig’s spelling corrector in Java 9 following a functional style.
The Dictionary
Everything having to do with spelling correction revolves around a curated list of valid words we refer to as the the dictionary.
In our dictionary implementation we associate every word with an integer rank indicating how frequently the word is used in relation to all other words. The lower the rank, the more frequently used the word is. Thus, for instance, the has rank 1
while triose has rank 106295
.
The dictionary declaration is:
Here, we make use of Kotlin’s primary constructor where a class instance is created from one or more values. Kotlin encourages using immutable values; this is reflected above by declaring the dictionary
to be an immutable val
.
In addition to declaring the dictionary
field, this primary constructor also makes it inaccessible from the outside world by means of private
. Unlike Java, Kotlin's general default visibility modifier is public
; this is very convenient since we strive to make all values immutable.
Transforming the typo into a set of word splits
In addition to good ole’ classes (like the above SpellingCorrector
) Kotlin also supports the notion of object
: a singleton-like instance that may or may not be associated with a corresponding class or interface. In other words, objects may implement interfaces or extend classes as needed but they can also stand on their own.
Importantly, objects can declare their own inner classes, values, variables, functions, etc.
In our case, we create an object called Edits
that contains:
- A data class
WordSplit
for storing left/right word splits - A function to generate word splits given a (typo) word
- A list of lambdas transforming a set of word splits into a set of candidate words
On the surface our Edits
Kotlin object looks like:
Word Splits
In order to hold word splits we define an inner data class WordSplit as:
Kotlin data classes don’t require getters/setters for their properties (which must all be declared in the data class’s primary constructor as immutable values). A data class also implements sensible equals
, hashCode
and toString
implementations for free!
Armed with this definition we can now formulate the rather trivial implementation of wordSplits()
:
Here we have an instance of an expression function using no curly braces to surround the function body. Instead, we use the =
sign to separate the function's declaration from its one-expression body. This body creates a collection from the range 0
to word.length
(inclusive). Note, en passant, that we don't need to use empty parens after length
in the expression word.length
; in Kotlin, it is a property rather than a function.
The range is, in effect, a collection of integers that can be mapped over to apply a lambda. In Kotlin, lambdas must be surrounded by curly braces (unlike Java, parenthesis just don’t cut it).
One-argument lambdas can use the implicitly defined value it
to refer to their sole argument. Thus, above we use it
to refer to each subsequent value in the range. In this usage it
has type Int
and is used as an index to the substring
function.
The ALL_EDITS
list
Even though we could use vanilla functions for each of Norvig’s four edits we purposefully choose to implement them as elements of a list of anonymous functions:
Note the list’s generic type: (Iterable<WordSplit>) -> Iterable<String>
which can be read roughly as: given a sequence of WordSplit
s generate a sequence of String
s.
By storing functions as data in a list we can apply them in a generic fashion to any given collection of word splits:
Here, we actually invoke each edit function by passing the wordSplits
argument to it
, cool!
The split()
method takes a string (the typo) and returns all possible pairs of word splits. Thus, for our beloved but misspelt hero dlbert we get the equivalent of:
By the way, all four edit implementations take an Iterable<WordSplit>
argument while returning an Iterable<String>
.
IntRange(startInclusive: Int, endInclusive: Int)
returns a subtype of Iterable<Int>
. In our example above this means all integers between 0
and 6
.`
Subsequently, each successive integer value is used as a substring index for splitting the word into a pair of left
and right
fragments.
Edits #1: deletes
Generating all possible deletes for a word is quite simple:
Here, we traverse an input Iterable<WordPair>
selecting only the pairs where right
is not empty (which is to say to all splits but the very first one). Next, we concatenate each word split into a string formed by the left
part followed by the "tail" of the right
one.

Given the typo waly this deletes()
anonymous implementation yields the equivalent of:
Edits #2: inserts
Inserts are the opposite of deletes and while their implementation is correspondingly simple:
This inserts()
edit uses a LETTERS
value of type Iterable<String>
. Again for the sake of simplicity we deal only with lowercase ASCII characters. There is no support for diacritics (whose implementation is not so complicated anyways). LETTERS
is defined in the Edits
object as:
CharRange
is a subtype ofIterable<String>
.

For our canine friend dgbert the above anonymous implementation of inserts()
yields the equivalent of:
Edits #3: transposes
Implementing transposition requires the right
fragment to have more than one character in length, which makes sense as transposing involves not one but two letters:

For our heroine alice the above anonymous implementation of transposes
yields the equivalent of:
Edits #4: replaces
The replace
edit is very prolific requiring the substitution of every letter in the typo for each letter in the alphabet. It also requires the right segment not to be empty (as something needs to be replaced). Because we're nesting the processing of each letter inside the processing of every typo character we need to apply flatMap
rathen than map
:

For our little friend asok the above replaces()
implementation yields the equivalent of:
edit1
: Applying All Edits Hoping for Valid Words
To cover all our bases we need to run and then concatenate all four edits. For this we define, in the Edits
object, an Iterable
as follows:
Note the type of ALL_EDITS
: it's a List<...>
rather than an Iterable<...>
. This is done so as to enable us to reference each edit by index on ALL_EDITS
:
The grouping of edits in a List<(Iterable<WordSplit>) -> Iterable<String>>
is an instance of a common functional pattern where code is treated as data. Here, we have a list of functions mapping from Iterable<WordSplit>
to Iterable<String>
(which is, precisely, the type of transformation all of our four edits implement). In this context Kotlin anonymous functions shine as they, effectively, provide us with a data-centric view of executable code.
We make use of this list of functions to implement edit1
as follows:
Because each edit returns a nested
Iterable<String>
we need to useflatMap
instead ofmap
.
Because the four edits are independent of one another we can (and, indeed, should) parallelize their execution.
On the bright side, Kotlin doesn’t requires us to “take the stream()
" of a collection as Java does. Functional functions such as filter
, map
and flatMap
can be invoked directly on all subtypes of Iterable
.
On the not-so-bright side, Kotlin doesn’t directly provide the equivalent of Java’s Stream.parallelStream()
method.
We can, however, take advantage of Kotlin’s (still experimental) coroutines to enable concurrent execution of lambdas passed to flatMap
(and potentially any other functional operation including map
and filter
):
Note the heading
Iterable<A>
type prepended topFlatMap
. This is an instance of extension functions described below.
Here, each edit (represented here by the mapper
function) is run on a separate thread and the overall pFlatMap()
function waits for all of them to complete before returning itself.
Given this function we can now parallelize edit execution as in:
What is this trailing pack()
function at the end? It doesn’t appear on Kotlin’s Iterable
documentation…
As it happens, edit1
is not yet done when all edits have been concatenated.
Once edit1
generates all possible edits we need to screen them for duplicates as well as for actual appearance in the dictionary. We also need to order the resulting valid words so the most frequently used ones are shown first. This is all achieved by function pack()
as follows:
Note here, again, the pack()
function is preprended by Iterable<String>.
What does this mean?
Kotlin provides a very powerful construct called extension functions. Extension functions enable developers to augment the repertoire of functions exposed by any type as if they had been declared in their original implementation.
This is why we can append a trailing pack(dictionary)
to the collection transformation embodied in edit1
(and edit2
below). Note the body of pack()
above uses this
to refer to its target augmented type.
Coincidentally, pFlatMap
is also an extension function used to type-safely augment the list of functions exposed by Iterable
.
edit2
: When edit1
Falters
Despite generating lots of possible suggestions, edit1
can fail to yield a dictionary word In this case we assume our typo stems not from one but two transcription mistakes.
edit2
is quite simply defined in terms of edit1
as follows:
Here we see our friend pack()
in action once again to ensure word uniqueness, validity and order.
Again, we use flatMap
(rather than map
) because we generate candidate suggestions from the nested edit1
results which are themselves of type Iterable<String>
.
Putting it All Together
We’re almost there now. Our workhorse method getCorrections()
method looks as follows:
Note the return type is Iterable<String>?
. What on Earth is that trailing ?
sign?
Unlike other JVM languages like Java itself, Scala or Xtend, Kotlin does not try to “fix” null
(Hoare's billion-dollar mistake).
Rather, Kotlin embraces the fact that null
is a non-replaceable part of the JVM definition and provides what it calls null
safety: a controlled way to make use of null
to indicate the absence of value that is simpler and more intuitive than Java's (and Guava's) Optional<>
or Scala's Option[]
.
Rules are very simple: by default you cannot return null with impunity. If you do intend to return null
as an indication of value absence you must explicitly annotate your function's signature with a trailing question mark sign (Iterable<String>?
above).
In our case, we need to express the absence of suggestions not because the typo doesn’t match anything known but, on the contrary, because it was handed a valid dictionary word for which the notion of suggestion simply doesn’t make sense. In this scenario we return null
.
It could be argued that returning emptyList()
would be appropriate to indicate such inapplicability.
In reality, however, it is perfectly possible for a word absent from the dictionary (.i.e, a typo) not to resemble any known word. In this case we still need to make it explicit we’re dealing with a typo. Thus, emptyList()
is not semantically equivalent to null
.
We there yet?
Well, almost ;-)
One last interesting bit is Kotlin’s companion object concept where an object implicitly named as its containing class “accompanies” the class in a similar fashion to Java’s static
fields and methods. In Kotlin, by the way, there is no concept of static
class members. If you need to achieve static
functionality then the class's companion object is the way to go as both the class and the companion object's internals are visible to each other.
For our SpellingCorrector
class its companion object looks like:
Final words about Kotlin
The SpellingCorrector
and Edits
Kotlin implementations add up to 200 lines including comments and empty lines. Their neutron star-dense incarnation comes down to only 75 lines!
Compare that to Norvig’s feat of packing the whole thing in 44 lines of (commented) Python code and the horror of Java taking up over 300 lines of code (120 in neutron star form).
Java’s verbosity is precisely one of the most important reason alternative JVM languages have emerged. In the next 2 posts we’ll see how the Scala and Xtend fare in terms of economy of expression and clarity of intent. So far Kotlin is an uncontested winner over Java.
Next in our plate: Implementing Norvig’s Algo in Scala (in the works).