Spelling Suggestion on the JVM #3: Implementing Norvig’s Algo in Kotlin

9 min readMar 5, 2018

This tutorial project implements a basic spelling suggestion service. The motivation is didactic: becoming familiar with “better java” languages around a simple but instructional example. Ease of implementation outranks optimal performance so readers can focus on the JVM languages as such. Examples and new concepts are introduced in Java 9 first so they’re immediately understandable to the experienced Java programmer. Crucially, the Java implementation is accompanied by equivalent (but idiomatic) implementations in today’s most relevant alternative JVM languages: Kotlin, Scala and Xtend. Code is available at Spellbound_’s Github repo.

This third post presents an idiomatic Kotlin implementation of Norvig’s spelling corrector.

The next post (Implementing Norvig’s Algo in Scala, in the works) presents an idiomatic Scala implementation of Norvig’s spelling corrector.

The previous post (Implementing Norvig’s Algo in Java) illustrates how to implement Norvig’s spelling corrector in Java 9 following a functional style.

The Dictionary

Everything having to do with spelling correction revolves around a curated list of valid words we refer to as the the dictionary.

In our dictionary implementation we associate every word with an integer rank indicating how frequently the word is used in relation to all other words. The lower the rank, the more frequently used the word is. Thus, for instance, the has rank 1 while triose has rank 106295.

The dictionary declaration is:

Here, we make use of Kotlin’s primary constructor where a class instance is created from one or more values. Kotlin encourages using immutable values; this is reflected above by declaring the dictionary to be an immutable val.

In addition to declaring the dictionary field, this primary constructor also makes it inaccessible from the outside world by means of private. Unlike Java, Kotlin's general default visibility modifier is public; this is very convenient since we strive to make all values immutable.

Transforming the typo into a set of word splits

In addition to good ole’ classes (like the above SpellingCorrector) Kotlin also supports the notion of object: a singleton-like instance that may or may not be associated with a corresponding class or interface. In other words, objects may implement interfaces or extend classes as needed but they can also stand on their own.

Importantly, objects can declare their own inner classes, values, variables, functions, etc.

In our case, we create an object called Edits that contains:

A data class WordSplit for storing left/right word splits
A function to generate word splits given a (typo) word
A list of lambdas transforming a set of word splits into a set of candidate words

On the surface our Edits Kotlin object looks like:

Word Splits

In order to hold word splits we define an inner data class WordSplit as:

Kotlin data classes don’t require getters/setters for their properties (which must all be declared in the data class’s primary constructor as immutable values). A data class also implements sensible equals, hashCode and toString implementations for free!

Armed with this definition we can now formulate the rather trivial implementation of wordSplits():

Here we have an instance of an expression function using no curly braces to surround the function body. Instead, we use the = sign to separate the function's declaration from its one-expression body. This body creates a collection from the range 0 to word.length (inclusive). Note, en passant, that we don't need to use empty parens after length in the expression word.length; in Kotlin, it is a property rather than a function.

The range is, in effect, a collection of integers that can be mapped over to apply a lambda. In Kotlin, lambdas must be surrounded by curly braces (unlike Java, parenthesis just don’t cut it).

One-argument lambdas can use the implicitly defined value it to refer to their sole argument. Thus, above we use it to refer to each subsequent value in the range. In this usage it has type Int and is used as an index to the substring function.

The `ALL_EDITS` list

Even though we could use vanilla functions for each of Norvig’s four edits we purposefully choose to implement them as elements of a list of anonymous functions:

Note the list’s generic type: (Iterable<WordSplit>) -> Iterable<String> which can be read roughly as: given a sequence of WordSplits generate a sequence of Strings.

By storing functions as data in a list we can apply them in a generic fashion to any given collection of word splits:

Here, we actually invoke each edit function by passing the wordSplits argument to it, cool!

The split() method takes a string (the typo) and returns all possible pairs of word splits. Thus, for our beloved but misspelt hero dlbert we get the equivalent of:

By the way, all four edit implementations take an Iterable<WordSplit> argument while returning an Iterable<String>.

IntRange(startInclusive: Int, endInclusive: Int) returns a subtype of Iterable<Int>. In our example above this means all integers between 0 and 6.`

Subsequently, each successive integer value is used as a substring index for splitting the word into a pair of left and right fragments.

Edits #1: deletes

Generating all possible deletes for a word is quite simple:

Here, we traverse an input Iterable<WordPair> selecting only the pairs where right is not empty (which is to say to all splits but the very first one). Next, we concatenate each word split into a string formed by the left part followed by the "tail" of the right one.

Given the typo waly this deletes() anonymous implementation yields the equivalent of:

Edits #2: inserts

Inserts are the opposite of deletes and while their implementation is correspondingly simple:

This inserts() edit uses a LETTERS value of type Iterable<String>. Again for the sake of simplicity we deal only with lowercase ASCII characters. There is no support for diacritics (whose implementation is not so complicated anyways). LETTERS is defined in the Edits object as:

CharRange is a subtype of Iterable<String>.

For our canine friend dgbert the above anonymous implementation of inserts() yields the equivalent of:

Edits #3: transposes

Implementing transposition requires the right fragment to have more than one character in length, which makes sense as transposing involves not one but two letters:

For our heroine alice the above anonymous implementation of transposes yields the equivalent of:

Edits #4: replaces

The replace edit is very prolific requiring the substitution of every letter in the typo for each letter in the alphabet. It also requires the right segment not to be empty (as something needs to be replaced). Because we're nesting the processing of each letter inside the processing of every typo character we need to apply flatMap rathen than map:

For our little friend asok the above replaces() implementation yields the equivalent of:

`edit1`: Applying All Edits Hoping for Valid Words

To cover all our bases we need to run and then concatenate all four edits. For this we define, in the Edits object, an Iterable as follows:

Note the type of ALL_EDITS: it's a List<...> rather than an Iterable<...>. This is done so as to enable us to reference each edit by index on ALL_EDITS:

The grouping of edits in a List<(Iterable<WordSplit>) -> Iterable<String>> is an instance of a common functional pattern where code is treated as data. Here, we have a list of functions mapping from Iterable<WordSplit> to Iterable<String> (which is, precisely, the type of transformation all of our four edits implement). In this context Kotlin anonymous functions shine as they, effectively, provide us with a data-centric view of executable code.

We make use of this list of functions to implement edit1 as follows:

Because each edit returns a nested Iterable<String> we need to use flatMap instead of map.

Because the four edits are independent of one another we can (and, indeed, should) parallelize their execution.

On the bright side, Kotlin doesn’t requires us to “take the stream()" of a collection as Java does. Functional functions such as filter, map and flatMap can be invoked directly on all subtypes of Iterable.

On the not-so-bright side, Kotlin doesn’t directly provide the equivalent of Java’s Stream.parallelStream() method.

We can, however, take advantage of Kotlin’s (still experimental) coroutines to enable concurrent execution of lambdas passed to flatMap (and potentially any other functional operation including map and filter):

Note the heading Iterable<A> type prepended to pFlatMap. This is an instance of extension functions described below.

Here, each edit (represented here by the mapper function) is run on a separate thread and the overall pFlatMap() function waits for all of them to complete before returning itself.

Given this function we can now parallelize edit execution as in:

What is this trailing pack() function at the end? It doesn’t appear on Kotlin’s Iterable documentation…

As it happens, edit1 is not yet done when all edits have been concatenated.

Once edit1 generates all possible edits we need to screen them for duplicates as well as for actual appearance in the dictionary. We also need to order the resulting valid words so the most frequently used ones are shown first. This is all achieved by function pack() as follows:

Note here, again, the pack() function is preprended by Iterable<String>. What does this mean?

Kotlin provides a very powerful construct called extension functions. Extension functions enable developers to augment the repertoire of functions exposed by any type as if they had been declared in their original implementation.

This is why we can append a trailing pack(dictionary) to the collection transformation embodied in edit1 (and edit2 below). Note the body of pack() above uses this to refer to its target augmented type.

Coincidentally, pFlatMap is also an extension function used to type-safely augment the list of functions exposed by Iterable.

`edit2`: When `edit1` Falters

Despite generating lots of possible suggestions, edit1 can fail to yield a dictionary word In this case we assume our typo stems not from one but two transcription mistakes.

edit2 is quite simply defined in terms of edit1 as follows:

Here we see our friend pack() in action once again to ensure word uniqueness, validity and order.

Again, we use flatMap (rather than map) because we generate candidate suggestions from the nested edit1 results which are themselves of type Iterable<String>.

Putting it All Together

We’re almost there now. Our workhorse method getCorrections() method looks as follows:

Note the return type is Iterable<String>?. What on Earth is that trailing ? sign?

Unlike other JVM languages like Java itself, Scala or Xtend, Kotlin does not try to “fix” null (Hoare's billion-dollar mistake).

Rather, Kotlin embraces the fact that null is a non-replaceable part of the JVM definition and provides what it calls null safety: a controlled way to make use of null to indicate the absence of value that is simpler and more intuitive than Java's (and Guava's) Optional<> or Scala's Option[].

Rules are very simple: by default you cannot return null with impunity. If you do intend to return null as an indication of value absence you must explicitly annotate your function's signature with a trailing question mark sign (Iterable<String>? above).

In our case, we need to express the absence of suggestions not because the typo doesn’t match anything known but, on the contrary, because it was handed a valid dictionary word for which the notion of suggestion simply doesn’t make sense. In this scenario we return null.

It could be argued that returning emptyList() would be appropriate to indicate such inapplicability.

In reality, however, it is perfectly possible for a word absent from the dictionary (.i.e, a typo) not to resemble any known word. In this case we still need to make it explicit we’re dealing with a typo. Thus, emptyList() is not semantically equivalent to null.

We there yet?

Well, almost ;-)

One last interesting bit is Kotlin’s companion object concept where an object implicitly named as its containing class “accompanies” the class in a similar fashion to Java’s static fields and methods. In Kotlin, by the way, there is no concept of static class members. If you need to achieve static functionality then the class's companion object is the way to go as both the class and the companion object's internals are visible to each other.

For our SpellingCorrector class its companion object looks like:

Final words about Kotlin

The SpellingCorrector and Edits Kotlin implementations add up to 200 lines including comments and empty lines. Their neutron star-dense incarnation comes down to only 75 lines!

Compare that to Norvig’s feat of packing the whole thing in 44 lines of (commented) Python code and the horror of Java taking up over 300 lines of code (120 in neutron star form).

Java’s verbosity is precisely one of the most important reason alternative JVM languages have emerged. In the next 2 posts we’ll see how the Scala and Xtend fare in terms of economy of expression and clarity of intent. So far Kotlin is an uncontested winner over Java.

Next in our plate: Implementing Norvig’s Algo in Scala (in the works).

Spelling Suggestion on the JVM #3: Implementing Norvig’s Algo in Kotlin

The Dictionary

Transforming the typo into a set of word splits

Word Splits

The `ALL_EDITS` list

Edits #1: deletes

Edits #2: inserts

Edits #3: transposes

Edits #4: replaces

`edit1`: Applying All Edits Hoping for Valid Words

`edit2`: When `edit1` Falters

Putting it All Together

We there yet?

Final words about Kotlin

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ricardo Rocha

No responses yet

Spelling Suggestion on the JVM #3: Implementing Norvig’s Algo in Kotlin

The Dictionary

Transforming the typo into a set of word splits

Word Splits

The ALL_EDITS list

Edits #1: deletes

Edits #2: inserts

Edits #3: transposes

Edits #4: replaces

edit1: Applying All Edits Hoping for Valid Words

edit2: When edit1 Falters

Putting it All Together

We there yet?

Final words about Kotlin

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ricardo Rocha

No responses yet

The `ALL_EDITS` list

`edit1`: Applying All Edits Hoping for Valid Words

`edit2`: When `edit1` Falters