This week’s featured Interview Practice challenge was groupsOfAnagrams. It’s a Twitter interview question, and they’re known for their difficult technical interviews! If you haven’t solved it yet, go ahead and do that first, then head back here and we’ll discuss how to get to a solution with a good runtime.
…Done? Okay, let’s dig into this coding challenge!
The technical interview challenge
You are given a list of n words. Two words are grouped together if they are anagrams of each other. For this interview question, our task is to determine how many groups of anagrams there are.
For example, if the list words = [listen, admirer, tea, eta, eat, ate, silent, tinsel, enlist, codesignal, married]
, we should have groupsOfAnagrams(words)
return 4
because there are 4 groups:
listen
,silent
,tinsel
,enlist
are all anagrams of each other.
tea
,eta
,eat
,ate
are all anagrams of each other. 
married
andadmirer
are anagrams of each other. 
codesignal
is not an anagram of anything inwords
except itself.
In the challenge, we are guaranteed that each word only consists of lowercase letters. So we don’t need to worry about the rules regarding spaces and punctuation. Each word is guaranteed to be at most 10
letters. The hard part is that we have to do this on lists containing up to 10^5
words!
First attempt: Determining if two words are anagrams
There are a few standard ways to determine whether or not words are anagrams of each other. One way is to go through each letter of the alphabet, and check that it appears the same number of times in each word.
alphabet = string.lowercase
def areAnagrams(word1, word2):
for char in alphabet:
if word1.count(c) != word2.count(c):
return False
return True
What is the runtime of this algorithm? Suppose a typical word has L characters. In calling word.count(char)
, we are asking the computer to scan word
to count the number of times character char
appears. This is an O(L) operation. Technically, this makes areAnagrams
an O(L) algorithm because we do this count operation for each letter of the alphabet, which doesn’t depend on the size of the word. However, big O notation suppresses the really bad constants hiding in the runtime! Consider that our words have at most 10 letters, and some may be repeats. That means that of our 26 iterations of the loop, at least 16 of them are wasted looking for letters that don’t appear. If we call the number of letters in the alphabet A, then the runtime of this algorithm is O(AL).
Asymptotic, or “big O”, notation misses constants. Analysis such as treating the size of the alphabet as a parameter is useful to estimate the size of the constants. This is particularly true in this case, where L is guaranteed to be less than the size of the alphabet A.
If the interviewer questions why you’re doing this (after all, English is almost certainly going to have 26 lowercase letters for our lifetimes!) you have the chance to impress them by saying that you’re considering what happens if punctuation is allowed, or if the company decides to look at large unicode character sets. This shows that you’re forward thinking, and have design on your mind – two things that it’s important to emphasize in technical interviews.
Basically, if using a constraint in the problem helps you significantly reduce the problem, feel free to use it. If you need time while you’re working on your solution, or are doing something odd like considering the number of letters in the alphabet as a variable, you can always defend your move as considering the design implications of the constraint!
We can reduce this runtime of our solution significantly by going through each letter in word1
one at a time. This is at most 10 letters, rather than 26. Here’s one way of implementing it:
# A faster anagram checker
def areAnagrams(word1, word2):
cntLetters = {}
# Count occurrences in word1 as positive
for char in word1:
cntLetters[char] = cntLetters.get(char,0) + 1
# Remove occurrences in word2
for char in word2:
if char not in cntLetters:
return False
cntLetters[char] = cntLetters[char]  1
# cntLetters should only contain 0s, because
# each letter should occur the same number of
# times.
for diff_in_count in cntLetters.values():
if diff_in_count != 0:
return False
# All the letters occurred the same number of times
return True
While the code is longer, and it’s still O(L), it is no longer O(AL). Note that we are no longer assuming the strings are only made of lowercase letters!
Approach 1: Works, but too slow…
We have an anagram checker that will tell us if any pair of words are anagrams.
# This is too slow :(
# METHOD A
def groupsOfAnagrams(words):
# make a copy of words, so user doesn't
# lose their words
tmp_words = words[:]
groups = 0
while len(tmp_words) > 0:
# If the word survived this far, it isn't part
# of any existing group
new_word = tmp_words.pop()
groups += 1
# remove all words that are anagrams of new word
tmp_words = [w for w in tmp_words if areAnagrams(w,new_word) == False]
return groups
What is the runtime here? We run through the loop O(N) times, where N is the number of words. If we are lucky, we remove a lot of words early. However, if each word is not an anagram of any other word, the while loop runs exactly N times. Inside the loop, we scan the remaining words for anagrams. So this is an O(N^2) algorithm. Since we can have up to 10^5
words in a list, O(N^2) probably won’t work. Sure enough, this code doesn’t pass the tests on CodeSignal – and it’s not an answer that will knock the interviewer’s socks off during a technical interview. We can do better!
Approach 2: Precomputing invariants
We are going to find an invariant that all the words that are anagrams share with each other, but don’t share with any words they are not anagrams of. We will use these as keys in hash tables with dummy values. At the end, we will simply count the number of keys in the hash table.
What are the invariants that we can use? A simple invariant is the number of a
s, the number of b
s, …, etc. Any two words that are anagrams will have the same invariant, and any two words that aren’t anagrams will have different invariants.
# METHOD B
alphabet = string.lowercase
# Our first invariant is a tuple of tuples with the number
# of letters that make an appearance, e.g.
# word = "worddd" > (('d',3), ('o',1),('r',1'),('w',1))
def calcInvariant(word):
cntLetter = [0]*len(alphabet)
for letter in word:
# Add one more occurence of letter
cntLetter[ord(letter)  ord('a')] += 1
invariant = tuple([(letter, cntLetter[index]) for index,letter in enumerate(alphabet) if cntLetter[index] > 0])
return invariant
For each word, this is an O(AL) process, as we are iterating through both the letters in the word and in the alphabet. The gain is going to occur because we only calculate it once per word.
Our algorithm is then:
# METHOD B & C
def groupsOfAnagrams(words):
hash_of_invariants = {}
for w in words:
invariant = calcInvariant(w)
# we don't actually care about the value,
# instead we just care how many distinct
# invariants there are.
hash_of_invariants[invariant] = 1
# return the number of invariants
return len(hash_of_invariants)
Because calcInvariant
is O(AL), where L is the number of letters in the word, we see this algorithm has time complexity O(NAL). Note that we never compare invariants. Instead, we use the magic of hash tables to set the value of 1
to the appropriate entry. If that key already exists, we overwrite the value. If it doesn’t exist, a new value is created. It was the comparison between keys that gave us our problematic O(N^2).
A cleaner, faster, invariant
There is another natural invariant for the words: sorting the letters that occur in the words. We could use:
# METHOD C
# calcInvariant('eat') = 'aet'
# calcInvariant('ate') = 'aet'
# calcInvariant('apple')='aelpp'
def calcInvariant(word):
return tuple(sorted(word))
This has the advantage of being easy to understand, and not making any assumptions about the underlying algorithm. It is a sorting algorithm, so it is O(L log L) instead of O(AL). (Technically we can’t compare algorithms using big O notation like this, because we don’t know the constants involved.) The sorted string can then be used for the invariant. In fact, the code for groupsOfAnagrams
doesn’t have to change at all!
This is faster because our sorting algorithm on small words is much faster than our O(AL) dictionary. If the words are long (e.g. we are comparing novels) then our compression of the string to count the number of times the characters are repeated would win. When accessing the hash table hash_of_invariants
we need to use the hash function, which depends on the length of the keys. Since L
< 10, we know that each key is at most 10 characters long, so the sorting invariant is elegant and understandable.
Dictionaries where we don’t care about the value: Sets
In our hash table, we didn’t really care about the values assigned to the keys. We were only interested in the number of distinct keys we had, as that told us how many groups of anagrams we had.
There is nothing wrong with this, except that someone reading your code may wonder what the significance of the 1
is when you wrote:
...
hash_of_invariants[invariant] = 1
...
The comments certainly help. But Python has a builtin data structure for this case called set. If we made a set of invariants instead, we would simply have:
...
set_of_invariants.add(invariant)
...
There is not a visible dummy value that might confuse future maintainers of your code. Under the hood, the Python set is implemented as a dictionary (hash table) with hidden dummy values with a few optimizations, so the real reason for doing this is readability. Because hash tables are used more frequently than sets, there is a valid argument that the hash table is more readable for more programmers.
The final solution
Here is our final, compact solution! Any interviewer would be happy to see this solution during a technical interview.
# METHOD D: using sets
def groupsOfAnagrams(words):
invariants = ["".join(sorted(w)) for w in words]
# make a set out of invariants
# (i.e. make a dummy hash table with keys being the entry in
# invariants, and a hidden dummy variable)
unique_invariants = set(invariants)
return len(unique_invariants)
Timing all the methods
In an interview, you always have to consider the runtime. As you could see, a lot of the effort in creating a solution for this interview question was finding an algorithm that was quick enough! I timed the algorithms for two cases:
 Our first example,
words = [listen, admirer, tea, eta, eat, ate, silent, tinsel, enlist, codesignal, married]

A pathological example, where each word was its own group of 10 letters, starting with
abcdefghij
,abcdefghik
, …, toabgilnoz
(this is a total of10^5
words)
I didn’t include the raw times, as that tells you more about my computer than the algorithm. Instead I normalized to the time for Method D on the list words
(i.e. Method D has a time of 1 by definition).
Method  Description  Time for words (compared to D on words ) 
Time for 10^5 words (compared to D on words ) 

A  O(N^2) method  5.4  Too Long to Time 
B  Using precalculated tuple invariant for hash table  6.4  70300 
C  Using sorted words for hash tables  1.2  10450 
D  Using sorted words for sets  1.0  7000 
Tell us…
Did your solution for this Interview Practice challenge look different than mine? How did you approach the problem? Let us know over on the CodeSignal forum!