Scoring Word Guesses with Semantic Similarity and spaCy

Introduction: Making Guesses Smarter

Welcome to the first lesson of our course, “Enhancing the Word Play Game with new functionalities” In this lesson, we will make our word prediction game more interactive and fair by adding a way to score player guesses. Instead of only rewarding exact matches, we want to give points for guesses that are close in meaning to the correct answer. This will make the game more fun and challenging, and it will feel more like how people actually use language.

By the end of this lesson, you’ll know how to compare two words for similarity and assign a score based on how close they are in meaning. This is a key step in building a smarter, more engaging game.

What Is Semantic Similarity?

Semantic similarity is a way to measure how close two words are in meaning. For example, car and automobile mean almost the same thing, so they are semantically similar. On the other hand, car and banana are not similar at all.

Here’s a simple table to show some examples:

Word 1	Word 2	Are they similar?
car	automobile	Yes
cat	kitten	Somewhat
car	banana	No

In our game, we want to reward players for making guesses that are close in meaning, not just exact matches. This makes the game fairer and more fun.

How Computers Compare Word Meanings

To compare word meanings, computers use something called word vectors or embeddings. You can think of a word vector as a list of numbers that represents the meaning of a word. Words with similar meanings have vectors that are close together.

For example, the word cat might be represented by a vector like [0.2, 0.5, 0.1, ...], and kitten might have a vector that is very close to it. The word banana would have a very different vector.

We can measure how close two vectors are using a mathematical formula called cosine similarity:

$\text{cos}(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}$

Building the Guess Scorer Function

Let’s build our guess scorer step by step.

Step 1: Import spaCy and Load the Model

First, we need to import spaCy and load a model that has word vectors. We’ll use the en_core_web_md model, which is good for English and includes medium-sized word vectors.

import spacy brings the spaCy library into our code.
nlp = spacy.load('en_core_web_md') loads the English model with word vectors.

Step 2: Process the Words

Next, we need to process the user’s guess and the correct word using spaCy. This turns each word into a Doc object, the class that spaCy uses to encode a sequence of tokens.

nlp(user_guess.lower()) processes the guess and makes it lowercase (to avoid case mismatches).
nlp(correct_word.lower()) does the same for the correct word.

Step 3: Calculate the Similarity

Now, we can use spaCy’s .similarity() method to compare the two objects.

This gives us a number between 0 and 1, where 1 means the words are identical in meaning, and 0 means they are not similar at all.

Step 4: Convert the Score to a Percentage

To make the score easier to understand, we multiply it by 100 to get a value between 0 and 100.

Step 5: Put It All Together

Here is the complete function:

Let’s see an example:

The function gives a higher score for words that are close in meaning.
The output values are just examples; your results may be slightly different depending on the spaCy model.

Summary And Practice Preview

In this lesson, you learned how to make your word prediction game smarter by scoring guesses based on their meaning, not just their spelling. We talked about semantic similarity, word vectors (embeddings), and how to use spaCy to compare words. You also saw how to build a function that gives a score from 0 to 100 for any two words.

This new scoring system will make your game more fun and fair, rewarding players for close guesses. In the next practice exercises, you’ll get hands-on experience using and testing this function. Get ready to see how your game can understand language just a little bit more like a human!

Next Lesson: Creating a Leaderboard for Your Word Prediction Game

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal