Comparing Model Predictions

Introduction

Welcome back to Building Probability Models! This is the fifth and final lesson of the course, and you have come a long way. You can now define valid probability models, build both uniform and non-uniform versions, and predict expected frequencies for any number of trials. Those skills let you say what a model expects to happen — but how do you know if the model is actually right? That is the question we tackle here. In this lesson, you will learn to place model predictions next to real-world data, decide whether the two are close enough to trust the model, and pinpoint why they might disagree. By the end, you will have a complete four-step evaluation workflow — compute, compare, judge, diagnose — that turns a probability model into a genuine decision-making tool.

Why Predictions and Data Will Never Match Exactly

100

Setting Up the Comparison

P(\text{outcome}) \times n

Outcome	Model Probability	Expected Frequency	Observed Frequency
Red	$0.333$	$0.333 \times 150 \approx 50$	$54$
Blue	$0.333$	$0.333 \times 150 \approx 50$	$47$
Green	$0.333$	$0.333 \times 150 \approx 50$	$49$

Judging Whether the Data Fits the Model

50

Outcome	Expected	Observed	Difference
Red	$50$	$54$	$+4$
Blue	$50$	$47$	$-3$
Green	$50$	$49$	$-1$

When Discrepancies Are Large: A Contrasting Example

P(\text{each slot}) = \frac{1}{3}

Slot	Expected Frequency	Observed Frequency	Difference
Slot A	$100$	$135$	$+35$
Slot B	$100$	$102$	$+2$
Slot C	$100$	$63$	$-37$

Sources of Discrepancy

When predictions and observations disagree, the natural follow-up is: why? There are four common explanations, and learning to distinguish among them is essential for evaluating any model. Natural random variation. Even a perfectly accurate model will not predict exact counts. Small differences are expected every time and do not indicate a problem. Too few trials. With a small sample, results can swing far from the model's predictions simply because the data has not had enough trials to stabilize. Collecting more data often resolves this kind of gap. Outdated or incorrect model assumptions. The model may have been built on old data or assumptions that no longer hold. In the snack machine example, perhaps Slot A was recently restocked with a trendy new item, breaking the "equally likely" assumption. Biased data collection. If data is gathered in a way that favors certain outcomes, the observed frequencies will not reflect the true process. For instance, recording snack purchases only during the morning shift might skew results if preferences change throughout the day. Recognizing which source is most plausible in a given situation is what turns a numerical comparison into a meaningful insight.

Evaluating a Bus Route Model: A Complete Example

0.10

Outcome	Probability
Early	$0.10$
On Time	$0.65$
Late	$0.25$

Outcome	Probability	Expected Frequency	Observed Frequency	Difference
Early	$0.10$	$0.10 \times 200 = 20$	$18$	$-2$
On Time	$0.65$	$0.65 \times 200 = 130$	$112$	$-18$
Late	$0.25$	$0.25 \times 200 = 50$	$70$	$+20$

Conclusion and Next Steps

In this final lesson of Building Probability Models, you learned a four-step workflow for evaluating a probability model against real data: compute expected frequencies, compare them to observed counts, judge whether the gaps are small enough to chalk up to randomness, and diagnose the most likely source when they are not. The four common sources of discrepancy — natural random variation, too few trials, flawed model assumptions, and biased data collection — give you a practical toolkit for explaining any mismatch. In more advanced statistics, formal tools such as hypothesis tests and confidence intervals can make these judgments more precise, but in this course we focus on clear comparisons between expected and observed counts. Up next, the practice exercises will walk you through every stage of this process. You will calculate expected frequencies, judge whether data fits a model, match scenarios to their most plausible source of discrepancy, and write a complete evaluation of a real-world model — so jump in and put your new skills to work!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal

Outcome	Model Probability	Expected Frequency	Observed Frequency
Red	$0.333$	$0.333 \times 150 \approx 50$	$54$
Blue	$0.333$	$0.333 \times 150 \approx 50$	$47$
Green	$0.333$	$0.333 \times 150 \approx 50$	$49$

Outcome

Model Probability

Expected Frequency

Observed Frequency

Red

0.333

0.333 \times 150 \approx 50

54

Blue

0.333

0.333 \times 150 \approx 50

47

Green

0.333

0.333 \times 150 \approx 50

49

Outcome	Expected	Observed	Difference
Red	$50$	$54$	$+4$
Blue	$50$	$47$	$-3$
Green	$50$	$49$	$-1$

Outcome

Expected

Observed

Difference

Red

50

54

+4

Blue

50

47

-3

Green

50

49

-1

Slot	Expected Frequency	Observed Frequency	Difference
Slot A	$100$	$135$	$+35$
Slot B	$100$	$102$	$+2$
Slot C	$100$	$63$	$-37$

Slot

Expected Frequency

Observed Frequency

Difference

Slot A

100

135

+35

Slot B

100

102

+2

Slot C

100

63

-37

Outcome	Probability
Early	$0.10$
On Time	$0.65$
Late	$0.25$

Outcome

Probability

Early

0.10

On Time

0.65

Late

0.25

Outcome	Probability	Expected Frequency	Observed Frequency	Difference
Early	$0.10$	$0.10 \times 200 = 20$	$18$	$-2$
On Time	$0.65$	$0.65 \times 200 = 130$	$112$	$-18$
Late	$0.25$	$0.25 \times 200 = 50$	$70$	$+20$

Outcome

Probability

Expected Frequency

Observed Frequency

Difference

Early

0.10

0.10 \times 200 = 20

18

-2

On Time

0.65

0.65 \times 200 = 130

112

-18

Late

0.25

0.25 \times 200 = 50

70

+20