Judge Whether the Data Can Be Trusted

Every business decision relies on data, but not all data is created equal. Sometimes, new data sources can offer game-changing insights, but it’s not always clear if they can be trusted. The most effective approach is to be neither blindly skeptical nor overly trusting—instead, evaluate each data set on its own merits. Some data is clearly reliable, some is clearly not, and much of it falls somewhere in between, usable with caution if you understand its limitations. In this unit, you’ll learn practical skills outlined in the HBR Guide to Analytics Basics for Managers to assess whether the data in front of you is trustworthy enough to inform your decisions

Four Essential Checks for Data Quality

To determine if data is reliable, focus on four key aspects: completeness, consistency, timeliness, and coverage.

  • Completeness means checking for missing values or records—if half your customer entries lack email addresses, for example, you can’t use email as a reliable contact method.
  • Consistency is about making sure numbers add up across different sources and time periods; if "total sales" in one report doesn’t match another, that’s a red flag.
  • Timeliness asks whether the data is current enough to inform your decision—using last year’s numbers for this quarter’s choices can lead you astray.
  • Coverage ensures the data represents the full group you care about; if your "customer satisfaction" survey only includes your most loyal users, your results will be skewed.

A quick way to spot issues is to run a simple summary, such as "How many unique users do we have this month?" If the answer seems off, pause and investigate before moving forward.

Spotting Bias, Gaps, and Collection Issues

Even when data looks “clean,” it can still mislead if it’s biased or incomplete. Ask yourself if certain groups are missing—maybe your mobile app data only covers iOS users, leaving out Android behavior. Consider whether any data was lost or never collected, such as a system outage that left a week’s gap in your records. Also, reflect on whether the way you collected the data could have influenced the results. A helpful habit is to ask: "Who or what might be missing from this data?" and "Could the way we collected this data have shaped the results?" These questions help you avoid drawing the wrong conclusions from incomplete or biased information.

It’s common to encounter situations where the trustworthiness of your data is unclear, especially when new data sources are involved or when unexpected results appear. In these moments, it’s important to have open conversations with colleagues to probe for possible issues, clarify what’s missing, and decide how to proceed. The following example shows how a team might work through these questions together:

  • Chris: I pulled the churn numbers for last quarter, but the total seems lower than what we expected.
  • Victoria: Did you check if all customer segments are included? Sometimes our enterprise clients get left out of the main report.
  • Chris: Good point. I just realized the export only covers self-serve accounts.
  • Victoria: That could really skew the results. Also, do we have data for the entire quarter?
  • Chris: There was a week in March when the tracking system was down, so we’re missing some data there too.
  • Victoria: Let’s flag those gaps when we share the numbers. We can still show the trend, but we should be clear about what’s missing.

In this exchange, Victoria demonstrates how to probe for missing segments and data gaps, and models transparency about limitations. Notice how she asks specific questions and encourages Chris to communicate the data’s constraints, rather than ignoring them.

Deciding When Data Is “Good Enough” to Use

Before trusting new data, consider where it came from and how it was created. Data produced under a strong data quality program—with clear accountabilities, input controls, and error correction—deserves more trust. If you’re unsure, do some research: find out which team or organization created the data, and check their reputation for quality. Ask colleagues for their experiences with this data source, and look for any documented data quality statistics or known issues.

Perfect data is rare, so you’ll need to judge when it’s fit for purpose. If the stakes are low or the decision is easily reversible, you might accept some gaps—"We’re missing 5% of records, but the trend is clear." For high-impact or hard-to-reverse decisions, push for better data or use proxies, such as supplementing missing churn data with support ticket trends. Always be transparent about limitations; for example, "This analysis covers 90% of users; results may not reflect the remaining 10%." Being upfront about what the data can and cannot support builds trust and helps others make informed choices.

When you encounter intriguing results from new or partially trusted data, isolate those findings and repeat your quality checks. Sometimes, even flawed data can yield valuable insights if you understand and work around its weaknesses. The key is to know where the flaws are, clean what you can, and be ready to back off if the data simply isn’t good enough. In the upcoming roleplay, you’ll get a chance to practice explaining data limitations and suggest practical next steps to a concerned stakeholder.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal