Now that you understand the analytics stack, let's dive into the critical first step: getting data into your systems. This is called data ingestion—the process of collecting raw data from various sources.
Think of it as the loading dock of your data warehouse. Everything starts here.
Engagement Message
What's one challenge you'd expect when collecting data from multiple different sources?
There are four main approaches to data ingestion, each with different strengths and weaknesses: batch transfers, API polling, event streaming, and database replication.
You'll see these everywhere, from banking apps to social media feeds. There's no one-size-fits-all—each approach is suited to different business needs and technical constraints.
Engagement Message
Can you think of a system you use every day that probably relies on one of these methods?
Batch file transfers collect data in scheduled chunks, like receiving daily shipments.
API polling actively requests data from live systems, like checking your email every few minutes.
Engagement Message
Daily sales reports: batch transfers or API polling?
Event streaming captures data the moment it happens, like notifications that ping your phone instantly when something occurs.
Database replication automatically copies data from one database to another, keeping them synchronized in real-time.
Engagement Message
What is one business scenario where you'd want instant data rather than daily batches?
Latency is how long it takes from when data is created to when it's available for analysis. Batch transfers have hours of latency, while streaming has seconds.
But lower latency usually means higher complexity and cost.
