Parsing Data with LLMs

Introduction: Moving to Real-World Data Tasks

Welcome! Up to this point, you have been learning how to use large language models (LLMs) for basic coding tasks. In this unit, we will take things a step further and explore how LLMs can help with more complex, professional tasks — specifically, parsing and analyzing structured data.

You do not need to be a programmer to follow along. This course is designed to be approachable for everyone, including those without a technical background. However, as we start working with more complex data, you may notice that the code generated by the LLM can become more involved, and sometimes, the model might not get things right on the first try. That’s normal! You will learn how to refine your prompts and regenerate responses as needed. We will also cover debugging in future lessons.

Let’s get started by understanding what data parsing is and why it’s useful.

What Is Data Parsing and Why Use LLMs?

Data parsing is the process of taking structured data — like a table, a CSV file, or an XML document — and extracting useful information from it. This is a common task in many jobs, from business analysis to software development.

For example, you might have a CSV file with employee records and want to find the average salary, the most common job title, or the country with the most employees. LLMs can help automate these tasks by generating code or even providing direct answers, saving you time and effort.

Here are some common data formats you might encounter:

CSV (Comma-Separated Values): Used for spreadsheets and simple databases.
XML (eXtensible Markup Language): Used for structured documents and data exchange.
JSON (JavaScript Object Notation): Used for APIs and web data.

LLMs are helpful because they can quickly generate scripts or code to parse these formats, even if you are not a coding expert.

How to Provide Data Context in Prompts

When asking an LLM to help parse data, you do not need to provide the entire dataset. Instead, you can include a small sample of the data. This gives the model enough context to understand the structure and content while keeping your prompt clear and focused.

It’s also important to format your prompt so the LLM can easily tell which part is the data, which part is your instructions, and what constraints you want to set. Using XML-styled formatting is one of the best options.

Let’s look at how you might present a data sample in your prompt:

By including just a few rows, you give the LLM enough information to understand the data’s structure and the types of values it contains.

Designing Prompts for Data Parsing Tasks: Context

Now, let’s build a step-by-step prompt to help the LLM generate a script for parsing data.

Step 1: Start with the Data Sample

First, include a snippet of your data using clear tags. Important: Mention the source of the data.

Designing Prompts for Data Parsing Tasks: Instructions

Next, tell the LLM exactly what you want to do with the data. For example:

Designing Prompts for Data Parsing Tasks: Constraints

Consider including the following constraints in your prompt:

Provide me with fully runnable code – ensures the model creates a ready-to-go solution that you don't need to modify yourself.
Only create a simple python script, nothing extra – some models, especially Claude, tend to overcomplicate the requested code. You might get a solution that is difficult to use due to unnecessary extra functions. This constraint ensures the provided solution only addresses your request.
Explain the algorithm in simple words – if you want to learn more about the code model generated. Usually, models do an excellent job explaining their code.

Full Prompt Example

Here is a complete example of a well-structured prompt for parsing data with an LLM:

This prompt clearly defines the data sample, instructions, and constraints, making it easy for the LLM to understand and respond accurately.

Summary and What’s Next

In this lesson, you learned:

What is data parsing, and why is it useful?
How do you provide a clear data sample to an LLM?
How do you structure your prompt with data, instructions, and constraints to get accurate, valuable results?

You are now ready to practice designing your own prompts for data parsing tasks. In the next section, you will get hands-on experience by working through exercises that build on what you learned here. Remember, it’s normal if the LLM doesn’t get it right the first time — refining your prompt is part of the process. Good luck, and have fun experimenting!

Next Lesson: Debugging Code with LLMs

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal