Welcome to the second lesson of the "Evaluation in DSPy" course. In this lesson, we will explore the foundational concept of data handling using Example
objects in DSPy. Data handling is a crucial step in any machine learning workflow, and DSPy provides a powerful yet simple way to manage your data through Example
objects. These objects are used to represent data items in both training and test sets, allowing you to efficiently organize and manipulate your data. By the end of this lesson, you will be equipped with the skills to create and manage Example
objects, setting a strong foundation for your journey in DSPy.
Let's begin by introducing the dspy.Example
class, which is central to data handling in DSPy. An Example
object is similar to a Python dictionary but comes with additional utilities that make it particularly useful for machine learning tasks. To create a basic example object, you can use the following code:
In this example, we create an Example
object named qa_pair
with two fields: question
and answer
. The print
statements allow us to view the entire object as well as access individual fields. The output of this code will be:
This demonstrates how Example
objects can be used to encapsulate data in a structured manner, making it easy to access and manipulate.
Example
objects are highly flexible and can accommodate multiple fields, allowing you to represent complex data structures. Consider the following example:
Here, we create an Example
object with multiple fields: field1
, field2
, and field3
. This flexibility is particularly useful when dealing with datasets that have various attributes. Additionally, you can create a list of Example
objects, as shown with the trainset
, to represent a collection of data items.
In DSPy, it is important to distinguish between input fields and other types of data. The with_inputs()
method allows you to specify which fields should be treated as inputs. Let's look at an example:
In the first line, we mark the question
field as an input. In the second line, both question
and answer
are marked as inputs. It is crucial to correctly identify input fields, as this affects how the data is processed in machine learning models.
Accessing and manipulating data within an Example
object is straightforward. You can use the dot operator to access fields directly. Additionally, the inputs()
and labels()
methods allow you to retrieve input and non-input fields separately. Consider the following example:
In this example, we access the name
field of the example
object using the dot operator. We then create an article_summary
object and use the inputs()
and labels()
methods to separate input and non-input fields. The output will be:
This demonstrates the utility of these methods in organizing and preparing data for machine learning models.
In this lesson, we covered the basics of data handling in DSPy using Example
objects. You learned how to create basic example objects, work with multiple fields, define input fields, and access and manipulate example data. These skills are essential for effectively managing data in DSPy and will serve as a foundation for more advanced topics in the course. As you move on to the practice exercises, I encourage you to experiment with creating and manipulating your own example objects in the CodeSignal IDE. This hands-on practice will reinforce your understanding and prepare you for the next steps in your DSPy journey.
