Lesson 5
Practical Data Manipulation Techniques
Practical Data Manipulation Techniques

Welcome to Practical Data Manipulation Techniques! In this unit, we’ll combine and build upon everything you've learned about data transformation in Ruby. You’ll work through techniques for filtering, projecting, and aggregating data, using methods like map, select, sum, and reduce. By the end, you’ll know how to harness these methods to analyze and summarize data effectively.

Let’s dive in!

Setting Up Our Dataset

Throughout this unit, we’ll work with a structured dataset to apply and combine the techniques you’ve learned. Here’s an array of hashes representing individuals with different attributes:

Ruby
1data = [ 2 { 'name' => 'Alice', 'age' => 25, 'profession' => 'Engineer', 'salary' => 70000 }, 3 { 'name' => 'Bob', 'age' => 30, 'profession' => 'Doctor', 'salary' => 120000 }, 4 { 'name' => 'Carol', 'age' => 35, 'profession' => 'Artist', 'salary' => 50000 }, 5 { 'name' => 'David', 'age' => 40, 'profession' => 'Engineer', 'salary' => 90000 } 6]

This dataset will be the foundation as we explore data manipulation techniques.

Selecting Specific Fields (Data Projection)

Data projection is used to select specific fields from each entry in a dataset. Let’s say we only want to see each person's name and profession:

Ruby
1projected_data = data.map do |entry| 2 entry.select { |key| ['name', 'profession'].include?(key) } 3end 4 5puts projected_data 6# Output: 7# [ 8# {"name"=>"Alice", "profession"=>"Engineer"}, 9# {"name"=>"Bob", "profession"=>"Doctor"}, 10# {"name"=>"Carol", "profession"=>"Artist"}, 11# {"name"=>"David", "profession"=>"Engineer"} 12# ]

In this example:

  1. map iterates through each person in the dataset.
  2. select extracts only the name and profession fields.

The result is an array of hashes containing only the projected fields.

Filtering Data Based on Conditions

Filtering allows you to keep only the data that matches specific conditions. Let’s select only the individuals who are 30 years or older:

Ruby
1filtered_data = data.select { |entry| entry['age'] >= 30 } 2 3puts filtered_data 4# Output: 5# [ 6# {"name"=>"Bob", "age"=>30, "profession"=>"Doctor", "salary"=>120000}, 7# {"name"=>"Carol", "age"=>35, "profession"=>"Artist", "salary"=>50000}, 8# {"name"=>"David", "age"=>40, "profession"=>"Engineer", "salary"=>90000} 9# ]

Here:

  1. select filters entries where the age is 30 or above.
  2. The result contains only entries matching this age criterion.
Combining Projection and Filtering

By combining projection and filtering, we can create a more refined view of our data. For instance, let’s retrieve only the name and salary of people who are engineers and over the age of 30:

Ruby
1projected_filtered_data = data 2 .select { |entry| entry['profession'] == 'Engineer' && entry['age'] > 30 } 3 .map { |entry| entry.select { |key, _| ['name', 'salary'].include?(key) } } 4 5puts projected_filtered_data 6# Output: [{"name"=>"David", "salary"=>90000}]

In this example:

  1. select filters for people who are engineers and over 30.
  2. map then projects only their name and salary.
Aggregating Data with Basic Calculations

Aggregation techniques allow us to summarize data by calculating sums, averages, counts, and more. Let’s explore a few common aggregation tasks.

Finding the Total Salary

We can calculate the total salary by extracting the salary field for each person and summing it:

Ruby
1total_salary = data.map { |entry| entry['salary'] }.sum 2 3puts "Total Salary: $#{total_salary}" 4# Output: Total Salary: $330000

Here:

  • map extracts each person’s salary.
  • sum calculates the total of all salaries.
Calculating the Average Age

To find the average age, sum all the ages and divide by the total count:

Ruby
1average_age = data.map { |entry| entry['age'] }.sum.to_f / data.size 2 3puts "Average Age: #{average_age}" 4# Output: Average Age: 32.5

This code:

  • Maps the dataset to ages.
  • Sums the ages and divides by the number of people to find the average.
Maximum and Minimum Salaries

Using max and min, we can find the highest and lowest salaries:

Ruby
1max_salary = data.map { |entry| entry['salary'] }.max 2min_salary = data.map { |entry| entry['salary'] }.min 3 4puts "Highest Salary: $#{max_salary}" 5puts "Lowest Salary: $#{min_salary}" 6# Output: Highest Salary: $120000, Lowest Salary: $50000

This example uses map to get all salaries, then max and min to find the highest and lowest values.

Advanced Aggregation with reduce

The reduce method (also called inject) is useful for custom aggregations. Let’s use it to count the number of people who are 21 or older:

Ruby
1adult_count = data.map { |entry| entry['age'] }.reduce(0) do |count, age| 2 count + (age >= 21 ? 1 : 0) 3end 4 5puts "Number of adults: #{adult_count}" 6# Output: Number of adults: 4

Here:

  1. We first map the ages.
  2. reduce accumulates a count of people aged 21 or over.
Chaining with `then` for Clarity

Ruby’s then method can help in chaining operations, making code easier to read. Here’s an example where we chain selection, projection, and averaging of salaries for engineers over 25:

Ruby
1average_salary_engineers = data 2 .select { |entry| entry['profession'] == 'Engineer' && entry['age'] > 25 } 3 .map { |entry| entry['salary'] } 4 .then { |salaries| salaries.sum / salaries.size.to_f if salaries.any? } 5 6puts "Average Salary for Engineers over 25: $#{average_salary_engineers}" 7# Output: Average Salary for Engineers over 25: $90000

This example:

  1. Filters for engineers older than 25.
  2. Projects only the salary field.
  3. Uses then to calculate the average salary, adding readability by separating the final calculation step.
Lesson Summary

In Practical Data Manipulation Techniques, we’ve brought together essential methods for transforming, filtering, and aggregating data in Ruby. You’ve learned to:

  • Project specific fields with map and select.
  • Filter data to include only entries meeting certain criteria.
  • Aggregate data using sum, reduce, max, min, and then.

With these combined techniques, you’re well-prepared to process, analyze, and summarize data efficiently in Ruby. Dive into the exercises to reinforce your skills—happy coding!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.