Welcome to Practical Data Manipulation Techniques! In this unit, we’ll combine and build upon everything you've learned about data transformation in Ruby. You’ll work through techniques for filtering, projecting, and aggregating data, using methods like map
, select
, sum
, and reduce
. By the end, you’ll know how to harness these methods to analyze and summarize data effectively.
Let’s dive in!
Throughout this unit, we’ll work with a structured dataset to apply and combine the techniques you’ve learned. Here’s an array of hashes representing individuals with different attributes:
Ruby1data = [ 2 { 'name' => 'Alice', 'age' => 25, 'profession' => 'Engineer', 'salary' => 70000 }, 3 { 'name' => 'Bob', 'age' => 30, 'profession' => 'Doctor', 'salary' => 120000 }, 4 { 'name' => 'Carol', 'age' => 35, 'profession' => 'Artist', 'salary' => 50000 }, 5 { 'name' => 'David', 'age' => 40, 'profession' => 'Engineer', 'salary' => 90000 } 6]
This dataset will be the foundation as we explore data manipulation techniques.
Data projection is used to select specific fields from each entry in a dataset. Let’s say we only want to see each person's name
and profession
:
Ruby1projected_data = data.map do |entry| 2 entry.select { |key| ['name', 'profession'].include?(key) } 3end 4 5puts projected_data 6# Output: 7# [ 8# {"name"=>"Alice", "profession"=>"Engineer"}, 9# {"name"=>"Bob", "profession"=>"Doctor"}, 10# {"name"=>"Carol", "profession"=>"Artist"}, 11# {"name"=>"David", "profession"=>"Engineer"} 12# ]
In this example:
map
iterates through each person in the dataset.select
extracts only thename
andprofession
fields.
The result is an array of hashes containing only the projected fields.
Filtering allows you to keep only the data that matches specific conditions. Let’s select only the individuals who are 30 years or older:
Ruby1filtered_data = data.select { |entry| entry['age'] >= 30 } 2 3puts filtered_data 4# Output: 5# [ 6# {"name"=>"Bob", "age"=>30, "profession"=>"Doctor", "salary"=>120000}, 7# {"name"=>"Carol", "age"=>35, "profession"=>"Artist", "salary"=>50000}, 8# {"name"=>"David", "age"=>40, "profession"=>"Engineer", "salary"=>90000} 9# ]
Here:
select
filters entries where theage
is 30 or above.- The result contains only entries matching this age criterion.
By combining projection and filtering, we can create a more refined view of our data. For instance, let’s retrieve only the name
and salary
of people who are engineers and over the age of 30:
Ruby1projected_filtered_data = data 2 .select { |entry| entry['profession'] == 'Engineer' && entry['age'] > 30 } 3 .map { |entry| entry.select { |key, _| ['name', 'salary'].include?(key) } } 4 5puts projected_filtered_data 6# Output: [{"name"=>"David", "salary"=>90000}]
In this example:
select
filters for people who are engineers and over 30.map
then projects only theirname
andsalary
.
Aggregation techniques allow us to summarize data by calculating sums, averages, counts, and more. Let’s explore a few common aggregation tasks.
We can calculate the total salary by extracting the salary
field for each person and summing it:
Ruby1total_salary = data.map { |entry| entry['salary'] }.sum 2 3puts "Total Salary: $#{total_salary}" 4# Output: Total Salary: $330000
Here:
map
extracts each person’ssalary
.sum
calculates the total of all salaries.
To find the average age, sum all the ages and divide by the total count:
Ruby1average_age = data.map { |entry| entry['age'] }.sum.to_f / data.size 2 3puts "Average Age: #{average_age}" 4# Output: Average Age: 32.5
This code:
- Maps the dataset to ages.
- Sums the ages and divides by the number of people to find the average.
Using max
and min
, we can find the highest and lowest salaries:
Ruby1max_salary = data.map { |entry| entry['salary'] }.max 2min_salary = data.map { |entry| entry['salary'] }.min 3 4puts "Highest Salary: $#{max_salary}" 5puts "Lowest Salary: $#{min_salary}" 6# Output: Highest Salary: $120000, Lowest Salary: $50000
This example uses map
to get all salaries, then max
and min
to find the highest and lowest values.
The reduce
method (also called inject
) is useful for custom aggregations. Let’s use it to count the number of people who are 21 or older:
Ruby1adult_count = data.map { |entry| entry['age'] }.reduce(0) do |count, age| 2 count + (age >= 21 ? 1 : 0) 3end 4 5puts "Number of adults: #{adult_count}" 6# Output: Number of adults: 4
Here:
- We first map the ages.
reduce
accumulates a count of people aged 21 or over.
Ruby’s then
method can help in chaining operations, making code easier to read. Here’s an example where we chain selection, projection, and averaging of salaries for engineers over 25:
Ruby1average_salary_engineers = data 2 .select { |entry| entry['profession'] == 'Engineer' && entry['age'] > 25 } 3 .map { |entry| entry['salary'] } 4 .then { |salaries| salaries.sum / salaries.size.to_f if salaries.any? } 5 6puts "Average Salary for Engineers over 25: $#{average_salary_engineers}" 7# Output: Average Salary for Engineers over 25: $90000
This example:
- Filters for engineers older than 25.
- Projects only the
salary
field. - Uses
then
to calculate the average salary, adding readability by separating the final calculation step.
In Practical Data Manipulation Techniques, we’ve brought together essential methods for transforming, filtering, and aggregating data in Ruby. You’ve learned to:
- Project specific fields with
map
andselect
. - Filter data to include only entries meeting certain criteria.
- Aggregate data using
sum
,reduce
,max
,min
, andthen
.
With these combined techniques, you’re well-prepared to process, analyze, and summarize data efficiently in Ruby. Dive into the exercises to reinforce your skills—happy coding!