Lesson 5
Practical Data Manipulation Techniques with Kotlin
Topic Overview and Importance

Hello and welcome! Today, we're exploring practical data manipulation techniques in Kotlin. We'll use Kotlin List collections to represent our data stream and perform projection, filtering, and aggregation. And here's the star of the show: our operations will be neatly packaged within a Kotlin class! No mess, all clean code.

Introduction to Data Manipulation

Data manipulation is akin to being a sculptor but for data. We chisel and shape our data to get the desired structure. Kotlin lists are perfect for this, and our operations will be conveniently bundled inside a Kotlin class. So, let's get our toolbox ready! Here's a simple Kotlin class, DataStream, that will serve as our toolbox:

Kotlin
1class DataStream(private val data: List<Map<String, Any?>>)
Data Projection in Practice

Our first stop is data projection. Think of it as capturing a photo of our desired features. Suppose we have data about people, and we're only interested in their names and ages; we project our data to include just these details. We'll extend our DataStream class with a projectData method for this:

Kotlin
1class DataStream(private val data: List<Map<String, Any?>>) { 2 3 fun projectData(keys: List<String>): DataStream { 4 val projectedData = data.map { entry -> 5 entry.filterKeys { it in keys } 6 } 7 return DataStream(projectedData) 8 } 9 10 fun dataToString(): String { 11 return data.joinToString(separator = ", ", prefix = "[", postfix = "]") { 12 it.entries.joinToString(separator = ", ", prefix = "{", postfix = "}") { entry -> 13 "${entry.key}=${entry.value}" 14 } 15 } 16 } 17} 18 19fun main() { 20 // Let's use it! 21 val ds = DataStream( 22 listOf( 23 mapOf("name" to "Alice", "age" to 25, "profession" to "Engineer"), 24 mapOf("name" to "Bob", "age" to 30, "profession" to "Doctor") 25 ) 26 ) 27 val projectedDs = ds.projectData(listOf("name", "age")) 28 println(projectedDs.dataToString()) // Outputs: [{name=Alice, age=25}, {name=Bob, age=30}] 29}

As you can see, we now have a new list with just the names and ages!

Data Filtering in Practice

Next, we have data filtering, which is like cherry-picking our preferred data entries. We'll extend our DataStream class with a filterData method that uses a "predicate" function to filter data:

Kotlin
1class DataStream(private val data: List<Map<String, Any?>>) { 2 3 // ... other methods ... 4 5 fun filterData(predicate: (Map<String, Any?>) -> Boolean): DataStream { 6 val filteredData = data.filter(predicate) 7 return DataStream(filteredData) 8 } 9} 10 11fun main() { 12 // Applying it: 13 val ds = DataStream( 14 listOf( 15 mapOf("name" to "Alice", "age" to 25, "profession" to "Engineer"), 16 mapOf("name" to "Bob", "age" to 30, "profession" to "Doctor") 17 ) 18 ) 19 val ageTest: (Map<String, Any?>) -> Boolean = { it["age"] as Int > 26 } 20 val filteredDs = ds.filterData(ageTest) 21 println(filteredDs.dataToString()) // Outputs: [{name=Bob, age=30, profession=Doctor}] 22}

With the filter method, our output is a list with only Bob’s data, as he's the only one who passes the "age over 26" test.

Data Aggregation in Practice

Last is data aggregation, where we condense our data into a summary. We will add an aggregateData method to our DataStream class for this:

Kotlin
1class DataStream(private val data: List<Map<String, Any?>>) { 2 3 // ... other methods ... 4 5 fun aggregateData(key: String, aggFunc: (List<Int>) -> Double): Double { 6 val values = data.mapNotNull { it[key] as? Int } 7 return aggFunc(values) 8 } 9} 10 11fun main() { 12 // Let's put it to use 13 val ds = DataStream( 14 listOf( 15 mapOf("name" to "Alice", "age" to 25, "profession" to "Engineer"), 16 mapOf("name" to "Bob", "age" to 30, "profession" to "Doctor") 17 ) 18 ) 19 val averageAge = ds.aggregateData("age") { ages -> ages.average() } 20 println(averageAge) // Outputs: 27.5 21}

With this script, we get the average age of Alice and Bob, which is 27.5.

Combining Projection, Filtering, and Aggregation

Now, let's combine projection, filtering, and aggregation to see the collective power of these techniques. We'll extend our example to demonstrate this flow:

  1. Data Projection: Choose only the desired fields.
  2. Data Filtering: Filter the data based on certain conditions.
  3. Data Aggregation: Summarize the filtered data.

We'll modify our DataStream class to include all the methods and then use them together in a workflow. The projection and filtering methods will now return an instance of DataStream, not a list as before, so that we can chain these methods when calling them:

Kotlin
1class DataStream(private val data: List<Map<String, Any?>>) { 2 3 fun projectData(keys: List<String>): DataStream { 4 val projectedData = data.map { entry -> 5 entry.filterKeys { it in keys } 6 } 7 return DataStream(projectedData) 8 } 9 10 fun filterData(predicate: (Map<String, Any?>) -> Boolean): DataStream { 11 val filteredData = data.filter(predicate) 12 return DataStream(filteredData) 13 } 14 15 fun aggregateData(key: String, aggFunc: (List<Int>) -> Double): Double { 16 val values = data.mapNotNull { it[key] as? Int } 17 return aggFunc(values) 18 } 19} 20 21fun main() { 22 // Example usage 23 val ds = DataStream( 24 listOf( 25 mapOf("name" to "Alice", "age" to 25, "profession" to "Engineer", "salary" to 70000), 26 mapOf("name" to "Bob", "age" to 30, "profession" to "Doctor", "salary" to 120000), 27 mapOf("name" to "Carol", "age" to 35, "profession" to "Artist", "salary" to 50000), 28 mapOf("name" to "David", "age" to 40, "profession" to "Engineer", "salary" to 90000) 29 ) 30 ) 31 32 // Step 1: Project the data to include only 'name', 'age', and 'salary' 33 val projectedDs = ds.projectData(listOf("name", "age", "salary")) 34 35 // Step 2: Filter the projected data to include only those with age > 30 36 val filteredDs = projectedDs.filterData { it["age"] as Int > 30 } 37 38 // Step 3: Aggregate the filtered data to compute the average salary 39 val averageSalary = filteredDs.aggregateData("salary") { salaries -> salaries.average() } 40 println(averageSalary) // Outputs: 70000.0 41}

Here:

  • Projection: We choose only the name, age, and salary fields from our data. The projectData method now returns a DataStream object, allowing us to chain multiple operations.
  • Filtering: We filter the projected data to include only those persons whose age is greater than 30. The filterData method also returns a DataStream object for chaining.
  • Aggregation: We calculate the average salary of the filtered data. The final output shows the average salary for those aged over 30, which is 70,000.

By combining these methods, our data manipulation becomes both powerful and concise. Try experimenting and see what you can create!

Lesson Summary

Brilliant job! You've now grasped the basics of data projection, filtering, and aggregation on Kotlin List collections. Plus, you've learned to package these operations in a Kotlin class — a neat bundle of reusable code magic!

Now, why not try applying these fresh skills with some practice exercises? They're just around the corner. Ready? Let's dive into more fun with data manipulation!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.