Welcome to the first lesson of our course on managing data from different datasets. In today's digital world, it's common to encounter large volumes of data. Understanding how to efficiently manage this data, especially when it's compressed, is crucial. This lesson will focus on handling JSON files contained within a zip archive using TypeScript. By the end, you'll be able to extract, read, and process data stored in compressed formats, building a strong foundation for handling real-world datasets.
Before we dive into zip files, let's briefly recall some essentials about JSON and file I/O operations. JSON, or JavaScript Object Notation, is a lightweight data interchange format. It's easy for humans to read and write and easy for machines to parse and generate. In TypeScript
, we interact with JSON data using the JSON.parse()
and JSON.stringify()
methods for converting between JSON and TypeScript objects or strings.
Zip files are a type of compressed file format that allows you to bundle many files into one. In TypeScript
, you work with zip files using the adm-zip
library. This library provides tools to handle zip files without extracting them to a directory.
Here's how you open a zip file using the adm-zip
library:
TypeScript1import AdmZip from 'adm-zip'; 2 3const zipFileName: string = 'universe_data.zip'; 4const zip = new AdmZip(zipFileName); 5 6console.log("Zip File Opened:", zipFileName);
In this example:
new AdmZip(zipFileName)
is used to open the zip file.- It allows you to perform various operations on the zip file, such as extracting or reading its contents.
Once the zip file is open, you can list its contents using the getEntries()
method from adm-zip
:
TypeScript1const zipEntries = zip.getEntries(); 2const fileList: string[] = zipEntries.map(entry => entry.entryName); 3console.log("Files in the archive:", fileList);
Here, fileList
will contain a list of the names of the files within the zip archive.
Before we proceed with parsing data from the universe dataset, let’s discuss the data itself. The dataset contains information about various stars and is provided in a JSON format. Each entry in the array corresponds to a star with details like its name, type, and mass. For instance, the mass field may look like "90.45 × 10^30 kg"
, indicating the mass in scientific notation. Understanding the structure will help us process and analyze the data efficiently.
Now, let's move on to reading JSON files stored in the zip archive. We begin by accessing a specific file from the archive and using JSON.parse()
.
Here's how you access a JSON file within the zip:
TypeScript1const starsJson: string = zip.readAsText('stars.json'); 2const stars: any[] = JSON.parse(starsJson);
In this code:
zip.readAsText('stars.json')
gives us the text contents of thestars.json
file in the archive.JSON.parse()
converts the JSON document into a TypeScript object or array we can work with.
Once we've loaded our JSON data, we can analyze it. Let's sort the stars by their mass to find the top 5 most massive ones.
We'll use the sort()
method along with a comparison function to sort by mass:
TypeScript1const sortedStars = stars.sort((a, b) => { 2 const massA: number = parseFloat(a.mass.split(' ')[0]); 3 const massB: number = parseFloat(b.mass.split(' ')[0]); 4 return massB - massA; 5}); 6 7const mostMassiveStars = sortedStars.slice(0, 5);
Explanation:
sort((a, b) => {...})
ensures we sort by themass
field, converting it into a float for numerical comparison.const mostMassiveStars = sortedStars.slice(0, 5)
extracts the top 5 stars by mass.
Finally, we'll display the top 5 massive stars:
TypeScript1console.log("Top 5 Most Massive Stars:"); 2mostMassiveStars.forEach((star, index) => { 3 console.log(`${index + 1}. ${star.name} - Mass: ${star.mass}`); 4});
Using a loop, we log each star's name and mass from our sorted list. The output will be formatted in the following manner:
Plain text1Top 5 Most Massive Stars: 21. Star Name - Mass: 90.45 × 10^30 kg 32. Star Name - Mass: 89.70 × 10^30 kg 43. Star Name - Mass: 88.35 × 10^30 kg 54. Star Name - Mass: 87.90 × 10^30 kg 65. Star Name - Mass: 86.10 × 10^30 kg
In addition to JSON files, you might encounter other types of files within a zip archive. Reading text files can be done using the adm-zip
library as well. The content will be returned as a string rather than a byte object.
Here's an example of how to read a text file from a zip archive:
TypeScript1const textContent: string = zip.readAsText('data.txt'); 2console.log(textContent);
Explanation:
zip.readAsText('data.txt')
gives us the decoded string contents of thedata.txt
file.
In addition to JSON and text files, you might encounter CSV files within a zip archive. You can use a CSV parsing library like csv-parser
with adm-zip
to handle these files.
Here's how to read a CSV file from a zip archive:
TypeScript1import csv from 'csv-parser'; 2import { Readable } from 'stream'; 3 4const csvData: any[] = []; 5const csvBuffer: Buffer = zip.readFile('data.csv'); 6 7Readable.from(csvBuffer.toString()) 8 .pipe(csv()) 9 .on('data', (row) => { 10 csvData.push(row); 11 }) 12 .on('end', () => { 13 console.log(csvData); 14 });
Explanation:
zip.readFile('data.csv')
gives us the buffer of thedata.csv
file.- We create a readable stream from the CSV data buffer.
csv-parser
is used to parse each row, pushing it tocsvData
for further processing or simply printing them, as demonstrated.
In this lesson, we covered how to use the adm-zip
and TypeScript's built-in JSON methods to manage data stored in a compressed format. You learned to open and read JSON files from a zip archive and process the data by sorting it based on specific criteria. The skills you’ve gained here will be invaluable as you tackle more complex data-handling tasks. Now, you're ready to apply these concepts in the upcoming practice exercises, where you'll get hands-on experience with data extraction and analysis from compressed datasets using TypeScript. Let's move forward with confidence.