Welcome to Parallel Merge Sort with Phaser

Building on your understanding of concurrency, today we’ll implement a parallel merge sort algorithm using Java's Phaser class. In previous lessons, you explored the nuances of thread-safe data structures like an LRU cache. Now, we’ll take that knowledge further to tackle sorting algorithms in a concurrent environment, optimizing performance on multi-core systems.

What You'll Learn

By the end of this lesson, you will understand:

  • The mechanics of parallel merge sort and how it divides and conquers large datasets.
  • How the Phaser class helps synchronize multiple threads.
  • How to implement an efficient parallel merge sort using thread pools and phased synchronization.

This lesson will equip you with the tools to handle large datasets efficiently using parallelism and phased synchronization in Java.

Understanding Parallel Merge Sort with Phaser

Merge sort is a divide-and-conquer algorithm. It breaks down a dataset into smaller subarrays, sorts them, and then merges them back together. The parallel merge sort enhances this process by assigning different subarrays to different threads, allowing multiple parts of the array to be sorted simultaneously.

To coordinate these threads, we use Phaser, a synchronization aid that enables threads to wait for each other at specific points, called phases. This is essential in the parallel merge sort, where threads must wait until all chunks are sorted before merging them.

ParallelMergeSort Class Initialization

Here, we define the ParallelMergeSort class. This class manages the overall sorting process, coordinating multiple threads using an ExecutorService for parallel execution and a Phaser for synchronizing the threads.

In this code, ParallelMergeSort takes three parameters: an ExecutorService, which manages the thread pool, a Phaser for synchronization, and the number of threads. The constructor initializes these fields, preparing the class for sorting. The executor allows tasks to run concurrently across multiple threads, while the phaser ensures that the threads wait for each other at specific points, such as after sorting or merging phases.

Sorting Chunks in Parallel

The first step in parallel merge sort is dividing the array into smaller chunks and sorting each chunk in parallel. This is done using an executor service that distributes tasks among multiple threads. The Phaser class ensures that threads synchronize during different phases, such as sorting and merging.

In this part of the code, the array is divided into smaller chunks, and each chunk is assigned to a thread for sorting. The bulkRegister() method is used to register all threads with the Phaser so that they synchronize during each phase of the algorithm. Each chunk is sorted by submitting a SortTask to the ExecutorService, allowing the tasks to run in parallel. The arriveAndAwaitAdvance() method ensures that no thread moves to the next phase until all sorting is completed.

SortTask Class

Below is the SortTask, which handles the sorting of individual chunks.

Each SortTask is responsible for sorting a specific chunk of the array, as determined by the left and right indices. The task uses Arrays.sort() to perform the sort on that portion of the array. Once the sorting is complete, the task calls arriveAndDeregister() to signal the Phaser that the sorting phase for this thread is finished.

Merging Sorted Chunks

Once all chunks are sorted, the next phase begins: merging the sorted chunks. This merging process occurs in several steps, starting with small chunks and progressively merging larger ones, until the entire array is sorted.

In this section, the sorted chunks are merged together in a phased manner. The process starts with the smallest chunks and proceeds to merge larger and larger ones. The Phaser ensures that no thread moves to the next phase until all merges in the current phase are completed. The executor.submit() method runs each merge in parallel, and the process continues until the entire array is sorted.

MergeTask Class

Here is the MergeTask that handles merging two sorted subarrays.

Each MergeTask merges two sorted subarrays. The merge happens between indices left, mid, and right. This task ensures that two sorted sections of the array are combined into one larger sorted section. After the merge is completed, the phaser.arriveAndDeregister() method signals the completion of this merging task.

Merge Method

The merge() method performs the merging operation.

The merge() method compares elements from two sorted subarrays and merges them into a single sorted array. The method copies the elements from the left and right subarrays and places the smaller element into the original array. After one of the subarrays is fully merged, the remaining elements from the other subarray are added to the array.

Running the Program

Here’s how the program is executed from the Main class:

In this Main class, we create a random array of integers and print it before and after sorting. The ParallelMergeSort class is instantiated with a fixed thread pool and a Phaser instance. The startParallelMergeSort() method initiates the parallel merge sort process.

Why Parallel Merge Sort with Phaser Matters

Understanding and implementing parallel merge sort with Phaser is essential because:

  • Efficiency: Parallelizing this process enables faster sorting of large datasets, taking full advantage of modern multi-core CPUs.
  • Synchronization: Phaser helps coordinate multiple threads without complex locking mechanisms, providing a cleaner approach to concurrency.
  • Scalable Solutions: The approach allows you to handle increased data loads efficiently by scaling the number of threads.

By mastering parallel merge sort, you’ll be able to write scalable, efficient sorting algorithms that perform well on large datasets in real-world applications. Now, you're ready to apply this knowledge in the upcoming practice section!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal