Prefix sums, or cumulative sums, allow us to quickly find the sum of any contiguous slice of an array.
As a quick example, suppose you copy down some driving directions. They tell you to drive down Penny Lane for 3 miles, then make a left onto Abbey Road, which we travel down for 5 miles before continuing on Lake Shore Drive for 4 miles, and so on. We convert these directions into the number of miles driven on each road.
# 3 miles on Penny Lane # 5 miles on Abbey Road # 4 miles on Lake Shore drive # 6 miles on Sunset Boulevard # 2 miles where the streets have no name distances = [3,5,4,6,2]
The prefix sum for the distances is:
prefix_distances = [3,8,12,18,20]
This tells us that it took 3 miles to get to the end of the first direction (Penny Lane), and 8 miles in total to get to the end of the second direction (Abby Road). If we want to know how long it took to get from the end of Abbey Road (mile 8 of our trip) to the end of Sunset Blvd (mile 18), we do one subtraction on
prefix_distances rather than two additions on
# distance between end of Sunset Blvd (mile 18) and Abbey Rd $ print prefix_distances - prefix_distances 10
A useful example of prefix sums would be calculating a moving average of an array, which is designed to remove periodic fluctuations in data. For example, if we knew the amount of money brought in by sales of cookies at the CodeSignal Cafe per day, we might get something like:
num_cookies_sold = [4, 5, 8, 10, 12, 0, 0, 5, 5, 10, 12, 18, 0, 0, ...]
We might guess there is fluctuation based on the day of the week. (Maybe it’s in an area that slows down over the weekend.) The seven day moving average would contain the average of the first seven values (4,5,8,10,12,0,0), while the next value would be the overlapping average of the next seven values (5,8,10,12,0,0,5). The moving average is:
moving_ave = [5.571, 5.714, 5.714, 6.0, 6.286, 7.143, 7.143]
In order to calculate the
moving_ave we can start with the prefix sums:
prefix_cookies = [4, 9, 17, 27, 39, 39, 39, 44, 49, 59, 71, 89, 89, 89]
To get the cookies sold for the first seven days, we can look at
prefix_cookies. To get the total number of cookies sold from day 2 to day 8 we can calculate
prefix_cookies-prefix_cookies. Once we have the prefix sums, calculating the total number of cookies sold in any seven day period becomes trivial. Once we know the number of cookies sold in a week, we can divide by seven to find the average.
The nice thing about the prefix sums approach to moving averages is that the method is agnostic about the period used to average over. If CodeSignal Cafe found that there was a monthly cycle to their cookie sales, it would be easy to use the same array
prefix_cookies to average over 30 days instead.
Other applications of prefix sums are:
- Calculating the average value of blocks of pixels (useful in noise removal).
Calculating whether one point is visible from another, given an array of heights (called the line of sight problem).
Finding cumulative distributions (for example, working out what percentage of income the top 1% of earners make from an array of incomes).
Maybe more impressive is the fact that prefix sums can be performed in parallel. This leads to some really useful algorithmic tricks.
Parallelizing prefix sums
If we are trying to calculate the prefix sum of an array
L , we can give a linear time algorithm. To calculate the
prefix[n], simply take:
prefix[n] = prefix[n-1] + L[n]
This algorithm seems simple and fast, but it is also clear that because
prefix[n] depends on
prefix[n-1] the problem seems embarrassingly serial. That is, it doesn’t seem like we can take a huge array and split it in two, and have each computer calculate half the prefix sum, and then easily join the results together in a way that saves time.
The amazing thing is that there is a parallel algorithm for prefix sum! The method involves passing through the array twice:
Takes the array and builds a binary tree from the array, making pairwise sums along the way.
Takes the array of sums, and determines the prefix sums.
Let’s look at how this works with our array of
First we build a tree, where each element is a leaf at the bottom. Each node keeps track of the index range it comes from (to help us put results from different processes together), the sum of all the elements in index range, and one other element that we’ll ignore for right now.
Notice that the sum stored in each node can be obtained by adding the sum of its two children. This means that we can split the job off to different machines, and then combine them at the end. What we end up with at the top node (or root node) is the sum of all the elements in the array.
To get the prefix sums, we will define left for a node with an index range
[a,b) to be the sum of all the elements of the array with an index of less than
a. In other words, this is the sum of all the elements that appear to the left of the first element included in this node. We start at the root node and make our way down the tree.
To convince ourselves that we can really construct the
left attribute this way, we will concentrate on the red boxed square. Moving from the parent (
index = [0,4)) to the left child is easy: Since the left child has the same lower limit, it just copies the left value (
0 in this case, because these are the elements that start at the beginning of the array). Moving to the right child (
index = [2,4)) takes a little more thought. The right child knows that indices from
[2,4) add up to 10 by looking at its own
sum attribute. By knowing the sum in the parent element is
18, we can deduce that the sum of all elements with indices less than
2 must be
18 - 10 = 8.
To get the prefix sum requires taking the
left element of all the leaves (except the first one, which is trivially zero) and the sum of the entire array.
prefix_distances are found to be:
prefix_distances = [3, 8, 12, 18, 20]
- Prefix sums are interesting in their own right, in terms of just precomputing results and then allowing you to rapidly calculate any contiguous slice of an array.
Demonstrates a pattern in computer science of breaking a seemingly serial task into one that can be parallelized. Can be generalized: build up a binary tree, then move down the tree from the top to separate off the contribution from the “left hand side” of the tree.
Have you ever encountered a problem in an interview that you solved using prefix sums? Or better yet, encountered one in real life? Let us know over on the CodeSignal forum!