There is something called the maximum subarray problem, and it goes something like this.
You are given an array of numbers: something like 3,5,-2,-1,5,3,1,-4,-6,5. Your job is to come up with the subarray with the highest sum.
For this example it would be the subarray [3,5,-2,-1,5,3,1], which has a sum of 14. No other subarray gives a higher sum.
Let’s look at a few more:
- The sequence -3,-3,-2,-4,-2,-5,-4 has the highest sum being -2, given by the subarray [-2].
- The sequence 1,2,-1,-2,1 has the highest sum being 3, given by the subarray [1,2].
- The sequence 3,2,1,4 has the highest sum being 10, given by the subarray [3,2,1,4].
It’s an interesting problem with many applications. How do we solve it?
Solving the problem
While there are many ways to solve this problem, the best solution is Kadane’s Algorithm. It’s a deceptively simple algorithm, yet also deceptively difficult to understand.
Here’s the Python algorithm given on Wikipedia:
def max_subarray(A):
max_ending_here = max_so_far = A[0]
for x in A[1:]:
max_ending_here = max(x, max_ending_here + x)
max_so_far = max(max_so_far, max_ending_here)
return max_so_far
Here’s an R implementation, with a trace added to help figure out what’s going on.
find_max_subarray = function(arr){
max_ending_here = arr[1]
max_so_far = arr[1]
for(i in 2:length(arr)){
max_ending_here = max(arr[i], max_ending_here + arr[i])
max_so_far = max(max_so_far, max_ending_here)
print(paste( max_ending_here,max_so_far,arr[i])) #for tracing
}
return(max_so_far)
}
Try running the algorithm yourself. Feed it different subsequences and look at the trace to see if you can figure out how it works.
Understanding this algorithm is not a simple task. Here’s how I understand it.
Understanding Kadane’s Algorithm
For each number in the sequence, we do this:
Each number “decides” if it wants to use the previous subarray sum, or if it wants to start a new subarray from scratch. At every comparison, it also checks to see if its subarray sum is higher than any previous one.
Let’s look at an example – the sequence 1,-2,3,-1,2.
We iterate through the numbers, starting at the second number in the sequence (-2).
At this point, the maximum subarray sum seen so far is 1 - the sum of the first element in the sequence. This is also the previous subarray sum.
The new subarray sum of -1 is passed to the next number. This is the sum of the current subarray [1,-2].
The next number (3) gets the previous subarray sum of-1:
We start a new sequence at this point. The current subarray is [3] and the current subarray sum is 3. The maximum subarray sum is also updated, since 3 is higher than everything we had before.
The next number (-1) is passed the previous subarray sum of 3:
Now the subarray is [3,-1] and the subarray sum is 2. Do you see how this goes now?
The last number is passed the subarray sum of 2:
The subarray at this point is [3,-1,2] and the subarray sum is 4. The max_so_far variable is updated to read 4.
At this point the sequence has ended and max_so_far is returned - which is 4. This is the highest subarray sum of the sequence, and the subarray was [3,-1,2].
And that’s it!