In a recent discussion in the German CoCreate user forum, a customer was looking for ways to solve a variation of the subset sum problem (which is a special case of the knapsack problem). This was needed to find the right combination of tool parts to manufacture a screw. Those tool parts are available in a wide variety of diameters, and the task at hand is to find a set of these tools which, when combined, can be used to manufacture a screw of a given size.

Abstracting from the manufacturing problem, the problem can be stated like this:
Given `n`

items of lengths `l _{1}`,

`t`

, find all subsets of items
which add up to `t`

.
Even though the description sounds fairly simple, the problem is
surprisingly tough. In fact, it is NP-complete,
which is computer science gobbledigook for "d*mn tough, man!".
In practice, it means that for small values of `n`

, it's easy to find
an algorithm which tests all permutations and lists all subsets in
a reasonable amount of time. However, as the size of the array of item lengths
grows, the computation time may grow exponentially, and chances are you'll
never see the day when it ends
(cf. Deep Thought).

The following simple backtracking algorithm is of that nature: It performs reasonably well for small arrays, but will quickly wear your patience for larger arrays. To see how the algorithm behaves, I created a version of the code which only counts the number of found solutions; then I ran it in CLISP, compiled and interpreted:

Array size | Run time (compiled) | Run time (interpreted) |
---|---|---|

10 | 0.002s | 0.02s |

15 | 0.05s | 0.35s |

20 | 0.7s | 6s |

25 | 18s | 210s |

27 | 80s | 780s |

29 | 270s | |

30 | 570s |

Take those results with arbitrary amounts of salt; the code (see below) initializes the test array with random numbers and will therefore always vary a little. But the results are reliable enough to show that you don't really want to use this algorithm for large arrays...

(let((solutions 0) flags numbers) (defunfound-solution() "Called whenever the algorithm has found a solution" (let((total 0)) (formatt" ") (dotimes(i (lengthnumbers)) (when(arefflags i) (incftotal (arefnumbers i)) (formatt"~A " (arefnumbers i))) ) (formatt" => ~A~%" total) (incfsolutions))) (defunfind-solutions(k target-sum callback) "Core backtracking algorithm" (when(zeroptarget-sum) (funcallcallback) (return-fromfind-solutions)) (unless(=k (lengthnumbers)) (let((nk (arefnumbers k))) (when(>=target-sum nk) ;; try subtracting numbers[k] from target-sum (setf(arefflags k)t) (find-solutions (+1 k) (-target-sum nk) callback) (setf(arefflags k)nil))) ;; recurse without subtracting first (find-solutions (+1 k) target-sum callback))) (defunfind-subset-sum(target-sum) "Set up and run backtracking algorithm based on 'numbers' array" (setfflags (make-array(list(lengthnumbers)))) (setfsolutions 0) (find-solutions 0 target-sum#'found-solution) (formatt"Found ~A different solutions.~%" solutions)) (defunsubset-sum-test(size) "Test subset sum algorithm using random numbers" (let*((total 0) target-sum) ;; init numbers array with random values up to 1000 (setfnumbers (make-array(listsize))) (dotimes(i size) (setf(arefnumbers i) (random1000)) (incftotal (arefnumbers i))) (setftarget-sum (floor(/total 2))) ;; random target sum (formatt"Now listing all subsets which sum up to ~A:~%" target-sum) (find-subset-sum target-sum))) )

The core backtracking algorithm is in `find-solutions`

. It will recursively
exhaust all subsets. When it finds a subset which adds up to `target-sum`

,
it will call the `callback`

function - this function can either simply
increase a solution counter, report the current solution to the user, or
store it somewhere for later retrieval.

In the test example above, the callback function is `print-solution`

which
increments a solution counter and prints the current solution.

To test the code, run `subset-sum-test`

, providing an array size. This function
will create an array of `numbers`

of that size and initialize it with random
values; it will also pick a random `target-sum`

. In a real application,
you would replace `subset-sum-test`

with a function which gets the array
data from somewhere (for example, from a tools database as in the customer's
case), and lets the user pick a `target-sum`

.

-- ClausBrod - 01 Mar 2006

The aforementioned customer would actually have preferred a solution in CoCreate Drafting's macro language. However, this macro language isn't really a full-blown programming language (even though it is perfectly adequate for almost all customization purposes). For instance, its macros cannot return values, the language doesn't have an array type, and the macro expansion stack (i.e. the depth of the macro call tree) has a fixed limit - which pretty much rules out non-trivial amounts of recursion.

While I was considering my options, I also fooled around with VBA, which resulted in the code presented below. I'm not at all proficient in VBA, so I'm sure the implementation is lacking, but anyway - maybe someone out there finds it useful nonetheless

Dim solutions As Long Dim flags() As Boolean Dim numbers() As Long Sub findSolutions(k As Long, targetSum As Long) If targetSum = 0 Then ' we found a solution solutions = solutions + 1 Exit Sub End If If k <= UBound(numbers) Then If (targetSum >= numbers(k)) Then flags(k) = True ' try first by subtracting numbers[k] from targetSum Call findSolutions(k + 1, targetSum - numbers(k)) flags(k) = False End If ' now try without subtracting Call findSolutions(k + 1, targetSum) End If End Sub Sub subsetsum() Dim targetSum As Long Dim i As Long Dim arraySize As Long arraySize = 25 ReDim numbers(0 To arraySize - 1) ReDim flags(0 To arraySize - 1) ' initialize numbers array with random entries Randomize For i = 0 To arraySize - 1 numbers(i) = Int(1000 * Rnd + 1) flags(i) = False targetSum = targetSum + numbers(i) Next targetSum = Int(targetSum / 2) solutions = 0 Call findSolutions(0, targetSum) MsgBox "Found " + Str(solutions) + " solutions." End Sub

Let's see - we recurse one level for each entry in the array, so with a maximum array size of 30 (the customer said he was considering a table of 20-30 values), the recursion depth should never exceed 30. That's not a lot, in fact, so would this still exceed the macro stack thresholds in CoCreate Drafting?

Oh, and by the way, the recursive function doesn't even try to return values, so the lack of return values in the macro language isn't a real obstacle in this case! I couldn't resist and just had to try to translate the algorithm into CoCreate Drafting macro language:

{ Description: Subset sum algorithm } { Author: Claus Brod } { Language: CoCreate Drafting macros } { (C) Copyright 2006 Claus Brod, all rights reserved } DEFINE Found_solution LOCAL I LOCAL Solution LOCAL Total LOCAL Nk { display current solution } LET Subset_sum_solutions (Subset_sum_solutions+1) LET I 1 LET Solution '' LET Total 0 WHILE (I <= Subset_sum_arraysize) IF (READ_LTAB 'Flags' I 1) LET Nk (READ_LTAB 'Numbers' I 1) LET Total (Total + Nk) LET Solution (Solution + ' ' + STR(Nk)) END_IF LET I (I+1) END_WHILE DISPLAY_NO_WAIT (Solution + ' sum up to ' + STR(Total)) END_DEFINE DEFINE Find_solutions PARAMETER K PARAMETER Target_sum LOCAL Nk IF (Target_sum = 0) { we found a solution, display it } Found_solution ELSE_IF (K <= Subset_sum_arraysize) LET Nk (READ_LTAB 'Numbers' K 1) { The following optimization only works if we can assume a sorted array } IF ((Nk * (Subset_sum_arraysize-K+1)) >= Target_sum) IF (Target_sum >= Nk) { try first by subtracting Numbers[k] from Target } WRITE_LTAB 'Flags' K 1 1 Find_solutions (K+1) (Target_sum-Nk) WRITE_LTAB 'Flags' K 1 0 END_IF { now try without subtracting } Find_solutions (K+1) Target_sum END_IF END_IF END_DEFINE DEFINE Subset_sum PARAMETER Subset_sum_arraysize LOCAL Target LOCAL Random LOCAL I LOCAL Subset_sum_solutions LOCAL Start_time { Allocate Numbers and Flags arrays } CREATE_LTAB Subset_sum_arraysize 1 'Numbers' CREATE_LTAB Subset_sum_arraysize 1 'Flags' LET Target 0 LET I 1 WHILE (I <= Subset_sum_arraysize) LET Random (INT(1000 * RND + 1)) LET Target (Target + Random) WRITE_LTAB 'Numbers' I 1 Random WRITE_LTAB 'Flags' I 1 0 LET I (I+1) END_WHILE LET Target (INT (Target/2)) DISPLAY ('Array size is ' + STR(Subset_sum_arraysize) + ', target sum is ' + STR(Target)) { Sorting in reverse order speeds up the recursion } SORT_LTAB 'Numbers' REVERSE_SORT 1 CONFIRM LET Start_time (TIME) LET Subset_sum_solutions 0 Find_solutions 1 Target DISPLAY ('Found ' + STR(Subset_sum_solutions) + ' solutions in ' + STR(TIME-Start_time) + ' seconds.') END_DEFINE

Because CoCreate Drafting's macro language doesn't have arrays, they have to be emulated
using *logical tables*, or *ltabs*. The solutions which are found are displayed
in CoCreate Drafting's prompt line, which is certainly not the most ideal place, but it's
sufficient to verify the algorithm is actually doing anything .-)

What you see above, is a tuned version of the macro code I came up with initially. The "literal" translation from the original Lisp version took ages to complete even for small arrays; for instance, searching for solutions in an array of 20 numbers took 95 seconds (using OSDD 2005). However, there are two simple optimizations which can be applied to the algorithm.

First, the input array can be sorted in reverse order, i.e. largest numbers first. This makes it more likely that we can prune the recursion tree early. This optimization itself improved runtimes by only 5% or so, but more importantly, it paved the way for another optimization.

Since we know that the numbers in the array are monotonically decreasing, we can now predict in many cases that there is no chance of possibly reaching the target sum anyway, and therefore abort the recursion early. Example for an array of size 20:

- We are in the midst of the recursion, say, at recursion level 15. The array entry at index 15 contains the value 42.
- Let us further assume that
`Target_sum`

has already been reduced to 500 earlier in the recursion, i.e. the remaining entries in the array somehow have to sum up to 500 to meet the required subset sum. - Because the values in the array are monotonically
decreasing, we know that the entries 15 through 20 have a
value of 42 at most. Assuming that
*all*remaining entries are 42, we get a maximum remaining sum of 42*6=252. - This means we can prune the recursion tree at this point because there's no way that this recursion line will ever find a solution.

This second optimization improved runtimes tremendously; with an array size of 20, we're now down to 15 seconds (originally 95 seconds); however, it still takes more than 3000 seconds to find all solutions in an array of size 30.

By the way, all performance measurements mentioned here were taken on the same system, a laptop with a 1.7 GHz Celeron CPU.

In real life, the `Numbers`

array will in fact contain floating-point
values rather than integers. The algorithm doesn't change, but
whenever you work with floating-point values, it's good to follow
a few basic guidelines like the ones outlined here.

In the case of the above macro code, instead of a comparison like
`IF (Target = 0)`, you'd probably want to write something
like `IF (ABS(Target) < Epsilon)`

where `Epsilon`

is a small value
chosen to meet a user-defined tolerance (for example 0.001).

-- ClausBrod - 16 Mar 2006

This is taking me to places I didn't anticipate. Over at codecomments.com, they are discussing solutions to the subset sum problem in Haskell, Prolog and Caml, if anyone is interested. (I sure was, and even read a tutorial on Haskell to help me understand what these guys are talking about )

-- ClausBrod - 01 Apr 2006

I started to learn some Ruby, so here's a naïve implementation of the algorithm in yet another language (See also this blog entry.)

$solutions = 0 $numbers = [] $flags = [] def find_solutions(k, target_sum)iftarget_sum == 0 # found a solution! (0..$numbers.length).each { |i|if($flags[i])thenprint $numbers[i], " ";end} print "\n" $solutions = $solutions + 1elseifk < $numbers.lengthiftarget_sum >= $numbers[k] $flags[k] = true find_solutions k+1, target_sum-$numbers[k] $flags[k] = falseendfind_solutions k+1, target_sumendendend def find_subset_sum(target_sum) print "\nNow listing all subsets which sum up to ", target_sum, ":\n" $solutions = 0 (0..$numbers.length()).each { |i| $flags[i] = false } find_solutions 0, target_sum print "Found ", $solutions, " different solutions.\n" end def subset_sum_test(size) total = 0 target_sum = 0 (0..size).each { |i| $numbers[i] = rand(1000); total += $numbers[i]; print $numbers[i], " " } target_sum = total/2 find_subset_sum target_sum end subset_sum_test 25

-- ClausBrod - 17 Apr 2006

The other day, I experimented with Python and thought I'd start with a quasi-verbatim translation of the subset sum code. Here is the result. Apologies for the non-idiomatic and naïve implementation.

import random import array import sys numbers = array.array('i') flags = array.array('c') solutions = 0 def find_solutions(k, target_sum): global solutions if target_sum == 0: print " Solution:", for i in range(0, len(numbers)): if flags[i] != 0: print numbers[i], print solutions = solutions + 1 else: if k < len(numbers): if (numbers[k] * (len(numbers)-k+1)) >= target_sum: if target_sum >= numbers[k]: flags[k] = 1 find_solutions(k+1, target_sum - numbers[k]) flags[k] = 0 find_solutions(k+1, target_sum) def find_subset_sum(target_sum): global solutions global flags print "Subsets which sum up to %s:" % target_sum flags = [0] * len(numbers) find_solutions(0, target_sum) print "Found", solutions, "different solutions" def subset_sum_test(size): global numbers total = 0 print "Random values:\n ", for i in range(0, size): numbers.append(random.randint(0, 1000)) total = total + numbers[i] print numbers[i], print numbers = sorted(numbers, reverse = True) target_sum = total/2 find_subset_sum(target_sum) subset_sum_test(15 if len(sys.argv) < 2 else int(sys.argv[1]))

See also A Subset Of Python.

-- ClausBrod - 30 Jun 2013

And here is an implementation in C#:

using System; namespace SubsetSum { class SubsetSum { private int[] numbers; private bool[] flags; private int findSolutions(int k, int targetSum, int solutions=0) { if (targetSum == 0) { Console.Write(" Solution: "); for (int i=0; i<numbers.Length; i++) { if (flags[i]) { Console.Write("{0} ", numbers[i]); } } Console.WriteLine(); solutions++; } else { if (k < numbers.Length) { if ((numbers[k] * (numbers.Length - k + 1)) >= targetSum) { if (targetSum >= numbers[k]) { flags[k] = true; solutions = findSolutions(k + 1, targetSum - numbers[k], solutions); flags[k] = false; } solutions = findSolutions(k + 1, targetSum, solutions); } } } return solutions; } public void solve() { Array.Sort(numbers, (x, y) => y - x); // sort in reverse order Array.Clear(flags, 0, flags.Length); int total = 0; Array.ForEach(numbers, (int n) => total += n); int solutions = findSolutions(0, total / 2); Console.WriteLine("Found {0} different solutions.", solutions); } SubsetSum(int size) { numbers = new int[size]; Random r = new Random(); for (int i = 0; i < size; i++) { numbers[i] = r.Next(1000); Console.Write("{0} ", numbers[i]); } Console.WriteLine(); flags = new bool[size]; } public static void Main(string[] args) { int size = args.Length > 1 ? int.Parse(args[1]) : 15; new SubsetSum(size).solve(); } } }

See also http://stackoverflow.com/questions/2353497/the-subsets-sum-problem-and-the-solvability-of-np-complete-problems

Revisions: | r1.23 | > | r1.22 | > | r1.21 | Total page history | Backlinks

Copyright © 1999-2019 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback