Wednesday, March 23, 2011

Algorithm to find subset within two sets of integers whose sums match

Hi,

I'm looking for an algorithm which can take two sets of integers (both positive and negative) and find subsets within each that have the same sum.

The problem is similar to the subset sum problem: http://en.wikipedia.org/wiki/Subset-sum_problem except that I'm looking for subsets on both sides.

Here's an example:

List A {4, 5, 9, 10, 1}

List B {21, 7, -4, 180}

So the only match here is: {10, 1, 4, 9} <=> {21, 7, -4}

Does anyone know if there are existing algorithms for this kinda problems?

So far, the only solution I have is a brute force approach which tries every combination but it performs in Exponential time and I've had to put a hard limit on the number of elements to consider to avoid it from taking too long.

The only other solution I can think of is to run a factorial on both lists and look for equalities there but that is still not very efficient and takes exponentially longer as the lists get bigger..

Any help will be much appreciated!

Thanks,

Yan

From stackoverflow
  • subset sum is Np-complete and you can polynomially reduce your problem to it, so your problem is NP-complete too.

    A. Rex : Perhaps you'd like to mention the reduction: if A and B are your sets for this problem, take A union (-B) in usual subset-sum and you're looking for sum 0.
  • Like the subset sum problem, this problem is weakly NP-complete, so it has a solution that runs in time polynomial(M), where M is the sum of all numbers appearing in the problem instance. You can achieve that with dynamic programming. For each set you can generate all possible sums by filling a 2-dimensional binary table, where "true" at (k,m) means that a subset sum m can be achieved by picking some elements from the first k elements of the set.

    You fill it iteratively - you set (k,m) to "true" if (k-1,m) is set to "true" (obviously, if you can get m from k-1 elements, you can get it from k elements by not picking the k-th) or if (k-1,m-d) is set to "true" where d is the value of k-th element in the set (the case where you pick the k-th element).

    Filling the table gets you all the possible sums in the last column (the one representing the whole set). Do this for both sets and find common sums. You can backtrack the actual subsets representing the solutions by reversing the process which you used to fill the tables.

    Binary Worrier : +1 mate. Bugger me, I had no idea. I have deleted my answer. Nice one :)
  • What others have said is true:

    1. This problem is NP-complete. An easy reduction is to usual subset-sum. You can show this by noting that a subset of A sums to a subset of B (not both empty) only if a non-empty subset of A union (-B) sums to zero.

    2. This problem is only weakly NP-complete, in that it's polynomial in the size of the numbers involved, but is conjectured to be exponential in their logarithms. This means that the problem is easier than the moniker "NP-complete" might suggest.

    3. You should use dynamic programming.

    So what am I contributing to this discussion? Well, code (in Perl):

    @a = qw(4 5 9 10 1);
    @b = qw(21 7 -4 180);
    %a = sums( @a );
    %b = sums( @b );
    for $m ( keys %a ) {
        print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n" if exists $b{$m};
    }
    
    sub sums {
        my( @a ) = @_;
        return () unless @a;
        my $a = shift @a;
        my %a = ( $a => [$a] );
        while( @a ) {
         $a = shift @a;
         for my $m ( keys %a ) {
          $a{$m+$a} = [@{$a{$m}},$a];
         }
        }
        return %a;
    }
    

    It prints

    sum(4 5 9 10) = sum(21 7)
    sum(4 9 10 1) = sum(21 7 -4)
    

    so, notably, there is more than one solution that works in your original problem!

  • Hi guys,

    Thanks a lot for all the quick responses!

    The dynamic programming solution is not really different to the exhaustive approch we have right now and I guess if we need the optimal solution we do need to consider every possible combination but the time it takes to generate this exhaustive list of sums are too long.. Did a quick test and the time it takes to generate all possible sums for x number of elements very quickly go over 1 min:

    11 elements took - 0.015625 seconds
    12 elements took - 0.015625 seconds
    13 elements took - 0.046875 seconds
    14 elements took - 0.109375 seconds
    15 elements took - 0.171875 seconds
    16 elements took - 0.359375 seconds
    17 elements took - 0.765625 seconds
    18 elements took - 1.609375 seconds
    19 elements took - 3.40625 seconds
    20 elements took - 7.15625 seconds
    21 elements took - 14.96875 seconds
    22 elements took - 31.40625 seconds
    23 elements took - 65.875 seconds
    24 elements took - 135.953125 seconds
    25 elements took - 282.015625 seconds
    26 elements took - 586.140625 seconds
    27 elements took - 1250.421875 seconds
    28 elements took - 2552.53125 seconds
    29 elements took - 5264.34375 seconds
    

    which for the business problem we're trying to solve is not really acceptable.. I'm gonna go back to the drawing board and see whether we do indeed need to know all the solutions or can we just do with one (the smallest/largest subset, e.g.) instead and hopefully that can help simply the problem and make my algorithm perform to expectaion.

    Thanks all the same!

  • How the problem is solved. i too face the time issue when i use the perl code.

0 comments:

Post a Comment