Comparing iterables for same content, but not regarding order

I'm attempting to compare two Iterables in Java of same size. I only need to know that the contents are the same. However, something like [1, 2] and [1, 2, 2] should not be equal, while [1, 2, 2, 4] should equal [1, 2, 4, 2].

boolean functionName() {
    boolean pvk;
    ... setup ...
    for(Edge e : pMST.edges()) {
      pvk = false;
      for(Edge f : kMST.edges()) {
        if(e == f) {
          pvk = true;
          System.out.println("True.");
        }
      }
      if(!pvk) return false;
    }
return true;
}

There's my initial lousy attempt, but not only does this always return false, it doesn't account for duplicates properly.

4 answers

  • answered 2019-06-11 23:26 sprinter

    The most straightforward solution is to just add the entries to a sorted list and compare them. Unless you have a particular reason to be concerned about performance I'd suggest just using that:

    boolean hasSameEdges(MST mst1, MST mst2) {
        final Comparator<MST> edgeComparator = Comparator.comparing(mst ->
            mst.edges().stream().sorted().collect(Collectors.toList()));
        return edgeComparator.compare(mst1, mst2) == 0;
    }
    

  • answered 2019-06-12 01:09 WJS

    I would sort them. But first I would compare the sizes before doing the sort. You would need to provide a Comparator<T> to be used by the sort method. If you're sorting Integers, you could use:

          List<Integer> a = new ArrayList<>(List.of(1, 2, 3, 3, 3, 3, 4, 5, 6));
          List<Integer> b = new ArrayList<>(List.of(2, 3, 1, 3, 4, 5, 6, 3, 3));
          System.out.println(compareLists(a, b, Comparator.naturalOrder()));
    
       public static <T> boolean compareList(List<T> list1, List<T> list2,
             Comparator<T> comp) {
    
          if (list1 == list2) {
              return true;
          }
          if (list1.size() != list2.size()) {
             return false;
          }
          Collections.sort(list1, comp);
          Collections.sort(list2, comp);
    
          return list1.equals(list2);
       }
    

  • answered 2019-06-12 06:23 Stuart Marks

    You could sort the items and compare the resulting lists, but this is potentially slow O(n lg n) and it relies on the items either being Comparable or having a total order imposed on them by a Comparator. This might be infeasible.

    This other answer suggests using a Guava Multiset. This makes sense, as it keeps track of the elements and the count of occurrences, which is significant for your question. It should be O(n) for reasonable implementations such as a HashMultiset. Other libraries such as Apache Commons (MultiSet) and Eclipse Collections (Bag) have collection implementations that are functionally equivalent to Guava’s Multiset.

    If you don't want to include a dependency on any of these libraries, you can do this in the JDK by itself. Unfortunately Java doesn't have a Bag implementation, but for this purpose it's easy to emulate it using a Map from your item type to a count, either Integer or Long.

    If you have Lists, you can do this:

    boolean unorderedEquals(List<Item> list1, List<Item> list2) {
        Map<Item, Long> freq1 = list1.stream().collect(groupingBy(i -> i, counting()));
        Map<Item, Long> freq2 = list2.stream().collect(groupingBy(i -> i, counting()));
        return freq1.equals(freq2);
    }
    

    If you have Iterables, you need to build up the maps using forEach instead:

    boolean unorderedEquals(Iterable<Item> iter1, Iterable<Item> iter2) {
        Map<Item, Integer> freq1 = new HashMap<>();
        iter1.forEach(it -> freq1.merge(it, 1, (a, b) -> a + b));
        Map<Item, Integer> freq2 = new HashMap<>();
        iter2.forEach(it -> freq2.merge(it, 1, (a, b) -> a + b));
        return freq1.equals(freq2);
    }
    

  • answered 2019-06-12 11:38 Holger

    Combining this answer with ideas from this thread, notably this answer to create an efficient but readable solution, you may use

    static boolean unorderedEquals(Collection<?> coll1, Collection<?> coll2) {
        if(coll1.size() != coll2.size()) return false;
        Map<Object, Integer> freq = new HashMap<>();
        for(Object o: coll1) freq.merge(o, 1, Integer::sum);
        for(Object o: coll2)
            if(freq.merge(o, -1, Integer::sum) < 0) return false;
        return true;
    }
    

    The first loop creates a frequency map like in the linked answer, but instead of building a second map, to perform an expensive comparison, the second loop decreases the counts on each occurrence, returning immediately, if a count became negative. The merge method smoothly handles the case of absent keys.

    Since it has been checked right at the beginning of the method that both lists have the same size, after increasing and decreasing, the total count must be zero. Since we have proven that there are no negative numbers, as we returned immediately for them, there can’t be positive non-zero values either. So we can return true after the second loop without further checks.

    Supporting arbitrary Iterables, which differ from Collection in not necessarily having a size() method, is a bit trickier, as we can’t do the pre-check then and hence, have to maintain the count:

    static boolean unorderedEquals(Iterable<?> iter1, Iterable<?> iter2) {
        Map<Object, Integer> freq = new HashMap<>();
        int size = 0;
        for(Object o: iter1) {
            freq.merge(o, 1, Integer::sum);
            size++;
        }
        for(Object o: iter2)
            if(--size < 0 || freq.merge(o, -1, Integer::sum) < 0) return false;
        return size == 0;
    }
    

    If we want avoid the boxing overhead, we have to resort to a mutable value for the map, e.g.

    static boolean unorderedEquals(Collection<?> coll1, Collection<?> coll2) {
        if(coll1.size() != coll2.size()) return false;
        Map<Object, int[]> freq = new HashMap<>();
        for(Object o: coll1) freq.computeIfAbsent(o, x -> new int[1])[0]++;
        int[] absent = { 0 };
        for(Object o: coll2) if(freq.getOrDefault(o, absent)[0]-- == 0) return false;
        return true;
    }
    

    But I don’t think that his will pay off. For small numbers of occurrences, boxing will reuse the Integer instances whereas we need a distinct int[] object for each distinct element when using mutable values.

    But using compute might be interesting for the Iterable solution, when using it like

    static boolean unorderedEquals(Iterable<?> coll1, Iterable<?> coll2) {
        Map<Object, int[]> freq = new HashMap<>();
        for(Object o: coll1) freq.computeIfAbsent(o, x -> new int[1])[0]++;
        int[] absent = {};
        for(Object o: coll2)
            if(freq.compute(o, (key,c) -> c == null || c[0] == 0? absent:
                                          --c[0] == 0? null: c) == absent) return false;
        return freq.isEmpty();
    }
    

    which removes entries from the map when their count reaches zero, so we only have to check the map for emptiness at the end.