Monday, April 25, 2011

Java: What is the best way to find elements in a sorted List?

I have a

List<Cat>

sorted by the cats' birthdays. Is there an efficient Java Collections way of finding all the cats that were born on January 24th, 1983? Or, what is a good approach in general?

From stackoverflow
  • Collections.binarySearch().

    Assuming the cats are sorted by birthday, this will give the index of one of the cats with the correct birthday. From there, you can iterate backwards and forwards until you hit one with a different birthday.

    If the list is long and/or not many cats share a birthday, this should be a significant win over straight iteration.

    Here's the sort of code I'm thinking of. Note that I'm assuming a random-access list; for a linked list, you're pretty much stuck with iteration. (Thanks to fred-o for pointing this out in the comments.)

    List<Cat> cats = ...; // sorted by birthday
    List<Cat> catsWithSameBirthday = new ArrayList<Cat>();
    Cat key = new Cat();
    key.setBirthday(...);
    final int index = Collections.binarySearch(cats, key);
    if (index < 0)
        return catsWithSameBirthday;
    catsWithSameBirthday.add(cats.get(index));
    // go backwards
    for (int i = index-1; i > 0; i--) {
        if (cats.get(tmpIndex).getBirthday().equals(key.getBirthday()))
            catsWithSameBirthday.add(cats.get(tmpIndex));
        else
            break;
    }
    // go forwards
    for (int i = index+1; i < cats.size(); i++) {
        if (cats.get(tmpIndex).getBirthday().equals(key.getBirthday()))
            catsWithSameBirthday.add(cats.get(tmpIndex));
        else
            break;
    }
    return catsWithSameBirthday;
    
    Jake : Collections.binarySearch() return a single element and makes no guarantees about elements which are considered identical.
    Michael Myers : Maybe that will teach me to read the question before answering. :)
    fred-o : Also, Collections.binarySearch() is only efficient for random access lists.
    Jake : Must have index < 0, but yeah: this is the general idea.
    Michael Myers : @Jake: Good point, that wouldn't have worked very well. I fixed the code.
  • Binary search is the classic way to go.

    Clarification: I said you use binary search. Not a single method specifically. The algorithm is:

    //pseudocode:
    
    index = binarySearchToFindTheIndex(date);
    if (index < 0) 
      // not found
    
    start = index;
    for (; start >= 0 && cats[start].date == date; --start);
    end = index;
    for (; end < cats.length && cats[end].date == date; ++end);
    
    return cats[ start .. end ];
    
    Jake : Collections.binarySearch() return a single element and makes no guarantees about elements which are considered identical.
    Mehrdad Afshari : I didn't say you should use `Collections.binarySearch` method. Binary search to find the index of a single element. All other elements with equal birthdays are beside the element found. You can get all of them with a single loop. It's a classic.
    slim : @Mehrdad - update the answer to reflect this and you'll earn an upvote.
  • Unless you somehow indexed the collection by date, the only way would be to iterate over all of them

    Mehrdad Afshari : It's sorted by date. What else would you possibly call an index!?
    Jake : I doubt that is the *only* way. Certainly you can imagine a "low level" algorithm that does this very efficiently by finding the first occurrence of Cats with a given birthday and proceeding linearly from there.
    Mehrdad Afshari : Jake: whatever algorithm you find first element will be O(n) unless you're doing some binary search. Asymptotically fastest algorithm is to binary search and linearly find the start and end after that.
  • If you need a really fast search use a HashMap with the birthday as a key. If you need to have the keys sorted use a TreeMap.

    Because you want to allow multiple cats to have the same birthday, you need to use a Collection as a value in the Hast/TreeMap, e.g.

          Map<Date,Collection<Cat>>
    
  • Google Collections can do what you want by using a Predicate and creating a filtered collection where the predicate matches dates.

    Jake : Does it filter the Collection in O(n)?
    basszero : I assume it would have to since it would apply the predicate to each element. A binary search (like the upvoted answer) is best if the list is sorted.

0 comments:

Post a Comment