I'm trying to design a data structure that allows efficient extraction of entries from a part of their content.
Let's say I am looking for an entry that matches this: [ x 2 3 x x ]
If [ 0 2 3 4 5 ] or [ 3 2 3 7 8 ] are in my data structure, they should be returned by my find function.
I wrote such a data structure where I compare the "pattern" against all entries of the data structure, but of course it takes way too much time. I have few ideas about how to do this a faster way but they are quite heavy to implement. Does something like this already exists? If not, how would you do this?
-
What you're trying to implement looks a lot like tuple space.
Ben : Yes it is actually what I want to do but I need I wanted a hint for implementing it. I guess I'll to look at the code! Thanksvartec : Let me guess. Distributed programming class homework? ;-)Ben : Nop ! that is real work :/vartec : Oh, then why not use already available solutions?Ben : Well I have specific needs but I simplified a bit the problem to ask my question. Available solutions are not exactly what I want, but they are a good start for implementation, I am on it. -
Well, a Suffix tree could do it in O(1), but it would take a LOT of memory.
j_random_hacker : +1. Assuming you have multiple entries, what you really want is called a generalised suffix tree, which is formed by concatenating all the entries together, using distinct (unique) separator symbols between each. Suffix arrays are a (theoretically) slower but much more memory-efficient alternative.j_random_hacker : P.S.: Suffix trees and arrays rock, but they are *hard* to implement. If you can't find an existing implementation, give yourself a solid 2 weeks to try to understand Ukkonnen's or McCreight's original papers on linear-time construction algorithms.Darius Bacon : How specifically does it do it in O(1)? You can retrieve all the entries containing a particular substring in O(length(substring)), if that's what you mean. I think the given problem does not restrict the known bits to be a single contiguous substring, either.Vilx- : I think it does restrict it to a single contiguous substring, but yes, it's not very clear on this. As for O(1) - well, it depends on how you look at it. The Big O in this case depends only on the length of the substring, not the amount of entries in the dictionary.Vilx- : Clasically when one talks about searching/sorting algorithms he measures the Big O in respect to the number of entries, not the length of them. From that point of view it's O(1). -
The first thing that comes to mind is to have a hash table for each position in the tuple. For searching you intersect the results for all the positions with a specified value.
dmeister : To hash each item is a starting point, but for large answer sets the intersection takes time and consumes memory. Consider also to hash each subset of a specific window size. -
You might take a look at the RETE algorithm. This general problem came up in the AI systems of the 70s; Chapter 14 in Paradigms of AI Programming covers it. (IIRC there's mainly elaborations of starblue's answer and decision trees.)
0 comments:
Post a Comment