1

Topic: Fast search in the list wildcard-corrected

Greetings! Prompt, please, good (i.e. effective in respect of speed) algorithm for the following task. On a function input the text line moves. It is necessary to check up, whether it coincides on a mask with at least single line from the list of rules. Each rule is a text line which can contain a wildcard-symbol *. I Will give an example: Input: /path/hello.txt Rules: *.pdf; *.txt; *someapp *;*path*/*ello* (here there will be coincidence on *.txt and on *path*/*ello *) we Assume that such checks are fulfilled many once a second, and the list corrected contains thousand and even millions elements. It would be desirable not to transit under all list, and to use something like binary search. The list corrected is formed dynamic, in a program operating time, therefore to build of it any precompiled model it is impossible. Also it is impossible to use means with dynamic code generation (JIT, etc.) - Such restrictions of the environment. It would be desirable to find not so much implementation (a C/C ++), how many the description of the algorithm. Thanks!

2

Re: Fast search in the list wildcard-corrected

Hello, okman, you wrote: O> the List corrected is formed dynamic, in a program operating time, therefore to build of it O> any precompiled model it is impossible. Also it is impossible to use means with dynamic O> code generation (JIT, etc.) - such restrictions of the environment. At such restrictions quickly will not be . It is possible to support three structures: a prefix tree,  a tree and the list of templates of a type *someapp*. We check at first on prefixes/suffixes, then by a remaining part of a template.

3

Re: Fast search in the list wildcard-corrected

Hello, , you wrote: > From the DB theory it is known, what expression like ' %- % ' does at once indexes ineffective and leads to full  tables And what, even  search does not help?

4

Re: Fast search in the list wildcard-corrected

Hello, , you wrote: > From the DB theory it is known that expression like ' %- % ' does at once indexes ineffective It looking what indexes. Another matter that in the given task an index not to construct, since the dial-up of the input data is not known in advance.

5

Re: Fast search in the list wildcard-corrected

Hello, okman, you wrote: O> Prompt, please, good (i.e. effective in respect of speed) algorithm for the following task. O> on a function input the text line moves. It is necessary to check up, whether it coincides on a mask with O> at least single line from the list of rules. Each rule is a text line which can O> contain a wildcard-symbol *. I Will give an example: O> Input: /path/hello.txt O> Rules: *.pdf; *.txt; *someapp *;*path*/*ello* O> (here there will be coincidence on *.txt and on *path*/*ello *) O> we Assume that such checks are fulfilled many once a second, and the list corrected O> contains thousand and even millions elements. It would be desirable not to transit under all list, and O> to use something like binary search. Never tried it and the more so I do not know, how it will work on millions rules. But as, judging by the task, you all the same which works, important only worked or not from all it is possible to make the big-big finite state machine of them, and it processes a line for the linear time and tells, approached or did not approach. Linear concerning length of a line, instead of an amount of rules, how many there rules absolutely not important. But whether I turns out to make its made is not assured. I once wrote article about such automatic machines http://rsdn.org/article/alg/nka.xml the Author (): Sergey Holodilov Data: 7/30/2007 In article the interesting slice of the theory of calculations is described small, but all the same. The mission of article - to serve ,  which, readers continue learning of this theory.