1

Topic: Spell checker (spelling)

Greetings to all! There is a text with wrong words which should be corrected according to words in the dictionary. And: - for word change 2 operations are admitted only: an insertion and-or character removal; - if it is two insertions or two removals these two characters should not be nearby; If under correction some words from the dictionary approach to display them in the text in curly brackets. An example. The dictionary: rain the his main mainly plain the Text: hte mainy lain As a result should be: the {main mainly} plain At first thought that will enough use algorithm of a distance of Levenshtejna. For example, if to compare "hte" with "the" it produces length 2, for "his" too. And for the algorithm it is correct (since works as changeovers), but under the job approaches only "the", since two actions: at first removal in the beginning of a word of "h character and then adding"h"between"t"and"e"." his "does not approach, since for word coercion" hte "to" his "actions (removals-inserts, instead of changeovers) will be more. Who can faced already, in what side will correctly think?

2

Re: Spell checker (spelling)

Hello, serghd, you wrote: S> Greetings to all! S> who can faced already, in what side will correctly think? To me Rabinovich by phone sang at me the friend works over spellchecker. Told that except distance of Levenshtajna they still hang weight factor on each change. For example on a misprint z/x there should be a coefficient less, than on a misprint q/m. Well and the distance of Levenshtajna does not provide simple swap of two characters, as in your example hte instead of the. It should be separate change with low enough coefficient.

3

Re: Spell checker (spelling)

Hello, serghd, you wrote: S> who Can faced already, in what side will correctly think? Is better towards change of a trade And if it is serious, in what a problem? You take from friends the abstract of lectures, you find a subject "dynamic programming" you read and you do If shortly you get set of triples of numbers: (a position in the sample, a position in a word from the dictionary, number already introduced errors). On the first to a step to set you put all one triple from three 0. Then in a cycle while in set there is at least one triple do the following: on current set of triples, you build the following. Each triple of numbers can be "continued" in several ways. 1) if characters in leaking a position of a word from the dictionary and the sample, coincide, it is possible on 1 to increase both positions. 2) if the number of errors is less than limit and the sample yet did not end, it is possible to increase by 1 position in the sample and number of errors 3) if the number of errors of less limit and a word from the dictionary yet did not end, it is possible to increase by 1 position in a word from the dictionary and number of errors 4) if reach the end of both words (and the sample and a word from the dictionary), and the limit of errors is not exceeded yet we found that we search. Accordingly you continue (you put in the set, corresponding a trace. To an algorithm step) all variants of continuation. Also you twist a cycle, yet you will not find or set a trace. The step will not be empty. As we at least one of positions increase each step, steps will be no more, than length of the sample and a word from the dictionary in the total. As we have a limit on number of errors "width" of search will be restricted by a constant depending, as O (limit). So algorithm linear on a limit and lengths. Yes, that I described above does not superimpose restriction That insertions cannot go successively. Itself invent, as it there to add. Creative Uzbeks and so on. p. s. Do not thank, in sense for "thanks" here there are buttons

4

Re: Spell checker (spelling)

Hello, Erop, you wrote: E> Is better towards change  "Than question layout in Russian and not Russian forums differs? In the first case what you , and then, perhaps, as a matter of fact" you at first tell seriously or simply hereditary?

5

Re: Spell checker (spelling)

Hello, serghd, you wrote: E>> Is better towards change  S> "Than question layout in Russian and not Russian forums differs? In the first case at first tell what you , and then, perhaps, as a matter of fact" I about what someone , wrote nothing, in difference from S> you seriously or simply hereditary? About trade change, yes, seriously it is necessary to think, as you were interested by the algorithm not described in my answer, and council to replace a trade...

6

Re: Spell checker (spelling)

Hello, Erop, you wrote: E> About trade change, yes, seriously it is necessary to think, as you were interested by the algorithm not described in my answer, and council to replace a trade... The Algorithm has been implemented by means of intersections of possible variants from the dictionary and the text long before your answer, thanks.