1

Topic: Algorithm of an estimation of coincidence

Greetings! There is a table in a DB in which, "words" are stored in eight columns,  with some user. Each "word" has some "weight", the total of scales of all eight "words" makes 100. The task in implementing algorithm of an estimation of coincidence of "words" at different users, for  . There are some thoughts, as well as there is a decision "" (wildly nonoptimal algorithmically)... As though you solved the task?  the DB is far off. A variant with table loading in storage for the subsequent handling - the bad variant since it is a lot of records both the expenditure of storage and load time will be very great.

2

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: X> the task in implementing algorithm of an estimation of coincidence of "words" at different users, for  . What is understood as coincidence? Also what you want to receive on an output?

3

Re: Algorithm of an estimation of coincidence

Hello, Lexey, you wrote: L> That is understood as coincidence? Well... Kol-in identical "words" at different users... L> Also what you want to receive on an output? An estimation from zero to hundred where a zero - any "word" did not coincide, or hundred - all words coincided.

4

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: L>> That is understood as coincidence? X> well... Kol-in identical "words" at different users... At pairs of users? Or at the arbitrary subsets of users? L>> also what you want to receive on an output? X> an estimation from zero to hundred where a zero - any "word" did not coincide, or hundred - all words coincided. One number on all or one number on pair (group) of users? How this number should be calculated? P.S. It is heavy, when the sense of a question should be drawn out from the asking.

5

Re: Algorithm of an estimation of coincidence

Hello, Lexey, you wrote: L> At pairs of users? Or at the arbitrary subsets of users? At all users available in the table. L> one number on all or one number on pair (group) of users? One number on one user. In a topic specified: each "word" has some "weight", the total of scales of all eight "words" makes 100. L> How this number should be calculated? To each "word" "weight" is appropriated. 100 - means all eight "words" of one user coincided with all eight words of other users. L> P.S. It is heavy, when the sense of a question should be drawn out from the asking. Simply you or do not understand as work  , or do not want to think of it...

6

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: L>> P.S. It is heavy, when the sense of a question should be drawn out from the asking. X> it is simple you or do not understand as work  , or do not want to think of it... Simply give an example the data. You do not subtilize you a finger show

7

Re: Algorithm of an estimation of coincidence

Hello, kov_serg, you wrote: _> Simply give an example the data. Well it is banal: at ' user1 ': a, b, c, d, e, f, g, h at ' user2 ': a, 1, c, 1, 1, 2, g, h at ' user3 ': a, b, c, 1, 1, f, g, h i.e. here at all users any of "words" coincides, someone has more than coincidence, someone - has less.

8

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: X> there is a table in a DB in which, "words" are stored in eight columns,  with some user. X> each "word" has some "weight", the total of scales of all eight "words" makes 100. You need detection of clusters. And as it to make - a heap of methods.

9

Re: Algorithm of an estimation of coincidence

Hello, Glory, you wrote: you need detection of clusters. And as it to make - a heap of methods. I read. Thanks.

10

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: X> well it is banal: X> at ' user1 ': a, b, c, d, e, f, g, h X> at ' user2 ': a, 1, c, 1, 1, 2, g, h X> at ' user3 ': a, b, c, 1, 1, f, g, h X> i.e. here at all users any of "words" coincides, someone has more than coincidence, someone - has less. All 8 words for all users identical and in columns a share of this word? name topic1 topic2 topic3... topic8 user1 10 20 0... 60 user2 0 0 0... 100 user3 11 19 0... 50... userN x1 x2 x3... x8 eight measured vectors. So? Cluster analysis

11

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: L>> P.S. It is heavy, when the sense of a question should be drawn out from the asking. X> it is simple you or do not understand as work  , or do not want to think of it... Or you are not capable normal task to formulate. Also you offer another it for you to finish thinking. Here only, probability of that thus it turns out to guess that it is really necessary for you, rather low.

12

Re: Algorithm of an estimation of coincidence

Hello, Lexey, you wrote: L> Or you are not capable normal task to formulate.... Or you are not capable to understand. Glory and kov_serg somehow understood all thanks, the question is closed.

13

Re: Algorithm of an estimation of coincidence

Hello, niXman, you wrote: X> greetings! X> there is a table in a DB in which, "words" are stored in eight columns,  with some user. X> each "word" has some "weight", the total of scales of all eight "words" makes 100. X> the task in implementing algorithm of an estimation of coincidence of "words" at different users, for  . X> there are some thoughts, as well as there is a decision "" (wildly nonoptimal algorithmically)... X> as though you solved the task? X>  X> the DB is far off. X> a variant with table loading in storage for the subsequent handling - the bad variant since it is a lot of records both the expenditure of storage and load time will be very great. Such tasks "in that specific case", as a rule, dare much easier, than "generally". Specify as as are arranged weight of separate fields of the table and what threshold of detection. Probably, the threshold can be exceeded only two-three methods, or coincidence on one of two-three "principal" fields - is inevitable. Then it is possible to manage small number of indexes in the table. As a last resort, get eight indexes: on each of fields. Still look at a statics of coincidence. Cooks the expected size of basis, number of unique values for each of fields? What probability of coincidence of a field for two casual records? What greatest cluster size of "identical" values for each of fields? Whether allows a DB to receive number of records with a preset value of a field during O (1) (on an index)?