1

Topic: Generation unique for several string fields

In data array (not a DB), if to present in the form of table, is available about 5-7 key columns which combination is unique defines record in an array. This array it is necessary  from another similar and, not to pull all 7 columns, it would be desirable to use any unique  instead of them. Advise algorithm/library allowing quickly   unique  on the basis of several lines. Thanks.

2

Re: Generation unique for several string fields

Hello, Antei, you wrote: A> In data array (not a DB), if to present in the form of table, is available about 5-7 key columns which combination is unique defines record in an array. This array it is necessary  from another similar and, not to pull all 7 columns, it would be desirable to use any unique  instead of them. A> Advise algorithm/library allowing quickly   unique  on the basis of several lines. A> thanks. _1+_2+ ... Znachenie_kolonkiH; precisely gives unique value if the dial-up of values in line is unique ... Truth about "speed" of generation such "" difficult something to assume in advance

3

Re: Generation unique for several string fields

Hello, Antei, you wrote: A> Advise algorithm/library allowing quickly   unique  on the basis of several lines. A> thanks. I advise to address to determination  to function and a principle of Dirichlet. Determination states that  can be used for calculation of value of the fixed length for input sequence of arbitrary length. And a principle of Dirichlet: If rabbits are seated in cells, and the number of rabbits is more than number of cells then in one of cells is more than one rabbit. From here directly follows that uniqueness of area of values  functions is possible only at the hash size exceeding any input sequence. And it contradicts determination  to function.

4

Re: Generation unique for several string fields

Hello, Antei, you wrote: A> In data array (not a DB), if to present in the form of table, is available about 5-7 key columns which combination is unique defines record in an array. This array it is necessary  from another similar and, not to pull all 7 columns, it would be desirable to use any unique  instead of them. A> Advise algorithm/library allowing quickly   unique  on the basis of several lines. Guid. NewID (). But correctly it is called - a substitute (primary) key.

5

Re: Generation unique for several string fields

6

Re: Generation unique for several string fields

Hello, samius, you wrote: S> S> If rabbits are seated in cells, and the number of rabbits is more than number of cells then in one of cells is more than one rabbit. S> from here directly follows that uniqueness of area of values  functions is possible only at the hash size exceeding any input sequence. And it contradicts determination  to function. On the other hand, if it is a lot of cells, and real-life (instead of basically possible) not so it is a lot of rabbits, also transplant them on cells uniformly, though in one cell theoretically and there can be two rabbits, the practical probability of this event is insignificant (we tell, is qualitative less than that in a cell really appeared two rabbits, instead of is simple at  in eyes forks). Therefore programs which rely not mismatch of a hash in case of mismatch of the input data, have the right to existence. But it is important to apply a hash good, instead of  what.

7

Re: Generation unique for several string fields

8

Re: Generation unique for several string fields

Hello, Pzz, you wrote: Pzz> Hello, samius, you wrote: S>> From here directly follows that uniqueness of area of values  functions is possible only at the hash size exceeding any input sequence. And it contradicts determination  to function. Pzz> on the other hand if it is a lot of cells, and real-life (instead of basically possible) not so it is a lot of rabbits, also transplant them on cells uniformly, though in one cell theoretically and there can be two rabbits, the practical probability of this event is insignificant (we tell, is qualitative less than that in a cell really appeared two rabbits, instead of is simple at  in eyes forks). Pzz> Therefore programs which rely not mismatch of a hash in case of mismatch of the input data, have the right to existence. But it is important to apply a hash good, instead of  what. But the HARDWARE wanted uniqueness warranties so that selected  a word "". I.e. Any decision which is not guaranteeing absence of collisions, does not approach it.

9

Re: Generation unique for several string fields

Hello, samius, you wrote: S> Hello, Pzz, you wrote: Pzz>> Hello, samius, you wrote: It agree, samius, thanks for so clear  with rabbits Also is forced to agree with Carc:  a unique variant which can be constructed on the fly concatenation, in the presence of a unique separator, certainly can to serve. Since  the key can be big, I think, it is possible to construct a composite index: a column 1) any , we tell murmur64 a column 2)  through  a key the Column 1) will be the small size and allows to narrow quickly search in an index the Column 2) provides uniqueness

10

Re: Generation unique for several string fields

Hello, samius, you wrote: Pzz>> Therefore programs which rely not mismatch of a hash in case of mismatch of the input data, have the right to existence. But it is important to apply a hash good, instead of  what. S> but the HARDWARE wanted uniqueness warranties so that selected  a word "". I.e. any decision which is not guaranteeing absence of collisions, does not approach it. What there is a warranty? For example, if the probability of a collision is a million times less than probability of machine failure, it is a warranty? And if in billion? And if in one million billions?

11

Re: Generation unique for several string fields

Hello, Pzz, you wrote: Pzz> Hello, samius, you wrote: S>> But the HARDWARE wanted uniqueness warranties so that selected  a word "". I.e. any decision which is not guaranteeing absence of collisions, does not approach it. Pzz> that there is a warranty? The warranty is lyrics. It as is "absolutely exact" or "tooth I give". I.e. in application to absolutely unambiguous requirement of uniqueness of function values (it is readable  to that there is a determination), the word "" does not weaken determination  and as on the contrary, strengthens it (would be where). Pzz> For example if the probability of a collision is a million times less than probability of machine failure, it is a warranty? And if in billion? And if in one million billions? It is a warranty  displays. The warranty  is when probability of a collision = 0 if it is more convenient to operate with probabilities. Certainly, it only my understanding of warranties of uniqueness. If there is another, ask to result it.

12

Re: Generation unique for several string fields

13

Re: Generation unique for several string fields

Hello, samius, you wrote: S> Contradicts. For any beforehand given size of an output bit line there will be an array from the input data, output lines exceeding length. I.e. rabbits do not get into cells. Presence of collisions - a direct consequence from determination. All the same about rabbits is not determination. And here in determination the bit line is not restricted also it can be similar on the size maximum of the possible initial data (for example convolution function returns initial given "finished" in the zero) and even more. In this case uniqueness is guaranteed, but, unconditionally, usefulness - under a huge question.

14

Re: Generation unique for several string fields

Hello, itslave, you wrote: I> Hello, samius, you wrote: S>> Contradicts. For any beforehand given size of an output bit line there will be an array from the input data, output lines exceeding length. I.e. rabbits do not get into cells. Presence of collisions - a direct consequence from determination. I> all the same about rabbits is not determination. Certainly, no. This statement which can be formulated differently. I> And here in determination the bit line is not restricted also it can be similar on the size maximum of the possible initial data (for example convolution function returns initial given "finished" in the zero) and even more. In this case uniqueness is guaranteed, but, unconditionally, usefulness - under a huge question. In determination  functions the input data of arbitrary length. It means that there is no element of the maximum length. Always there will be an element, on unit . Yes, in practice are used  superfluous length. But uniqueness warranties  come to an end there where the size of input set exceeds 2^n - 1 where n - the size of a bit line . Thus, even ideal  functions (perfect hash) are not  functions on determination.

15

Re: Generation unique for several string fields

Hello, samius, you wrote: S> In determination  functions the input data of arbitrary length. It means that there is no element of the maximum length. Always there will be an element, on unit . Yes, in practice are used  superfluous length. But uniqueness warranties  come to an end there where the size of input set exceeds 2^n - 1 where n - the size of a bit line . Thus, even ideal  functions (perfect hash) are not  functions on determination. Perfectly, it agree. In the real world, always there is a maximum length of the input data. And with this assumption,  existing always in this Universe, we can guarantee presence unique "" for all types of the input data.

16

Re: Generation unique for several string fields

Hello, itslave, you wrote: I> Hello, samius, you wrote: I> it is excellent, it agree. In the real world, always there is a maximum length of the input data. And with this assumption,  existing always in this Universe, we can guarantee presence unique "" for all types of the input data. It is difficult to disagree with it. In the world where files do not happen more 100, function id with the size of an output line 100 completely satisfies us concerning uniqueness  contents of files.

17

Re: Generation unique for several string fields

Hello, samius, you wrote: S> it is difficult to disagree With it. In the world where files do not happen more 100, function id with the size of an output line 100 completely satisfies us concerning uniqueness  contents of files. Yes, I already some times marked that from the practical point of view it is difficult to invent  in which " unique hashes" help to solve a problem. Was specific a case the starter topic (communication between two tables with huge natural primary key), is necessary to it substitute unique key that I and  at once but why that was received by a minus from the HARDWARE.

18

Re: Generation unique for several string fields

Hello, itslave, you wrote: I> Guid. NewID (). But correctly it is called - a substitute (primary) key. How many time repeated a pattern: generation  with enough high frequency gives sooner or later two identical values successively.

19

Re: Generation unique for several string fields

SHA-256. Warranties nobody gives, but in practice of coincidence will not be.

20

Re: Generation unique for several string fields

Hello, Mr. Delphist, you wrote: I>> Guid. NewID (). But correctly it is called - a substitute (primary) key. MD> how many time repeated a pattern: generation  with enough high frequency gives sooner or later two identical values successively. It whence such information?

21

Re: Generation unique for several string fields

Hello, Mr. Delphist, you wrote: MD> how many time repeated a pattern: generation  with enough high frequency gives sooner or later two identical values successively. Thanks, for a long time so did not neigh.

22

Re: Generation unique for several string fields

I>> Hello, samius, you wrote: Thoughts aloud... And that if to take two  algorithm and: - on the basis of input 5-7 string columns to count two  and -  them, i.e. as a result we receive HashFunction1 (col1, col2..., col7) + delimiter + HashFunction2 (col1, col2..., col7) It should reduce probability  practically to 0. What think (practically, not )? I remind an original problem: - I from one  (1) do  of another (2) - in 1 I can create a column/field for a substitute and count it at an insertion - from 2  on 1 it is necessary to do "on the fly"

23

Re: Generation unique for several string fields

Hello, Antei, you wrote: I>>> Hello, samius, you wrote: A> Thoughts aloud... A> And that if to take two  algorithm and: A> - on the basis of input 5-7 string columns to count two  and A> -  them, i.e. as a result we receive A> HashFunction1 (col1, col2..., col7) + delimiter + HashFunction2 (col1, col2..., col7) A> It should reduce probability  practically to 0. A> That think (practically, not )? A> I Remind an original problem: A> - I from one  (1) do  of another (2) A> - in 1 I can create a column/field for a substitute and count it at insertion A> - from 2  on 1 it is necessary to do "on the fly" "If to collect 9 women together, all of them give equally birth to the child only in 9 months, and in any way through one" () And what if all the same to reformulate a start question? More accurately  the epithet "" for as already explained, is not present and cannot be in hash functions of an absolute warranty. But there is a comprehensible.  to paint a word "quickly". How fast? Data volumes ... ... It is not necessary to confuse concepts "" and "" are different things. Examples of the data are necessary. PS: easier  to steam of hash functions and to run their test. Is banal to push into the test pair  () tuples of the data,  considers, yes in collision broad gulls writes down. And most  beer to drink. Then to look at broad gulls, and at once it becomes clear, what variants are comprehensible that on collisions that on speed of operation. And variants mass: MD5, CRCxx and still a coach and the small cart ...

24

Re: Generation unique for several string fields

Hello, Antei, you wrote: I>>> Hello, samius, you wrote: A> Thoughts aloud... Here it is fair, I do not understand whence such persistence in ??? Companion Kodd with Date of years  40-50 solved this problem fundamentally back, it is possible to tell closed a topic. Both uniqueness provided also logarithmic complexity of algorithm, and even convenient and declarative followers invented language. Take and use, implementations one million, on any  the data, taste and a purse. But is not present, it is necessary to break a forehead and the a hash function to invent, which suddenly, at achievement of significant data sampling some statistically starts to beat painfully and regularly on . I in an emphasis do not understand motivation, explain .

25

Re: Generation unique for several string fields

Hello, Antei, you wrote: I>>> Hello, samius, you wrote: A> Thoughts aloud... A> And that if to take two  algorithm and: A> - on the basis of input 5-7 string columns to count two  and A> -  them, i.e. as a result we receive A> HashFunction1 (col1, col2..., col7) + delimiter + HashFunction2 (col1, col2..., col7) I like this train of thought. Yes, by magnification kol-va various functions until the size of a resultant  does not exceed upper limit of the size of the input data with a store, we receive practically a uniqueness warranty. But there is a way a bit easier: ToStream (col1, col2... col7).Zip ().ToBaseXX () where Zip we replace with favourite or accessible algorithm of lossless compression. If the size of surrogate key is not important, it is possible to lower compression. XX in ToBaseXX - a favourite method to present an array byte the text for convenience of operation. A> it should reduce probability  practically to 0. A> That think (practically, not )? That practically that , the nonzero percent of collisions does not conduct to vital issues until collisions do not become too much. Even a principle of operation   the collisions at a choice ideal  functions (with  unique results).  spreads received  on an array  which length is selected proceeding from the size of a collection. Problems with productivity  begin when  is picked up incorrectly and at reversal it is necessary to sort out too many collisions and to compare many keys, before to return the answer on  request. And that, comparing of keys normally not a problem until they do not appear too "nearby". For example, let a field col1 - a long line also what to understand that two keys different, it is necessary to transverse in comparing to the middle of two lines. If at two collisions keys not too nearby check of pair characters or lengths of lines normally already shows an inequality of keys. A> I remind an original problem: A> - I from one  (1) do  of another (2) A> - in 1 I can create a column/field for a substitute and count it at insertion A> - from 2  on 1 it is necessary to do "on the fly" Here, now I understand, whence the uniqueness requirement. For a surrogate key insertion. At once then a specifying question.  it is necessary to do in storage? Or changed 1 it will be saved and  will become through DB mechanisms? If in storage the substitute is not necessary. It is enough to construct on 1  where  will be Tuple <T1..., T7>, and value - a line 1.  itself calculates a combination , and  understands with collisions. At first I want to understand, whether such decision basically goes, then it it will be possible  on storage and speed of operation. In a limiting case can be generally HashSet <DataRow> with special , i.e. does not demand storage for storage of keys.