Hello, Antei, you wrote: I>>> Hello, samius, you wrote: A> Thoughts aloud... A> And that if to take two algorithm and: A> - on the basis of input 5-7 string columns to count two and A> - them, i.e. as a result we receive A> HashFunction1 (col1, col2..., col7) + delimiter + HashFunction2 (col1, col2..., col7) I like this train of thought. Yes, by magnification kol-va various functions until the size of a resultant does not exceed upper limit of the size of the input data with a store, we receive practically a uniqueness warranty. But there is a way a bit easier: ToStream (col1, col2... col7).Zip ().ToBaseXX () where Zip we replace with favourite or accessible algorithm of lossless compression. If the size of surrogate key is not important, it is possible to lower compression. XX in ToBaseXX - a favourite method to present an array byte the text for convenience of operation. A> it should reduce probability practically to 0. A> That think (practically, not )? That practically that , the nonzero percent of collisions does not conduct to vital issues until collisions do not become too much. Even a principle of operation the collisions at a choice ideal functions (with unique results). spreads received on an array which length is selected proceeding from the size of a collection. Problems with productivity begin when is picked up incorrectly and at reversal it is necessary to sort out too many collisions and to compare many keys, before to return the answer on request. And that, comparing of keys normally not a problem until they do not appear too "nearby". For example, let a field col1 - a long line also what to understand that two keys different, it is necessary to transverse in comparing to the middle of two lines. If at two collisions keys not too nearby check of pair characters or lengths of lines normally already shows an inequality of keys. A> I remind an original problem: A> - I from one (1) do of another (2) A> - in 1 I can create a column/field for a substitute and count it at insertion A> - from 2 on 1 it is necessary to do "on the fly" Here, now I understand, whence the uniqueness requirement. For a surrogate key insertion. At once then a specifying question. it is necessary to do in storage? Or changed 1 it will be saved and will become through DB mechanisms? If in storage the substitute is not necessary. It is enough to construct on 1 where will be Tuple <T1..., T7>, and value - a line 1. itself calculates a combination , and understands with collisions. At first I want to understand, whether such decision basically goes, then it it will be possible on storage and speed of operation. In a limiting case can be generally HashSet <DataRow> with special , i.e. does not demand storage for storage of keys.