Topic: Sampling unique values
I try from some thick doc-file in which there are tabular data for simplification tasks to create a DB. Only these tables formed by different people, accordingly and filled them at own discretion. Something is not obviously possible for unifying now. I will explain on cats and other dumb animals: we admit, in cells there are such records: "a cat, a dog, a parrot"or"a dog, a hamster, a canary". These values can repeat in many lines. By sampling and splitting of lines into substrings on a comma, I receive unique values for everyone ("cat", "dog", "parrot" and so forth). But some companions filled the table so: "cats," or are even worse than a dog:" Dogs domestic, thoroughbred, a cat ". That is in the table of unique animals there will be records: a dog, dogs. And if with it still somehow it is possible to be reconciled, here breaking the last record on a comma as to a separator, among animals appears certain"thoroughbred"that absolutely in any gate. I can not invent how to teach algorithm to understand, where dogs thoroughbred" come to an end "and"cat"begins. I will be grateful for the help.