1

Topic: The formula for utl_match.edit_distance_similarity

All greetings!
On an input in a java-method there are 2 words and already counted distance of Levenshtejna.
Under this data it is necessary to count similarity in percentage (as in utl_match.edit_distance_similarity).
I can not pick up the formula, who can faced?
Or it is not enough such input data?

with q as (
select '  ' s1, '  ' s2 from dual union all
select ' 11 ', '  ' from dual union all
select ' 22 ', ' 3333 ' from dual
)
select
utl_match.edit_distance (s1, s2) distance;
utl_match.edit_distance_similarity (s1, s2) similarity_etalon;
round (100-utl_match.edit_distance (s1, s2) *100 / greatest (length (s1), length (s2)) / 2, 2) formula1;
round (100-utl_match.edit_distance (s1, s2) *100 / least (length (s1), length (s2)) / 2, 2) formula2;
round (100-utl_match.edit_distance (s1, s2) *100 / length (s1 || s2), 2) formula3
from q
/
DISTANCE SIMILARITY_ETALON FORMULA1 FORMULA2 FORMULA3
-------- ----------------- ---------- ---------- ----------
4 72 71.43 71.43 71.43
6 63 66.67 57.14 62.5
8 56 63.64 55.56 60

Thankful in advance.

2

Re: The formula for utl_match.edit_distance_similarity

Yes well;
Not?

ceil (100 - (utl_match.edit_distance (s1, s2)/greatest (length (s1), length (s2)) *100))

3

Re: The formula for utl_match.edit_distance_similarity

K790 wrote:

not?

ceil (100 - (utl_match.edit_distance (s1, s2)/greatest (length (s1), length (s2)) *100))

Not.

FORMULA4
--------
43
34
28

4

Re: The formula for utl_match.edit_distance_similarity

For multibyte the part of bytes implicitly coincides. In utf-8 comparing of identical letters gives 2 units of coincidence, the majority of different letters gives coincidence of one of two bytes on the first byte. In a case and "" on only to the second.

5

Re: The formula for utl_match.edit_distance_similarity

In general put was in .
As it was possible to clarify  a way, in the native formula lengths of lines bytes undertake:

ceil (100 - utl_match.edit_distance (s1, s2) / greatest (length>>> b <<<(s1), length>>> b <<<(s2)) *100) formula

As consequence - difference in 4 Russian letters weighs in % as much, how many in 2 English words at equal length that does not please.
[spoiler]

with q as (
select '  ' s1, '  ' s2 from dual union all
select ' 11 ', '  ' from dual union all
select ' 22 ', ' 3333 ' from dual union all
select ' Steven ', ' Stephen ' from dual
)
select
s1 || '. ' || s2 words;
length (s1) len1;
length (s2) len2;
lengthb (s1) lenb1;
lengthb (s2) lenb2;
utl_match.edit_distance (s1, s2) distance;
utl_match.edit_distance_similarity (s1, s2) etalon;
ceil (100 - utl_match.edit_distance (s1, s2) / greatest (lengthb (s1), lengthb (s2)) *100) formula
from q
/
WORDS LEN1 LEN2 LENB1 LENB2 DISTANCE ETALON FORMULA
----------------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
.  7 7 14 14 4 72 72
11.  9 7 16 14 6 63 63
22. 3333 9 11 16 18 8 56 56
Steven. Stephen 6 7 6 7 2 72 72

[/spoiler]