#### Re: About one method data compressions

Barlone;

Yes, the length of each replaced piece will need to be inscribed in a resultant file. That lowers a compression level a little.

You are not logged in. Please login or register.

Barlone;

Yes, the length of each replaced piece will need to be inscribed in a resultant file. That lowers a compression level a little.

Well you though tried to do any estimations? Here what probability to find at least one conterminous bit sequence of length M in two sequences of length N, one of which casual? Approximately (N-M) / (2^M). It will not be compressed at you anything.

Yes, from the first attempt it will hardly be compressed. Therefore in title the word "" also is carried out.

The formula which you wrote are not "at least one conterminous sequence", and "SPECIFIC conterminous sequence (at least once)".

FXS wrote:

1. We find bit sequence of the minimum length M, which never meets in D.Nazovyom its "flag" (.

2. In a cycle we sort out values seed (s), used for initialization : (s).

3. By means of (s) it is generated casual sequence (joint venture) of the same length N.

4. We compare and the joint venture. All coinciding - in and the joint venture - bit sequences, length of more M is replaced (in !) on F.Pri's flag it we track, that "on joints" an interposed flag and adjacent initial bits did not arise "stray flags".

...

I here that thought - the text as one of types of the compressed data, not so well approaches for such compression:

1. tries sequence of different numbers, i.e. repetitions are rare enough. And we have repetitions of letters and syllables.

2. produces values from all given range, and in the text the part is used only.

such algorithm is better white noise it will be compressed.

Dima T;

So, therefore and http://www.sql.ru/forum/1282108/ob-odno … y-entropii

Barlone wrote:

Well you though any estimations tried to do? Here what probability to find at least one conterminous bit sequence of length M in two sequences of length N, one of which casual? Approximately (N-M) / (2^M). It will not be compressed at you anything.

+1

A little in another way I argue: should sequence of all possible variants of combinations on M byte or M*8 bit, I.e. only 2 ^ (M*8). Thus for initialization value in the size M*8 bit is required. Plus it is necessary to specify the size how many byte to read. As a result we have "archive" in the size of more original.

I understand that sequences more long M can get, but it is a rare occurence and he can be neglected. Therefore I consider that "archive" almost always will be in the size of more original.

Dima T;

I can not understand that you wrote. What for generally to skip - in a reasoning and presentation - from bits on bytes and it is reverse?

" should "... (Pseudo-) casual bit sequence of the specified length. Everything, is more than anything from it it is not required.

(Yes, seed it will be necessary to write down too in created archive, I know.)

FXS wrote:

Dima T;

I can not understand that you wrote. What for generally to skip - in a reasoning and presentation - from bits on bytes and it is reverse?

There is no need, thought flight so went, but it does not change an essence

FXS wrote:

" should "... (Pseudo-) casual bit sequence of the specified length. Everything, is more than anything from it it is not required.

I to that that for initialization which can produce the necessary sequence in M-bit, is required not less than M-bit. It at the best.

Once again, short, in bits:

We should M find bit. That precisely was sequence should contain 2^M combinations, i.e. all combinations from M bit.

Hence to save the address of the necessary combination in sequence (or to initialize ) to us the minimum of M of bits is required.

Output: it will be compressed nothing.

Dima T;

You state, what the offered method does not allow to compress never an input line on one while we do not sort out all set of bit lines of length of M?

I am ready to accept this statement as a hypothesis, but the proof it yet I do not see... (Thus that the offered method of compression too is not proved, that is ...)

Let's begin that among "all set of bit lines of length of M" there is simply line , and on what step it gets to us - is unpredictable.

Dima T;

And, , I here was precisely tangled in designations of lengths (N and)... But also you, apparently, too: that you means, for example,

Dima T wrote:

We should M find bit

?

We compare two lines and if we find local coincidence it has a length (which I designated).

At lines

"11111111111111111111111111111111" and

"00011111000000000000000000000000" - is with =5 since 4th position.

(Oh, something I absolutely forgot initial designations... Far already left.)

FXS wrote:

Yes, from the first attempt it will hardly be compressed. Therefore in title the word "" also is carried out.

well. If under "" to understand "for trillion years something we find" then it is finite.

That I wrote today *, results from this (described in this topic) problems as follows.

1. We need to compress a line S (lengths N).

2. We take a pseudorandom line Z (the same length N) and we check, whether ** it helps ** to us to make it (item 1).

3. In this topic I described a specific method of such compression.

4. And today (in *) I speak: let's recode simply a line S in a line Z at line R. And ** we try ** to compress the received line R ** any ** the archiver.

5. If it turned out - ! If is not present - we are returned in item 2.

_____________________

* http://www.sql.ru/forum/1282108/ob-odno … y-entropii

FXS wrote:

Dima T;

And, , I here was precisely tangled in designations of lengths (N and)... But also you, apparently, too:

Is such, but I understand you

FXS wrote:

that you means, for example,

it is passed...

?

We compare two lines and if we find local coincidence it has a length (which I designated).

At lines

"11111111111111111111111111111111" and

"00011111000000000000000000000000" - is with =5 since 4th position.

You consider a rare special case when there is a coincidence. I consider a variant that in sequence there are all combinations for =5. Since if they not everything record will be even more ineffective since it is necessary to use shorter substrings.

Give hardly we simplify the task: there is a line of M of bits and any sequence. Without a difference whence it undertakes (, an array etc.).

That the casual line in the size of M was present at sequence, it is necessary sequence in the size 2^ lines since all it is possible 2^ combinations on M of bits.

Hence our sequence should contain all possible combinations. For instructions of the necessary combination it is necessary a variable in the size of M of bits.

Or in other words: to identify the necessary combination from M of bits it is necessary to store M of bits.

And that that you think that in sequence there will be a wonderful place where after required line in M of bits there will be a line which you will search on a following step then you strongly you are mistaken since probability of such event almost zero.

Output: though how many searches do, but as a result "archive" will be more source file. Except for some special cases.

Dima T wrote:

Give hardly we simplify the task

it to me it seems, or you really (while) consider only one (auxiliary) point - "a flag" choice (for a specific input line)?

Simply I re-read (at last) own designations in the task. The m is a length of a flag.

Barlone wrote:

it is passed...

Well. If under "" to understand "for trillion years something we find" then it is finite.

"trillion", poorly, .

Only 10 in a level of 12 zero.

FXS wrote:

it is passed...

It seems to me, or you really (while) consider only one (auxiliary) point - "a flag" choice (for a specific input line)?

Simply I re-read (at last) own designations in the task. The m is a length of a flag.

It only seems. At you on each iteration any one specific line (or a flag) specific length of M of bits.

On my reasonings on each iteration it will be written down not less than M of bits. Also does not influence that on different iterations at M different values.

Algorithm not the worker. It will not press.

Dima T wrote:

on each iteration any one specific... The flag)

is not present, the flag is selected once - for an initial line. If it: "101010111001110101011111100111001111111111", a flag "000" and a point.

FXS wrote:

it is passed...

No, the flag is selected once - for an initial line. If it: "101010111001110101011111100111001111111111", a flag "000" and a point.

I not absolutely understood a flag essence. It for what?

I explain as understood: is and the data, i.e. that that is compressed.

As a result the data it is necessary to lead to such type:

` seed [sub] 0 [/sub], len [sub] 0 [/sub], seed [sub] 1 [/sub], len [sub] 2 [/sub]... seed [sub] K [/sub], len [sub] K [/sub] `

Where

seed [sub] i [/sub] initialization

len [sub] i [/sub] how many bit to read from

As a result we search for the result having the least length. Correctly?

Understood about a flag. You a little some other way went.

We have the Data. We invent a flag that was not in the data.

We select: substring given (M of bits) and seed so that (seed) produced these M of bits.

We replace substring on {a flag, seed, the M} in that case if they is less M, i.e. is obtained the shaken Data.

Correctly?

No.

If an initial line -

` "10101011100111010101 [b] 1111 [/b] 1001111001 [b] 1111111111 [/b]" `

, That a flag "000", as well as is described in item 1

If the casual line, suddenly, turned out

` "11111111111111111111111111111111111111111111" `

, That oblate will be:

` "10101011100111010101 [b] 000 [/b] 100111001 [b] 000 [/b]" `

(Separately it will be necessary to transfer plus of length of two "" segments.

Dima T wrote:

we Replace substring on {a flag, seed, the M} in that case if they is less M, i.e. is obtained the shaken Data.

If so there will be no such combinations since for storage seed it is required places as much how many initial substring occupies, I above proved it. Except for rare special cases.

**Random topics**