1

Topic: grep the big broad gull on a range of dates

From a series "I ". Sometimes it is required to receive contents of a broad gull from the middle on some range of time, or to receive last N minutes from a current broad gull. Aggravating circumstance is that broad gulls at me very big (hundreds GB). Potential decisions of the task: Various dancings with awk/perl type of the such. The main lack of a similar class of decisions is that full is required  a broad gull (and it very long on great volumes). There is a utility dategrep which enters into standard repositories and works essentially faster awk/perl, but all the same reads all file. Perl-ovaja the utility dategrep, or it rewritten on  c-dategrep. With them all is remarkable (inside at them binary search) until while the small data volume gets to a range of dates. An output of the big ranges at them slow enough (it is used file IO and the size of the unit in 8KB is beaten by nails). At some instant me it is all I tired also finished to  states the bicycle under a name timegrep - probably to whom too it is useful.  on Yandex. A disk

2

Re: grep the big broad gull on a range of dates

Hello, Anton Batenev, you wrote: AB> From a series "I ". 1. To cut broad gulls did not try? 2. In the resulted example when the broad gull quits from stdin, too there is any scoring in comparison with analogs? At the expense of what?

3

Re: grep the big broad gull on a range of dates

Hello, Anton Batenev, you wrote: AB> Sometimes it is required to receive contents of a broad gull from the middle on some range of time, or to receive last N minutes from a current broad gull. Aggravating circumstance is that broad gulls at me very big (hundreds GB). Potential decisions of the task: Graylog did not consider?

4

Re: grep the big broad gull on a range of dates

Hello, wildwind, you wrote: w> AB> From a series "I ". w> 1. To cut broad gulls did not try? They at me are already cut on some ranges (days, hour), but all the same the big. w> 2. In the resulted example when the broad gull quits from stdin, too there is any scoring in comparison with analogs? At the expense of what? At comparing with analogs there are subtleties. More low I will compare with dategrep from composition dateutils, as main "competitor". dategrep translates a line in date faster - for the sake of this moment there  were confused with own implementation strptime; In my implementation search stops as soon as there will be a date more or equal finite, at dategrep search proceeds till the end of a file. .., for correct comparing, date in a file should be arranged (as it in the theory and should be in a broad gull). If dates are not arranged, my implementation produces incomplete result. Further depends on in what place of a file there is a necessary range and on how many it big. If it is in the beginning a file and-or small my implementation can benefit at the expense of an early stop. Otherwise most likely benefits dategrep at the expense of fast implementation strptime on which will be the main resources of the program are spent.  on Yandex. A disk

5

Re: grep the big broad gull on a range of dates

Hello, GarryIV, you wrote: GIV> AB> Sometimes it is required to receive contents of a broad gull from the middle on some range of time, or to receive last N minutes from a current broad gull. Aggravating circumstance is that broad gulls at me very big (hundreds GB). Potential decisions of the task: GIV> Graylog did not consider? I considered various variants in the form of ELK Stack, graylog, splunk. If to them to send "crude" broad gulls without any preliminary handling, on great volumes (it is a question of hundreds TB a month) the decision price (on an index is required the minimum 3x a disk space, and on the prices for splunk it is possible to be destroyed) and supports (it is all it is necessary to adjust in advance,  etc.) It turns out unfairly big in comparison with a profit (99 % of the information will be never requested). If to send already processed broad gulls that part of the information which happens can be lost it is necessary at various investigations. I.e. to one another does not hinder, but the primary source in the form of crude dens is obliged to be and more often it is necessary to me.  on Yandex. A disk

6

Re: grep the big broad gull on a range of dates

Hello, Anton Batenev, you wrote: AB> 99 % of the information will be never requested AB> the primary source in the form of crude dens is obliged to be and more often it is necessary to me. Here possibilities for optimization are explicitly visible.