1

Topic: And as processor loading on a low level is measured

There are utilities, for example the system monitor which show processor kernels are how much loaded. It would be desirable to understand, and as they work on a low level. Even I will specify that I want. Is at me  calculation in storage. I need to make that it was fulfilled as soon as possible for what it is necessary to use available resources as much as possible. Now each of 16 kernels is loaded percent on 95. Thus the data for calculation occupies under 100 , I need to shovel all of them, and they random access. And theoretically I can rest against speed of the system bus, I will simply not have time to get the data from storage. Accordingly - if the processor will stand idle from what the system bus does not consult - whether truly what in this case processor loading will be much less 100 percent? And if it is possible to load it almost for 100 percent, it means most likely a bottleneck, and low-level  the code I can achieve much still. And for the answer to this question it would be desirable to understand a principle as this loading  in percentage is considered.

2

Re: And as processor loading on a low level is measured

Hello, elmal, you wrote: E> There are utilities, for example the system monitor which show processor kernels are how much loaded. It would be desirable to understand, and as they work on a low level. These simple utilities show most likely a state of the scheduler of OS, instead of the information on the processor. For example, is if the scheduler was launched for the last second 100 times, and from them saw nonblank queue of tasks/processes of 78 times then computer loading is approximately equal 78 % smile To put it briefly, this metrics at all does not approach for search of bottlenecks in the program. Well that is random-disk-seek still can carry somehow to make out (because it causes actuating of the scheduler with change of the active process), but random-access to storage already precisely not to see in any way. E>. And theoretically I can rest against speed of the system bus, I will simply not have time to get the data from storage. Accordingly - if the processor will stand idle from what the system bus does not consult - whether truly what in this case processor loading will be much less 100 percent? No. 100 % - mean that the processor does something. If he waits the data from the bus it, of course, too registers in statistics as the active action it actively waits! smile E> and if it is possible to load it almost for 100 percent, it means most likely a bottleneck, and low-level  the code I can achieve much still. And for the answer to this question it would be desirable to understand a principle as this loading  in percentage is considered. There is such remarkable class of programs: profilers. And many processors have the built in remarkable unit: cpu performance counters. And that the most important thing, many profilers are able to address with it smile To put it briefly, from the processor it is possible to learn the detailed statistics about that, what is the time he considered something in , what is the time he waited for loading of the data from  or storages as often he was mistaken in a prediction of passages, and in what places of the program all aforesaid happened. And this information can be received in a convenient and evident type if to launch the program under the profiler, instead of through indirect methods from "system performance monitor".

3

Re: And as processor loading on a low level is measured

Hello, elmal, you wrote: E> There are utilities, for example the system monitor which show processor kernels are how much loaded. It would be desirable to understand, and as they work on a low level. https://github.com/opcm/pcm E> Even I will specify that I want. Is at me  calculation in storage. I need to make that it was fulfilled as soon as possible for what it is necessary to use available resources as much as possible. So consider how many the useful operation each flow fulfilled and compare to different variants of implementation. The relation kol-in the useful operations to a theoretical maximum states an efficiency estimation.

4

Re: And as processor loading on a low level is measured

Hello, watchmaker, you wrote: W> These simple utilities show most likely a state of the scheduler of OS, instead of the information on the processor. For example, is if the scheduler was launched for the last second 100 times, and from them saw nonblank queue of tasks/processes of 78 times then computer loading is approximately equal 78 % simply interesting correlation is. When I am engaged in the active code optimization, with the profiler when it is already absolutely not clear where further to optimize and hardly will touch - instead of acceleration deceleration turns out, here in this case the system monitor after all manipulations shows that all kernels of the processor are loaded for 100 percent (almost). When in optimization at all you are not engaged, the system monitor shows that the processor a horse-radish is not loaded. Accordingly it would be desirable to understand what exactly there it is really measured. W> To put it briefly, from the processor it is possible to learn the detailed statistics about that, what is the time he considered something in , what is the time he waited for loading of the data from  or storages as often he was mistaken in a prediction of passages, and in what places of the program all aforesaid happened. And this information can be received in a convenient and evident type if to launch the program under the profiler, instead of through indirect methods from "system performance monitor". The profiler for bottlenecks I drive for a long time already that. At me generally that JVM. Accordingly those profilers, which I  (it jvisualvm and async profiler bottlenecks in the code show with a known error only, accordingly now rested that already explicit such places and are not present). Accordingly now I would like to understand how much at me the storage bus is utilized, is far me to a limit or not, in how many time could be accelerated theoretically if all is ideally written. Who can advise  for statistics collection? Under linux, it is desirable the command prompt utility that I fed id process and received the information how much strongly at me percents really stands idle and waits the data from storage.

5

Re: And as processor loading on a low level is measured

Hello, elmal, you wrote: E> who Can advise  for statistics collection? Under linux, it is desirable the command prompt utility that I fed id process and received the information how much strongly at me percents really stands idle and waits the data from storage. It would Seem perf is able to show it from a box: $ perf stat - pid= .... . 2267.013044 task-clock (msec) # 0.999 CPUs utilized 38 context-switches # 0.017 K/sec 1 cpu-migrations # 0.000 K/sec 168,120 page-faults # 0.074 M/sec 6,736,989,754 cycles # 2.972 GHz (83.26 %) 2,649,490,951 stalled-cycles-frontend # at 39.33 % frontend cycles idle (83.25 %) 1,161,767,865 stalled-cycles-backend # 17.24 % backend cycles idle (66.82 %) 13,341,391,976 instructions # 1.98 insns per cycle # 0.20 stalled cycles per insn (83.42 %) 2,613,352,023 branches # 1152.773 M/sec (83.42 %) 1,043,243 branch-misses # 0.04 % of all branches (83.30 %) Well and other counters there to include and look.

6

Re: And as processor loading on a low level is measured

E> Even I will specify that I want. Is at me  calculation in storage. I need to make that it was fulfilled as soon as possible for what it is necessary to use available resources as much as possible. Now each of 16 kernels is loaded percent on 95. Most likely it  intercontinuous synchronization. If it is direct so these are critical of 5 % - try to reduce time and an amount of collisions of access to the general variables.

7

Re: And as processor loading on a low level is measured

8

Re: And as processor loading on a low level is measured

Hello, elmal, you wrote: E> stalled-cycles it is not shown. Or percents not that? Like percents not absolutely ancient, i7-4770K Or for example the version perf any archaic. After all addresses and metrics of counters is that can change literally in each chip.  can is simple not know whence to take the necessary values for calculations and as them to interpret. In this case (except update perf, it is finite) it is possible to transfer simply in it the necessary parameters (taken of the documentation to the processor) directly in a key - event =. E> Total if stalled-cycles it would be shown - whether that correctly I understand that it almost what is necessary for me? Well more or less. It approximately the general report about that how many the processor waited for the data. Probably he waited their arrival from storage, and probably there was a dependence on an output any  instructions. The detailed statistics on events is in other counters. And the superscalar processor can stand idle in parts: Some units can wait, and some to work. That it is correct to interpret this situation of one number stalled-cycles a little. Even I will repeat: for program acceleration nevertheless it is more useful to look not at this total statistics, and on values like number cache-miss with a binding to instructions is allows to find a place in the source code, which brakes (though, of course, with jvm here probably there is a known complexity with  instructions on byte-code; it is probable to eat a known method as it to make, but with these means I never worked).