[Toaster] [PATCH 0/5] Fix task buildstats gathering
Barros Pena, Belen
belen.barros.pena at intel.com
Mon Feb 22 02:25:11 PST 2016
On 21/02/2016 12:04, "Richard Purdie" <richard.purdie at linuxfoundation.org>
wrote:
>On Sat, 2016-02-20 at 18:51 +0000, Barros Pena, Belen wrote:
>> So I ran a build following the steps above, and had a look at the
>> data.
>> Information now shows, which is definitely an improvement :)
>>
>> * Time and Disk I/O show for all executed tasks, which is what's
>> supposed
>> to happen.
>> * CPU usage is missing from some executed tasks. This used to happen
>> before as well, but we never actually worked out why.
>>
>> Regarding the numbers, I am not sure how useful is the Disk I/O
>> figure
>> expressed in bytes. Should we convert it to something else? And then,
>> our
>> big problem is definitely the CPU usage, which shows pretty crazy
>> numbers.
>> The highest one in my build was for linux-yocto
>> do_compile_kernelmodules
>> at a whopping 2455.15%
>>
>> So, Richard Purdie pointed out to us that % over 100 are related to
>> task
>> parallelism. And in fact, if I divide the CPU usage value of the
>> compile
>> tasks by the value of PARALLEL_MAKE (36), I do get percentages below
>> 100
>> for all of them. In the example of linux-yocto
>> do_compile_kernelmodules,
>> we get 68.20%
>>
>> If I divide the CPU usage value of all the install tasks by the value
>> of
>> PARALLEL_MAKEINST (36), the same happens: % below 100.
>>
>> However, we do get % over 100 for tasks that we have been told have
>> no
>> parallelism at all. I see such values for unpack, patch, configure,
>> package and populate_sysroot tasks.
>
>FWIW, do_package does contain parallelism.
>For unpack/patch/configure/populate_sysroot, there is some parallelism
>too, in that the parent logging runs in parallel with the child
>execution. Where these tasks run quickly, I think this could account
>for the 'parallelism' we see there. Was it only for short running
>tasks?
I am not sure what you'd consider a 'short running' task, so here are a
few examples:
Recipe task time (secs) CPU usage
db do_unpack 1.29 144.34%
linux-libc-headers do_unpack 9.81 134.64%
linux-yocto do_patch 16.40 141.12%
rpm-native do_patch 4.84 117.13%
gcc-cross-i586 do_configure 3.25 125.24%
gcc-cross-initial-i586 do_configure 3.27 120.50%
glibc-locale do_populate_sysroot 2.80 192.74%
flex do_populate_sysroot 1.98 157.74%
>
>> So, the question is, why are those
>> happening? Because for tasks that we know have parallelism we might
>> be able to divide the value by the parallelism set, as I did for
>> compile and install tasks. But for the others, I genuinely have no
>> answer, other that there must be some kind of bug in the data
>> collection.
>
>FWIW I'm very strongly against doing any kind of division by
>PARALLEL_MAKE or similar numbers as it makes the end resulting number
>much less meaningful. The idea behind showing these numbers in the UI
>is to allow people to make decisions based on the numbers. If you do
>that division, I can't think of useful decisions/actions I could then
>make on it.
>
>For example, "2455%" above tells me that we have a parallelism factor
>of about 24 in the kernel build. From that I can conclude that whilst
>the kernel is good, its not making full use of the hardware which would
>have been a factor of 36 (although the system was likely also busy
>doing other things unless the task was run in isolation).]
And what if we look at the other side of the data? What does a value of
7.32% mean, like the one we get for netbase do_compile?
>
>The only proposal I have is to simply display these as parallelism
>factors rather than percentage usages.
So, in the example of linux-yocto do_compile_kernelmodules, would we show
'24'? And for netbase do_compile? '0.07'?
Would it be just easier to show resource usage times, since they are
somehow a standard measure that users might be more likely to recognise?
So we could show either:
* Child rusage ru_utime + Child rusage ru_stime
* or we split CPU usage into 2 columns, one for ru_utime and the other for
ru_stime
> I don't think the data collected
>is wrong, it does require a certain about of thinking to interpret it
>though.
>We're pulling this data direct from the kernel so its unlikely
>we can get any "better" (easier to interpret?) data either.
>
>Cheers,
>
>Richard
>
>
More information about the toaster
mailing list