[poky] [PATCH 1/1] bitbake: optimize file parsing speed

Fri Nov 19 05:14:13 PST 2010

Richard Purdie wrote:
> On Wed, 2010-11-17 at 12:10 +0800, Dongxiao Xu wrote:
>> build some data cache for generate_dependencies() on hand, and later
>> each time when parsing the bb file, we do not need to build them
>> again 
>> and again.
>> 
>> This optimization could get about 50% speed gain when parsing all
>> ~800 
>> bb files.
>> 
>> Signed-off-by: Dongxiao Xu <dongxiao.xu at intel.com>
>> ---
>>  bitbake/lib/bb/cooker.py |    2 ++
>>  bitbake/lib/bb/data.py   |   11 ++++++-----
>>  2 files changed, 8 insertions(+), 5 deletions(-)
>> 
>> diff --git a/bitbake/lib/bb/cooker.py b/bitbake/lib/bb/cooker.py
>> index 33eb65e..05e6c16 100644 --- a/bitbake/lib/bb/cooker.py
>> +++ b/bitbake/lib/bb/cooker.py
>> @@ -76,6 +76,8 @@ class BBCooker:
>> 
>>          self.configuration.data = bb.data.init()
>> 
>> +        bb.data.init_data_cache(self.configuration.data) +
>>          if not server:
>>              bb.data.setVar("BB_WORKERCONTEXT", "1",
>> self.configuration.data)
>> 
>> diff --git a/bitbake/lib/bb/data.py b/bitbake/lib/bb/data.py index
>> fee10cc..a9e539f 100644
>> --- a/bitbake/lib/bb/data.py
>> +++ b/bitbake/lib/bb/data.py
>> @@ -296,17 +296,18 @@ def build_dependencies(key, keys, shelldeps,
>>      d): #bb.note("Variable %s references %s and calls %s" % (key,
>>      str(deps), str(execs))) #d.setVarFlag(key, "vardeps", deps)
>> 
>> -def generate_dependencies(d):
>> +def init_data_cache(d):
>> +    bb.data.keylist = set(key for key in d.keys() if not
>> key.startswith("__")) +    bb.data.shelldeps = set(key for key in
>> bb.data.keylist if +d.getVarFlag(key, "export") and not
>> d.getVarFlag(key, "unexport")) 
>> 
>> -    keys = set(key for key in d.keys() if not key.startswith("__"))
>> -    shelldeps = set(key for key in keys if d.getVarFlag(key,
>> "export") and not d.getVarFlag(key, "unexport")) +def
>> generate_dependencies(d): 
>> 
>>      deps = {}
>>      taskdeps = {}
>> 
>>      tasklist = bb.data.getVar('__BBTASKS', d) or []      for task
>> in tasklist: -        deps[task] = build_dependencies(task, keys,
>> shelldeps, d) +        deps[task] = build_dependencies(task,
>> bb.data.keylist, + bb.data.shelldeps, d) 
>> 
>>          newdeps = deps[task]
>>          seen = set()
>> @@ -316,7 +317,7 @@ def generate_dependencies(d):
>>              newdeps = set()
>>              for dep in nextdeps:
>>                  if dep not in deps:
>> -                    deps[dep] = build_dependencies(dep, keys,
>> shelldeps, d) +                    deps[dep] =
>> build_dependencies(dep, + bb.data.keylist, bb.data.shelldeps, d)
>>                  newdeps |=  deps[dep]
>>              newdeps -= seen
>>          taskdeps[task] = seen | newdeps
> 
> 
> I'm afraid this isn't going to be quite this simple although this
> does prove those lines of code are a big hotspot in parsing. 
> 
> Why? You're creating the key and export lists for the base
> configuration data whereas the original code creates these lists for
> the *total* parsed metadata. There will therefore be differences in
> the values held by the two caches :(.   
> 
> As an example, if you set:
> 
> FOO = "bar" in a .bb file, 'FOO' will not appear in your keywords
> cache. 
> 
> Cheers,
> 
> Richard

Richard,

Yes, you are right, thanks for pointing it out.

Now I am trying to solve this parse time issue in another way. 

We saw that the following two lines cost a lot of cycles.

    keys = set(key for key in d.keys() if not key.startswith("__"))
    shelldeps = set(key for key in keys if d.getVarFlag(key, "export") and not d.getVarFlag(key, "unexport"))

After dump out the d.keys(), I found most of the items (>90%) are variables in distro_tracking_fields.inc, actually they are not used in normal build process.

I checked the code, some functions in utility-tasks.bbclass (related with upstream version check) will need information in distro_tracking_fields.inc, and so that poky.conf includes this file. And utility-tasks.bbclass is inherited in base.bbclass, which is somewhat fundamental to poky.

I am thinking of moving those distro checking related code from utility-tasks.bbclass to distrodata.bbclass, in order not to involve such big database (distro_tracking_fields.inc) in normal parsing process. Does it make sense?

I did some simple tests, and this could save about 20% file parsing time.

For the repeated parsing hot spot, I will continue to investigate to see whether there is optimization point. 

Thanks,
Dongxiao