I just realized that sstate setscene tasks are also launched through same interface as normal tasks, i.e. 'exec' through fork_off_task. So the majority of slowness comes from the overhead of 'exec' too. Of course the disk i/o contentions also contribute a bit which however is less obvious compared to the former factor in my calculation. Thanks Kevin