[linux-yocto] [PATCH] x86_64: expand kernel stack to 16K

Bruce Ashfield bruce.ashfield at windriver.com
Thu Jul 17 12:14:48 PDT 2014


On 14-07-17 12:34 AM, zhe.he at windriver.com wrote:
> From: Minchan Kim <minchan at kernel.org>

I've grabbed this and staged it in the 3.14 yocto kernel tree.

Bruce

>
> commit 6538b8ea886e472f4431db8ca1d60478f838d14b upstream
>
> While I play inhouse patches with much memory pressure on qemu-kvm,
> 3.14 kernel was randomly crashed. The reason was kernel stack overflow.
>
> When I investigated the problem, the callstack was a little bit deeper
> by involve with reclaim functions but not direct reclaim path.
>
> I tried to diet stack size of some functions related with alloc/reclaim
> so did a hundred of byte but overflow was't disappeard so that I encounter
> overflow by another deeper callstack on reclaim/allocator path.
>
> Of course, we might sweep every sites we have found for reducing
> stack usage but I'm not sure how long it saves the world(surely,
> lots of developer start to add nice features which will use stack
> agains) and if we consider another more complex feature in I/O layer
> and/or reclaim path, it might be better to increase stack size(
> meanwhile, stack usage on 64bit machine was doubled compared to 32bit
> while it have sticked to 8K. Hmm, it's not a fair to me and arm64
> already expaned to 16K. )
>
> So, my stupid idea is just let's expand stack size and keep an eye
> toward stack consumption on each kernel functions via stacktrace of ftrace.
> For example, we can have a bar like that each funcion shouldn't exceed 200K
> and emit the warning when some function consumes more in runtime.
> Of course, it could make false positive but at least, it could make a
> chance to think over it.
>
> I guess this topic was discussed several time so there might be
> strong reason not to increase kernel stack size on x86_64, for me not
> knowing so Ccing x86_64 maintainers, other MM guys and virtio
> maintainers.
>
> Here's an example call trace using up the kernel stack:
>
>           Depth    Size   Location    (51 entries)
>           -----    ----   --------
>     0)     7696      16   lookup_address
>     1)     7680      16   _lookup_address_cpa.isra.3
>     2)     7664      24   __change_page_attr_set_clr
>     3)     7640     392   kernel_map_pages
>     4)     7248     256   get_page_from_freelist
>     5)     6992     352   __alloc_pages_nodemask
>     6)     6640       8   alloc_pages_current
>     7)     6632     168   new_slab
>     8)     6464       8   __slab_alloc
>     9)     6456      80   __kmalloc
>    10)     6376     376   vring_add_indirect
>    11)     6000     144   virtqueue_add_sgs
>    12)     5856     288   __virtblk_add_req
>    13)     5568      96   virtio_queue_rq
>    14)     5472     128   __blk_mq_run_hw_queue
>    15)     5344      16   blk_mq_run_hw_queue
>    16)     5328      96   blk_mq_insert_requests
>    17)     5232     112   blk_mq_flush_plug_list
>    18)     5120     112   blk_flush_plug_list
>    19)     5008      64   io_schedule_timeout
>    20)     4944     128   mempool_alloc
>    21)     4816      96   bio_alloc_bioset
>    22)     4720      48   get_swap_bio
>    23)     4672     160   __swap_writepage
>    24)     4512      32   swap_writepage
>    25)     4480     320   shrink_page_list
>    26)     4160     208   shrink_inactive_list
>    27)     3952     304   shrink_lruvec
>    28)     3648      80   shrink_zone
>    29)     3568     128   do_try_to_free_pages
>    30)     3440     208   try_to_free_pages
>    31)     3232     352   __alloc_pages_nodemask
>    32)     2880       8   alloc_pages_current
>    33)     2872     200   __page_cache_alloc
>    34)     2672      80   find_or_create_page
>    35)     2592      80   ext4_mb_load_buddy
>    36)     2512     176   ext4_mb_regular_allocator
>    37)     2336     128   ext4_mb_new_blocks
>    38)     2208     256   ext4_ext_map_blocks
>    39)     1952     160   ext4_map_blocks
>    40)     1792     384   ext4_writepages
>    41)     1408      16   do_writepages
>    42)     1392      96   __writeback_single_inode
>    43)     1296     176   writeback_sb_inodes
>    44)     1120      80   __writeback_inodes_wb
>    45)     1040     160   wb_writeback
>    46)      880     208   bdi_writeback_workfn
>    47)      672     144   process_one_work
>    48)      528     112   worker_thread
>    49)      416     240   kthread
>    50)      176     176   ret_from_fork
>
> [ Note: the problem is exacerbated by certain gcc versions that seem to
>    generate much bigger stack frames due to apparently bad coalescing of
>    temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
>    35% more stack on the virtio path than 4.8.2 does, for example.
>
>    Minchan not only uses such a bad gcc version (4.6.3 in his case), but
>    some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
>    what causes that kernel_map_pages() frame, for example). But we're
>    clearly getting too close.
>
>    The VM code also seems to have excessive stack frames partly for the
>    same compiler reason, triggered by excessive inlining and lots of
>    function arguments.
>
>    We need to improve on our stack use, but in the meantime let's do this
>    simple stack increase too.  Unlike most earlier reports, there is
>    nothing simple that stands out as being really horribly wrong here,
>    apart from the fact that the stack frames are just bigger than they
>    should need to be.        - Linus ]
>
> Signed-off-by: Minchan Kim <minchan at kernel.org>
> Cc: Peter Anvin <hpa at zytor.com>
> Cc: Dave Chinner <david at fromorbit.com>
> Cc: Dave Jones <davej at redhat.com>
> Cc: Jens Axboe <axboe at kernel.dk>
> Cc: Andrew Morton <akpm at linux-foundation.org>
> Cc: Ingo Molnar <mingo at kernel.org>
> Cc: Peter Zijlstra <a.p.zijlstra at chello.nl>
> Cc: Mel Gorman <mgorman at suse.de>
> Cc: Rik van Riel <riel at redhat.com>
> Cc: Johannes Weiner <hannes at cmpxchg.org>
> Cc: Hugh Dickins <hughd at google.com>
> Cc: Rusty Russell <rusty at rustcorp.com.au>
> Cc: Michael S Tsirkin <mst at redhat.com>
> Cc: Dave Hansen <dave.hansen at intel.com>
> Cc: Steven Rostedt <rostedt at goodmis.org>
> Cc: PJ Waskiewicz <pjwaskiewicz at gmail.com>
> Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
> Signed-off-by: He Zhe <zhe.he at windriver.com>
> ---
>   arch/x86/include/asm/page_64_types.h |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
> index 8de6d9c..6782051 100644
> --- a/arch/x86/include/asm/page_64_types.h
> +++ b/arch/x86/include/asm/page_64_types.h
> @@ -1,7 +1,7 @@
>   #ifndef _ASM_X86_PAGE_64_DEFS_H
>   #define _ASM_X86_PAGE_64_DEFS_H
>
> -#define THREAD_SIZE_ORDER	1
> +#define THREAD_SIZE_ORDER	2
>   #define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)
>   #define CURRENT_MASK (~(THREAD_SIZE - 1))
>
>



More information about the linux-yocto mailing list