[linux-yocto] [PATCH 1/1] xilinx-zyqn: Move disable_nonboot_cpus() in front of local_irq_disable()
Bruce Ashfield
bruce.ashfield at gmail.com
Tue Oct 22 19:46:17 PDT 2019
On Mon, Oct 21, 2019 at 11:27 PM Quanyang Wang
<quanyang.wang at windriver.com> wrote:
>
>
> On 10/21/19 8:05 PM, Bruce Ashfield wrote:
> > On Mon, Oct 21, 2019 at 6:45 AM Michal Simek <michal.simek at xilinx.com> wrote:
> >> On 21. 10. 19 10:45, Quanyang Wang wrote:
> >>> Hi Michal,
> >>>
> >>> On 10/21/19 4:16 PM, Michal Simek wrote:
> >>>> On 21. 10. 19 7:50, quanyang.wang at windriver.com wrote:
> >>>>> From: Quanyang Wang <quanyang.wang at windriver.com>
> >>>>>
> >>>>> When run kdump with enabling CONFIG_DEBUG_PREEMPT, there is a calltrace
> >>>>> as below:
> >>>>>
> >>>>> BUG: using smp_processor_id() in preemptible [00000000] code: sh/303
> >>>>> caller is machine_crash_shutdown+0x2c/0xe8
> >>>>> CPU: 0 PID: 303 Comm: sh Kdump: loaded Not tainted
> >>>>> 5.2.20-yocto-standard #1
> >>>>> Hardware name: Xilinx Zynq Platform
> >>>>> [<80112ff4>] (unwind_backtrace) from [<8010ca4c>] (show_stack+0x18/0x1c)
> >>>>> [<8010ca4c>] (show_stack) from [<809b000c>] (dump_stack+0x70/0x8c)
> >>>>> [<809b000c>] (dump_stack) from [<80549a14>]
> >>>>> (debug_smp_processor_id+0xd4/0x118)
> >>>>> [<80549a14>] (debug_smp_processor_id) from [<80111428>]
> >>>>> (machine_crash_shutdown+0x2c/0xe8)
> >>>>> [<80111428>] (machine_crash_shutdown) from [<801afe24>]
> >>>>> (__crash_kexec+0x70/0xd0)
> >>>>> [<801afe24>] (__crash_kexec) from [<801259b4>] (panic+0x110/0x324)
> >>>>> [<801259b4>] (panic) from [<805f7018>] (sysrq_handle_crash+0x18/0x1c)
> >>>>> [<805f7018>] (sysrq_handle_crash) from [<805f7584>]
> >>>>> (__handle_sysrq+0x9c/0x14c)
> >>>>> [<805f7584>] (__handle_sysrq) from [<805f79e8>]
> >>>>> (write_sysrq_trigger+0x5c/0x6c)
> >>>>> [<805f79e8>] (write_sysrq_trigger) from [<8031e850>]
> >>>>> (proc_reg_write+0x78/0x8c)
> >>>>> [<8031e850>] (proc_reg_write) from [<802b1b28>] (vfs_write+0xc0/0x154)
> >>>>> [<802b1b28>] (vfs_write) from [<802b2a64>] (ksys_write+0x6c/0xd4)
> >>>>> [<802b2a64>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54)
> >>>>> Exception stack(0xba157fa8 to 0xba157ff0)
> >>>>> 7fa0: 00000002 005ab930 00000001 005ab930 00000002 00000000
> >>>>> 7fc0: 00000002 005ab930 76fa2290 00000004 76f3d124 76f3cc8c 00000000
> >>>>> 00000000
> >>>>> 7fe0: 00000004 7edec940 76edbfff 76e67d16
> >>>>>
> >>>>> This is because that the function disable_nonboot_cpus is called in
> >>>>> order to make sure that the crash kernel runs in the boot CPU(cpu0).
> >>>>> And it will enable local irq by calling as below:
> >>>>>
> >>>>> disable_nonboot_cpus
> >>>>> -> freeze_secondary_cpus
> >>>>> -> _cpu_down
> >>>>> -> percpu_down_write
> >>>>> -> rcu_sync_enter
> >>>>> -> spin_unlock_irq(&rsp->rss_lock)
> >>>>> -> local_irq_enable()
> >>>>>
> >>>>> Then the functions including smp_processor_id() behind
> >>>>> disable_nonboot_cpus
> >>>>> will run at the irq-enabled context, and this will trigger the
> >>>>> calltrace.
> >>>>>
> >>>>> So move disable_nonboot_cpus() in front of local_irq_disable() to avoid
> >>>>> it since disable_nonboot_cpus() not need run at an atomic context.
> >>>>>
> >>>>> Signed-off-by: Quanyang Wang <quanyang.wang at windriver.com>
> >>>>> ---
> >>>>> arch/arm/kernel/machine_kexec.c | 3 ++-
> >>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/arch/arm/kernel/machine_kexec.c
> >>>>> b/arch/arm/kernel/machine_kexec.c
> >>>>> index 654f2b1f9ac0..83d2025a4ab1 100644
> >>>>> --- a/arch/arm/kernel/machine_kexec.c
> >>>>> +++ b/arch/arm/kernel/machine_kexec.c
> >>>>> @@ -145,9 +145,10 @@ static void machine_kexec_mask_interrupts(void)
> >>>>> void machine_crash_shutdown(struct pt_regs *regs)
> >>>>> {
> >>>>> - local_irq_disable();
> >>>>> disable_nonboot_cpus();
> >>>>> + local_irq_disable();
> >>>>> +
> >>>>> crash_smp_send_stop();
> >>>>> crash_save_cpu(regs, smp_processor_id());
> >>>>>
> >>>> ok. Can you please check before this if your usecases work without
> >>>> disable_nonboot_cpus(). This patch was done pretty long time ago where
> >>>> there was an issue with kexec. Long time ago I was talking to arm-soc
> >>>> maintainers about this and they told me that mainline code should work
> >>>> fine without any need to call disable_nonboot_cpus().
> >>>> It means if kexec is working fine we can revert origin patch and use
> >>>> what mainline is using.
> >>> It seems that the issue is still there. When crash at cpu1 and crash
> >>> kernel runs at cpu1,
> >>>
> >>> it will hang, the log is as below:
> >>>
> >>> root at xilinx-zynq:~# sh 1.sh
> >>> syscall kexec_file_load not available.
> >>> sysrq: Trigger a crash
> >>> Kernel panic - not syncing: sysrq triggered crash
> >>> CPU: 1 PID: 308 Comm: sh Kdump: loaded Not tainted 5.2.20-yocto-standard #4
> >>> Hardware name: Xilinx Zynq Platform
> >>> [<80112eb0>] (unwind_backtrace) from [<8010cc04>] (show_stack+0x18/0x1c)
> >>> [<8010cc04>] (show_stack) from [<8094f8f4>] (dump_stack+0x70/0x8c)
> >>> [<8094f8f4>] (dump_stack) from [<801256f4>] (panic+0xf8/0x320)
> >>> [<801256f4>] (panic) from [<805dbeb0>] (sysrq_handle_crash+0x18/0x1c)
> >>> [<805dbeb0>] (sysrq_handle_crash) from [<805dc3b8>]
> >>> (__handle_sysrq+0x9c/0x148)
> >>> [<805dc3b8>] (__handle_sysrq) from [<805dc804>]
> >>> (write_sysrq_trigger+0x5c/0x6c)
> >>> [<805dc804>] (write_sysrq_trigger) from [<8031b040>]
> >>> (proc_reg_write+0x78/0x8c)
> >>> [<8031b040>] (proc_reg_write) from [<802aeec4>] (vfs_write+0xc0/0x154)
> >>> [<802aeec4>] (vfs_write) from [<802afd18>] (ksys_write+0x64/0xc8)
> >>> [<802afd18>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54)
> >>> Exception stack(0xb905bfa8 to 0xb905bff0)
> >>> bfa0: 00000002 0059afa0 00000001 0059afa0 00000002
> >>> 00000000
> >>> bfc0: 00000002 0059afa0 76f8e290 00000004 76f29124 76f28c8c 00000000
> >>> 00000000
> >>> bfe0: 00000004 7eb858c0 76ec7fff 76e53d16
> >>> CPU 0 will stop doing anything useful since another CPU has crashed
> >>> Loading crashdump kernel...
> >>> Bye!
> >>> Booting Linux on physical CPU 0x1
> >>> Linux version 5.2.20-yocto-standard (oe-user at oe-host) (gcc version 9.2.0
> >>> (GCC)) #1 SMP PREEMPT Thu Oct 17 08:15:14 UTC 2019
> >>> CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d
> >>> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> >>> OF: fdt: Machine model: Xilinx ZC706 board
> >>> OF: fdt: Ignoring memory range 0x0 - 0x8000000
> >>> printk: debug: ignoring loglevel setting.
> >>> printk: bootconsole [earlycon0] enabled
> >>> Memory policy: Data cache writealloc
> >>> cma: Reserved 16 MiB at 0x16c00000
> >>> On node 0 totalpages: 65280
> >>> Normal zone: 574 pages used for memmap
> >>> Normal zone: 0 pages reserved
> >>> Normal zone: 65280 pages, LIFO batch:15
> >>> percpu: Embedded 19 pages/cpu s47756 r8192 d21876 u77824
> >>> pcpu-alloc: s47756 r8192 d21876 u77824 alloc=19*4096
> >>> pcpu-alloc: [0] 0 [0] 1
> >>> Built 1 zonelists, mobility grouping on. Total pages: 64706
> >>> Kernel command line: console=ttyPS0,115200n8 root=/dev/nfs rw
> >>> nfsroot=128.224.165.20:/export/pxeboot/vlm-boards/22009/rootfs,v3,tcp
> >>> ip=128.224.179.217:128.224.165.20:128.224.178.1:255.255.254.0:zc702:eth0:off
> >>> ignore_loglevel earlyprintk noinitrd selinux=0 enforcing=0 kmemleak=on
> >>> elfcorehdr=0x17f00000 mem=261120K
> >>> Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
> >>> Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
> >>> Memory: 227332K/261120K available (9216K kernel code, 725K rwdata, 2284K
> >>> rodata, 1024K init, 567K bss, 17404K reserved, 16384K cma-reserved)
> >>> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> >>> ftrace: allocating 35203 entries in 69 pages
> >>> rcu: Preemptible hierarchical RCU implementation.
> >>> rcu: RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
> >>> Tasks RCU enabled.
> >>> rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
> >>> rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> >>> NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> >>> efuse mapped to (ptrval)
> >>> slcr mapped to (ptrval)
> >>> L2C: platform provided aux values match the hardware, so have no
> >>> effect. Please remove them.
> >>> L2C-310 erratum 769419 enabled
> >>> L2C-310 enabling early BRESP for Cortex-A9
> >>> L2C-310: enabling full line of zeros but not enabled in Cortex-A9
> >>> L2C-310 ID prefetch enabled, offset 1 lines
> >>> L2C-310 dynamic clock gating enabled, standby mode enabled
> >>> L2C-310 cache controller enabled, 8 ways, 512 kB
> >>> L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001
> >>> random: get_random_bytes called from start_kernel+0x2b0/0x4c4 with
> >>> crng_init=0
> >>> zynq_clock_init: clkc starts at (ptrval)
> >>> Zynq clock init
> >>> sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns
> >>> clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles:
> >>> 0x4ce07af025, max_idle_ns: 440795209040 ns
> >>> Switching to timer-based delay loop, resolution 3ns
> >>> clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff,
> >>> max_idle_ns: 537538477 ns
> >>> timer #0 at (ptrval), irq=17
> >>> Console: colour dummy device 80x30
> >>> Calibrating delay loop (skipped), value calculated using timer
> >>> frequency.. 666.66 BogoMIPS (lpj=3333333)
> >>> pid_max: default: 32768 minimum: 301
> >>> LSM: Security Framework initializing
> >>> Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> >>> Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> >>> CPU: Testing write buffer coherency: ok
> >>> CPU0: Spectre v2: using BPIALL workaround
> >>> CPU0: thread -1, cpu 1, socket 0, mpidr 80000001
> >>> Setting up static identity map for 0x8100000 - 0x8100060
> >>> rcu: Hierarchical SRCU implementation.
> >>> smp: Bringing up secondary CPUs ...
> >> ok. Can you send content of your 1.sh script?
> >>
> >> Anyway the patch looks good to me.
> >> Bruce: Feel free to take it. I will add it to Xilinx tree too.
> > Ack'd. Will pull it into my queue.
>
> Hi Bruce,
>
> Would you please hang up this patch? I just sent some other patches
>
ok. I've dropped this patch in favour of the new 2 patch series you just sent.
I'll wait to hear Ack's on that new series before merging.
Bruce
> which conflicts with this patch.
>
> Thanks,
>
> Quanyang
>
> >
> > Bruce
> >
> >> Thanks,
> >> Michal
> >>
> >>
> >
--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II
More information about the linux-yocto
mailing list