[linux-yocto] [PATCH] tracing/x86: Save CR2 before tracing irqsoff on error_entry

Bruce Ashfield bruce.ashfield at gmail.com
Mon Apr 29 12:37:32 PDT 2019


On Mon, Apr 29, 2019 at 8:34 AM Bruce Ashfield <bruce.ashfield at gmail.com>
wrote:

>
>
> On Sun, Apr 28, 2019 at 6:50 AM He Zhe <zhe.he at windriver.com> wrote:
>
>>
>>
>> On 4/26/19 11:14 PM, Bruce Ashfield wrote:
>> >
>> >
>> > On Wed, Apr 24, 2019 at 9:38 PM He Zhe <zhe.he at windriver.com <mailto:
>> zhe.he at windriver.com>> wrote:
>> >
>> >
>> >
>> >     On 4/24/19 8:34 PM, Bruce Ashfield wrote:
>> >     >
>> >     >
>> >     > On Wed, Apr 24, 2019 at 3:47 AM He Zhe <zhe.he at windriver.com
>> <mailto:zhe.he at windriver.com> <mailto:zhe.he at windriver.com <mailto:
>> zhe.he at windriver.com>>> wrote:
>> >     >
>> >     >     This is for standard/base and all sub-level branches. For
>> explanation, see the
>> >     >     bottom of the commit log I append.
>> >     >
>> >     >
>> >     > Which kernel versions ? I didn't notice a version it the shortlog
>> or temporary section, but I may have overlooked it.
>> >
>> >     >From 4.19 to 5.0
>> >
>> >
>> > Thanks, this is now merged.
>>
>> This is missing on v5.0/standard/intel-x86.
>>
>>
> Not in my tree, but I'll double check as I'm merging more changes later
> today.
>

I confirmed that this really is in the branches that I pushed, are you
really not seeing commit b470fb994ebd5b9ae8e520c85029449bf847289c in that
branch ?

https://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto/commit/?h=v5.0/standard/intel-x86&id=b470fb994ebd5b9ae8e520c85029449bf847289c


Bruce



>
> Bruce
>
>
>
>> Zhe
>>
>> >
>> > Bruce
>> >
>> >
>> >
>> >     Zhe
>> >
>> >     >
>> >     > Bruce
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >     Zhe
>> >     >
>> >     >     On 4/24/19 3:42 PM, zhe.he at windriver.com <mailto:
>> zhe.he at windriver.com> <mailto:zhe.he at windriver.com <mailto:
>> zhe.he at windriver.com>> wrote:
>> >     >     > From: "Steven Rostedt (VMware)" <rostedt at goodmis.org
>> <mailto:rostedt at goodmis.org> <mailto:rostedt at goodmis.org <mailto:
>> rostedt at goodmis.org>>>
>> >     >     >
>> >     >     > He Zhe reported a crash by enabling trace events and
>> selecting
>> >     >     > "userstacktrace" which will read the stack of userspace for
>> every trace
>> >     >     > event recorded. Zhe narrowed it down to:
>> >     >     >
>> >     >     >  c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints
>> and unify their usage")
>> >     >     >
>> >     >     > With the problem config, I was able to also reproduce the
>> error. I
>> >     >     > narrowed it down to just having to do the following:
>> >     >     >
>> >     >     >  # cd /sys/kernel/tracing
>> >     >     >  # echo 1 > options/userstacktrace
>> >     >     >  # echo 1 > events/preemptirq/irq_disable/enable
>> >     >     >
>> >     >     > And sure enough, I triggered a crash. Well, it was systemd
>> crashing
>> >     >     > with a bad memory access??
>> >     >     >
>> >     >     >  systemd-journal[537]: segfault at ed8cb8 ip
>> 00007f7fffc9fef5 sp 00007ffc4062cb10 error 7
>> >     >     >
>> >     >     > And it would crash similarly each time I tried it, but
>> always at a
>> >     >     > different place. After spending the day on this, I finally
>> figured it
>> >     >     > out. The bug is happening in entry_64.S right after
>> error_entry.
>> >     >     > There's two TRACE_IRQS_OFF in that code path, which if I
>> comment out,
>> >     >     > the bug goes away. Then it dawned on me that the crash
>> always happens
>> >     >     > when systemd does a normal page fault. We had this bug
>> before, and it
>> >     >     > was with the exception trace points.
>> >     >     >
>> >     >     > The issue is that a tracepoint can fault (reading vmalloc
>> or whatever).
>> >     >     > And doing a userspace stack trace most definitely will
>> fault. But if we
>> >     >     > are coming from a legitimate page fault, the address of
>> that fault (in
>> >     >     > the CR2 register) will be lost if we fault before we get to
>> the page
>> >     >     > fault handler. That's exactly what is happening.
>> >     >     >
>> >     >     > To solve this, a TRACE_IRQS_OFF_CR2 (and ON for
>> consistency) was added
>> >     >     > that saves the CR2 register. A new
>> trace_hardirqs_off_thunk_cr2 is
>> >     >     > created that stores the cr2 register, calls the
>> >     >     > trace_hardirqs_off_caller, then on return restores the cr2
>> register if
>> >     >     > it changed, before returning.
>> >     >     >
>> >     >     > On my tests this fixes the issue. I just want to know if
>> this is a
>> >     >     > legitimate fix or if someone can come up with a better fix?
>> >     >     >
>> >     >     > Note: this also saves the exception context just like the
>> >     >     > do_page_fault() function does.
>> >     >     >
>> >     >     > Note2: This only gets enabled when lockdep or irq tracing
>> is enabled,
>> >     >     > which is not recommended for production environments.
>> >     >     >
>> >     >     > Link:
>> http://lkml.kernel.org/r/897cf5cf-fc24-8a64-cb28-847f2d2e63d2@windriver.com
>> >     >     >
>> >     >     > Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq
>> tracepoints and unify their usage")
>> >     >     > Signed-off-by: Steven Rostedt (VMware) <rostedt at goodmis.org
>> <mailto:rostedt at goodmis.org> <mailto:rostedt at goodmis.org <mailto:
>> rostedt at goodmis.org>>>
>> >     >     >
>> >     >     > Link:
>> https://lore.kernel.org/lkml/20190320221534.165ab87b@oasis.local.home/
>> >     >     >
>> >     >     > This might not be the final solution. But the upstream
>> thread has stopped over
>> >     >     > a month and there is unlikely a final solution in the near
>> future.
>> >     >     >
>> >     >     > Since the diff looks quite clear and does not affect other
>> functions. It should
>> >     >     > be worth adding this initial patch from the maintainer.
>> >     >     >
>> >     >     > Signed-off-by: He Zhe <zhe.he at windriver.com <mailto:
>> zhe.he at windriver.com> <mailto:zhe.he at windriver.com <mailto:
>> zhe.he at windriver.com>>>
>> >     >     > ---
>> >     >     >  arch/x86/entry/common.c         | 26
>> ++++++++++++++++++++++++++
>> >     >     >  arch/x86/entry/entry_64.S       |  4 ++--
>> >     >     >  arch/x86/entry/thunk_64.S       |  2 ++
>> >     >     >  arch/x86/include/asm/irqflags.h |  4 ++++
>> >     >     >  4 files changed, 34 insertions(+), 2 deletions(-)
>> >     >     >
>> >     >     > diff --git a/arch/x86/entry/common.c
>> b/arch/x86/entry/common.c
>> >     >     > index 7bc105f..7edffec 100644
>> >     >     > --- a/arch/x86/entry/common.c
>> >     >     > +++ b/arch/x86/entry/common.c
>> >     >     > @@ -292,6 +292,32 @@ __visible void do_syscall_64(unsigned
>> long nr, struct pt_regs *regs)
>> >     >     >
>> >     >     >       syscall_return_slowpath(regs);
>> >     >     >  }
>> >     >     > +
>> >     >     > +extern void trace_hardirqs_on_caller(unsigned long
>> caller_addr);
>> >     >     > +__visible void trace_hardirqs_on_caller_cr2(unsigned long
>> caller_addr)
>> >     >     > +{
>> >     >     > +     unsigned long address = read_cr2(); /* Get the
>> faulting address */
>> >     >     > +     enum ctx_state prev_state;
>> >     >     > +
>> >     >     > +     prev_state = exception_enter();
>> >     >     > +     trace_hardirqs_on_caller(caller_addr);
>> >     >     > +     if (address != read_cr2())
>> >     >     > +             write_cr2(address);
>> >     >     > +     exception_exit(prev_state);
>> >     >     > +}
>> >     >     > +
>> >     >     > +extern void trace_hardirqs_off_caller(unsigned long
>> caller_addr);
>> >     >     > +__visible void trace_hardirqs_off_caller_cr2(unsigned long
>> caller_addr)
>> >     >     > +{
>> >     >     > +     unsigned long address = read_cr2(); /* Get the
>> faulting address */
>> >     >     > +     enum ctx_state prev_state;
>> >     >     > +
>> >     >     > +     prev_state = exception_enter();
>> >     >     > +     trace_hardirqs_off_caller(caller_addr);
>> >     >     > +     if (address != read_cr2())
>> >     >     > +             write_cr2(address);
>> >     >     > +     exception_exit(prev_state);
>> >     >     > +}
>> >     >     >  #endif
>> >     >     >
>> >     >     >  #if defined(CONFIG_X86_32) ||
>> defined(CONFIG_IA32_EMULATION)
>> >     >     > diff --git a/arch/x86/entry/entry_64.S
>> b/arch/x86/entry/entry_64.S
>> >     >     > index 1f0efdb..73ddf24 100644
>> >     >     > --- a/arch/x86/entry/entry_64.S
>> >     >     > +++ b/arch/x86/entry/entry_64.S
>> >     >     > @@ -1248,12 +1248,12 @@ ENTRY(error_entry)
>> >     >     >        * we fix gsbase, and we should do it before
>> enter_from_user_mode
>> >     >     >        * (which can take locks).
>> >     >     >        */
>> >     >     > -     TRACE_IRQS_OFF
>> >     >     > +     TRACE_IRQS_OFF_CR2
>> >     >     >       CALL_enter_from_user_mode
>> >     >     >       ret
>> >     >     >
>> >     >     >  .Lerror_entry_done:
>> >     >     > -     TRACE_IRQS_OFF
>> >     >     > +     TRACE_IRQS_OFF_CR2
>> >     >     >       ret
>> >     >     >
>> >     >     >       /*
>> >     >     > diff --git a/arch/x86/entry/thunk_64.S
>> b/arch/x86/entry/thunk_64.S
>> >     >     > index be36bf4..1300b53 100644
>> >     >     > --- a/arch/x86/entry/thunk_64.S
>> >     >     > +++ b/arch/x86/entry/thunk_64.S
>> >     >     > @@ -41,6 +41,8 @@
>> >     >     >  #ifdef CONFIG_TRACE_IRQFLAGS
>> >     >     >       THUNK
>> trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
>> >     >     >       THUNK
>> trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
>> >     >     > +     THUNK
>> trace_hardirqs_on_thunk_cr2,trace_hardirqs_on_caller_cr2,1
>> >     >     > +     THUNK
>> trace_hardirqs_off_thunk_cr2,trace_hardirqs_off_caller_cr2,1
>> >     >     >  #endif
>> >     >     >
>> >     >     >  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> >     >     > diff --git a/arch/x86/include/asm/irqflags.h
>> b/arch/x86/include/asm/irqflags.h
>> >     >     > index 058e40f..dd51174 100644
>> >     >     > --- a/arch/x86/include/asm/irqflags.h
>> >     >     > +++ b/arch/x86/include/asm/irqflags.h
>> >     >     > @@ -172,9 +172,13 @@ static inline int
>> arch_irqs_disabled(void)
>> >     >     >  #ifdef CONFIG_TRACE_IRQFLAGS
>> >     >     >  #  define TRACE_IRQS_ON              call
>> trace_hardirqs_on_thunk;
>> >     >     >  #  define TRACE_IRQS_OFF     call trace_hardirqs_off_thunk;
>> >     >     > +#  define TRACE_IRQS_ON_CR2  call
>> trace_hardirqs_on_thunk_cr2;
>> >     >     > +#  define TRACE_IRQS_OFF_CR2 call
>> trace_hardirqs_off_thunk_cr2;
>> >     >     >  #else
>> >     >     >  #  define TRACE_IRQS_ON
>> >     >     >  #  define TRACE_IRQS_OFF
>> >     >     > +#  define TRACE_IRQS_ON_CR2
>> >     >     > +#  define TRACE_IRQS_OFF_CR2
>> >     >     >  #endif
>> >     >     >  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> >     >     >  #  ifdef CONFIG_X86_64
>> >     >
>> >     >
>> >     >
>> >     > --
>> >     > - Thou shalt not follow the NULL pointer, for chaos and madness
>> await thee at its end
>> >     > - "Use the force Harry" - Gandalf, Star Trek II
>> >     >
>> >
>> >
>> >
>> > --
>> > - Thou shalt not follow the NULL pointer, for chaos and madness await
>> thee at its end
>> > - "Use the force Harry" - Gandalf, Star Trek II
>> >
>>
>>
>
> --
> - Thou shalt not follow the NULL pointer, for chaos and madness await thee
> at its end
> - "Use the force Harry" - Gandalf, Star Trek II
>
>

-- 
- Thou shalt not follow the NULL pointer, for chaos and madness await thee
at its end
- "Use the force Harry" - Gandalf, Star Trek II
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.yoctoproject.org/pipermail/linux-yocto/attachments/20190429/1c07ee85/attachment-0001.html>


More information about the linux-yocto mailing list