[meta-amd] Random Reboot on R-Series Dual Core APU (bald-eagle)
Peter Smith
salerio at gmail.com
Wed Feb 27 22:59:44 PST 2019
Hi, no we have not done what you suggest. The reason being that the
distribution we are running is Yocto sumo based and uses a 4.14 kernel. I
personally don’t have control over the distribution layers (it’s a standard
in our business). The problem has been seen on multiple CPU”s but always
ends in the exact same MCE. We have verified we have the latest ucode. We
have noticed that a number of patches related to mce/amd were applied to
the snowy owl and v1000 platforms, do you know if these were all up
streamed into 4.14.71? As far as I can tell they seem to have been. Peter
On Thu, 28 Feb 2019 at 06:45, Awais Belal <awais_belal at mentor.com> wrote:
> Hi Peter,
>
>
> Does anyone know of any patches that should be applied that might fix this
> issue. We are currently experimenting with the latest kernel (4.14.71) as
> defined on the sumo branch of meta-amd, both with the radeon and amdgpu
> graphics drivers. We do not yet know if either of these combinations
> exhibits the same random reboot.
>
>
> The last release we did for the BaldEagle platform was back in the jethro
> time frame. Have you tried running things on this platform using the jethro
> combination just to see if you exhibit the same issue? We did not observe
> anything such during our development in the mentioned time frame.
>
> Are you experiencing this on multiple target boards or is it a specific
> one that shows such problems?
>
> BR,
> Awais
>
> On 27/02/2019 19:26, Peter Smith wrote:
>
> Hi,
>
> Any help on this point would be appreciated. We are seeing random reboots
> on a product based on the above (AMD RX-427 to be specific) which present
> as an MCE error (see below for an example).
>
> [Parsed by mcelog]
>
> mcelog: Unknown CPU type vendor 2 family 21 model 0
>
> Hardware event. This is not a software error.
>
> CPU 0 4 northbridge
>
> MISC c012000001000000
>
> TIME 1547772066 Fri Jan 18 00:41:06 2019
>
> Northbridge Link Protocol Error
>
> bit57 = processor context corrupt
>
> bit59 = misc error valid
>
> bit61 = error uncorrected
>
> bit62 = error overflow (multiple errors)
>
> bus error 'local node observed, request didn't time out
>
> generic error mem transaction
>
> generic access, level generic'
>
> STATUS fa000010000b0c0f MCGSTATUS 0
>
> CPUID Vendor AMD Family 21 Model 0
>
> SOCKET 0 APIC 0 microcode 6003106
>
>
> Does anyone know of any patches that should be applied that might fix this
> issue. We are currently experimenting with the latest kernel (4.14.71) as
> defined on the sumo branch of meta-amd, both with the radeon and amdgpu
> graphics drivers. We do not yet know if either of these combinations
> exhibits the same random reboot.
>
>
>
> Best Regards
> Peter
>
> _______________________________________________
> meta-amd mailing listmeta-amd at yoctoproject.orghttps://lists.yoctoproject.org/listinfo/meta-amd
>
> --
Best Regards
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.yoctoproject.org/pipermail/meta-amd/attachments/20190228/eb1504b5/attachment-0001.html>
More information about the meta-amd
mailing list