[meta-amd] Random Reboot on R-Series Dual Core APU (bald-eagle)

Peter Smith salerio at gmail.com
Wed Feb 27 23:10:29 PST 2019


One thing I forgot to add is the problem has been seen on the same hardware
running Ubuntu distributions (18.04).

On Thu, 28 Feb 2019 at 06:59, Peter Smith <salerio at gmail.com> wrote:

> Hi, no we have not done what you suggest. The reason being that the
> distribution we are running is Yocto sumo based and uses a 4.14 kernel. I
> personally don’t have control over the distribution layers (it’s a standard
> in our business). The problem has been seen on multiple CPU”s but always
> ends in the exact same MCE. We have verified we have the latest ucode. We
> have noticed that a number of patches related to mce/amd were applied to
> the snowy owl and v1000 platforms, do you know if these were all up
> streamed into 4.14.71? As far as I can tell they seem to have been.  Peter
>
> On Thu, 28 Feb 2019 at 06:45, Awais Belal <awais_belal at mentor.com> wrote:
>
>> Hi Peter,
>>
>>
>> Does anyone know of any patches that should be applied that might fix
>> this issue. We are currently experimenting with the latest kernel (4.14.71)
>> as defined on the sumo branch of meta-amd, both with the radeon and amdgpu
>> graphics drivers. We do not yet know if either of these combinations
>> exhibits the same random reboot.
>>
>>
>> The last release we did for the BaldEagle platform was back in the jethro
>> time frame. Have you tried running things on this platform using the jethro
>> combination just to see if you exhibit the same issue? We did not observe
>> anything such during our development in the mentioned time frame.
>>
>> Are you experiencing this on multiple target boards or is it a specific
>> one that shows such problems?
>>
>> BR,
>> Awais
>>
>> On 27/02/2019 19:26, Peter Smith wrote:
>>
>> Hi,
>>
>> Any help on this point would be appreciated. We are seeing random reboots
>> on a product based on the above (AMD RX-427 to be specific) which present
>> as an MCE error (see below for an example).
>>
>> [Parsed by mcelog]
>>
>> mcelog: Unknown CPU type vendor 2 family 21 model 0
>>
>> Hardware event. This is not a software error.
>>
>> CPU 0 4 northbridge
>>
>> MISC c012000001000000
>>
>> TIME 1547772066 Fri Jan 18 00:41:06 2019
>>
>>                    Northbridge Link Protocol Error
>>
>>        bit57 = processor context corrupt
>>
>>        bit59 = misc error valid
>>
>>        bit61 = error uncorrected
>>
>>        bit62 = error overflow (multiple errors)
>>
>>        bus error 'local node observed, request didn't time out
>>
>>              generic error mem transaction
>>
>>              generic access, level generic'
>>
>> STATUS fa000010000b0c0f MCGSTATUS 0
>>
>> CPUID Vendor AMD Family 21 Model 0
>>
>> SOCKET 0 APIC 0 microcode 6003106
>>
>>
>> Does anyone know of any patches that should be applied that might fix
>> this issue. We are currently experimenting with the latest kernel (4.14.71)
>> as defined on the sumo branch of meta-amd, both with the radeon and amdgpu
>> graphics drivers. We do not yet know if either of these combinations
>> exhibits the same random reboot.
>>
>>
>>
>> Best Regards
>> Peter
>>
>> _______________________________________________
>> meta-amd mailing listmeta-amd at yoctoproject.orghttps://lists.yoctoproject.org/listinfo/meta-amd
>>
>> --
> Best Regards
> Peter
>
-- 
Best Regards
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.yoctoproject.org/pipermail/meta-amd/attachments/20190228/fa53c450/attachment.html>


More information about the meta-amd mailing list