- Hardware errors in linux
- Linux x86_64: Detecting Hardware Errors
- Install mcelog
- Default Cronjob
- How do I view error logs?
- A Note About mcelog
- Arch Linux
- #1 2018-01-06 11:43:46
- Unexpected reboot, hardware errors (Ryzen cpu)
- #2 2018-01-06 12:27:29
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #3 2018-01-13 09:10:04
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #4 2018-01-13 11:15:08
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #5 2018-01-13 16:05:22
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #6 2018-01-13 18:46:38
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #7 2018-01-14 00:45:52
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #8 2018-01-14 02:25:36
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #9 2018-01-14 14:03:16
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #10 2018-01-14 14:38:05
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #11 2018-01-14 14:45:37
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #12 2018-01-14 15:08:54
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #13 2018-01-14 18:04:32
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #14 2018-01-14 19:39:26
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #15 2018-01-17 13:32:49
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #16 2018-01-17 14:19:38
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #17 2018-01-17 15:30:19
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #18 2018-01-18 10:59:31
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #19 2018-01-18 13:26:03
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #20 2018-01-18 13:47:21
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #21 2018-01-23 11:42:22
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #22 2018-01-25 18:12:40
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
- #23 2018-02-01 21:12:53
- Re: Unexpected reboot, hardware errors (Ryzen cpu)
Hardware errors in linux
Дабы всё узнать о своём железе
А для для AMD ещё и
так как для амд их (эксепшоны) отключают при загрузке
И чё самое интересное, MCE появилось на Pentium PRO, Pentium II и AMD K6. и это совсем не фишка x86_64
1.2 , pavlinux ( ok ), 01:42, 06/06/2009 [ответить] | +1 + / – | ||||||||||||||||||||||||||||||||
А так же, Каждый логический процессор в системе, имеет директорию /sys/devices/system/machinecheck/machinecheckN Эти директории содержат файлы для динамической конфигурации, а именно: Содержит 64-х битную маску, включающая или отключающая определенные сообщения, Остальные файлы конфигурации, хотя и находятся в каждой папке sysfs, но изменения Интервал опроса процессора, в минутах, по умолчанию 5 минут. * tolerant 0: Всегда генерировать panic на неисправимых ошибках или записывать в лог исправленные. И самое интересный конфиг MCE это Программа которая будет запущена, когда в sysfs появляется какое-либо событие от MCE. Например запуск вывода на консоль, экран, или отправка почты # echo ‘mail -t root -s «MCE EVENT: ‘date «+%m.%d.%Y — %H:%M:%S»‘»‘ > \ Именно использование trigger удобнее, так как не надо дёргать крон каждые 5 минут. Так, выше кем-то скопипастенные команды, запишем в триггер, дабы освободить крон Источник Linux x86_64: Detecting Hardware ErrorsThe Blue Screen of Death (BSoD) is used by Microsoft Windows, after encountering a critical system error. Linux / UNIX like operating system may get a kernel panic. It is just like BSoD. The BSoD and a kernel panic generated using a Machine Check Exception (MCE). MCE is nothing but feature of AMD / Intel 64 bit systems which is used to detect an unrecoverable hardware problem. MCE can detect:
Program such mcelog decodes machine check events (hardware errors) on x86-64 machines running a 64-bit Linux kernel. It should be run regularly as a cron job on any x86-64 Linux system. This is useful for predicting server hardware failure before actual server crash. Install mcelogType the following command under RHEL / CentOS / Fedora Linux, 64 bit kernel: Default Cronjobmcelog should be run regularly as a cron job on any x86-64 Linux system. By default following cron settings are used on Debian / Ubuntu Linux – /etc/cron.d/mcelog:
Join Patreon ➔ CentOS / RHEL / Fedora Linux runs hourly cron job via /etc/cron.hourly/mcelog.cron: How do I view error logs?Use tail or grep command: A Note About mcelog
Comments on this entry are closed. Is there any similar tools for 32-bit operating systems? You mention mcelog only works There are some other tools for other CPUs as well: Wikipedia i can update tools linux to backtrack Do anyone know about a working solution for 32bit operating systems on x86_64 hardware? if i run your script i am getting this error.. Hi Vivek ! i get lot of information through your website .. Thanks very much. pls help me to decode the mcelog errors: As i forwarded this case to HP , But as per hp its is firware issue ….What you have to say? 2) plcg423: MCE 0 hi Hi Vivek, MCE 0 Источник Arch LinuxYou are not logged in. #1 2018-01-06 11:43:46Unexpected reboot, hardware errors (Ryzen cpu)My pc rebooted itself unexpectedly. I was just running firefox and some terminal emulators; no heavy jobs. Journalctl shows these errors: What could it mean? I have the Ryzen 5 1600 processor — I don’t know if it is relevant, but I heard it has some problems like these. Please tell me if you need more from the log. #2 2018-01-06 12:27:29Re: Unexpected reboot, hardware errors (Ryzen cpu)Try the kernel boot parameter «processor.max_cstate=5» to disable the C6 C-state (https://wiki.gentoo.org/wiki/Ryzen#Rand … mce_events). That should help to avoid these reboots. #3 2018-01-13 09:10:04Re: Unexpected reboot, hardware errors (Ryzen cpu)Thanks, I added it — now remains to see whether it will help. #4 2018-01-13 11:15:08Re: Unexpected reboot, hardware errors (Ryzen cpu)I have experienced another MCE-related reboot since then. Now I’ve gone into the BIOS and disabled extended power saving features during idle (and set it to normal instead). I think this feature is another name for AMD Cool’n’Quiet (https://en.wikipedia.org/wiki/Cool%27n%27Quiet), which could be the reason for the reboots according to various discussion threads I have found. Also, I am using processor.max_cstate=1 now just in case. So far I have not had any more reboots. #5 2018-01-13 16:05:22Re: Unexpected reboot, hardware errors (Ryzen cpu)I had the same problem and it turned out to be a symptom of the Ryzen «performance marginality» (a.k.a. segfault) bug. Try to execute the tester script, should it fail just RMA the processor to AMD (directly, not to the shop you bought it from) and they’ll send you a good one. #6 2018-01-13 18:46:38Re: Unexpected reboot, hardware errors (Ryzen cpu)This is a pre-built HP PC; I am not sure if AMD would actually exchange the CPU if you have not bought it from them directly. And Windows 10 runs perfectly on it, so I am not sure if HP would see this as a hardware issue. It’s not really their problem if I (also) want to run Linux in addition to Windows. But I have also read somewhere that someone RMA’ed his Ryzen and still experienced crashes. As I have not seen any gcc crashes on this system in Linux, I am inclined to think that the «performance marginality» (crashes when compiling with gcc) and the reboots when idle are two separate issues. And frankly if these Ryzen crashes happen only once a month or so I think I could live with them. That would be roughly on par with AMD graphics driver-related crashes, so not a biggie. Ext4 can handle that just fíne. Last edited by Morn (2018-01-13 18:47:01) #7 2018-01-14 00:45:52Re: Unexpected reboot, hardware errors (Ryzen cpu)Tough, yet I would at least try. And maybe the OP has a BOX processor. Anyway, in my case normal compilations with make -j were perfectly OK, so this is not a good indicator; only ryzen-test compilation load was triggering crashes. #8 2018-01-14 02:25:36Re: Unexpected reboot, hardware errors (Ryzen cpu)If these reboots should remain a problem, maybe I will get myself an 1800X and put the 1700 on eBay. Supposedly the RMA process takes several weeks, which is a long time to spend without a working PC. I mean I still have the old PC, but how can you go back from 16 threads to 2 threads? It’s impossible! I have also seen a type of freeze with this machine, this time without an MCE error in the logs, so I suspect it is the graphics card. It seems to be triggered by fast scrolling e.g. in the web browser. It might be related to rebooting directly from Windows into Linux, so now I always make sure to shut down Windows and do a Linux cold boot. I’ve had this happen with Windows before where it puts hardware into a weird state that Linux cannot properly recover from. But in light of the whole Meltdown debacle I’m still pretty happy with my purchasing decision. Last edited by Morn (2018-01-14 02:45:54) #9 2018-01-14 14:03:16Re: Unexpected reboot, hardware errors (Ryzen cpu)Did you try to decode what the MCE is about? It might give you a clue as to why the machine is crashing. #10 2018-01-14 14:38:05Re: Unexpected reboot, hardware errors (Ryzen cpu)I think it’s the same MCE error everyone is geting with Ryzen on Linux (bea0000000000108), but I have not looked up what it means exactly. I am not even sure where these codes are documented. Perhaps in the AMD CPU docs somewhere? #11 2018-01-14 14:45:37Re: Unexpected reboot, hardware errors (Ryzen cpu)You could try using mcelog or other similar programs and see if they can decode the errors. #12 2018-01-14 15:08:54Re: Unexpected reboot, hardware errors (Ryzen cpu)#13 2018-01-14 18:04:32Re: Unexpected reboot, hardware errors (Ryzen cpu)I’m not sure you should be telling mcelog that the cpu is a K8, you could also try the generic cpu. There are other parsers which may or may not be able to decode more. #14 2018-01-14 19:39:26Re: Unexpected reboot, hardware errors (Ryzen cpu)All these programs seem to be several years old and probably cannot properly interpret Ryzen CPU errors. On Arch, it looks like mcelog has been replaced with rasdaemon, so maybe I could try that instead. P.S. No luck with rasdaemon either: Last edited by Morn (2018-01-14 19:46:03) #15 2018-01-17 13:32:49Re: Unexpected reboot, hardware errors (Ryzen cpu)Just for the record, it seems it happened again while I was away. (I don’t have time right now so I will look at what you have both been saying later). From journalctl: Edit: Btw, any idea whether the following is related? Also happens sometimes: Seems like OS just stops sending graphics to the screen and it turns black. At which point I can only force-reboot. Last edited by Ploppz (2018-01-17 13:47:54) #16 2018-01-17 14:19:38Re: Unexpected reboot, hardware errors (Ryzen cpu)The last two chunks of output are not related, only the first is related to the mce. The is a kernel patch related to MCEs but those are for K8 Athlon64 not Ryzen and I think this is already included in the kernel currently in [core]. commit 52994c256df36fda9a715697431cba9daecb6b11 x86/pti: Make sure the user/kernel PTEs match Meelis reported that his K8 Athlon64 emits MCE warnings when PTI is #17 2018-01-17 15:30:19Re: Unexpected reboot, hardware errors (Ryzen cpu)Edit: Btw, any idea whether the following is related? Also happens sometimes: Seems like OS just stops sending graphics to the screen and it turns black. At which point I can only force-reboot. This might be the same as the lockups I have experienced: screen goes black, cannot SSH in any more, have to turn the computer off. Are you dual-booting with Windows or is your system Linux-only? And which graphics card are you using? I have seen neither MCE reboots nor mystery lockups since the BIOS setting change. The lockups were pretty easy to trigger with furious scrolling e.g. in the web browser. So I hope everything is stable now. #18 2018-01-18 10:59:31Re: Unexpected reboot, hardware errors (Ryzen cpu)I’m experiencing similar «mystery lockups». The screen goes black (or simply freezes) and I have to turn the computer off. Strangely I cannot find any error messages in the logs. My system: Ryzen 1600, Asrock AB350M Pro4, Nvidia Geforce 1050, 16 GB RAM. For now I’ve disabled Cool ‘n’ Quiet in the BIOS and I’m booting with processor.max_cstate=1 kernel-parameter, but it’s too early to say whether this workaround actually works. I’ve also run the kill-ryzen-script and it gave me a segfault within a minute. Should I RMA my CPU? EDIT: I just had a freeze/lockup one minute after writing this post. Same symptoms as above: Frozen screen, system doesn’t react anymore and I can’t ssh into it. So disabling Cool ‘n’ Quiet and booting with processor.max_cstate=1 didn’t fix it for me. No obvious error messages in «journalctl —boot=-1»: Last edited by Fredo (2018-01-18 11:09:26) #19 2018-01-18 13:26:03Re: Unexpected reboot, hardware errors (Ryzen cpu)I Also have major issues started yesterday with ryzen, 1600x here as well. I whould suspect RAM if the rigg was new, but i had no problems in windows or before yesterday.. #20 2018-01-18 13:47:21Re: Unexpected reboot, hardware errors (Ryzen cpu)I’ve also run the kill-ryzen-script and it gave me a segfault within a minute. Should I RMA my CPU? No, it didn’t «fix» anything. It just shifted the brokeness one space to the right. — jasonwryan #21 2018-01-23 11:42:22Re: Unexpected reboot, hardware errors (Ryzen cpu)Some updates from me: I spent most of the last weekend trying to find a solution for the freezes/lockups with my system. I found out that I’m most probably affected by this bug: Bug 196683 — Random Soft Lockup on new Ryzen build The discussion suggested many different workarounds: Since I’m also experiencing the segfault problem I’ve asked AMD for a RMA and it got accepted. I really really really hope that a new CPU will fix this. Last edited by Fredo (2018-01-23 21:56:01) #22 2018-01-25 18:12:40Re: Unexpected reboot, hardware errors (Ryzen cpu)Thanks for reporting back Fredo! I will not try those things then. I will probably want an RMA myself. I’m inexperienced in this though: how long time does it usually take until you get a new one? How long time after buying the CPU can I still RMA? I am based in Belgium. #23 2018-02-01 21:12:53Re: Unexpected reboot, hardware errors (Ryzen cpu)The discussion suggested many different workarounds: Since I’m also experiencing the segfault problem I’ve asked AMD for a RMA and it got accepted. I really really really hope that a new CPU will fix this. You can check if you compiled it correctly with And if it was enabled, you will see this: Another way to test: Default 4.15 kernel (currently in testing) has CONFIG_RCU_NOCB_CPU enabled, so you would only need to pass the boot parameter, if using that kernel. As for the exact workarounds you need to get a stable system, there is no definite answer on that yet, sadly. For instance, in this post, the person claims to need *ALL* I am currently testing myself all of those at once, and will report back if it survives. For 1), I had an option «Disable Global C-states» or something in the BIOS/UEFI. I also had something like «Deeper sleep» disabled by default. 3) was also disabled on the BIOS/UEFI. 2) I am doing through ryzen-stabilizator, which I updated to do that as well, apart from disabling C6. and 4) I am using the testing 4.15 kernel with (Ryzen 5 1600 here). As you mentioned, you are also plagued by the segfault issue, so do the RMA first and then do the testing again with the new chip. Источник |