I'm experiencing freezes with my new rig and I have no idea what could be the problem.
I suspect it's a hardware problem. I've lowered the memory frequency to see if it's not the problem but it's a shot in the dark.
Motherboard : Asus B350-F
CPU : AMD Ryzen 5 1600
GPU: Asus GeForce GTX 1070
Mem: 2x8Gb G.Skill DDR4 F4-3000C15D
SSD: Samsung 860 EVO 500 Gb
@e4rache did you update to the latest microcode and BIOS? Some issues were fixed recently for ryzen IIRC. I assume you're on a GNU/Linux? Try to look at journalctl to see if it has anything on there. I like to use journactl piped into lnav
@Matter I have the new microcode but I didn't update the bios.
I've check journalctl but there's nothing there, it really looks like a sudden hardware lockup.
The rig is on 24/7 and the freeze only happen once a week or so and so far only when it is idle.
I'll update the bios, sounds like a good advice, thanks.
@e4rache Yup, sounds exactly like what I have. Very annoying, have to do a hardware reset and since I'm not always there I have to annoy friends with it... take a look at this: https://bugzilla.kernel.org/show_bug.cgi?id=196683
Description: Under a highly specific and detailed set of internal timing
conditions, the MWAIT instruction may cause a thread to
hang in SMT (Simultaneous Multithreading) Mode.
@Matter It seems that's a soft lockup followed by a kernel panic.
That's not what's happening here.
I have no errors whatsoever in the logs. It just freezes.
Let's hope the BIOS update will fix that :-)
@e4rache a lot of people in that thread do experience that though, see for example https://bugzilla.kernel.org/show_bug.cgi?id=196683#c418
I, for one, also have a freeze like yours on my server every once in a while (had it three times in as many months), with never more than 2 weeks uptime. I disabled some non-essential stuff and that seems to help, maybe the timing in that specific application was triggering the bug. Didn't have the chance to update BIOS yet since I haven't been on site for a while.
@Matter Indeed that's exactly what I'm experiencing. It looks like
- Typical Current Idle
- disabling global c-states
- idle=nomwait kernel parameter
are worth a try.
Thanks a lot.
@e4rache please tell me what you end up changing and whether it helps or not! It would help me a lot, for when I finally get physical access again
@e4rache yup that seems the only sensible one, the others would multiply my energy consumption by 3 or so if I calculated that correctly. Not ideal, obviously.
@Matter Voila, I've change the bios setting from "low" to "typical" current idle.
If I don't come back to you for an update in a week or so, just ping me.
@e4rache that's great news! My server hasn't frozen again since I reduced load to only necessary stuff, but it annoys me that it's so underused right now, so I'm anxious to get to it and test out that setting
@Matter mhh strange because it seems my problem was when the computer was idling instead of high load.
@e4rache yes right, so maybe it isn't the same issue. I was running a tor relay, so a lot of encryption work going on there I guess. It's still quite sporadic and inconsistent, so maybe a lot of transitions from active to idle states did it? No clue
Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.