Follow

I'm experiencing freezes with my new rig and I have no idea what could be the problem.
I suspect it's a hardware problem. I've lowered the memory frequency to see if it's not the problem but it's a shot in the dark.
Any suggestions?

@Matter
Motherboard : Asus B350-F
CPU : AMD Ryzen 5 1600
GPU: Asus GeForce GTX 1070
Mem: 2x8Gb G.Skill DDR4 F4-3000C15D
SSD: Samsung 860 EVO 500 Gb

@e4rache did you update to the latest microcode and BIOS? Some issues were fixed recently for ryzen IIRC. I assume you're on a GNU/Linux? Try to look at journalctl to see if it has anything on there. I like to use journactl piped into lnav

@Matter I have the new microcode but I didn't update the bios.
I've check journalctl but there's nothing there, it really looks like a sudden hardware lockup.

The rig is on 24/7 and the freeze only happen once a week or so and so far only when it is idle.

I'll update the bios, sounds like a good advice, thanks.

@e4rache Yup, sounds exactly like what I have. Very annoying, have to do a hardware reset and since I'm not always there I have to annoy friends with it... take a look at this: bugzilla.kernel.org/show_bug.c

@e4rache "
Description: Under a highly specific and detailed set of internal timing
conditions, the MWAIT instruction may cause a thread to
hang in SMT (Simultaneous Multithreading) Mode.
"

@Matter It seems that's a soft lockup followed by a kernel panic.
That's not what's happening here.
I have no errors whatsoever in the logs. It just freezes.
Let's hope the BIOS update will fix that :-)

@e4rache a lot of people in that thread do experience that though, see for example bugzilla.kernel.org/show_bug.c

I, for one, also have a freeze like yours on my server every once in a while (had it three times in as many months), with never more than 2 weeks uptime. I disabled some non-essential stuff and that seems to help, maybe the timing in that specific application was triggering the bug. Didn't have the chance to update BIOS yet since I haven't been on site for a while.

@Matter Indeed that's exactly what I'm experiencing. It looks like
- Typical Current Idle
- disabling global c-states
- idle=nomwait kernel parameter
are worth a try.

Thanks a lot.

@e4rache please tell me what you end up changing and whether it helps or not! It would help me a lot, for when I finally get physical access again

@Matter Sure. The first thing I'll try is to set "typical current idle" UEFI setting.

@e4rache yup that seems the only sensible one, the others would multiply my energy consumption by 3 or so if I calculated that correctly. Not ideal, obviously.

@Matter Voila, I've change the bios setting from "low" to "typical" current idle.
*fingers crossed*
If I don't come back to you for an update in a week or so, just ping me.

@Matter No more freeze. Seems it worked for me \o/

@e4rache that's great news! My server hasn't frozen again since I reduced load to only necessary stuff, but it annoys me that it's so underused right now, so I'm anxious to get to it and test out that setting

@Matter mhh strange because it seems my problem was when the computer was idling instead of high load.

@e4rache yes right, so maybe it isn't the same issue. I was running a tor relay, so a lot of encryption work going on there I guess. It's still quite sporadic and inconsistent, so maybe a lot of transitions from active to idle states did it? No clue

Sign in to participate in the conversation
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.