Huzaifa Rasheed

Huzaifa Rasheed

Software Engineer

Email at [email protected]


Blogs

Debugging the Wi-Fi Crash That Froze My Homelab

November 3, 2025

I just wanted my homelab to run smoothly. But sometimes, the universe (and Linux) has other plans.

My home server hosts a bunch of small things on my old hardware.

It connects via Wi-Fi, because i didn’t want to run a 20-meter Ethernet cable halfway across the house to where my router is.

So… I stuck with Wi-Fi.

Little did i know, the trouble it would cause.


⚡ The Mystery Crashes

Every once in a while, my entire system would just freeze.

No response, no errors, no log, no warning. Just fans spinning and the little lights blinking.

The only way to bring it back? Pull the plug.

At first, I thought I was overloading it. Maybe too many containers, maybe overheating, maybe the RAM.

My mind was saying “it’s probably hardware”.

It wasn’t until a few nights ago, when it crashed again mid-session, I decided to dig in.

syslog and journalctl didn’t give me anything obvious.

That’s when I asked ChatGPT to pair-debug with me, and together we started narrowing things down.


🧩 The Clue

After a bit of back-and-forth, we found something interesting buried in dmesg:

iwlwifi 0000:02:00.0: Device error - SW reset
iwlwifi 0000:02:00.0: Start IWL Event Log Dump: nothing in log

Wait… Wi-Fi?

I thought it was supposed to “just work”.

Maybe with a little less throughput (no ethernet remember).

Apparently, under high system load, Intel Wi-Fi chips can panic, crash the firmware, and sometimes freeze the entire system.


🔧 The Fix (and the Weird Lesson)

Turns out, the issue comes from aggressive power-saving and hardware crypto settings that don’t play nicely under load.

The fix was to offload some stuff from the Wi-Fi driver (iwlwifi).

I added a few lines to /etc/modprobe.d/iwlwifi.conf:

options iwlwifi power_save=0
options iwlwifi 11n_disable=8
options iwlwifi swcrypto=1

and ofc, i didn’t know what these do, so after finding out, here is what these do

  • power_save=0: disables Wi-Fi power-saving features (like periodic sleep modes).

    • Some Intel cards crash when the radio is repeatedly sleeps/wakes-up.
    • Power-save transitions can trigger firmware “microcode errors.”
    • Now, the Wi-Fi chip stays fully powered, uses slightly more power, but that’s fine.
  • 11n_disable=8: disables A-MPDU aggregation, a feature that batches multiple pkts/frames into one large transmission.

    • Certain Intel firmware versions crash when handling large aggregated frames.
    • Disabling A-MPDU stops those crashes at the cost of some throughput.
    • The Catch:

      • Wi-Fi still remains fast, but maximum throughput may drop a bit (e.g., 150 → 120 Mbps).
      • Stability under sustained load improves dramatically.
  • swcrypto=1: forces encryption/decryption of Wi-Fi traffic to happen in software (CPU) rather than the card’s onboard hardware

    • Some Intel cards hardware crypto engines are buggy and can lock up under load, causing the “SW reset”
    • Software crypto avoids that path entirely.
    • The Catch/Win:

      • Very minor extra CPU use (I didn’t even notice).
      • Slight latency improvement in exchange for slightly higher CPU work (again didn’t notice the load).

💪 The Stress Test

To see if it actually worked, I wanted to stress test it, which is where i found stress-ng.

I started with individual components test (for ~10mins each)

# cpu
sudo stress-ng --cpu 8 --timeout 600s --metrics-brief 
# ram
sudo stress-ng --vm 2 --vm-bytes 80% --timeout 600s --metrics-brief 
# disk, careful it can corrupt the disk
sudo stress-ng --hdd 2 --hdd-opts dsync --timeout 600s --metrics-brief
# i/o
sudo stress-ng --io 2 --timeout 600s --metrics-brief

and "stressing" the Wi-Fi module in parallel
# sends pkts as fast as the network allows
sudo ping -f -s 1400 google.com 

All of these independently didn't crash my system, so naturally the next thing was to "stress" everything together
sudo stress-ng --cpu 4 --vm 2 --vm-bytes 70% --hdd 1 --io 2 --timeout 600s --metrics-brief
# plus Wi-Fi module stress in parallel
sudo ping -f -s 1400 google.com 

it basically maxes out mostly everything - CPU, memory, disk, I/O and network all at once - the same kind of load that used to instantly kill my system.

This time… nothing crashed.

No hangs, no lag, no freeze, just some fan noise.

I was happy, my server handled it, and most importantly it was still working fine.

🧠 What I Learned

Before this, I never really thought about Wi-Fi drivers at the kernel level.

But this bug made me realize how deep Linux’s stability goes - and how fragile things can get when firmware starts misbehaving.

And the irony is, if I had just used Ethernet, I would’ve never even discovered this.

Avoiding “the easy fix” lead me to a much better one.

✨ Takeaway

Eunning a homelab, among many things, also helps to understand how these system behaves under pressure.

In my case, it helped me uncover and fix a bug that I didn’t even know existed.

Technically, all bugs are like this when they are first discovered, but this bug especially was not on my bingo cards.


All this, just because I didn’t want to run an Ethernet cable - but honestly, I’m glad I didn’t.