What your router knows but won't tell you

A while back my home network started acting strange. Not broken, exactly. Just a low-grade wrongness. Video calls would glitch for a few seconds and come back. Large file uploads would stall and resume. Speed tests mostly looked fine. Occasional ones didn’t. Every tool I pointed at the problem said “looks ok to me.”

The cable was a Cat6 run under the house. Terminated cleanly. Passed a basic continuity test. Looked for all the world like a working cable. It was, in the way that a tire with a slow leak is a working tire — every push of the pedal got you forward, until at some point it didn’t.

What the cable actually was, I found out later, was partially bitten through by a staple somewhere under the floor joists. The jacket had been compromised enough to let in just enough moisture to degrade the pairs, just enough, just sometimes. The physical layer was fine most of the time. When it wasn’t, the Ethernet frames came out garbled, failed their Frame Check Sequence, and the receiving NIC dropped them on the floor — incrementing a counter called rx_crc_errors in /sys/class/net/eth0/statistics/. Standard Ethernet has no retransmission at L2; it’s best-effort delivery. For my TCP traffic, the sender eventually noticed the missing acks and retransmitted higher up the stack. For UDP — which is most real-time video and voice — the frames were just gone, permanently, which is exactly what the video-call glitches actually were. From my laptop’s perspective: a little extra latency on downloads, a little weirdness on calls, mostly fine.

From the router’s perspective: an integer in a kernel data structure was climbing faster than it should be. The router knew. The router knew exactly. The router’s vendor never shipped a way to tell me.

That’s the story I want to unpack, because it’s not just about me and my cable. This is the shape of almost every home network problem I’ve tried to diagnose in the last year of building a diagnostic tool. The firmware already knows. It doesn’t tell.

The funnel

Here’s what I mean, concretely. I’ve spent the last several months tearing apart about twenty consumer gateways, mesh systems, and cable modems — Netgear Orbi, TP-Link Deco, Google Nest WiFi, Eero, Starlink, UniFi, AT&T and Xfinity ISP boxes, and a long tail of others. The pattern is identical across every single one:

The firmware measures a lot. On a Netgear Orbi WiFi 7 system, the mesh self-organizing daemon (ezmesh) sends path-quality probe packets every single second between every mesh node pair for its own steering decisions. A separate daemon (repacd) runs rolling 30-minute and 2-hour evaluation windows on the backhaul link quality. The radio driver (ath11k) tracks per-VAP CRC errors, retry percentages, airtime consumption, and noise floor. All of this is happening in the firmware right now, in your living room, continuously.
Root-on-the-box gets less. If you jailbreak or SSH into the device, you can read some but not all of it. On the satellite mesh nodes, you get satellite_status, hop_count, partial wireless info — but the ACL on the file read method explicitly denies reading /sys/class/net/ath*/statistics/. The error counters are there, on the device, exposed to the kernel’s VFS. The management daemon won’t let you read them.
The web UI gets even less. The one page on Orbi that shows any per-interface statistics at all (RST_stattbl.htm) displays TX and RX packet counts plus collisions. No error subtypes. No retry percentages. No per-client signal in dBm — just a 0-to-100 “quality bar” that compresses the actual data (dBm signal strength, SNR, PHY rate) into a single vibes-based integer.
The SOAP API gets a subset of the web UI. The vendor’s own SOAP interface returns client signal as 0-100, satellite signal as 0-100, WAN bytes as a 32-bit counter that overflows at 4 GB.
Any unauthenticated standard API gets a subset of that. UPnP IGD, which is the closest thing we have to a cross-vendor diagnostic protocol, exposes WAN IP, WAN status, WAN uptime, and aggregate byte counters. That’s it.

Five layers. Roughly a 10× compression from the top to the bottom. And that’s on Orbi, which is comparatively open — the direction of travel across the industry is toward less exposure, not more. Between Orbi firmware versions V10.5.10.10 and V10.5.19.5, several of the ubus endpoints I was relying on for satellite telemetry got ACL-denied in the newer build.

Why this happens

Not because exposing it is hard. Exposing the rx_crc_errors counter from sysfs is a few lines of code. The routers already run HTTP servers. Everything from the kernel to the admin web page is sitting on one small Linux or OpenWrt-derived device; nothing is physically far from anything.

It happens because no vendor has a commercial incentive to expose it, and no standard forces them to. Every vendor independently decides how much to surface, and most choose the minimum that makes their own web UI work. The feature roadmap is set by what the mobile app needs to render, not what a curious user (or a third-party diagnostic tool, or a CISO’s compliance dashboard) might want to ask.

The one exception is Ubiquiti UniFi, which offers a genuinely rich local API with typed endpoints and an event websocket. Prosumers who want real visibility usually end up there. But even UniFi is instructive: their officially-supported v1 Network API has rotted, and almost everything interesting I interrogate today is through their wholly undocumented v2. Even the industry’s best-in-class open platform couldn’t sustain openness as a first-class product commitment.

UPnP is the closest thing we have, and it’s not close

UPnP IGD (Internet Gateway Device) is the one quasi-standard local diagnostic protocol in consumer networking. It’s been around since 2001 and shipped on nearly every consumer router since — it’s the de facto path for automatic port mapping on a LAN, though not the only one (NAT-PMP, PCP, and application-level STUN/TURN/ICE cover similar ground for specific cases).

You’d think that gives us something to build on. It mostly doesn’t.

Adoption varies wildly. Comcast’s Xfinity XB7 gateways ship UPnP with only a WPS (WiFi setup) service; no IGD at all, which means the standard “what’s your WAN IP” query gets you nothing. AT&T’s BGW-series fiber gateways have no UPnP whatsoever. Tenda and TP-Link Deco ship MiniUPnPd with hardcoded defaults that are actively misleading: query GetCommonLinkProperties on a Deco connected to gigabit Sonic fiber and it happily reports Cable / 8 Mbps, because the MiniUPnPd build never got its default strings overridden by the vendor’s init script.

Even where UPnP is implemented well, it doesn’t compose. The gateway’s UPnP IGD exposes the gateway’s view of its WAN, but it does not relay any telemetry from the upstream device — the DOCSIS cable modem, the fiber ONT, the DSL modem. If you want to know whether your ISP’s problem is actually your problem, you have to interrogate the modem separately on its own IP (typically 192.168.100.1 for DOCSIS), with its own auth, its own HTML-scraping protocol, its own per-page login cycle. There is no standard query that says “tell me the L2 path quality end to end.”

And nobody is fixing UPnP itself. The UPnP Forum that owned the spec merged into the Open Connectivity Foundation in 2016 and hasn’t meaningfully evolved IGD since — IGD v2 was the last substantive update and it’s now 15 years old. A vendor who wanted to extend UPnP today would either ship a proprietary vendor-specific service under their own URN (defeating the standardization argument) or wait for a standards body that doesn’t currently exist. Neither of those is progress. Not holding my breath.

Structured gathering of L2 path quality across the full client → AP → mesh → router → modem → ISP path is therefore fragmented across partial standards with gaps nothing covers. No single standard protocol composes the segments, and the parts the vendor’s firmware measures internally are exactly the parts that stay internal.

The real-life cost

This isn’t academic. Here are three failure modes I’ve either personally diagnosed or seen in field data, all of which are measured by the firmware today and invisible to any user-facing tool:

Mesh radio backhaul degradation. 5GHz backhaul between a router and satellite drops from 2882 Mbps to 720 Mbps because someone started running a microwave at the end of the kitchen island that lives between them. Repacd sees it immediately — its RateThresholdMin5GInPercent=40 variable is watching for exactly this. The user experiences “video calls get weird around lunchtime.” Nobody connects the two.

Electricians stapling Ethernet. What happened to me. A cable tester reports continuity, the link comes up at gigabit, speedtest looks fine, and somewhere behind drywall a galvanized staple is slowly shorting a pair. The kernel’s rx_crc_errors counter ticks up a few thousand per minute. Everything else looks ok. Days or weeks of “my internet is sometimes weird” until someone actually pulls the cable.

Channel interference from a new neighbor’s AP. Noise floor climbs 8 dBm on a specific channel. The iwinfo survey call would report this directly. On most consumer routers either the call fails or returns empty because the vendor hasn’t hooked it up, even though the underlying driver (ath11k, mac80211) maintains the statistics internally. The neighbor starts working from home in January; your retries climb all year.

All three are diagnosable by the firmware right now. None are diagnosable by the user. The gap is the product of neglect, not difficulty.

What you can already do without the vendor’s help

Before I declare the vendor-cooperation problem insurmountable, it’s worth acknowledging what the client can actually do on its own. Two existing paths cover more ground than I’ve been giving them credit for:

Host-side statistical inference. A client with kernel visibility — or eBPF instrumentation — can see a lot about its own network behavior. Per-socket TCP retransmit counts, RTT distributions, congestion window evolution, DNS query latency histograms, and the micro-timing shape of packet loss are all observable without any cooperation from the router. A model built on those signals can distinguish “my WiFi is congested” from “my cable is corrupting frames” from “my ISP is buffer-bloated” from “my own CPU is the bottleneck” with surprising precision, purely by the shape of the loss. Network Weather already uses some of this and there’s a lot more runway.

IEEE 802.11k/v/r. I was about to complain about the lack of a standard for RF-environment telemetry. I shouldn’t — 802.11k Radio Resource Measurement has been standardized since 2008 and is widely deployed. A WiFi client can query its associated AP over the air for neighbor reports, channel-load measurements, and noise-histogram data, no web API involved. 802.11v adds BSS Transition Management; 802.11r adds fast BSS transition. Most modern consumer APs support k and v, including Orbi (their beacons advertise both).

Neither of these paths closes the full gap. Host-side inference can detect that there’s loss and hint at where, but it can’t read an interface error counter on a mesh satellite two hops away, tell you the exact PHY rate the backhaul negotiated, or see a rising noise floor on a band no client is currently associated to. 802.11k gives you a client-side view of the radio environment but doesn’t expose the vendor’s internal mesh state, wired port statistics, or anything about the upstream modem. The piece of the gap that remains — cross-segment L2 path quality visible to the vendor’s firmware and accessible through nothing — is real, and it’s what the rest of this argument is about.

What would have to change

Three plausible mechanisms, in rough order of likelihood:

Chipset manufacturers push it down. Qualcomm, Broadcom, and MediaTek ship reference firmware images that OEMs build on. If a minimum-telemetry-exposure schema were built into the reference — the way MiniUPnPd or reference WPS ships by default — every downstream vendor would inherit it more or less for free. This is probably the most practical path and the one that’s least discussed.

An IETF standard for local diagnostics. A lightweight, JSON-based, read-only local API — a /diag.json that any router could implement. This doesn’t require vendor cooperation to write; it requires vendor cooperation to adopt. Companion technology: TWAMP Light (RFC 5357, RFC 8545) is already a well-specified IETF standard for hop-by-hop active measurement. Cisco, Juniper, and Nokia enterprise gear implements it. Consumer routers don’t, but the spec is sitting there waiting.

Regulation. The FCC’s Cyber Trust Mark program is the closest thing to a regulatory forcing function. The program requires vendors to commit to security update windows and publish end-of-support dates. Extending that to a minimum diagnostic-exposure requirement — so that a Cyber Trust Mark-certified device has to expose enough telemetry for a third-party tool to verify its own security posture — would be a natural extension of the existing policy direction. Slowest of the three paths, but the stickiest once in place.

Whichever of these moves first, the underlying argument is simple: your router already knows. The data is there. The failure is the last inch, and the last inch is a policy failure, not an engineering one.

About Network Weather

I’ve been building Network Weather for about a year now, since leaving Capital One in June 2025. It’s a Mac app that tries to tell you, in plain English, why your home network is acting weird. The reason it can do more for some routers than others is the exposure gap I described above — we consume whatever the vendor will give us, and some vendors give us much more than others. The argument for openness isn’t about us. It’s about users already having the answer sitting in silicon twenty feet away, and not being able to reach it.

If you’re a router vendor and want to talk about what a minimum-useful local diagnostic surface might look like, drop me a line.