Performance Monitoring — Sync Traces and On-Demand Profiling

Performance Monitoring — Sync Traces and On-Demand ProfilingInstrument the discovery pipeline with eztrc traces and add a Settings button that captures an on-demand pprof from the desktop app.

User Stories

As a daemon dev debugging a slow sync report, I want to look at /debug/traces and see exactly which stage of discovery ate the wall clock, so I don't have to sprinkle printlns and rebuild.

Problem

Today we can see HTTP request traces in /debug/traces and grab a pprof by knowing the daemon's debug port and typing a curl command. Neither is enough.

When a user reports "opening this doc takes forever", there's no timeline anywhere showing where the time went. DiscoverObjectWithProgress increments counters (PeersFound, BlobsDownloaded) but doesn't record how long any stage took. The scheduler only keeps a last-run timestamp. So we guess.

And when someone's daemon is using 100% CPU, we can tell them to run go tool pprof http://localhost:56001/debug/pprof/profile?seconds=30. Most users won't. The data is there, the front door is missing.

Two small fixes cover both. Neither needs new storage, a new protocol, or changes to how sync actually works.

Solution

Sync tracing via eztrc

We already serve /debug/traces with eztrc — it shows per-request HTTP traces and trcstats already computes p10/p50/p90/p99 percentiles per category. We just don't emit any sync traces into it.

The choke point is DiscoverObjectWithProgress at backend/hmnet/syncing/discovery.go:60. Wrap it with an eztrc trace keyed by the IRI, and mark the stages as they happen:

func (s *Service) DiscoverObjectWithProgress(
    ctx context.Context, entityID blob.IRI, version blob.Version, recursive bool, prog *Progress,
) (blob.Version, error) {
    ctx, tr := eztrc.New(ctx, "sync.discover", string(entityID))
    defer tr.Finish()

    if version != "" {
        eztrc.Tracef(ctx, "version=%s recursive=%v", version, recursive)
    }

    // ... existing body, with Tracef calls at stage boundaries:
    eztrc.Tracef(ctx, "local cache miss, querying peers")
    // ...
    eztrc.Tracef(ctx, "local peers: %d found, %d synced ok", prog.PeersFound.Load(), prog.PeersSyncedOK.Load())
    // ...
    eztrc.Tracef(ctx, "DHT lookup start")
    // ... etc.
}

Per-peer sync gets its own nested trace inside syncWithPeer at backend/hmnet/syncing/syncing.go:351:

func (s *Service) syncWithPeer(ctx context.Context, pid peer.ID, eids map[string]bool, ...) error {
    ctx, tr := eztrc.New(ctx, "sync.peer", pid.String())
    defer tr.Finish()
    // ... existing body, with Tracef at RBSR rounds and blob download batches.
}

Once this is in, /debug/traces shows sync.discover and sync.peer as new categories. trcstats gives us p50/p90 for free. No new endpoints, no new ring buffer, nothing to garbage-collect.

The only judgment call is how many stage marks to emit. Too few and we can't tell where time went; too many and the trace view gets noisy. Starting points: entry, local-cache-hit shortcut, peer list size, each peer sync start/end, DHT lookup start, DHT peer sync start, exit. Tune once we see real traces.

On-demand pprof from the Settings UI

net/http/pprof is already mounted on the daemon's debug listener (backend/cmd/seed-daemon/main.go:15, registered in backend/daemon/http.go:147). That means GET http://localhost:56001/debug/pprof/profile?seconds=30 already works today — it just has no UI.

Add a small block in the desktop app's Advanced settings (frontend/apps/desktop/src/pages/settings.tsx, inside DeveloperSettings) with three buttons:

Capture 30s CPU profile → /debug/pprof/profile?seconds=30
Heap snapshot → /debug/pprof/heap
Goroutine dump → /debug/pprof/goroutine

The click handler reuses the exact pattern in frontend/apps/desktop/src/save-cid-as-file.tsx: a tRPC mutation in app-api.ts that uses Electron's net.request to stream the response, dialog.showSaveDialog to pick a target, and fs.writeFileSync to persist. Default filename like seed-cpu-<iso-timestamp>.pprof.

Optional finisher: after save, shell.openPath the containing folder so the user can drag the file into an issue or open it with go tool pprof -http.

That's the whole feature on the frontend. Daemon gets zero new code.

Rabbit Holes

In-app flamegraph rendering. Tempting — we could show the profile right there. But it means shipping a pprof protobuf parser and a flamegraph library in the renderer. Save-to-disk works for v1; users open with go tool pprof -http if they want the UI. Revisit later.
eztrc retention at high sync volume. If someone is subscribed to a lot of accounts, sync.discover could churn a lot of traces and crowd out other categories. eztrc has per-category limits; we may need to set one explicitly. Not a blocker, but something to watch once it ships.
Which stages to mark inside syncResources(). Too few and the trace is useless; too many and the view is noise. First pass should stay conservative (entry, exit, DHT boundary, per-peer). Expand based on what we actually want to debug.
pprof label / symbol privacy. If users share a profile file, it contains function names and sometimes data addresses. Not leaking anything secret, but worth a note in the UI ("contents are technical — no document content is included") so nobody worries.
File name collisions / save cancel UX. showSaveDialog handles the cancel path; just need to make sure the mutation doesn't toast an error when the user intentionally cancels.

No Gos

Auto-profile on saturation (CPU/RAM watchdog, red dot in settings, profile history list). This is a separate project — bigger scope, needs cross-platform process sampling, storage policy, unread state. Out of this proposal on purpose.
Remote / authenticated access to the debug surface. The debug HTTP listener stays localhost-only and unauthenticated, same as today.
Any change to sync behavior or wire protocol. Instrumentation only. If a trace reveals a bug, we fix it in a follow-up.
Replacing /debug/pprof with a custom endpoint. net/http/pprof already does the right thing. Don't reinvent.
A custom profile storage / history feature. Users save to their own disk. We don't hold profiles inside the daemon.

Do you like what you are reading? Subscribe to receive updates.

Unsubscribe anytime