RustScope v0.5.0
Unified performance profiling infrastructure for Rust systems. One command, one JSON, zero configuration.
TABLE OF CONTENTS
- 1. WHAT THIS IS
- 2. IMPLEMENTED & TESTED FEATURES
- 3. ARCHITECTURE OVERVIEW
- 4. DETAILED FLOWS
- 5. DATA MODEL & SCHEMA
- 6. VISUALIZER (UI/UX)
- 7. GETTING STARTED
- 8. TESTING
- 9. OPERATIONAL RUNBOOK
- 10. SECURITY MODEL
- 11. PERFORMANCE & SCALABILITY
- 12. CONTRIBUTING
- 13. CHANGELOG & ROADMAP
1. WHAT THIS IS (Executive Summary)
- Problem: Modern Rust profiling is fragmented. Developers must juggle
perffor CPU,heaptrackfor memory, and custom instrumentation for function-level SLOs, often losing the "big picture" of how system metrics correlate with code execution. - Solution: RustScope provides a unified profiling CLI and a zero-cost instrumentation library. It captures CPU spikes, memory leaks, thread activity, function-level latencies, lock contention, and async task timing into a single, dashboard-ready JSON report.
- Non-goals: This is not a debugger or a live production monitoring agent (e.g., DataDog). It is a developer-centric tool for performance auditing, regression testing, and micro-benchmarking.
- Status:
Beta. Current version0.5.0. The JSON schema is stable for dashboard integrations. (v1.0 in progress with timeline correlation, export polish, and Linux CPU sampling).
2. IMPLEMENTED & TESTED FEATURES
Core Profiling (Library)
-
Function Instrumentation — Attribute macros for timing, memory, and stack depth.
- Status:
✅ Implemented + Tested - Coverage:
rustscope/tests/integration.rs::test_basic_profiling - Contract: Guarantees < 50ns overhead per instrumented call.
- Macros:
#[profile],#[profile_all](entire mod/impl blocks),profile_scope!,profile_block!,profile_if!
- Status:
-
Statistical Benchmarking — High-precision statistical runner with warmup.
- Status:
✅ Implemented + Tested - Coverage:
rustscope/tests/integration.rs::test_benchmark_runner
- Status:
-
Outlier Detection — Online statistical anomaly flagging using Welford's algorithm.
- Status:
✅ Implemented + Tested - Coverage:
rustscope/tests/integration.rs::test_outlier_detection - Config: Default 3σ threshold, customizable.
- Status:
-
Latency Budgets — Per-function SLO with violation callbacks.
- Status:
✅ Implemented - Usage:
#[profile(budget_ns = 1_000_000)]with optionalon_violationcallback.
- Status:
-
Timeline Events — Per-call NDJSON export with timestamps.
- Status:
✅ Implemented - Output: Every function call with start/end timestamps for temporal analysis.
- Status:
-
CPU Hardware Counters (Linux,
hw-countersfeature) — perf-event counters.- Status:
✅ Implemented - Metrics:
cpu_cycles,instructions,cache_misses,ipc,branch_miss_rate.
- Status:
-
Async Profiling (
async-profilingfeature) — Correct async timing viatracing-subscriberLayer.- Status:
✅ Implemented - Why: Traditional timing on async fns measures wall time (including yielded time), which is useless for CPU profiling.
- Status:
-
Sampling Profiler (
samplingfeature) — SIGPROF-based zero-code-change profiling.- Status:
✅ Implemented - Default: 100Hz sampling, captures backtraces automatically.
- Status:
-
Lock Profiling — Mutex/RwLock contention metrics.
- Status:
✅ Implemented - Metrics: Contention counts, wait times,
LockRecordin schema.
- Status:
-
Thread Profiling — Per-thread performance breakdown.
- Status:
✅ Implemented - API:
thread_profile::ThreadProfiler::enable().
- Status:
-
Session Metadata — Git commit, CI tags, custom key-value pairs.
- Status:
✅ Implemented - Auto-detects: GitHub Actions, GitLab CI environment variables.
- Status:
-
Export Formats — Multiple industry-standard outputs.
- Status:
✅ Implemented - Formats: pprof (protobuf/gzip), Chrome Tracing (Perfetto), SpeedScope, collapsed stacks (Brendan Gregg format), CSV.
- Status:
-
Helper Macros:
profile_scope!(name)— Inline named scope profiling.profile_block!(expr)— Profile block expression, return value.assert_perf!(expr, threshold_ns)— Panic if performance threshold violated.profile_if!(condition, expr)— Conditional profiling.
System Analysis (CLI)
-
Indefinite Process Monitoring — Continuous polling of CPU, Memory, Threads, FDs, and syscalls.
- Status:
✅ Implemented - Contract: Automatically stops and flushes data when the child process exits or on
Ctrl-C. - Default:
-d 0(indefinite until exit).
- Status:
-
Session Event Detection — Real-time detection of CPU/Memory spikes.
- Status:
✅ Implemented - Behavior: Records "spike" events in JSON when memory grows > 5MB or CPU > 80% between samples.
- Status:
-
Live Terminal Dashboard — Multi-line in-place terminal UI.
- Status:
✅ Implemented - Metrics: CPU, heap, threads, FDs, syscall rate, peak values, event pane.
- Modes:
compact,full,slow(3000ms),fast(500ms) refresh. - Controls:
qquit,ctoggle compact,sslow,ffast.
- Status:
-
macOS Function Sampling — Sidecar integration with macOS
samplecommand.- Status:
✅ Implemented - Contract: Chunked sampling with
-mayDieflag, improved parser/fallback.
- Status:
-
Linux Function Sampling —
perf record+perf scriptparsing.- Status:
✅ Implemented - Note: Linux CPU metric in process summary is placeholder (
0.0) — useperfsampling for CPU hotspots on Linux.
- Status:
-
Process Metrics — Comprehensive system metrics collection.
- Status:
✅ Implemented - Metrics: CPU%, heap MB, thread count, FD count, syscalls/sec.
- Status:
-
Crate/Module Rollups — Project-level hotspot aggregation.
- Status:
✅ Implemented - Output:
crate_rollupsandmodule_rollupsinProfileSession.
- Status:
-
Hotspot Snapshots — Time-indexed function snapshots during session.
- Status:
✅ Implemented - Use case: Correlate spikes with exact functions dominating that window.
- Status:
-
Regression Comparison — Session-to-session diff with severity levels.
- Status:
✅ Implemented - CLI:
--compare-baseline,--compare-current,--threshold,--fail-on. - Severity:
any,minor,moderate,critical.
- Status:
-
Sampling Diagnostics — Backend fidelity and symbolization quality.
- Status:
✅ Implemented - Metrics:
raw_samples,symbolized_samples,dropped_samples,backendname.
- Status:
-
Attach Mode — Profile running processes via
--pid.- Status:
✅ Implemented
- Status:
-
Cargo Integration — Build and profile with
--cargo.- Status:
✅ Implemented
- Status:
3. ARCHITECTURE OVERVIEW
3.1 Component Diagram
3.2 Boundary Definitions
| Boundary | Protocol | Auth mechanism | Failure mode | Retry strategy |
|---|---|---|---|---|
| CLI → Child Process | OS Signals (SIGINT/SIGTERM) | N/A (Process Owner) | Process Zombie | 2s Graceful Wait then SIGKILL |
| CLI → /proc (Linux) | File I/O | FS Permissions | Permission Denied | Graceful Fallback (0.0 metrics) |
| CLI → libproc (macOS) | C FFI (proc_pidinfo) | N/A | Access Restricted | Fallback to ps command |
CLI → sample (macOS) | Subprocess + stdout parse | N/A (Process Owner) | SIP/Permission Denied | Warning + continue without function samples |
CLI → perf (Linux) | Subprocess + perf script parse | N/A (Process Owner) | perf not installed | Warning + fallback to process metrics |
| Library → SIGPROF | Signal handler | N/A | Signal blocked | Disable sampling gracefully |
3.3 Architectural Decisions & Trade-offs
Decision: Welford's Online Algorithm for Outliers
- Rationale: We need to detect anomalies in real-time without storing every single call duration in memory.
- Theory cited: Welford's algorithm allows computing running mean and variance in $O(1)$ time and $O(1)$ space.
- Trade-offs: More sensitive to early-session noise; requires a "warmup" period (default 10 calls).
Decision: Indefinite duration by default (-d 0)
- Rationale: Most backend performance issues happen during specific session events (hitting a route), not fixed time windows.
- Theory cited: Event-driven monitoring — decoupling collection from time improves signal-to-noise ratio for servers.
- Anti-patterns avoided: "Blind profiling" where data collection stops before the interesting event occurs.
Decision: Async per-poll wrapping for #[profile]
- Rationale: Traditional timing on async fns measures "Wall Time" (including time spent yielded), which is useless for CPU profiling.
- Trade-offs: Slightly higher overhead due to future wrapping; requires
async-profilingfeature.
Decision: Canonical ProfileSession schema for both library and CLI
- Rationale: Previously, library and CLI emitted different JSON shapes. Unified schema simplifies the frontend and enables compare mode.
- Trade-offs: CLI adds optional process/session fields (
process_summary,process_samples, etc.) to the same schema.
4. DETAILED FLOWS
4.1 Happy Path: Binary Profiling
4.2 Indefinite Backend Monitoring
When running with -d 0 (default), the CLI monitors for Session Events.
- Scenario: User hits
/heavy-route. - Detection: Memory jumps 10MB.
- Action: CLI pushes a
MemoryEvent { type: "spike", location: "Memory spike: +10.0 MB" }. - TUI: Terminal flashes
[!] SPIKEand incrementsEVENTScount.
4.3 Error & Recovery Flows
- Process Not Found: If
proc_pidinfoor/procreads fail repeatedly, the CLI assumes the child has exited, flushes remaining data, and shuts down cleanly. - macOS Permission Denied: If
samplefails due to SIP/Permissions, the tool logs a warning but continues collecting system metrics (CPU/Mem). - Linux perf not installed: CLI warns and falls back to process metrics only (no function hotspots).
4.4 Regression Comparison Flow
5. DATA MODEL & SCHEMA
Entity-Relationship (JSON Shape)
The output is a single ProfileSession object:
Export Format Support
| Format | Schema | Compatible With |
|---|---|---|
| RustScope JSON | ProfileSession | Native UI, CI compare |
| pprof (protobuf) | pprof.proto | go tool pprof, Pyroscope, Grafana Pyroscope |
| Chrome Tracing | Trace Event Format | chrome://tracing, Perfetto UI |
| SpeedScope | speedscope.json | speedscope.app |
| Collapsed stacks | Brendan Gregg format | flamegraph.pl, Inferno |
| CSV | Tabular | Excel, pandas, data analysis |
6. VISUALIZER (UI/UX)
RustScope includes a premium, web-based visualizer built with Next.js, Tailwind CSS, and D3 for deep analysis of your performance profile sessions.
6.1 Key Features
- Instant Visualization — Simply drag and drop any
rustscope-last.jsonto generate high-resolution flamegraphs. - Multi-Format Support — Native support for RustScope JSON, plus compatibility with
inferno,samply, andpprofstack traces. - Interactive Stack Explorer — Seamlessly toggle between Flamegraphs (top-down) and Icicle Charts (bottom-up) with smooth D3 transitions.
- Search & Filtering — Instant search for function names and intelligent filtering to isolate your crate's logic from
stdor allocator overhead. - Smart Insights — Automated heuristic analysis flags critical bottlenecks, deep recursion, and memory-heavy hot paths.
- Session View — Process metrics over time, metadata, event log, and spike-to-hotspot correlation.
- Project View — Crate-level and module-level rollup analysis for whole-project profiling.
- Compare View — Load baseline and current sessions, view regression diff table with severity (Critical/Moderate/Minor), summary cards.
- Async View — Visualize async task poll times and timing breakdown.
- Locks View — Lock contention visualization with wait times and contention counts.
- Zones View — Custom profiling zones/regions for targeted analysis.
- Timeline Canvas — Time-series visualization of CPU, memory, threads, FDs, syscalls with hotspot overlays.
- Keyboard Shortcuts — Quick tab navigation (1-7), Escape to close panels, drag-to-resize sidebar.
- Multi-Export — Download sessions in pprof, Chrome Tracing, SpeedScope, CSV formats.
6.2 Running Locally
The visualizer is a dedicated application located in the flamegraph-profiler/ directory.
cd flamegraph-profiler
npm install
npm run dev
# Dashboard available at http://localhost:3000
7. GETTING STARTED
7.1 Prerequisites
- Rust:
>= 1.75.0 - OS: macOS (Intel/Apple Silicon) or Linux (x86_64).
- macOS Tools:
samplecommand (built-in). - Linux Tools:
perf(optional, for function sampling).
7.2 Environment Setup
# 1. Clone & Build CLI
git clone https://github.com/anurag/rustscope && cd rustscope
cargo build --release -p rustscope-cli
# 2. Add to PATH
alias rustscope=$(pwd)/target/release/rustscope
7.3 Install from crates.io (CLI users)
cargo install rustscope-cli
rustscope --help
7.4 Library Integration
Add to your Cargo.toml:
[dependencies]
rustscope = "0.5.0"
[features]
rustscope = ["rustscope/hw-counters", "rustscope/async-profiling", "rustscope/sampling"]
Use in your code:
use rustscope::profile;
#[profile]
fn my_expensive_function() {
// your code here
}
#[profile(budget_ns = 1_000_000)]
fn with_budget() {
// panics if exceeds 1ms (with default on_violation)
}
7.5 CLI Quick Usage
# Check backend/sampler readiness
rustscope --doctor
# Profile a cargo binary for 5 seconds
rustscope --cargo rustscope-examples --bin advanced_demo -d 5 -o rustscope-last.json
# Profile an already-built binary until exit / Ctrl-C
rustscope -- ./target/release/advanced_demo
# Attach to a running process
rustscope --pid 12345 --name api-server -d 30 -o api-profile.json
# Live dashboard with verbose output
rustscope -v --cargo rustscope-examples --bin stress_demo
# Compare two profile sessions and fail on critical regressions
rustscope --compare-baseline baseline.json --compare-current current.json --threshold 10 --fail-on critical
7.6 Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
RUSTSCOPE_ALLOC_PIPE | ❌ | — | Named pipe for LD_PRELOAD tracking (Linux only, v0.4.0 planned) |
VERBOSE | ❌ | false | Enable library-level internal logging |
8. TESTING
Test Architecture
rustscope/
├── tests/
│ └── integration.rs # Core library logic (Macros, Outliers, Stats, Timeline)
rustscope-cli/
└── src/profiler/ # Tested via demo runs (stress_demo)
Running Tests
# Core Library Tests
cargo test -p rustscope
# Run CLI Demo (End-to-End Verification)
./target/release/rustscope -v --cargo rustscope-examples --bin advanced_demo -d 5
# Run stress demo for realistic profiling
./target/release/rustscope -v --cargo rustscope-examples --bin stress_demo
# Compare two sessions (CI regression test)
./target/release/rustscope --compare-baseline base.json --compare-current curr.json --fail-on critical
9. OPERATIONAL RUNBOOK
Scenario: Metrics report 0.0 values
- Symptoms: JSON has
0.0for CPU/Heap. - Diagnosis: Check if process exited instantly or if running on an unsupported OS.
- Remediation: Use
-vto check live terminal metrics. Ensure target binary is prefixed with./.
Scenario: Linux CPU is 0.0 in process summary
- Symptoms:
process_summary.cpu_pctshows0.0. - Diagnosis: Linux CPU collection in CLI is placeholder — use
perfsampling for function hotspots. - Remediation: Install
linux-perfand ensureperf recordis available; function hotspots will still work viaperf scriptparsing.
Scenario: JSON file too large
- Symptoms: File > 100MB for long runs.
- Remediation: Reduce
--sample-rate(default 100Hz) to 10Hz, or use library instrumentation only.
Scenario: macOS sample fails with permission error
- Symptoms: No function hotspots on macOS.
- Remediation: Grant Terminal "Full Disk Access" in System Preferences → Security & Privacy, or run with
-vto see warnings.
10. SECURITY MODEL
- Auth: None. Designed for local development.
- Secrets: CLI does not capture environment variables of the target process by default.
- Data in Transit: All communication between CLI and Target is via local OS pipes/signals.
- Input Validation: JSON parser validates schema version and required fields before processing.
11. PERFORMANCE & SCALABILITY
- Throughput: Designed to handle binaries with 100,000+ function calls per second.
- Bottlenecks: High sample rates (> 1000Hz) will introduce measurable OS interrupt overhead.
- Scaling: The library scales horizontally with threads using
AtomicU64for metric aggregation. - Overhead: < 50ns per
#[profile]call (when compiled with optimizations).
12. CONTRIBUTING
- Branches:
feature/*orfix/*. - Commits: Conventional Commits preferred.
- PRs: Must include a test case in
integration.rsfor new library features. - CLI Changes: Must update both
rustscope-cliandflamegraph-profilerif schema changes.
13. CHANGELOG & ROADMAP
v0.5.0 (Current): Unified Session Profiling, Rollups, and Regression Workflow
- Canonical CLI Output:
rustscope-clinow emits the sameProfileSessionschema as the library. - Session Profiling: Attach mode (
--pid), indefinite duration, reliable flush on shutdown. - Terminal Dashboard: Multi-metric live UI with refresh modes and controls.
- Project Rollups: Crate-level and module-level hotspot aggregation.
- Frontend Views: Session, Project, Compare, Async, Locks, Zones tabs.
- Regression Workflow: Baseline vs current comparison with severity levels for CI gating.
- Sampler Improvements: macOS chunked
sample, Linuxperf record+perf scriptparsing. - Stress Demo: Realistic workload (CPU bursts, memory spikes, thread/FD/syscall churn).
v0.6.0 (Next): Timeline Correlation & Export Polish
- Time-Correlated Session Analysis: Click a spike → see exact time window → inspect dominating crate/module/function.
- True Timeline/Session View: CPU/memory/FD/thread/syscall lanes with hotspot overlays and event markers.
- Export Polish: Stabilize pprof, Chrome Tracing, SpeedScope exports with complete feature parity.
- Linux CPU Collection: Replace placeholder with real
/proc/statorperfintegration for process summary.
v1.0 (Target): Production-Ready Profiler
- Reliable Hotspot Capture: macOS symbol extraction improvements, Linux
perfsymbolization maturity. - One Stable Schema: Freeze
ProfileSessionschema version, remove legacy compatibility branches. - CI/CD Integration: GitHub Actions / GitLab CI templates for performance regression gating.
- Documentation: Complete API docs, tutorial guides, video walkthroughs.
- Release Stability: Semantic versioning, deprecation policy, migration guides.
License: MIT
Acknowledgements: Inspired by perf, dtrace, pprof, flamegraph, and the parking_lot community.