RustScope v0.5.0

Unified performance profiling infrastructure for Rust systems. One command, one JSON, zero configuration.

1. WHAT THIS IS
2. IMPLEMENTED & TESTED FEATURES
3. ARCHITECTURE OVERVIEW
4. DETAILED FLOWS
5. DATA MODEL & SCHEMA
6. VISUALIZER (UI/UX)
7. GETTING STARTED
8. TESTING
9. OPERATIONAL RUNBOOK
10. SECURITY MODEL
11. PERFORMANCE & SCALABILITY
12. CONTRIBUTING
13. CHANGELOG & ROADMAP

1. WHAT THIS IS (Executive Summary)

Problem: Modern Rust profiling is fragmented. Developers must juggle perf for CPU, heaptrack for memory, and custom instrumentation for function-level SLOs, often losing the "big picture" of how system metrics correlate with code execution.
Solution: RustScope provides a unified profiling CLI and a zero-cost instrumentation library. It captures CPU spikes, memory leaks, thread activity, function-level latencies, lock contention, and async task timing into a single, dashboard-ready JSON report.
Non-goals: This is not a debugger or a live production monitoring agent (e.g., DataDog). It is a developer-centric tool for performance auditing, regression testing, and micro-benchmarking.
Status: Beta. Current version 0.5.0. The JSON schema is stable for dashboard integrations. (v1.0 in progress with timeline correlation, export polish, and Linux CPU sampling).

2. IMPLEMENTED & TESTED FEATURES

Core Profiling (Library)

Function Instrumentation — Attribute macros for timing, memory, and stack depth.
- Status: ✅ Implemented + Tested
- Coverage: rustscope/tests/integration.rs::test_basic_profiling
- Contract: Guarantees < 50ns overhead per instrumented call.
- Macros: #[profile], #[profile_all] (entire mod/impl blocks), profile_scope!, profile_block!, profile_if!
Statistical Benchmarking — High-precision statistical runner with warmup.
- Status: ✅ Implemented + Tested
- Coverage: rustscope/tests/integration.rs::test_benchmark_runner
Outlier Detection — Online statistical anomaly flagging using Welford's algorithm.
- Status: ✅ Implemented + Tested
- Coverage: rustscope/tests/integration.rs::test_outlier_detection
- Config: Default 3σ threshold, customizable.
Latency Budgets — Per-function SLO with violation callbacks.
- Status: ✅ Implemented
- Usage: #[profile(budget_ns = 1_000_000)] with optional on_violation callback.
Timeline Events — Per-call NDJSON export with timestamps.
- Status: ✅ Implemented
- Output: Every function call with start/end timestamps for temporal analysis.
CPU Hardware Counters (Linux, hw-counters feature) — perf-event counters.
- Status: ✅ Implemented
- Metrics: cpu_cycles, instructions, cache_misses, ipc, branch_miss_rate.
Async Profiling (async-profiling feature) — Correct async timing via tracing-subscriber Layer.
- Status: ✅ Implemented
- Why: Traditional timing on async fns measures wall time (including yielded time), which is useless for CPU profiling.
Sampling Profiler (sampling feature) — SIGPROF-based zero-code-change profiling.
- Status: ✅ Implemented
- Default: 100Hz sampling, captures backtraces automatically.
Lock Profiling — Mutex/RwLock contention metrics.
- Status: ✅ Implemented
- Metrics: Contention counts, wait times, LockRecord in schema.
Thread Profiling — Per-thread performance breakdown.
- Status: ✅ Implemented
- API: thread_profile::ThreadProfiler::enable().
Session Metadata — Git commit, CI tags, custom key-value pairs.
- Status: ✅ Implemented
- Auto-detects: GitHub Actions, GitLab CI environment variables.
Export Formats — Multiple industry-standard outputs.
- Status: ✅ Implemented
- Formats: pprof (protobuf/gzip), Chrome Tracing (Perfetto), SpeedScope, collapsed stacks (Brendan Gregg format), CSV.
Helper Macros:
- profile_scope!(name) — Inline named scope profiling.
- profile_block!(expr) — Profile block expression, return value.
- assert_perf!(expr, threshold_ns) — Panic if performance threshold violated.
- profile_if!(condition, expr) — Conditional profiling.

System Analysis (CLI)

Indefinite Process Monitoring — Continuous polling of CPU, Memory, Threads, FDs, and syscalls.
- Status: ✅ Implemented
- Contract: Automatically stops and flushes data when the child process exits or on Ctrl-C.
- Default: -d 0 (indefinite until exit).
Session Event Detection — Real-time detection of CPU/Memory spikes.
- Status: ✅ Implemented
- Behavior: Records "spike" events in JSON when memory grows > 5MB or CPU > 80% between samples.
Live Terminal Dashboard — Multi-line in-place terminal UI.
- Status: ✅ Implemented
- Metrics: CPU, heap, threads, FDs, syscall rate, peak values, event pane.
- Modes: compact, full, slow (3000ms), fast (500ms) refresh.
- Controls: q quit, c toggle compact, s slow, f fast.
macOS Function Sampling — Sidecar integration with macOS sample command.
- Status: ✅ Implemented
- Contract: Chunked sampling with -mayDie flag, improved parser/fallback.
Linux Function Sampling — perf record + perf script parsing.
- Status: ✅ Implemented
- Note: Linux CPU metric in process summary is placeholder (0.0) — use perf sampling for CPU hotspots on Linux.
Process Metrics — Comprehensive system metrics collection.
- Status: ✅ Implemented
- Metrics: CPU%, heap MB, thread count, FD count, syscalls/sec.
Crate/Module Rollups — Project-level hotspot aggregation.
- Status: ✅ Implemented
- Output: crate_rollups and module_rollups in ProfileSession.
Hotspot Snapshots — Time-indexed function snapshots during session.
- Status: ✅ Implemented
- Use case: Correlate spikes with exact functions dominating that window.
Regression Comparison — Session-to-session diff with severity levels.
- Status: ✅ Implemented
- CLI: --compare-baseline, --compare-current, --threshold, --fail-on.
- Severity: any, minor, moderate, critical.
Sampling Diagnostics — Backend fidelity and symbolization quality.
- Status: ✅ Implemented
- Metrics: raw_samples, symbolized_samples, dropped_samples, backend name.
Attach Mode — Profile running processes via --pid.
- Status: ✅ Implemented
Cargo Integration — Build and profile with --cargo.
- Status: ✅ Implemented

3. ARCHITECTURE OVERVIEW

3.1 Component Diagram

3.2 Boundary Definitions

Boundary	Protocol	Auth mechanism	Failure mode	Retry strategy
CLI → Child Process	OS Signals (SIGINT/SIGTERM)	N/A (Process Owner)	Process Zombie	2s Graceful Wait then SIGKILL
CLI → /proc (Linux)	File I/O	FS Permissions	Permission Denied	Graceful Fallback (0.0 metrics)
CLI → libproc (macOS)	C FFI (`proc_pidinfo`)	N/A	Access Restricted	Fallback to `ps` command
CLI → `sample` (macOS)	Subprocess + stdout parse	N/A (Process Owner)	SIP/Permission Denied	Warning + continue without function samples
CLI → `perf` (Linux)	Subprocess + `perf script` parse	N/A (Process Owner)	perf not installed	Warning + fallback to process metrics
Library → SIGPROF	Signal handler	N/A	Signal blocked	Disable sampling gracefully

3.3 Architectural Decisions & Trade-offs

Decision: Welford's Online Algorithm for Outliers

Rationale: We need to detect anomalies in real-time without storing every single call duration in memory.
Theory cited: Welford's algorithm allows computing running mean and variance in $O(1)$ time and $O(1)$ space.
Trade-offs: More sensitive to early-session noise; requires a "warmup" period (default 10 calls).

Decision: Indefinite duration by default (-d 0)

Rationale: Most backend performance issues happen during specific session events (hitting a route), not fixed time windows.
Theory cited: Event-driven monitoring — decoupling collection from time improves signal-to-noise ratio for servers.
Anti-patterns avoided: "Blind profiling" where data collection stops before the interesting event occurs.

Decision: Async per-poll wrapping for #[profile]

Rationale: Traditional timing on async fns measures "Wall Time" (including time spent yielded), which is useless for CPU profiling.
Trade-offs: Slightly higher overhead due to future wrapping; requires async-profiling feature.

Decision: Canonical ProfileSession schema for both library and CLI

Rationale: Previously, library and CLI emitted different JSON shapes. Unified schema simplifies the frontend and enables compare mode.
Trade-offs: CLI adds optional process/session fields (process_summary, process_samples, etc.) to the same schema.

4. DETAILED FLOWS

4.1 Happy Path: Binary Profiling

4.2 Indefinite Backend Monitoring

When running with -d 0 (default), the CLI monitors for Session Events.

Scenario: User hits /heavy-route.
Detection: Memory jumps 10MB.
Action: CLI pushes a MemoryEvent { type: "spike", location: "Memory spike: +10.0 MB" }.
TUI: Terminal flashes [!] SPIKE and increments EVENTS count.

4.3 Error & Recovery Flows

Process Not Found: If proc_pidinfo or /proc reads fail repeatedly, the CLI assumes the child has exited, flushes remaining data, and shuts down cleanly.
macOS Permission Denied: If sample fails due to SIP/Permissions, the tool logs a warning but continues collecting system metrics (CPU/Mem).
Linux perf not installed: CLI warns and falls back to process metrics only (no function hotspots).

4.4 Regression Comparison Flow

5. DATA MODEL & SCHEMA

Entity-Relationship (JSON Shape)

The output is a single ProfileSession object:

Export Format Support

Format	Schema	Compatible With
RustScope JSON	`ProfileSession`	Native UI, CI compare
pprof (protobuf)	`pprof.proto`	`go tool pprof`, Pyroscope, Grafana Pyroscope
Chrome Tracing	`Trace Event Format`	`chrome://tracing`, Perfetto UI
SpeedScope	`speedscope.json`	speedscope.app
Collapsed stacks	Brendan Gregg format	`flamegraph.pl`, Inferno
CSV	Tabular	Excel, pandas, data analysis

6. VISUALIZER (UI/UX)

RustScope includes a premium, web-based visualizer built with Next.js, Tailwind CSS, and D3 for deep analysis of your performance profile sessions.

6.1 Key Features

Instant Visualization — Simply drag and drop any rustscope-last.json to generate high-resolution flamegraphs.
Multi-Format Support — Native support for RustScope JSON, plus compatibility with inferno, samply, and pprof stack traces.
Interactive Stack Explorer — Seamlessly toggle between Flamegraphs (top-down) and Icicle Charts (bottom-up) with smooth D3 transitions.
Search & Filtering — Instant search for function names and intelligent filtering to isolate your crate's logic from std or allocator overhead.
Smart Insights — Automated heuristic analysis flags critical bottlenecks, deep recursion, and memory-heavy hot paths.
Session View — Process metrics over time, metadata, event log, and spike-to-hotspot correlation.
Project View — Crate-level and module-level rollup analysis for whole-project profiling.
Compare View — Load baseline and current sessions, view regression diff table with severity (Critical/Moderate/Minor), summary cards.
Async View — Visualize async task poll times and timing breakdown.
Locks View — Lock contention visualization with wait times and contention counts.
Zones View — Custom profiling zones/regions for targeted analysis.
Timeline Canvas — Time-series visualization of CPU, memory, threads, FDs, syscalls with hotspot overlays.
Keyboard Shortcuts — Quick tab navigation (1-7), Escape to close panels, drag-to-resize sidebar.
Multi-Export — Download sessions in pprof, Chrome Tracing, SpeedScope, CSV formats.

6.2 Running Locally

The visualizer is a dedicated application located in the flamegraph-profiler/ directory.

cd flamegraph-profiler
npm install
npm run dev
# Dashboard available at http://localhost:3000

7. GETTING STARTED

7.1 Prerequisites

Rust: >= 1.75.0
OS: macOS (Intel/Apple Silicon) or Linux (x86_64).
macOS Tools: sample command (built-in).
Linux Tools: perf (optional, for function sampling).

7.2 Environment Setup

# 1. Clone & Build CLI
git clone https://github.com/anurag/rustscope && cd rustscope
cargo build --release -p rustscope-cli

# 2. Add to PATH
alias rustscope=$(pwd)/target/release/rustscope

7.3 Install from crates.io (CLI users)

cargo install rustscope-cli
rustscope --help

7.4 Library Integration

Add to your Cargo.toml:

[dependencies]
rustscope = "0.5.0"

[features]
rustscope = ["rustscope/hw-counters", "rustscope/async-profiling", "rustscope/sampling"]

Use in your code:

use rustscope::profile;

#[profile]
fn my_expensive_function() {
    // your code here
}

#[profile(budget_ns = 1_000_000)]
fn with_budget() {
    // panics if exceeds 1ms (with default on_violation)
}

7.5 CLI Quick Usage

# Check backend/sampler readiness
rustscope --doctor

# Profile a cargo binary for 5 seconds
rustscope --cargo rustscope-examples --bin advanced_demo -d 5 -o rustscope-last.json

# Profile an already-built binary until exit / Ctrl-C
rustscope -- ./target/release/advanced_demo

# Attach to a running process
rustscope --pid 12345 --name api-server -d 30 -o api-profile.json

# Live dashboard with verbose output
rustscope -v --cargo rustscope-examples --bin stress_demo

# Compare two profile sessions and fail on critical regressions
rustscope --compare-baseline baseline.json --compare-current current.json --threshold 10 --fail-on critical

7.6 Environment Variables

Variable	Required	Default	Description
`RUSTSCOPE_ALLOC_PIPE`	❌	—	Named pipe for LD_PRELOAD tracking (Linux only, v0.4.0 planned)
`VERBOSE`	❌	`false`	Enable library-level internal logging

8. TESTING

Test Architecture

rustscope/
├── tests/
│   └── integration.rs   # Core library logic (Macros, Outliers, Stats, Timeline)
rustscope-cli/
    └── src/profiler/    # Tested via demo runs (stress_demo)

Running Tests

# Core Library Tests
cargo test -p rustscope

# Run CLI Demo (End-to-End Verification)
./target/release/rustscope -v --cargo rustscope-examples --bin advanced_demo -d 5

# Run stress demo for realistic profiling
./target/release/rustscope -v --cargo rustscope-examples --bin stress_demo

# Compare two sessions (CI regression test)
./target/release/rustscope --compare-baseline base.json --compare-current curr.json --fail-on critical

9. OPERATIONAL RUNBOOK

Scenario: Metrics report 0.0 values

Symptoms: JSON has 0.0 for CPU/Heap.
Diagnosis: Check if process exited instantly or if running on an unsupported OS.
Remediation: Use -v to check live terminal metrics. Ensure target binary is prefixed with ./.

Scenario: Linux CPU is 0.0 in process summary

Symptoms: process_summary.cpu_pct shows 0.0.
Diagnosis: Linux CPU collection in CLI is placeholder — use perf sampling for function hotspots.
Remediation: Install linux-perf and ensure perf record is available; function hotspots will still work via perf script parsing.

Scenario: JSON file too large

Symptoms: File > 100MB for long runs.
Remediation: Reduce --sample-rate (default 100Hz) to 10Hz, or use library instrumentation only.

Scenario: macOS sample fails with permission error

Symptoms: No function hotspots on macOS.
Remediation: Grant Terminal "Full Disk Access" in System Preferences → Security & Privacy, or run with -v to see warnings.

10. SECURITY MODEL

Auth: None. Designed for local development.
Secrets: CLI does not capture environment variables of the target process by default.
Data in Transit: All communication between CLI and Target is via local OS pipes/signals.
Input Validation: JSON parser validates schema version and required fields before processing.

11. PERFORMANCE & SCALABILITY

Throughput: Designed to handle binaries with 100,000+ function calls per second.
Bottlenecks: High sample rates (> 1000Hz) will introduce measurable OS interrupt overhead.
Scaling: The library scales horizontally with threads using AtomicU64 for metric aggregation.
Overhead: < 50ns per #[profile] call (when compiled with optimizations).

12. CONTRIBUTING

Branches: feature/* or fix/*.
Commits: Conventional Commits preferred.
PRs: Must include a test case in integration.rs for new library features.
CLI Changes: Must update both rustscope-cli and flamegraph-profiler if schema changes.

13. CHANGELOG & ROADMAP

v0.5.0 (Current): Unified Session Profiling, Rollups, and Regression Workflow

Canonical CLI Output: rustscope-cli now emits the same ProfileSession schema as the library.
Session Profiling: Attach mode (--pid), indefinite duration, reliable flush on shutdown.
Terminal Dashboard: Multi-metric live UI with refresh modes and controls.
Project Rollups: Crate-level and module-level hotspot aggregation.
Frontend Views: Session, Project, Compare, Async, Locks, Zones tabs.
Regression Workflow: Baseline vs current comparison with severity levels for CI gating.
Sampler Improvements: macOS chunked sample, Linux perf record + perf script parsing.
Stress Demo: Realistic workload (CPU bursts, memory spikes, thread/FD/syscall churn).

v0.6.0 (Next): Timeline Correlation & Export Polish

Time-Correlated Session Analysis: Click a spike → see exact time window → inspect dominating crate/module/function.
True Timeline/Session View: CPU/memory/FD/thread/syscall lanes with hotspot overlays and event markers.
Export Polish: Stabilize pprof, Chrome Tracing, SpeedScope exports with complete feature parity.
Linux CPU Collection: Replace placeholder with real /proc/stat or perf integration for process summary.

v1.0 (Target): Production-Ready Profiler

Reliable Hotspot Capture: macOS symbol extraction improvements, Linux perf symbolization maturity.
One Stable Schema: Freeze ProfileSession schema version, remove legacy compatibility branches.
CI/CD Integration: GitHub Actions / GitLab CI templates for performance regression gating.
Documentation: Complete API docs, tutorial guides, video walkthroughs.
Release Stability: Semantic versioning, deprecation policy, migration guides.

License: MIT
Acknowledgements: Inspired by perf, dtrace, pprof, flamegraph, and the parking_lot community.