RustScope v0.5.0

Unified performance profiling infrastructure for Rust systems. One command, one JSON, zero configuration.

Build Status Coverage License Version


TABLE OF CONTENTS


1. WHAT THIS IS (Executive Summary)

  • Problem: Modern Rust profiling is fragmented. Developers must juggle perf for CPU, heaptrack for memory, and custom instrumentation for function-level SLOs, often losing the "big picture" of how system metrics correlate with code execution.
  • Solution: RustScope provides a unified profiling CLI and a zero-cost instrumentation library. It captures CPU spikes, memory leaks, thread activity, function-level latencies, lock contention, and async task timing into a single, dashboard-ready JSON report.
  • Non-goals: This is not a debugger or a live production monitoring agent (e.g., DataDog). It is a developer-centric tool for performance auditing, regression testing, and micro-benchmarking.
  • Status: Beta. Current version 0.5.0. The JSON schema is stable for dashboard integrations. (v1.0 in progress with timeline correlation, export polish, and Linux CPU sampling).

2. IMPLEMENTED & TESTED FEATURES

Core Profiling (Library)

  • Function Instrumentation — Attribute macros for timing, memory, and stack depth.

    • Status: ✅ Implemented + Tested
    • Coverage: rustscope/tests/integration.rs::test_basic_profiling
    • Contract: Guarantees < 50ns overhead per instrumented call.
    • Macros: #[profile], #[profile_all] (entire mod/impl blocks), profile_scope!, profile_block!, profile_if!
  • Statistical Benchmarking — High-precision statistical runner with warmup.

    • Status: ✅ Implemented + Tested
    • Coverage: rustscope/tests/integration.rs::test_benchmark_runner
  • Outlier Detection — Online statistical anomaly flagging using Welford's algorithm.

    • Status: ✅ Implemented + Tested
    • Coverage: rustscope/tests/integration.rs::test_outlier_detection
    • Config: Default 3σ threshold, customizable.
  • Latency Budgets — Per-function SLO with violation callbacks.

    • Status: ✅ Implemented
    • Usage: #[profile(budget_ns = 1_000_000)] with optional on_violation callback.
  • Timeline Events — Per-call NDJSON export with timestamps.

    • Status: ✅ Implemented
    • Output: Every function call with start/end timestamps for temporal analysis.
  • CPU Hardware Counters (Linux, hw-counters feature) — perf-event counters.

    • Status: ✅ Implemented
    • Metrics: cpu_cycles, instructions, cache_misses, ipc, branch_miss_rate.
  • Async Profiling (async-profiling feature) — Correct async timing via tracing-subscriber Layer.

    • Status: ✅ Implemented
    • Why: Traditional timing on async fns measures wall time (including yielded time), which is useless for CPU profiling.
  • Sampling Profiler (sampling feature) — SIGPROF-based zero-code-change profiling.

    • Status: ✅ Implemented
    • Default: 100Hz sampling, captures backtraces automatically.
  • Lock Profiling — Mutex/RwLock contention metrics.

    • Status: ✅ Implemented
    • Metrics: Contention counts, wait times, LockRecord in schema.
  • Thread Profiling — Per-thread performance breakdown.

    • Status: ✅ Implemented
    • API: thread_profile::ThreadProfiler::enable().
  • Session Metadata — Git commit, CI tags, custom key-value pairs.

    • Status: ✅ Implemented
    • Auto-detects: GitHub Actions, GitLab CI environment variables.
  • Export Formats — Multiple industry-standard outputs.

    • Status: ✅ Implemented
    • Formats: pprof (protobuf/gzip), Chrome Tracing (Perfetto), SpeedScope, collapsed stacks (Brendan Gregg format), CSV.
  • Helper Macros:

    • profile_scope!(name) — Inline named scope profiling.
    • profile_block!(expr) — Profile block expression, return value.
    • assert_perf!(expr, threshold_ns) — Panic if performance threshold violated.
    • profile_if!(condition, expr) — Conditional profiling.

System Analysis (CLI)

  • Indefinite Process Monitoring — Continuous polling of CPU, Memory, Threads, FDs, and syscalls.

    • Status: ✅ Implemented
    • Contract: Automatically stops and flushes data when the child process exits or on Ctrl-C.
    • Default: -d 0 (indefinite until exit).
  • Session Event Detection — Real-time detection of CPU/Memory spikes.

    • Status: ✅ Implemented
    • Behavior: Records "spike" events in JSON when memory grows > 5MB or CPU > 80% between samples.
  • Live Terminal Dashboard — Multi-line in-place terminal UI.

    • Status: ✅ Implemented
    • Metrics: CPU, heap, threads, FDs, syscall rate, peak values, event pane.
    • Modes: compact, full, slow (3000ms), fast (500ms) refresh.
    • Controls: q quit, c toggle compact, s slow, f fast.
  • macOS Function Sampling — Sidecar integration with macOS sample command.

    • Status: ✅ Implemented
    • Contract: Chunked sampling with -mayDie flag, improved parser/fallback.
  • Linux Function Samplingperf record + perf script parsing.

    • Status: ✅ Implemented
    • Note: Linux CPU metric in process summary is placeholder (0.0) — use perf sampling for CPU hotspots on Linux.
  • Process Metrics — Comprehensive system metrics collection.

    • Status: ✅ Implemented
    • Metrics: CPU%, heap MB, thread count, FD count, syscalls/sec.
  • Crate/Module Rollups — Project-level hotspot aggregation.

    • Status: ✅ Implemented
    • Output: crate_rollups and module_rollups in ProfileSession.
  • Hotspot Snapshots — Time-indexed function snapshots during session.

    • Status: ✅ Implemented
    • Use case: Correlate spikes with exact functions dominating that window.
  • Regression Comparison — Session-to-session diff with severity levels.

    • Status: ✅ Implemented
    • CLI: --compare-baseline, --compare-current, --threshold, --fail-on.
    • Severity: any, minor, moderate, critical.
  • Sampling Diagnostics — Backend fidelity and symbolization quality.

    • Status: ✅ Implemented
    • Metrics: raw_samples, symbolized_samples, dropped_samples, backend name.
  • Attach Mode — Profile running processes via --pid.

    • Status: ✅ Implemented
  • Cargo Integration — Build and profile with --cargo.

    • Status: ✅ Implemented

3. ARCHITECTURE OVERVIEW

3.1 Component Diagram

3.2 Boundary Definitions

BoundaryProtocolAuth mechanismFailure modeRetry strategy
CLI → Child ProcessOS Signals (SIGINT/SIGTERM)N/A (Process Owner)Process Zombie2s Graceful Wait then SIGKILL
CLI → /proc (Linux)File I/OFS PermissionsPermission DeniedGraceful Fallback (0.0 metrics)
CLI → libproc (macOS)C FFI (proc_pidinfo)N/AAccess RestrictedFallback to ps command
CLI → sample (macOS)Subprocess + stdout parseN/A (Process Owner)SIP/Permission DeniedWarning + continue without function samples
CLI → perf (Linux)Subprocess + perf script parseN/A (Process Owner)perf not installedWarning + fallback to process metrics
Library → SIGPROFSignal handlerN/ASignal blockedDisable sampling gracefully

3.3 Architectural Decisions & Trade-offs

Decision: Welford's Online Algorithm for Outliers

  • Rationale: We need to detect anomalies in real-time without storing every single call duration in memory.
  • Theory cited: Welford's algorithm allows computing running mean and variance in $O(1)$ time and $O(1)$ space.
  • Trade-offs: More sensitive to early-session noise; requires a "warmup" period (default 10 calls).

Decision: Indefinite duration by default (-d 0)

  • Rationale: Most backend performance issues happen during specific session events (hitting a route), not fixed time windows.
  • Theory cited: Event-driven monitoring — decoupling collection from time improves signal-to-noise ratio for servers.
  • Anti-patterns avoided: "Blind profiling" where data collection stops before the interesting event occurs.

Decision: Async per-poll wrapping for #[profile]

  • Rationale: Traditional timing on async fns measures "Wall Time" (including time spent yielded), which is useless for CPU profiling.
  • Trade-offs: Slightly higher overhead due to future wrapping; requires async-profiling feature.

Decision: Canonical ProfileSession schema for both library and CLI

  • Rationale: Previously, library and CLI emitted different JSON shapes. Unified schema simplifies the frontend and enables compare mode.
  • Trade-offs: CLI adds optional process/session fields (process_summary, process_samples, etc.) to the same schema.

4. DETAILED FLOWS

4.1 Happy Path: Binary Profiling

4.2 Indefinite Backend Monitoring

When running with -d 0 (default), the CLI monitors for Session Events.

  • Scenario: User hits /heavy-route.
  • Detection: Memory jumps 10MB.
  • Action: CLI pushes a MemoryEvent { type: "spike", location: "Memory spike: +10.0 MB" }.
  • TUI: Terminal flashes [!] SPIKE and increments EVENTS count.

4.3 Error & Recovery Flows

  • Process Not Found: If proc_pidinfo or /proc reads fail repeatedly, the CLI assumes the child has exited, flushes remaining data, and shuts down cleanly.
  • macOS Permission Denied: If sample fails due to SIP/Permissions, the tool logs a warning but continues collecting system metrics (CPU/Mem).
  • Linux perf not installed: CLI warns and falls back to process metrics only (no function hotspots).

4.4 Regression Comparison Flow


5. DATA MODEL & SCHEMA

Entity-Relationship (JSON Shape)

The output is a single ProfileSession object:

Export Format Support

FormatSchemaCompatible With
RustScope JSONProfileSessionNative UI, CI compare
pprof (protobuf)pprof.protogo tool pprof, Pyroscope, Grafana Pyroscope
Chrome TracingTrace Event Formatchrome://tracing, Perfetto UI
SpeedScopespeedscope.jsonspeedscope.app
Collapsed stacksBrendan Gregg formatflamegraph.pl, Inferno
CSVTabularExcel, pandas, data analysis

6. VISUALIZER (UI/UX)

RustScope includes a premium, web-based visualizer built with Next.js, Tailwind CSS, and D3 for deep analysis of your performance profile sessions.

6.1 Key Features

  • Instant Visualization — Simply drag and drop any rustscope-last.json to generate high-resolution flamegraphs.
  • Multi-Format Support — Native support for RustScope JSON, plus compatibility with inferno, samply, and pprof stack traces.
  • Interactive Stack Explorer — Seamlessly toggle between Flamegraphs (top-down) and Icicle Charts (bottom-up) with smooth D3 transitions.
  • Search & Filtering — Instant search for function names and intelligent filtering to isolate your crate's logic from std or allocator overhead.
  • Smart Insights — Automated heuristic analysis flags critical bottlenecks, deep recursion, and memory-heavy hot paths.
  • Session View — Process metrics over time, metadata, event log, and spike-to-hotspot correlation.
  • Project View — Crate-level and module-level rollup analysis for whole-project profiling.
  • Compare View — Load baseline and current sessions, view regression diff table with severity (Critical/Moderate/Minor), summary cards.
  • Async View — Visualize async task poll times and timing breakdown.
  • Locks View — Lock contention visualization with wait times and contention counts.
  • Zones View — Custom profiling zones/regions for targeted analysis.
  • Timeline Canvas — Time-series visualization of CPU, memory, threads, FDs, syscalls with hotspot overlays.
  • Keyboard Shortcuts — Quick tab navigation (1-7), Escape to close panels, drag-to-resize sidebar.
  • Multi-Export — Download sessions in pprof, Chrome Tracing, SpeedScope, CSV formats.

6.2 Running Locally

The visualizer is a dedicated application located in the flamegraph-profiler/ directory.

cd flamegraph-profiler npm install npm run dev # Dashboard available at http://localhost:3000

7. GETTING STARTED

7.1 Prerequisites

  • Rust: >= 1.75.0
  • OS: macOS (Intel/Apple Silicon) or Linux (x86_64).
  • macOS Tools: sample command (built-in).
  • Linux Tools: perf (optional, for function sampling).

7.2 Environment Setup

# 1. Clone & Build CLI git clone https://github.com/anurag/rustscope && cd rustscope cargo build --release -p rustscope-cli # 2. Add to PATH alias rustscope=$(pwd)/target/release/rustscope

7.3 Install from crates.io (CLI users)

cargo install rustscope-cli rustscope --help

7.4 Library Integration

Add to your Cargo.toml:

[dependencies] rustscope = "0.5.0" [features] rustscope = ["rustscope/hw-counters", "rustscope/async-profiling", "rustscope/sampling"]

Use in your code:

use rustscope::profile; #[profile] fn my_expensive_function() { // your code here } #[profile(budget_ns = 1_000_000)] fn with_budget() { // panics if exceeds 1ms (with default on_violation) }

7.5 CLI Quick Usage

# Check backend/sampler readiness rustscope --doctor # Profile a cargo binary for 5 seconds rustscope --cargo rustscope-examples --bin advanced_demo -d 5 -o rustscope-last.json # Profile an already-built binary until exit / Ctrl-C rustscope -- ./target/release/advanced_demo # Attach to a running process rustscope --pid 12345 --name api-server -d 30 -o api-profile.json # Live dashboard with verbose output rustscope -v --cargo rustscope-examples --bin stress_demo # Compare two profile sessions and fail on critical regressions rustscope --compare-baseline baseline.json --compare-current current.json --threshold 10 --fail-on critical

7.6 Environment Variables

VariableRequiredDefaultDescription
RUSTSCOPE_ALLOC_PIPENamed pipe for LD_PRELOAD tracking (Linux only, v0.4.0 planned)
VERBOSEfalseEnable library-level internal logging

8. TESTING

Test Architecture

rustscope/ ├── tests/ │ └── integration.rs # Core library logic (Macros, Outliers, Stats, Timeline) rustscope-cli/ └── src/profiler/ # Tested via demo runs (stress_demo)

Running Tests

# Core Library Tests cargo test -p rustscope # Run CLI Demo (End-to-End Verification) ./target/release/rustscope -v --cargo rustscope-examples --bin advanced_demo -d 5 # Run stress demo for realistic profiling ./target/release/rustscope -v --cargo rustscope-examples --bin stress_demo # Compare two sessions (CI regression test) ./target/release/rustscope --compare-baseline base.json --compare-current curr.json --fail-on critical

9. OPERATIONAL RUNBOOK

Scenario: Metrics report 0.0 values

  • Symptoms: JSON has 0.0 for CPU/Heap.
  • Diagnosis: Check if process exited instantly or if running on an unsupported OS.
  • Remediation: Use -v to check live terminal metrics. Ensure target binary is prefixed with ./.

Scenario: Linux CPU is 0.0 in process summary

  • Symptoms: process_summary.cpu_pct shows 0.0.
  • Diagnosis: Linux CPU collection in CLI is placeholder — use perf sampling for function hotspots.
  • Remediation: Install linux-perf and ensure perf record is available; function hotspots will still work via perf script parsing.

Scenario: JSON file too large

  • Symptoms: File > 100MB for long runs.
  • Remediation: Reduce --sample-rate (default 100Hz) to 10Hz, or use library instrumentation only.

Scenario: macOS sample fails with permission error

  • Symptoms: No function hotspots on macOS.
  • Remediation: Grant Terminal "Full Disk Access" in System Preferences → Security & Privacy, or run with -v to see warnings.

10. SECURITY MODEL

  • Auth: None. Designed for local development.
  • Secrets: CLI does not capture environment variables of the target process by default.
  • Data in Transit: All communication between CLI and Target is via local OS pipes/signals.
  • Input Validation: JSON parser validates schema version and required fields before processing.

11. PERFORMANCE & SCALABILITY

  • Throughput: Designed to handle binaries with 100,000+ function calls per second.
  • Bottlenecks: High sample rates (> 1000Hz) will introduce measurable OS interrupt overhead.
  • Scaling: The library scales horizontally with threads using AtomicU64 for metric aggregation.
  • Overhead: < 50ns per #[profile] call (when compiled with optimizations).

12. CONTRIBUTING

  • Branches: feature/* or fix/*.
  • Commits: Conventional Commits preferred.
  • PRs: Must include a test case in integration.rs for new library features.
  • CLI Changes: Must update both rustscope-cli and flamegraph-profiler if schema changes.

13. CHANGELOG & ROADMAP

v0.5.0 (Current): Unified Session Profiling, Rollups, and Regression Workflow

  • Canonical CLI Output: rustscope-cli now emits the same ProfileSession schema as the library.
  • Session Profiling: Attach mode (--pid), indefinite duration, reliable flush on shutdown.
  • Terminal Dashboard: Multi-metric live UI with refresh modes and controls.
  • Project Rollups: Crate-level and module-level hotspot aggregation.
  • Frontend Views: Session, Project, Compare, Async, Locks, Zones tabs.
  • Regression Workflow: Baseline vs current comparison with severity levels for CI gating.
  • Sampler Improvements: macOS chunked sample, Linux perf record + perf script parsing.
  • Stress Demo: Realistic workload (CPU bursts, memory spikes, thread/FD/syscall churn).

v0.6.0 (Next): Timeline Correlation & Export Polish

  • Time-Correlated Session Analysis: Click a spike → see exact time window → inspect dominating crate/module/function.
  • True Timeline/Session View: CPU/memory/FD/thread/syscall lanes with hotspot overlays and event markers.
  • Export Polish: Stabilize pprof, Chrome Tracing, SpeedScope exports with complete feature parity.
  • Linux CPU Collection: Replace placeholder with real /proc/stat or perf integration for process summary.

v1.0 (Target): Production-Ready Profiler

  • Reliable Hotspot Capture: macOS symbol extraction improvements, Linux perf symbolization maturity.
  • One Stable Schema: Freeze ProfileSession schema version, remove legacy compatibility branches.
  • CI/CD Integration: GitHub Actions / GitLab CI templates for performance regression gating.
  • Documentation: Complete API docs, tutorial guides, video walkthroughs.
  • Release Stability: Semantic versioning, deprecation policy, migration guides.

License: MIT
Acknowledgements: Inspired by perf, dtrace, pprof, flamegraph, and the parking_lot community.