Roofline Model Performance Profiler for ARM Chips
The fastest ARM-native roofline profiler. For embedded firmware, ML inference, and HPC.
Book a Demo
Platform Support Matrix
Phase 1 platforms available now. Phase 2 Cortex-A platforms coming Q2 2026.
| Chip Family | Platform |
|---|---|
| ARM Cortex-M | STM32H5XX Under DevelopmentSTM32F4/H7 Under DevelopmentnRF52840 Request |
| ARM Cortex-A | Raspberry Pi 4 AvailableRaspberry Pi 5 AvailableNVIDIA Jetson Orin RequestAmpere Altra RequestAWS Graviton 3 Request |
Built for Your Workflow
Whether you're optimizing firmware, deploying ML models, or pushing HPC boundaries, our library integrates into your development cycle.
Firmware Performance
Profile crypto, sensor, and control-loop code ARM Cortex-M and Cortex-A chips. Pinpoint cache bottlenecks in production firmware before shipping.
ML Inference
Measure real memory bandwidth and cache behavior when deploying to Jetson Orin, Ampere Altra, or AWS Graviton. Identify performance bottlenecks in one sprint.
HPC Benchmarking
Validate scaling behavior and roofline performance across ARM architectures.
How It Works
Install
Add our library to your project dependencies.
Profile Code
Wrap your measurement target functions with SABUESO_START/STOP macros.
Analyze Bottlenecks
Review cycle counts, cache miss rates, and roofline plots to identify where performance is left on the table.
Optimize
A/B test optimizations, measure impact, iterate toward your performance goal.
#include "sabueso_arm_a76_rpi5.h"
// Initialize the board and output sink
sabueso_backend_init(&ctx);
SabuesoOutputBuffer buf = ...;
// Profile the given function
SABUESO_MEASURE(&buf, your_function(...));
// Alternatively, you can
// explicitly annotate the region
SABUESO_START();
your_function(...);
SABUESO_STOP(&buf);
Roofline Model Output
Sabueso generates a roofline plot showing where your code sits relative to compute and memory bandwidth ceilings.