Roofline Model Performance Profiler for ARM Chips

The fastest ARM-native roofline profiler. For embedded firmware, ML inference, and HPC.

Book a Demo

Platform Support Matrix

Phase 1 platforms available now. Phase 2 Cortex-A platforms coming Q2 2026.

Chip Family Platform
ARM Cortex-M

STM32H5XX

Under Development

STM32F4/H7

Under Development

nRF52840

Request
ARM Cortex-A

Raspberry Pi 4

Available

Raspberry Pi 5

Available

NVIDIA Jetson Orin

Request

Ampere Altra

Request

AWS Graviton 3

Request

Built for Your Workflow

Whether you're optimizing firmware, deploying ML models, or pushing HPC boundaries, our library integrates into your development cycle.

Firmware Performance

Profile crypto, sensor, and control-loop code ARM Cortex-M and Cortex-A chips. Pinpoint cache bottlenecks in production firmware before shipping.

ML Inference

Measure real memory bandwidth and cache behavior when deploying to Jetson Orin, Ampere Altra, or AWS Graviton. Identify performance bottlenecks in one sprint.

HPC Benchmarking

Validate scaling behavior and roofline performance across ARM architectures.

How It Works

1

Install

Add our library to your project dependencies.

2

Profile Code

Wrap your measurement target functions with SABUESO_START/STOP macros.

3

Analyze Bottlenecks

Review cycle counts, cache miss rates, and roofline plots to identify where performance is left on the table.

4

Optimize

A/B test optimizations, measure impact, iterate toward your performance goal.

#include "sabueso_arm_a76_rpi5.h"

// Initialize the board and output sink
sabueso_backend_init(&ctx);
SabuesoOutputBuffer buf = ...;

// Profile the given function
SABUESO_MEASURE(&buf, your_function(...));

// Alternatively, you can 
// explicitly annotate the region 
SABUESO_START();
your_function(...);
SABUESO_STOP(&buf);

Roofline Model Output

Sabueso generates a roofline plot showing where your code sits relative to compute and memory bandwidth ceilings.

Roofline model output

Book a Demo