Fluxcore flash a model →

embedded edge-AI · runtime + compiler + board

Solder an AI model
onto a microcontroller.

Fluxcore takes a trained model and lowers it to bare-metal C that runs on parts you can hold under a loupe — Cortex-M0+, RISC-V, ESP32. No Linux, no accelerator, no cloud round-trip. Kilobytes of RAM. Milliwatts of power. Inference in microseconds.

  • 11 KBpeak SRAM, keyword model
  • 312 µsinference @ 64 MHz M4
  • 0.9 mWavg, duty-cycled
  • 0 depsno RTOS, no malloc
1k0 pull-up · joint reflowed at 280°C · probe: 3.301 V
// runtime

A compiler that thinks in bytes, not GPUs.

Hand it a quantized model; get back a static .a and a header. Fluxcore plans every tensor into a fixed arena at build time — no allocator runs on the device, ever.

flux@bench — kws.tflite
$ flux compile kws.tflite --target cortex-m4f --arena 16k
 parsing graph ............ 14 ops, 9 tensors
 quantize ................. int8, per-channel
 planning arena ........... 11.0 KB / 16 KB ok
 emitting kws_model.c ..... 41 KB flash
 emitting kws_model.h
 built in 0.42s — 312 µs / inference @ 64 MHz

$ flux flash --board flx-01 --port /dev/ttyACM0
 erasing ... writing 41216 B ... verified
 running. say "fluxcore" to wake.

Static arena planner

Every buffer is sized and placed at compile time. Zero heap, zero fragmentation, deterministic worst-case RAM you can print on the datasheet.

int8 / per-channel quant

Post-training or QAT. Bring a .tflite, ONNX, or our own .flx graph. Fixed-point kernels hand-tuned for CMSIS-NN and the RISC-V P-ext.

One header to ship

#include "model.h", call flx_invoke(). No RTOS dependency, MISRA-clean output, reproducible builds keyed by graph hash.

// supported silicon

If it has a few kilobytes, it can think.

Forty-plus parts across three ISAs in the support matrix. Kernels fall back to portable C so an unlisted part still runs — just slower.

MCUCoreSRAMClockKWS latencyAccel
STM32G0B1Cortex-M0+144 KB64 MHz1.9 ms
STM32U585Cortex-M33786 KB160 MHz148 µsHelium
nRF52840Cortex-M4F256 KB64 MHz312 µsDSP
ESP32-S3Xtensa LX7512 KB240 MHz96 µsSIMD
CH32V307RISC-V RV3264 KB144 MHz540 µsP-ext
RP2040dual M0+264 KB133 MHz880 µsPIO
vision

person-detect 96×96

  • RAM 64 KB
  • Flash 213 KB
  • Lat. 41 ms @ M4F
audio

keyword spotting

  • RAM 11 KB
  • Flash 41 KB
  • Lat. 312 µs
sensor

fall-detect IMU

  • RAM 3.2 KB
  • Flash 9 KB
  • Lat. 28 µs
industrial

bearing anomaly

  • RAM 7.5 KB
  • Flash 22 KB
  • Lat. 140 µs
// hardware

FLX-01 — the bench board.

An open-hardware dev board built to be probed. Castellated edges, every rail on a labelled test point, a mic + IMU + camera header so you can flash a model and watch it fire in one sitting.

MCU
nRF52840 · Cortex-M4F @ 64 MHz · 1 MB flash / 256 KB RAM
Sensors
PDM mic · 6-axis IMU · 0.3 MP camera header
Power
USB-C or LiPo · 0.9 mW duty-cycled inference
I/O
23 castellated GPIO · SWD · 4 labelled test points
On board
RGB status LED · user button · 4 MB QSPI flash
License
CERN-OHL-S · KiCad sources in the repo
idle — board connected on ttyACM0
// the bench

Shipped from the workbench.

// from the field

Makers and plant engineers, same toolchain.

“I had wake-word running on a coin cell over a weekend. The arena planner told me exactly how much RAM I had left — no guessing, no crashes at 3am.”

— Priya N., hardware hacker, solders-on-sundays.dev

“We retrofitted 400 motors with bearing-anomaly detection on M0+ parts we already stocked. No gateway, no cloud bill. Latency is 140 µs and the audit liked the determinism.”

— Marcus T., reliability lead, Kessler Drives GmbH

“The FLX-01 is the first dev board I didn't want to hide in a project box. Test points everywhere. It belongs on the bench, under the loupe.”