ELASTIC: Efficient Once For All Iterative Search for Object Detection on Microcontrollers

Tony Tran 1, Qin Lin 2,3, Bin Hu *,2,3
1Cullen College of Engineering Research Computing at University of Houston 2Department of Electrical and Computer Engineering at University of Houston 3Department of Engineering Technology at University of Houston
Cover Image

Abstract

Deploying high-performance object detectors on TinyML platforms poses significant challenges due to tight hardware constraints and the modular complexity of modern detection pipelines. Neural Architecture Search (NAS) offers a path toward automation, but existing methods either restrict optimization to individual modules, sacrificing cross-module synergy, or require global searches that are computationally intractable.

We propose ELASTIC (Efficient Once for All Iterative Search for Object Detection on Microcontrollers), a unified, hardware-aware NAS framework that alternates optimization across modules (e.g., backbone, neck, and head) in a cyclic fashion. ELASTIC introduces a novel Population Passthrough mechanism in evolutionary search that retains high-quality candidates between search stages, yielding faster convergence, up to an 8% final mAP gain, and eliminates search instability observed without population passthrough.

In a controlled comparison, empirical results show ELASTIC achieves +4.75% higher mAP and 2× faster convergence than progressive NAS strategies on SVHN, and delivers a +9.09% mAP improvement on PascalVOC given the same search budget. ELASTIC achieves 72.3% mAP on PascalVOC, outperforming MCUNET by 20.9% and TinyissimoYOLO by 16.3%. When deployed on MAX78000/MAX78002 microcontrollers, ELASTIC-derived models outperform Analog Devices’ TinySSD baselines, reducing energy by up to 71.6%, lowering latency by up to 2.4×, and improving mAP by up to 6.99 percentage points across multiple datasets.

Overview of ELASTIC Framework

Block Diagram

Our method begins with a pretrained supernet and performs iterative neural architecture search by alternating optimization between the backbone and head. The Population Passthrough mechanism ensures continuity by retaining top-performing candidates across module alternations.

Quantitative Comparison of Search Strategies

Quantitative comparison of search strategies on the SVHN and PascalVOC subset. ELASTIC consistently achieves the highest mAP on both datasets and reduces GPU hours by 59% on SVHN.
Dataset Method mAP ↑mAP Cost MACs Params Latency
SVHN Backbone-Only 75.53% +0.18% 10.8 hrs 137.6 M 0.73 M 2.62 ms
Head-Only 75.35% +0.00% 4.7 hrs 475.3 M 1.01 M 2.69 ms
Progressive 79.62% +4.27% 30.8 hrs 78.4 M 0.52 M 2.36 ms
ELASTIC (OURS) 80.09% +4.74% 12.5 hrs 105.8 M 0.61 M 2.34 ms
PascalVOC Backbone-Only 22.02% +2.80% 9.6 hrs 436.1 M 0.58 M 2.11 ms
Head-Only 19.22% +0.00% 6.0 hrs 240.0 M 0.36 M 2.48 ms
Progressive 21.74% +2.52% 14.8 hrs 540.0 M 0.41 M 2.66 ms
ELASTIC (OURS) 30.83% +11.61% 14.65 hrs 642.8 M 0.57 M 2.00 ms

Notes: “MACs” denotes multiply–accumulate operations; “Cost” refers to total GPU hours required during search. Highlighted rows mark ELASTIC’s results for each dataset.

Comparison of ELASTIC Framework

Comparison of ELASTIC with TinyissimoYOLO and MCUNET on the full PascalVOC dataset, considering all object classes and counts. ELASTIC achieves a 20.9% mAP boost over MCUNET, 4.0% over MCUNetV2, and a 16.3% over TinyissimoYOLO’s best performing model, while discovering a model with significantly fewer MACs, enabling faster inference on microcontrollers.
Method MACs ↓MACs Params VOC mAP ↑mAP
TY: 20-3-88 32M 90.7% 0.58M 53% +30%
TY: 20-7-88 44M 87.2% 0.58M 47% +24%
TY: 20-3-112 54M 84.3% 0.89M 56% +33%
TY: 20-7-112 70M 79.6% 0.91M 53% +30%
TY: 20-3-224 218M 36.4% 3.34M 23% +0%
MCUNet 168M 51.0% 1.2M 51.4% +28.4%
MCUNetV2-M4 172M 49.9% 0.47M 64.6% +41.6%
MCUNetV2-H7 343M 0% 0.67M 68.3% +45.3%
ELASTIC (OURS) 86M 74.9% 1.36M 72.3% +49.3%

Notes: “MACs” denotes multiply–accumulate operations (lower is better). “↓MACs” and “↑mAP” indicate relative change w.r.t. the MCUNetV2-H7 reference.

Search Space Refinement & Evolution

Search space refinement distributions
Search space refinement through ELASTIC iteration. Distributions of 100 randomly sampled architectures from three head search spaces—joint NAS, progressive head-only, and ELASTIC-refined—on SVHN (left) and PascalVOC (right). ELASTIC produces architectures with a mean mAP improvement of +4.87% on SVHN and +0.43% on PascalVOC over progressive search. Compared to global joint search, ELASTIC improves the mAP by 26.07% and 23.33% on SVHN and PascalVOC, respectively.
Search space evolution over iterations
Search Space Evolution Over Iterations. Mean accuracy and distribution of randomly sampled architectures at iterations 0, 3, and 5. On SVHN, ELASTIC continues to improve mean mAP by +3.3% while reducing variance by approximately 89.7%. On PascalVOC, mean mAP improves from 27.89% to 28.32%, with variance dropping by approximately 33%.
Mean mAP and Variance across Search Spaces. Superscripts indicate the iteration number in the iterative search process. A higher mean mAP indicates better average performance, while lower variance reflects more stable and consistent architectures discovered across runs. On SVHN, the mean mAP increases from 44.61% to 70.68%, accompanied by a reduction in variance from 0.193 to 0.199. On PascalVOC, mean mAP improves from 4.99% to 28.32%, while variance decreases by over 95%.
Dataset Search Space Mean mAP ↑mAP Std. Dev. Variance
SVHN Joint Search 44.61% 0% 1.39 1.93
Head Search 67.38% +22.77% 0.37 0.136
ELASTIC (3) 70.42% +25.81% 0.47 0.221
ELASTIC (5) 70.68% +26.07% 0.45 0.199
PascalVOC Joint Search 4.99% 0% 0.04 1.40 × 10 −3
Head Search 27.89% +22.90% 0.01 9.86 × 10 −5
ELASTIC (3) 28.28% +23.29% 0.01 7.04 × 10 −5
ELASTIC (5) 28.32% +23.33% 0.01 6.60 × 10 −5

Notes: Smaller variance values indicate greater stability and consistency in discovered architectures.

Performance & Efficiency on MAX78000/MAX78002

Comparison of performance/efficiency metrics on the MAX78000/MAX78002 platforms. ELASTIC models achieve higher mAP, with gains of up to 6.99%, while consistently reducing energy consumption by up to 3.5× and power usage by up to 1.6× compared to baseline models. On the PascalVOC dataset, ELASTIC reduces latency by 2.4×.
Model Dataset Device Params Energy (μJ) Latency (ms) Power (mW) mAP (%)
ai87-fpndetector PascalVOC MAX78002 2.18M 62001 122.6 445.76 50.66
ELASTIC (OURS) PascalVOC MAX78002 1.32M 17581 51.1 285.02 57.65
ai85net-tinierssd-face VGGFace2 MAX78000 0.28M 1712 43.4 29.90 84.72
ELASTIC (OURS) VGGFace2 MAX78000 0.22M 1368 45.6 20.90 87.10
ai85net-tinierssd SVHN MAX78000 0.19M 573 14.0 29.20 83.60
ELASTIC (OURS) SVHN MAX78000 0.22M 341 13.0 16.70 88.10
Comparison of ELASTIC-derived models vs. scaled baselines
Comparison of ELASTIC-derived models vs. scaled ai85net-tinierssd baselines. Marker size reflects number of FLOPs, positively correlated with energy. ELASTIC-derived models can reduce FLOPs by up to 66% with up to a 5.7% accuracy gain.
ELASTIC deployment on MAX78000 using SVHN dataset
ELASTIC deployment on MAX78000 using SVHN dataset. ELASTIC produces compact, high-performing architectures suitable for ultra-low-power deployment.

Detection Demo on MAX78000

SVHN Demo on MAX78000. On-device digit detection example.
VGGFace2 Demo on MAX78000. On-device face detection example.

BibTeX

@misc{tran2025elasticefficientiterativesearch,
      title={ELASTIC: Efficient Once For All Iterative Search for Object Detection on Microcontrollers}, 
      author={Tony Tran and Qin Lin and Bin Hu},
      year={2025},
      eprint={2503.21999},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.21999},
}