Skip to content

Hardware Lookup Table (LUT)

A Hardware LUT is a pre-built cost model for a specific MCU target. It is built from a profiling phase that evaluates random architectures on real hardware, and then replaces real MCU evaluation during NAS.

Motivation

Real MCU evaluation (flash + measure inference time + measure energy) takes 30–60 seconds per architecture. In a 30-generation NAS run with 50 individuals per generation, this adds up to ~12–25 hours of hardware time.

The Hardware LUT addresses this by: 1. Profile once: Evaluate N random architectures on the target board. 2. Build LUT: Train a cost model from the profiling data. 3. Run NAS: Use the LUT instead of real hardware — no board needed. 4. Reuse: The same LUT works for any future NAS run targeting the same board + search space.

Difference from the online hardware surrogate

Online Surrogate (surrogate_hardware) Hardware LUT
When Learns during NAS run Dedicated profiling phase before NAS
Coverage Biased by NAS selection pressure Uniform random sampling of search space
Hardware needed during NAS Yes (exploration set always evaluated on MCU) No
Reusable Coupled to one run Portable across runs

Prediction Modes

Full-architecture mode (full)

One Random Forest per metric (energy, inference_time, ROM) trained on the full 544-dimensional architecture encoding.

  • Captures cross-layer interactions (memory layout, caching effects).
  • Works best when profiling data covers the search space well.

Layer-wise mode (layerwise)

Per-layer-type regressors where total cost = sum of per-layer costs.

  • Better generalization to unseen architecture structures.
  • Interpretable: shows which layer types dominate the cost.
  • Use predict_breakdown() to get per-layer cost contributions.

Training approach: 1. Extract per-layer features from the encoding's slot-based structure. 2. Train one Random Forest per layer type (C_2D_BLOCK, DC_2D_BLOCK, D, etc.) per metric. 3. Calibrate with a global scaling factor so sum of layer predictions matches measured totals.

Workflow

1. Profile

Run the profiling script from Docker (requires MCU hardware):

docker run -it --rm --privileged --gpus all \
    -v /path/to/EdgeVolution:/EdgeVolution edgevolution-embedded \
    python3 tools/profile_hardware.py \
        +hyperparameters=speech_commands +search_space=speech_commands \
        +boards=nrf52840dk \
        hardware_profile.n_samples=200 \
        hardware_profile.output=hardware_luts/nrf52840dk/ \
        hardware_profile.mode=full

This will: 1. Generate 200 random architectures. 2. Evaluate them through the full pipeline (translate, train, flash, measure). 3. Build the LUT and report cross-validation R²/MAE per metric. 4. Save the LUT to the specified output directory.

2. Run NAS with LUT

No hardware needed:

python main.py \
    +hyperparameters=speech_commands +search_space=speech_commands \
    +boards=nrf52840dk search_strategy=pymoo \
    hardware_lut.enabled.value=true \
    hardware_lut.path.value=hardware_luts/nrf52840dk/

Results will contain hw_lut_predicted: true in each individual's results.json.

3. Reuse

The same LUT works for any future NAS run targeting the same board and search space combination.

Configuration

In conf/config.yaml:

hardware_lut:
  enabled:
    value: false
  path:
    value: null

hardware_profile:
  n_samples:
    value: 200
  output:
    value: null
  mode:
    value: full   # 'full' or 'layerwise'

LUT Directory Structure

hardware_luts/nrf52840dk/
  lut_metadata.json       # board info, sample count, mode
  registry_info.json      # encoding schema
  energy/                 # per-metric model
    metadata.json         # (full mode: SurrogateModel format)
    model.joblib
    training_data.npz
  inference_time/
    ...
  rom/
    ...

API

from neural_architecture_search.src.hardware_lut import HardwareLUT

# Build from profiling results
lut = HardwareLUT.build_from_results(
    results_dir, board_snr, registry, mode="full"
)

# Predict
pred = lut.predict(chromosome, encoding)
# {'energy': (35.2, 2.1), 'inference_time': (2816.0, 150.3), 'rom': (489000.0, 12000.5)}

# Layer-wise breakdown (layerwise mode only)
breakdown = lut.predict_breakdown(chromosome, encoding)
# {'energy': {'0_STFT_2D': 5.1, '1_C_2D_BLOCK': 12.3, ...}}

# Cross-validation
cv = lut.cross_validate(results_dir, board_snr, registry, n_folds=5)
# {'energy': {'r2': 0.92, 'mae': 2.1}, ...}

# Save / Load
lut.save("hardware_luts/nrf52840dk/")
lut = HardwareLUT.load("hardware_luts/nrf52840dk/")