Hardware Lookup Table (LUT)¶

A Hardware LUT is a pre-built cost model for a specific MCU target. It is built from a profiling phase that evaluates random architectures on real hardware, and then replaces real MCU evaluation during NAS.

Motivation¶

Real MCU evaluation (flash + measure inference time + measure energy) takes 30–60 seconds per architecture. In a 30-generation NAS run with 50 individuals per generation, this adds up to ~12–25 hours of hardware time.

The Hardware LUT addresses this by: 1. Profile once: Evaluate N random architectures on the target board. 2. Build LUT: Train a cost model from the profiling data. 3. Run NAS: Use the LUT instead of real hardware — no board needed. 4. Reuse: The same LUT works for any future NAS run targeting the same board + search space.

Difference from the online hardware surrogate¶

	Online Surrogate (`surrogate_hardware`)	Hardware LUT
When	Learns during NAS run	Dedicated profiling phase before NAS
Coverage	Biased by NAS selection pressure	Uniform random sampling of search space
Hardware needed during NAS	Yes (exploration set always evaluated on MCU)	No
Reusable	Coupled to one run	Portable across runs

Prediction Modes¶

Full-architecture mode (`full`)¶

One Random Forest per metric (energy, inference_time, ROM) trained on the full 544-dimensional architecture encoding.

Captures cross-layer interactions (memory layout, caching effects).
Works best when profiling data covers the search space well.

Layer-wise mode (`layerwise`)¶

Per-layer-type regressors where total cost = sum of per-layer costs.

Better generalization to unseen architecture structures.
Interpretable: shows which layer types dominate the cost.
Use predict_breakdown() to get per-layer cost contributions.

Training approach: 1. Extract per-layer features from the encoding's slot-based structure. 2. Train one Random Forest per layer type (C_2D_BLOCK, DC_2D_BLOCK, D, etc.) per metric. 3. Calibrate with a global scaling factor so sum of layer predictions matches measured totals.

Workflow¶

1. Profile¶

Run the profiling script from Docker (requires MCU hardware):

docker run -it --rm --privileged --gpus all \
    -v /path/to/EdgeVolution:/EdgeVolution edgevolution-embedded \
    python3 tools/profile_hardware.py \
        +hyperparameters=speech_commands +search_space=speech_commands \
        +boards=nrf52840dk \
        hardware_profile.n_samples=200 \
        hardware_profile.output=hardware_luts/nrf52840dk/ \
        hardware_profile.mode=full

This will: 1. Generate 200 random architectures. 2. Evaluate them through the full pipeline (translate, train, flash, measure). 3. Build the LUT and report cross-validation R²/MAE per metric. 4. Save the LUT to the specified output directory.

2. Run NAS with LUT¶

No hardware needed:

python main.py \
    +hyperparameters=speech_commands +search_space=speech_commands \
    +boards=nrf52840dk search_strategy=pymoo \
    hardware_lut.enabled.value=true \
    hardware_lut.path.value=hardware_luts/nrf52840dk/

Results will contain hw_lut_predicted: true in each individual's results.json.

3. Reuse¶

The same LUT works for any future NAS run targeting the same board and search space combination.

Configuration¶

In conf/config.yaml:

hardware_lut:
  enabled:
    value: false
  path:
    value: null

hardware_profile:
  n_samples:
    value: 200
  output:
    value: null
  mode:
    value: full   # 'full' or 'layerwise'

LUT Directory Structure¶

hardware_luts/nrf52840dk/
  lut_metadata.json       # board info, sample count, mode
  registry_info.json      # encoding schema
  energy/                 # per-metric model
    metadata.json         # (full mode: SurrogateModel format)
    model.joblib
    training_data.npz
  inference_time/
    ...
  rom/
    ...

API¶

from neural_architecture_search.src.hardware_lut import HardwareLUT

# Build from profiling results
lut = HardwareLUT.build_from_results(
    results_dir, board_snr, registry, mode="full"
)

# Predict
pred = lut.predict(chromosome, encoding)
# {'energy': (35.2, 2.1), 'inference_time': (2816.0, 150.3), 'rom': (489000.0, 12000.5)}

# Layer-wise breakdown (layerwise mode only)
breakdown = lut.predict_breakdown(chromosome, encoding)
# {'energy': {'0_STFT_2D': 5.1, '1_C_2D_BLOCK': 12.3, ...}}

# Cross-validation
cv = lut.cross_validate(results_dir, board_snr, registry, n_folds=5)
# {'energy': {'r2': 0.92, 'mae': 2.1}, ...}

# Save / Load
lut.save("hardware_luts/nrf52840dk/")
lut = HardwareLUT.load("hardware_luts/nrf52840dk/")