Constrained Edge AI Deployment: Fine-Tuning vs. Distillation for LLM Compression

Jacob Sander; David Moe; Achraf Cohen; Brian Jalaian; Brent Venable; Venkat R. Dasari

doi:10.1109/MILCOM64451.2025.11310502

Back

Constrained Edge AI Deployment: Fine-Tuning vs. Distillation for LLM Compression

Conference proceeding

Peer reviewed

Constrained Edge AI Deployment: Fine-Tuning vs. Distillation for LLM Compression

Jacob Sander, David Moe, Achraf Cohen, Brian Jalaian, Brent Venable and Venkat R. Dasari

MILCOM 2025 - 2025 IEEE Military Communications Conference (MILCOM), pp.1500-1505

IEEE Military Communications Conference (MILCOM) (Los Angeles, California, USA, 10/06/2025–10/10/2025)

10/2025

DOI: https://doi.org/10.1109/MILCOM64451.2025.11310502

Metrics

5 Record Views

Abstract

Modern foundational models are often compressed via a combination of structured pruning and re-training to meet the strict compute, memory, and connectivity constraints of edge deployments. While state-of-the-art (SoTA) pruning schemes target the entire Transformer, we adopt a simple, layer-wise L 2 -norm pruning on only the multi-layer perceptron (MLP) blocks as a fixed baseline. Our focus is not on achieving maximal compression, but on isolating the impact of the re-training loss function: (i) L2-norm Pruning with Cross-Entropy Fine-Tuning (L2PFT), which relies on labeled data, versus (ii) L2-norm Pruning with KL-Divergence Self-Distillation (L2PSD), which utilizes only teacher logits without requiring labeled data. We evaluate both pipelines on the OLMo2-7B-SFT model for CommonsenseQA, suitable for intermittent or denied connectivity scenarios typical of edge networks. Under identical pruning schedules, L2PSD achieves comparable or superior test accuracy to L2PFT, indicating that the choice of loss function has a significant impact on compressed model recovery in resource-constrained environments.

Details

Title: Constrained Edge AI Deployment: Fine-Tuning vs. Distillation for LLM Compression
Publication Details: MILCOM 2025 - 2025 IEEE Military Communications Conference (MILCOM), pp.1500-1505
Resource Type: Conference proceeding
Conference: IEEE Military Communications Conference (MILCOM) (Los Angeles, California, USA, 10/06/2025–10/10/2025)
Publisher: IEEE
Grant note: Arm (10.13039/100016311)
Identifiers: 99381586970906600
Academic Unit: Institute for Human and Machine Cognition; Mathematics and Statistics; Intelligent Systems and Robotics; Hal Marcus College of Science and Engineering
Language: English

Constrained Edge AI Deployment: Fine-Tuning vs. Distillation for LLM Compression

Metrics

Abstract

Related links

Details

University of West Florida Social media