List of works
Journal article
Published 10/22/2025
SN computer science, 6, 8, 921
Forest disturbance due to natural events, such as wildfires, represents an increasing global challenge that demands advanced analytical methods for effective detection and mitigation. To this end, integrating satellite imagery with deep learning (DL) has emerged as a powerful approach for forest wildfire detection; however, its practical use remains limited by the scarcity of large, well-labeled satellite imagery datasets. In this study, we address this issue by presenting the California Wildfire GeoImaging Dataset (CWGID), a high-resolution bi-temporal collection of over 100,000 labeled RGB "before-" and and "-after" Sentinel-2 wildfire satellite image pairs. We build and label the dataset programmatically, significantly reducing the time and manual effort usually required to create labeled datasets suitable for DL applications. Our methods include data acquisition from authoritative sources, systematic preprocessing, and an initial analysis using three pre-trained Convolutional Neural Network (CNN) architectures for two classification tasks consisting, respectively, in labeling unitemporal and bitemporal inputs as damaged or not damaged by fire. Our results show that using bi-temporal imagery as input during model training and testing can result in improved model performance, with the Early Fusion (EF) EfficientNet-B0 model achieving the highest wildfire detection accuracy of over 92%. These findings suggest that the CWGID and the streamlined programmatic methodology used to build it may help address the scarcity of labeled data for DL-based forest wildfire detection, while providing a scalable resource that could support other DL applications in environmental monitoring.
Conference proceeding
Natural Language Interaction with Databases on Edge Devices in the Internet of Battlefield Things
Published 10/06/2025
MILCOM IEEE Military Communications Conference, 838 - 843
IEEE Military Communications Conference (MILCOM): MILCOM 2025, 10/06/2025–10/10/2025, Los Angeles, California, USA
The expansion of the Internet of Things (IoT) in the battlefield, Internet of Battlefield Things (IoBT), gives rise to new opportunities for enhancing situational awareness. To increase the potential of IoBT for situational awareness in critical decision making, the data from these devices must be processed into consumer-ready information objects and made available to consumers on demand. To address this challenge we propose a workflow that makes use of natural language processing (NLP) to query a database and return a response in natural language. Our solution utilizes Large Language Models (LLMs) sized for edge devices to perform NLP, as well as a graph database. These types of databases are well suited for the dynamic, connected networks pervasive in the IoBT. Our architecture employs LLMs for both mapping questions in natural language to Cypher database queries as well as to summarize the database output back to the user in natural language. We evaluated several medium-sized LLMs for both of these tasks on a database representing publicly available data from the US Army's Multi-purpose Sensing Area Multi-Purpose Sensing Area (MSA) at the Jornada Range in Las Cruces, NM. We observe that Llama 3.1 (8 billion parameters) outperforms the other models across all considered metrics. Most importantly, we note that, unlike current methods, our two step approach allows the relaxation of the Exact Match (EM) requirement of the produced Cypher queries with ground truth code and, in this way, it achieves a 19.4% increase in accuracy. Our workflow lays the groundwork for deploying LLMs on edge devices to enable natural language interactions with databases containing information objects for critical decision making.
Conference proceeding
Constrained Edge AI Deployment: Fine-Tuning vs. Distillation for LLM Compression
Published 10/2025
MILCOM 2025 - 2025 IEEE Military Communications Conference (MILCOM), 1500 - 1505
IEEE Military Communications Conference (MILCOM), 10/06/2025–10/10/2025, Los Angeles, California, USA
Modern foundational models are often compressed via a combination of structured pruning and re-training to meet the strict compute, memory, and connectivity constraints of edge deployments. While state-of-the-art (SoTA) pruning schemes target the entire Transformer, we adopt a simple, layer-wise L 2 -norm pruning on only the multi-layer perceptron (MLP) blocks as a fixed baseline. Our focus is not on achieving maximal compression, but on isolating the impact of the re-training loss function: (i) L2-norm Pruning with Cross-Entropy Fine-Tuning (L2PFT), which relies on labeled data, versus (ii) L2-norm Pruning with KL-Divergence Self-Distillation (L2PSD), which utilizes only teacher logits without requiring labeled data. We evaluate both pipelines on the OLMo2-7B-SFT model for CommonsenseQA, suitable for intermittent or denied connectivity scenarios typical of edge networks. Under identical pruning schedules, L2PSD achieves comparable or superior test accuracy to L2PFT, indicating that the choice of loss function has a significant impact on compressed model recovery in resource-constrained environments.
Journal article
Fast, slow, and metacognitive thinking in AI
Published 10/01/2025
npj Artificial Intelligence, 1, 27
Inspired by the ”thinking fast and slow” cognitive theory of human decision making, we propose a multi-agent cognitive architecture (SOFAI) that is based on ”fast”/”slow” solvers and a metacognitive module. We then present experimental results on the behavior of an instance of this architecture for AI systems that make decisions about navigating in a constrained environment. We show that combining the two decision modalities through a separate metacognitive function allows for higher decision quality with less resource consumption compared to employing only one of the two modalities. Analyzing how the system achieves this, we also provide evidence for the emergence of several human-like behaviors, including skill learning, adaptability, and cognitive control.
Conference proceeding
Toward Human-Aligned LLM Reviews for Scientific Papers
Published 09/15/2025
Proceedings IEEE International Conference on e-Science: eScience 2025, 363 - 364
IEEE International Conference on e-Science: eScience 2025, 09/15/2025–09/18/2025, Chicago, Illinois, USA
The peer review process is strained by increasing submission volumes, reviewer fatigue, and inconsistent standards. While Large Language Models (LLMs) can aid in reviews, they are often overly optimistic and lack technical depth. We developed an innovative prompting strategy that, when applied to ChatGPT-4 on ICLR 2025 papers, reduced score inflation and generated reviews more closely aligned with human reviewer median scores.
Preprint
Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization
Posted to a preprint site 08/10/2025
In this work, we evaluate the potential of Large Language Models (LLMs) in building Bayesian Networks (BNs) by approximating domain expert priors. LLMs have demonstrated potential as factual knowledge bases; however, their capability to generate probabilistic knowledge about real-world events remains understudied. We explore utilizing the probabilistic knowledge inherent in LLMs to derive probability estimates for statements regarding events and their relationships within a BN. Using LLMs in this context allows for the parameterization of BNs, enabling probabilistic modeling within specific domains. Our experiments on eighty publicly available Bayesian Networks, from healthcare to finance, demonstrate that querying LLMs about the conditional probabilities of events provides meaningful results when compared to baselines, including random and uniform distributions, as well as approaches based on next-token generation probabilities. We explore how these LLM-derived distributions can serve as expert priors to refine distributions extracted from data, especially when data is scarce. Overall, this work introduces a promising strategy for automatically constructing Bayesian Networks by combining probabilistic knowledge extracted from LLMs with real-world data. Additionally, we establish the first comprehensive baseline for assessing LLM performance in extracting probabilistic knowledge.
Magazine article
Thinking Fast and Slow in Human and Machine Intelligence
Published 07/25/2025
Communications of the ACM, 68, 8, 72 - 79
When working to build machines that have a form of intelligence, it is natural to be inspired by human intelligence. Of course, humans are very different from machines, in their embodiment and myriad other ways. Humans exploit their bodies to experience the world, create an internal model of it, and use this model to reason, learn, and make contextual and informed decisions. Machines lack the same embodiment, but often have access to both more memory and more computing power. Despite these crucial disanalogies, it is still useful to leverage our knowledge of how the human mind reasons and makes decisions to design and build machines that demonstrate behaviors similar to that of a human. In this article, we present a novel AI architecture, Slow and Fast AI (SOFAI), that is inspired by the “thinking fast and slow” cognitive theory of human decision making. SOFAI is a multi-agent architecture that employs both “fast” and “slow” solvers underneath a metacognitive agent that is able to both choose among a set of solvers as well as reflect on and learn from past experience. Experimental results on the behavior of two instances of the SOFAI architecture show that, compared to using just one of the two decision modalities, SOFAI is markedly better in terms of decision quality, resource consumption, and efficiency.
Preprint
Natural Language Interaction with Databases on Edge Devices in the Internet of Battlefield Things
Posted to a preprint site 06/05/2025
The expansion of the Internet of Things (IoT) in the battlefield, Internet of Battlefield Things (IoBT), gives rise to new opportunities for enhancing situational awareness. To increase the potential of IoBT for situational awareness in critical decision making, the data from these devices must be
processed into consumer-ready information objects, and made available to consumers on demand. To address this challenge we propose a workflow that makes use of natural language processing (NLP) to query a database technology and return a response in natural language. Our solution utilizes Large Language Models (LLMs) that are sized for edge devices to perform NLP as well as graphical databases which are well suited for dynamic connected networks which are pervasive in the IoBT. Our architecture employs LLMs for both mapping questions in natural language to Cypher database queries as well as to summarize the database output back to the user in natural language. We evaluate several medium sized LLMs for both of these tasks on a database representing publicly available data from the US Army's Multipurpose Sensing Area (MSA) at the Jornada Range in Las Cruces, NM. We observe that Llama 3.1 (8 billion parameters) outperforms the other models across all the considered metrics. Most importantly, we note that, unlike current methods, our two step approach allows the relaxation of the Exact Match (EM) requirement of the produced Cypher queries with ground truth code and, in this way, it achieves a 19.4% increase in accuracy. Our workflow lays the ground work for deploying LLMs on edge devices to enable natural language interactions with databases containing information objects for critical decision making.
Preprint
Constrained Edge AI Deployment: Fine-Tuning vs Distillation for LLM Compression
Posted to a preprint site 05/13/2025
Modern foundational models are often compressed via a combination of structured pruning and re-training to meet the strict compute, memory, and
connectivity constraints of edge deployments. While state-of-the-art pruning schemes target the entire Transformer, we adopt a simple, layer-wise
L2-norm pruning on only the MLP blocks as a fixed baseline. Our focus is not on achieving maximal compression, but on isolating the impact of the re-training loss function: (i) Fine-tuning with Cross- Entropy (L2PFT), which requires labeled data, versus (ii) Self-Distillation with KL-divergence, which leverages only teacher logits (no labels) (L2PSD). We evaluate both pipelines on the OLMo2- 7B-SFT model for CommonsenseQA suitable for intermittent or denied connectivity scenarios typical of edge networks. Under identical pruning schedules, KL-based distillation matches or exceeds CE fine-tuning in test accuracy, demonstrating that, even with a basic MLP-only pruning, the choice of loss function materially affects compressed model recovery in
resource-constrained environments.
Conference proceeding
Reasoning over Uncertain Text by Generative Large Language Models
Published 04/11/2025
Proceedings of the ... AAAI Conference on Artificial Intelligence, 39, 23, 24911 - 24920
AAAI Conference on Artificial Intelligence, 02/25/2025–03/04/2025, Philadelphia, Pennyslvania, USA
This paper considers the challenges Large Language Models (LLMs) face when reasoning over text that includes information involving uncertainty explicitly quantified via probability values. This type of reasoning is relevant to a variety of contexts ranging from everyday conversations to medical decision-making. Despite improvements in the mathematical reasoning capabilities of LLMs, they still exhibit significant difficulties when it comes to probabilistic reasoning. To deal with this problem, we introduce the Bayesian Linguistic Inference Dataset (BLInD), a new dataset specifically designed to test the probabilistic reasoning capabilities of LLMs. We use BLInD to find out the limitations of LLMs for tasks involving probabilistic reasoning. In addition, we present several prompting strategies that map the problem to different formal representations, including Python code, probabilistic algorithms, and probabilistic logical programming. We conclude by providing an evaluation of our methods on BLInD and an adaptation of a causal reasoning question-answering dataset. Our empirical results highlight the effectiveness of our proposed strategies for multiple LLMs.
Code and Dataset - https://github.com/HLR/BLInD
Extended Version - https://arxiv.org/abs/2402.09614