Logo image
A Visual Question Answering-based Object Detection Framework using a Team of Multi-Agent UAVs
Conference paper   Open access   Peer reviewed

A Visual Question Answering-based Object Detection Framework using a Team of Multi-Agent UAVs

Stephane Lee, Nathan Rayon and Hakki Erhan Sevil
Florida Conference on Recent Advances in Robotics (FCRAR 2025), 38th (Dania Beach, Florida, USA, 05/08/2025–05/09/2025)
05/09/2025

Metrics

5 File views/ downloads
35 Record Views

Abstract

This paper presents an autonomous multi-agent unmanned aerial vehicle (UAV) system designed to perform object detection through Visual Question Answering (VQA) using aerial imagery. The system utilizes an entropy-based distributed behavior model to coordinate UAV movements toward designated waypoints. A VQA model is used to analyze aerial footage for detection of objects of interest. The study investigates the impact of various distributed behavior configurations, including number of UAVs, UAV formations, flight altitude, and separation distance. After analysis, a final optimized configuration for maximizing surface area coverage and VQA model performance were found. These findings contribute to the development of aerial systems capable of collaborative visual reasoning in complex environments.
pdf
A Visual Question Answering-based Object Detection Framework using a Team of Multi-Agent UAVs2.87 MBDownloadView
PresentationConference paper pdf Open Access

Related links

Details

Logo image