Logo image
Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation
Conference paper   Open access   Peer reviewed

Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation

Stephane Lee and Hakki Erhan Sevil
Florida Conference on Recent Advances in Robotics (FCRAR 2025), 38th (Florida Atlantic University, Dania Beach, Florida, USA, 05/07/2025–05/08/2025)
05/08/2025

Metrics

9 File views/ downloads
35 Record Views

Abstract

Image segmentation is essential for navigation and scene understanding in autonomous systems, particularly in unstructured outdoor environments. This study investigates the segmentation capabilities of DALL-E 3, a generative text-to-image model, that is not explicitly trained for semantic segmentation. A custom segmentation pipeline was developed to evaluate and refine DALL-E 3 outputs on outdoor images from the RELLIS-3D dataset. The post-processing workflow includes morphological operations with varied structure elements to enhance segmentation accuracy. Segmentation accuracy was assessed using mean Intersection over Union (mIoU) across selected terrain classes. Results show that the raw DALL-E 3 outputs were improved after developed post-processing refinement, and resulting accuracy values are competitive with supervised models, HRNet+OCR and GSCNN. These results demonstrate that text-to-image models, when paired with domain-aware post-processing, offer a promis-ing alternative for flexible, rapid-deployment segmentation for universal robotics without requiring labeled training data. These efforts contribute to our research team’s broader goal of enabling intelligent mobile robots capable of autonomous perception and decision-making in complex environments.
pdf
Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation7.68 MBDownloadView
PresentationConference paper pdf Open Access

Related links

Details

Logo image