Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation

Stephane Lee; Hakki Erhan Sevil

Back

Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation

Conference paper

Open access

Peer reviewed

Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation

Stephane Lee and Hakki Erhan Sevil

Florida Conference on Recent Advances in Robotics (FCRAR 2025), 38th (Florida Atlantic University, Dania Beach, Florida, USA, 05/07/2025–05/08/2025)

05/08/2025

Metrics

9 File views/ downloads

35 Record Views

Abstract

Image segmentation is essential for navigation and scene understanding in autonomous systems, particularly in unstructured outdoor environments. This study investigates the segmentation capabilities of DALL-E 3, a generative text-to-image model, that is not explicitly trained for semantic segmentation. A custom segmentation pipeline was developed to evaluate and refine DALL-E 3 outputs on outdoor images from the RELLIS-3D dataset. The post-processing workflow includes morphological operations with varied structure elements to enhance segmentation accuracy. Segmentation accuracy was assessed using mean Intersection over Union (mIoU) across selected terrain classes. Results show that the raw DALL-E 3 outputs were improved after developed post-processing refinement, and resulting accuracy values are competitive with supervised models, HRNet+OCR and GSCNN. These results demonstrate that text-to-image models, when paired with domain-aware post-processing, offer a promis-ing alternative for flexible, rapid-deployment segmentation for universal robotics without requiring labeled training data. These efforts contribute to our research team’s broader goal of enabling intelligent mobile robots capable of autonomous perception and decision-making in complex environments.

Files and links (1)

pdf

Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation7.68 MBDownload View

PresentationConference paper pdf Open Access

Details

Title: Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation
Resource Type: Conference paper
Conference: Florida Conference on Recent Advances in Robotics (FCRAR 2025), 38th (Florida Atlantic University, Dania Beach, Florida, USA, 05/07/2025–05/08/2025)
Format: pdf
Number of pages: 5
Copyright: Permission granted to the University of West Florida Libraries by the author to digitize and/or display this information for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires the permission of the copyright holder.
Identifiers: 99381348025106600
Academic Unit: Intelligent Systems and Robotics
Language: English

Text-to-Image Model-based Image Segmentation for Scene Understanding in Autonomous Robot Navigation

Metrics

Abstract

Files and links (1)

Related links

Details

University of West Florida Social media