Image Segmentation-Based Scene Understanding using a Text-to-Image Model for Autonomous  Robot Navigation

Stephane Claude Lee

Back

Image Segmentation-Based Scene Understanding using a Text-to-Image Model for Autonomous Robot Navigation

Thesis

Open access

Image Segmentation-Based Scene Understanding using a Text-to-Image Model for Autonomous Robot Navigation

Stephane Claude Lee

University of West Florida Libraries

Master of Science (MS), University of West Florida

2025

Metrics

2 File views/ downloads

26 Record Views

Abstract

Robots require an effective system to determine whether their environment is safe to navigatethrough or not. This includes identifying objects, analyzing terrain, and accounting for the physical form and mode of locomotion of the robot, whether wheeled, legged, or flying, and finding a path to safely reach a destination. Without accurate perception and classification of their surroundings, autonomous robots are prone to misinterpret obstacles, misjudge terrain types, or become non-functional in dynamic or unstructured environments. The most common form of identifying visual data from a robot’s camera is through a trained segmentation model. These models are typically trained on thousands of annotated images to recognize and label objects in new, unseen data. However, this approach presents multiple limitations. First, trained segmentation models are computationally expensive and time-consuming to develop, requiring substantial hardware resources and manual labeling. Second, their effectiveness is often constrained to the domain they were trained on; once deployed in a new environment, they may fail to generalize without additional training or adaptation. Lastly, many of these models require fine-tuning when transitioning across sensor types, scene variations, or robot platforms. This research explores an alternative to traditional segmentation by proposing a generative AI-based approach for class identification, terrain analysis, and visual perception. specifically, the study investigates whether a text-to-image model, DALLE-3, can be prompted to generate segmentation masks without any prior exposure to labeled training data. By eliminating the dependency on pre-trained segmentation networks, the goal is to create a universal segmentation pipeline that is adaptable across multiple robotic platforms. A complete segmentation pipeline is developed that includes prompt formatting, segmentation generation, region recognition, and evaluation against ground truth data from the RELLIS-3D dataset. This RELLIS-3D dataset is the benchmark comprising dynamic outdoor scenes with 20 defined semantic classes. The segmentation performance of DALLE-3 is evaluated using the mean Intersection over Union (mIoU) metric and compared to the outputs of state-of-the-art trained models, including HRNet+OCR and GSCNN. The findings of this research point toward a future where generative models may supplement orreplace traditional segmentation pipelines in scenarios where training data is scarce or unavailable. This thesis lays the groundwork for further research in combining generative models with robotic perception systems, enabling greater flexibility, rapid deployment, and task adaptation across a wide range of autonomous platforms.

Files and links (1)

pdf

Image Segmentation-Based Scene Understanding using a Text-to-Image Model for Autonomous Robot Navigation24.69 MBDownload View

Preprint Thesis pdf Open Access

Details

Title: Image Segmentation-Based Scene Understanding using a Text-to-Image Model for Autonomous Robot Navigation
Resource Type: Thesis
Contributors: Hakki Erhan Sevil (Committee Chair)
Yazan Alqudah (Committee Member)
Jiaming Fu (Committee Member)
Publisher: University of West Florida Libraries
Format: pdf
Number of pages: 86
Copyright: Permission granted to the University of West Florida Libraries by the author to digitize and/or display this information for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires the permission of the copyright holder.
Identifiers: 99381469215906600
Academic Unit: Dr. Muhammad Harunur Rashid Department of Electrical and Computer Engineering
Language: English
Awarding Institution: University of West Florida; Master of Science (MS)
Theses and Dissertations: Master of Science (MS), University of West Florida

Image Segmentation-Based Scene Understanding using a Text-to-Image Model for Autonomous Robot Navigation

Metrics

Abstract

Files and links (1)

Details

University of West Florida Social media