Logo image
Towards a Unified Framework for Deep Embedded Multi-View Representations
Dissertation   Open access

Towards a Unified Framework for Deep Embedded Multi-View Representations

Don Yates
University of West Florida Libraries
Doctor of Philosophy (PHD), University of West Florida
2026

Metrics

1 Record Views

Abstract

Multi-view data is ubiquitous in modern perception systems, arising naturally from multiple sensors, viewpoints, temporal observations, or heterogeneous feature modalities. A central challenge in multi-view clustering is to discover latent structure that captures view invariant semantics while preserving complementary, view specific information that is often critical for discrimination. Although deep multi-view clustering methods have demonstrated promising results on simplified benchmarks, many existing approaches suffer from limited representational capacity, entangled latent factors, restrictive probabilistic assumptions, and poor scalability to complex, real world data. This dissertation develops a unified framework for deep embedded multi-view clustering based on the explicit disentanglement of common (shared) and unique (view specific) representations. The core premise is that robust clustering performance in realistic settings requires modeling both cross-view consistency and intra view diversity within a learned latent space. To this end, the proposed methods employ autoencoder architectures with enhanced representational capacity, self-supervised learning objectives, and clustering aware latent regularization. Contrastive and complementary loss formulations are introduced to align shared representations across views while preserving informative view specific features, avoiding conditional independence assumptions that often fail in practice. The framework is validated across diverse application domains, including large scale aerial imagery and robotic perception. In the context of aerial image analysis, deep embedded multi-view clustering with convolutional backbones and data augmentation is shown toeffectively capture complex spatial patterns and environmental variability, enabling robust clustering of real-world imagery beyond canonical datasets. In robotic localization, the dissertation reformulates loop closure detection as a latent space clustering problem rather than a retrieval task, demonstrating that clustering driven representations improve robustness to perceptual aliasing, appearance change, and scalability constraints in SLAM systems. Additionally, a common and unique representation deep embedded clustering architecture is introduced to support single sample inference, extending the applicability of multi-view clustering methods to online and real-time scenarios.Extensive experimental evaluation on benchmark multi-view datasets and real-world perception tasks demonstrates consistent improvements in clustering accuracy, normalized mutual information, robustness, and interpretability compared to state-of-the-art methods. Collectively, this work establishes deep embedded clustering with disentangled common and unique representations as a principled and scalable approach for multi-view learning, providing both theoretical insight and practical tools for high dimensional, dynamic, and real-world data analysis.
pdf
Towards a Unified Framework for Deep Embedded Multi-View Representations3.40 MBDownloadView
Preprint Dissertation pdf Open Access

Details

Logo image