Bridging Vision, Language, and Gaze for Trustworthy Foundation Models

Wang, Xintong

DC Element	Wert	Sprache
dc.contributor.advisor	Biemann, Chris	-
dc.contributor.author	Wang, Xintong	-
dc.date.accessioned	2026-06-29T10:11:08Z	-
dc.date.available	2026-06-29T10:11:08Z	-
dc.date.issued	2026	-
dc.identifier.uri	https://ediss.sub.uni-hamburg.de/handle/ediss/12466	-
dc.description.abstract	Foundation models have reshaped the development of artificial intelligence (AI) by introducing a paradigm in which a single pretrained system can generalize across a wide spectrum of language and multimodal tasks. Built upon large-scale data and representation learning, these models reduce the need for task-specific engineering and enable reusable semantic and perceptual knowledge. This shift from specialized models to general-purpose learning architectures has significantly expanded the scope and applicability of AI technologies. As foundation models transition from controlled benchmarks to real-world deployment, new demands arise that extend beyond raw capability. Systems are increasingly expected to behave in ways that are reliable, interpretable, and aligned with human expectations, particularly in settings involving multimodal reasoning and interaction. These requirements expose limitations that are not visible in traditional performance evaluations and motivate a shift in research focus from scaling performance to ensuring trustworthy behavior. In this dissertation, trustworthiness is analyzed as a property that emerges from the structural coupling of groundedness, alignment stability, faithfulness, and controllability. These dimensions correspond to interdependent stages of the trustworthiness pipeline, encompassing how multimodal signals anchor meaning, how pretrained representations are adapted, how generative processes reconcile internal knowledge with external evidence, and how model behavior can be guided in transparent ways. When these stages are treated in isolation, characteristic failure modes arise, including context-insensitive grounding, instability under adaptation, hallucinated outputs, and safety interventions that disrupt communicative intent. The dissertation therefore investigates trustworthiness through coordinated interventions at different interfaces of the modeling pipeline. It develops methods that strengthen context-sensitive multimodal grounding while preserving representational structure, examines inference-time mechanisms that regulate the interaction between prior knowledge and conditioning signals, and introduces cognitively informed analyses to identify interpretable loci for efficient behavioral steering. Rather than addressing isolated symptoms, these contributions target complementary sources of unreliability across data construction, representation maintenance, and generation dynamics. Overall, the findings indicate that trustworthy foundation modeling must be engineered as a lifecycle property rather than achieved through post hoc alignment alone. Reliability arises from the deliberate coordination of grounding, adaptation, inference, and control, suggesting a pathway toward foundation models whose general capabilities are matched by predictability, transparency, and human-centered usability.	en
dc.language.iso	en	de_DE
dc.publisher	Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky	de
dc.rights	http://purl.org/coar/access_right/c_abf2	de_DE
dc.subject.ddc	004: Informatik	de_DE
dc.title	Bridging Vision, Language, and Gaze for Trustworthy Foundation Models	en
dc.type	doctoralThesis	en
dcterms.dateAccepted	2026-06-23	-
dc.rights.cc	https://creativecommons.org/licenses/by/4.0/	de_DE
dc.rights.rs	http://rightsstatements.org/vocab/InC/1.0/	-
dc.type.casrai	Dissertation	-
dc.type.dini	doctoralThesis	-
dc.type.driver	doctoralThesis	-
dc.type.status	info:eu-repo/semantics/publishedVersion	de_DE
dc.type.thesis	doctoralThesis	de_DE
tuhh.type.opus	Dissertation	-
thesis.grantor.department	Informatik	de_DE
thesis.grantor.place	Hamburg	-
thesis.grantor.universityOrInstitution	Universität Hamburg	de_DE
dcterms.DCMIType	Text	-
dc.identifier.urn	urn:nbn:de:gbv:18-ediss-138779	-
item.grantfulltext	open	-
item.languageiso639-1	other	-
item.creatorOrcid	Wang, Xintong	-
item.advisorGND	Biemann, Chris	-
item.creatorGND	Wang, Xintong	-
item.fulltext	With Fulltext	-
Enthalten in den Sammlungen:	Elektronische Dissertationen und Habilitationen

Dateien zu dieser Ressource:

Datei	Beschreibung	Prüfsumme	Größe	Format
2026-wang-dissertation.pdf		4e69adf381b5ee523d548521bf160750	25.46 MB	Adobe PDF	Öffnen/Anzeigen

Zur Kurzanzeige

Info

Seitenansichten

Letzte Woche

Letzten Monat

geprüft am null

Download(s)

Letzte Woche

Letzten Monat

geprüft am null

Werkzeuge

Google Scholar^TM

Prüfe

Dateien zu dieser Ressource:

Seitenansichten

Download(s)

Google ScholarTM

Google Scholar^TM