DC ElementWertSprache
dc.contributor.advisorBiemann, Chris-
dc.contributor.authorWang, Xintong-
dc.date.accessioned2026-06-29T10:11:08Z-
dc.date.available2026-06-29T10:11:08Z-
dc.date.issued2026-
dc.identifier.urihttps://ediss.sub.uni-hamburg.de/handle/ediss/12466-
dc.description.abstractFoundation models have reshaped the development of artificial intelligence (AI) by introducing a paradigm in which a single pretrained system can generalize across a wide spectrum of language and multimodal tasks. Built upon large-scale data and representation learning, these models reduce the need for task-specific engineering and enable reusable semantic and perceptual knowledge. This shift from specialized models to general-purpose learning architectures has significantly expanded the scope and applicability of AI technologies. As foundation models transition from controlled benchmarks to real-world deployment, new demands arise that extend beyond raw capability. Systems are increasingly expected to behave in ways that are reliable, interpretable, and aligned with human expectations, particularly in settings involving multimodal reasoning and interaction. These requirements expose limitations that are not visible in traditional performance evaluations and motivate a shift in research focus from scaling performance to ensuring trustworthy behavior. In this dissertation, trustworthiness is analyzed as a property that emerges from the structural coupling of groundedness, alignment stability, faithfulness, and controllability. These dimensions correspond to interdependent stages of the trustworthiness pipeline, encompassing how multimodal signals anchor meaning, how pretrained representations are adapted, how generative processes reconcile internal knowledge with external evidence, and how model behavior can be guided in transparent ways. When these stages are treated in isolation, characteristic failure modes arise, including context-insensitive grounding, instability under adaptation, hallucinated outputs, and safety interventions that disrupt communicative intent. The dissertation therefore investigates trustworthiness through coordinated interventions at different interfaces of the modeling pipeline. It develops methods that strengthen context-sensitive multimodal grounding while preserving representational structure, examines inference-time mechanisms that regulate the interaction between prior knowledge and conditioning signals, and introduces cognitively informed analyses to identify interpretable loci for efficient behavioral steering. Rather than addressing isolated symptoms, these contributions target complementary sources of unreliability across data construction, representation maintenance, and generation dynamics. Overall, the findings indicate that trustworthy foundation modeling must be engineered as a lifecycle property rather than achieved through post hoc alignment alone. Reliability arises from the deliberate coordination of grounding, adaptation, inference, and control, suggesting a pathway toward foundation models whose general capabilities are matched by predictability, transparency, and human-centered usability.en
dc.language.isoende_DE
dc.publisherStaats- und Universitätsbibliothek Hamburg Carl von Ossietzkyde
dc.rightshttp://purl.org/coar/access_right/c_abf2de_DE
dc.subject.ddc004: Informatikde_DE
dc.titleBridging Vision, Language, and Gaze for Trustworthy Foundation Modelsen
dc.typedoctoralThesisen
dcterms.dateAccepted2026-06-23-
dc.rights.cchttps://creativecommons.org/licenses/by/4.0/de_DE
dc.rights.rshttp://rightsstatements.org/vocab/InC/1.0/-
dc.type.casraiDissertation-
dc.type.dinidoctoralThesis-
dc.type.driverdoctoralThesis-
dc.type.statusinfo:eu-repo/semantics/publishedVersionde_DE
dc.type.thesisdoctoralThesisde_DE
tuhh.type.opusDissertation-
thesis.grantor.departmentInformatikde_DE
thesis.grantor.placeHamburg-
thesis.grantor.universityOrInstitutionUniversität Hamburgde_DE
dcterms.DCMITypeText-
dc.identifier.urnurn:nbn:de:gbv:18-ediss-138779-
item.grantfulltextopen-
item.languageiso639-1other-
item.creatorOrcidWang, Xintong-
item.advisorGNDBiemann, Chris-
item.creatorGNDWang, Xintong-
item.fulltextWith Fulltext-
Enthalten in den Sammlungen:Elektronische Dissertationen und Habilitationen
Dateien zu dieser Ressource:
Datei Beschreibung Prüfsumme GrößeFormat  
2026-wang-dissertation.pdf4e69adf381b5ee523d548521bf16075025.46 MBAdobe PDFMiniaturbild
Öffnen/Anzeigen
Zur Kurzanzeige

Info

Seitenansichten

Letzte Woche
Letzten Monat
geprüft am null

Download(s)

Letzte Woche
Letzten Monat
geprüft am null
Werkzeuge

Google ScholarTM

Prüfe