Volltextdatei(en) vorhanden
DC ElementWertSprache
dc.contributor.advisorMenzel, Wolfgang-
dc.contributor.advisorVertan, Cristina-
dc.contributor.advisorvon Hahn, Walther-
dc.contributor.authorDuma, Mirela-Stefania-
dc.date.accessioned2022-08-04T11:43:15Z-
dc.date.available2022-08-04T11:43:15Z-
dc.date.issued2021-
dc.identifier.urihttps://ediss.sub.uni-hamburg.de/handle/ediss/9721-
dc.description.abstractMachine Translation (MT) is a current topic in the Computational Linguistics (CL) community. Training an MT model on a domain and using it on another domain does not yield the expected performance due to the syntactic and semantic differences between the two domains. Thus, domain adaptation is necessary. Data selection, which is the topic of this thesis, is a corpus-driven domain adaptation method. Given a general domain corpus and an in-domain, each sentence from the general domain corpus is scored according to its similarity to the in-domain. The most similar sentences to an in-domain are selected as pseudo in-domain and used later on in the training of domain-focused MT systems. There are two challenges that arise with data selection: which method to use to determine the most similar sentences from the general domain to a given in-domain and how many of the general domain sentences to select as pseudo in-domain. In this thesis, data selection methods that address both challenges are presented. I developed several scoring methods and compared them with a method I developed that automatically determines the ratio of sentences to select. Data selection is crucial for MT systems that aim to translate domain-specific texts. The data selection SMT models presented in this thesis were trained faster in comparison with training using full general domain data, had a smaller size, and performed on a par or better than the models trained using the full training data.en
dc.language.isoende_DE
dc.publisherStaats- und Universitätsbibliothek Hamburg Carl von Ossietzkyde
dc.rightshttp://purl.org/coar/access_right/c_abf2de_DE
dc.subjectMachine Translationen
dc.subjectData Selectionen
dc.subjectDomain Adaptationen
dc.subject.ddc004: Informatikde_DE
dc.titleData Selection for Statistical Machine Translationen
dc.typedoctoralThesisen
dcterms.dateAccepted2021-12-01-
dc.rights.ccNo licensede_DE
dc.rights.rshttp://rightsstatements.org/vocab/InC/1.0/-
dc.subject.gndTranslation <Linguistik>de_DE
dc.type.casraiDissertation-
dc.type.dinidoctoralThesis-
dc.type.driverdoctoralThesis-
dc.type.statusinfo:eu-repo/semantics/publishedVersionde_DE
dc.type.thesisdoctoralThesisde_DE
tuhh.type.opusDissertation-
thesis.grantor.departmentInformatikde_DE
thesis.grantor.placeHamburg-
thesis.grantor.universityOrInstitutionUniversität Hamburgde_DE
dcterms.DCMITypeText-
dc.identifier.urnurn:nbn:de:gbv:18-ediss-101970-
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt14/medical-task/de_DE
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt17/biomedical-translation-task.htmlde_DE
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt16/biomedical-translation-task.htmlde_DE
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt18/biomedical-translation-task.htmlde_DE
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt16/it-translation-task.htmlde_DE
datacite.relation.IsDerivedFromhttps://opus.nlpl.eu/index.phpde_DE
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt13/translation-task.htmlde_DE
datacite.relation.IsDerivedFromhttps://ufal.mff.cuni.cz/ufal_medical_corpusde_DE
datacite.relation.IsDerivedFromhttps://www.statmt.org/wmt19/biomedical-translation-task.htmlde_DE
datacite.relation.IsDerivedFromhttps://github.com/jhlau/doc2vecde_DE
datacite.relation.IsDerivedFromhttp://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmarkde_DE
datacite.relation.IsDerivedFromhttps://alt.qcri.org/semeval2017/task1/index.php?id=data-and-toolsde_DE
item.advisorGNDMenzel, Wolfgang-
item.advisorGNDVertan, Cristina-
item.advisorGNDvon Hahn, Walther-
item.grantfulltextopen-
item.languageiso639-1other-
item.fulltextWith Fulltext-
item.creatorOrcidDuma, Mirela-Stefania-
item.creatorGNDDuma, Mirela-Stefania-
Enthalten in den Sammlungen:Elektronische Dissertationen und Habilitationen
Dateien zu dieser Ressource:
Datei Beschreibung Prüfsumme GrößeFormat  
Dissertation_SD.pdfd8161f69b7f99455391477cb4b23072a2.66 MBAdobe PDFÖffnen/Anzeigen
Zur Kurzanzeige

Diese Publikation steht in elektronischer Form im Internet bereit und kann gelesen werden. Über den freien Zugang hinaus wurden durch die Urheberin / den Urheber keine weiteren Rechte eingeräumt. Nutzungshandlungen (wie zum Beispiel der Download, das Bearbeiten, das Weiterverbreiten) sind daher nur im Rahmen der gesetzlichen Erlaubnisse des Urheberrechtsgesetzes (UrhG) erlaubt. Dies gilt für die Publikation sowie für ihre einzelnen Bestandteile, soweit nichts Anderes ausgewiesen ist.

Info

Seitenansichten

901
Letzte Woche
Letzten Monat
geprüft am 25.04.2024

Download(s)

139
Letzte Woche
Letzten Monat
geprüft am 25.04.2024
Werkzeuge

Google ScholarTM

Prüfe