Comparison of Compiler's Intermediate Representations and Input/Output Access Patterns with String Kernels

Torres Carvajal, Raul Ernesto

Titel:	Comparison of Compiler's Intermediate Representations and Input/Output Access Patterns with String Kernels
Sonstige Titel:	Vergleich von intermediären Darstellungen eines Compilers und Eingabe-/Ausgabe-Mustern mit String-Kernels
Sprache:	Englisch
Autor*in:	Torres Carvajal, Raul Ernesto
Schlagwörter:	String Kernels; Code Comparison; I/O Access Patterns Comparison; Compiler's Intermediate Representations
Erscheinungsdatum:	2018
Tag der mündlichen Prüfung:	2018-11-09
Zusammenfassung:	Kernel methods aim for the detection of stable patterns robustly and efficiently from a finite data sample by embedding data items into a space of higher dimensionality where data points have linear relations. Strings kernels apply this methodology to find relationships between string objects by checking for the number of shared substrings and using this measure as a similarity score. Due to the low number of studies conducted in the area of code comparison and Input/Output (I/O) access pattern recognition using string kernels, the goal of this thesis is to propose a suitable, general representation, as well as the corresponding strategies of comparison based on kernel methods, such that they can be used successfully to determine a reliable similarity measure among a collection of programs. Therefore, we propose different conversion strategies from these original sources to a weighted string representation; the defined representation is a collection of tokens whose weights allow the modulation of the contribution of each token in the calculation of the overall similarity. The resulting strings are compared with a new family of kernel functions, which correspond to the major contribution of this thesis: the kastx spectrum kernel family. In order to create a similarity measure among two strings, these kernels are based on the longest common substrings; the idea behind this approach is to give more relevance to the largest common pieces of code rather than to small and disperse code instructions. The size of the valid matching substrings is controlled by the cut weight, a parameter given by the user, that specifies the minimum weight that those substrings should have. We tested our methodology in two scenarios: i) pattern recognition in I/O traces, and ii) comparison of intermediate representations of a compiler. In the first scenario, the clustering analysis showed that the proposed kernels managed to conform clusters that reflected the similarity of patterns taken from two popular I/O benchmarks. For the second scenario, a set of C functions was organized in four different classes, according to their purpose; clustering analysis here also showed a cluster organization that reflected the affinity among functions of the same class. These new kernels obtained similar, and in some cases, better results when compared to the blended spectrum kernel, a string kernel with widespread use in cheminformatics problems. The provided kernels will enrich the spectra of available string kernel functions on the literature and might be used in the future in similarity studies, not only in the field of computer science, but also in other areas like cheminformatics, bioinformatics or natural language processing.
URL:	https://ediss.sub.uni-hamburg.de/handle/ediss/7948
URN:	urn:nbn:de:gbv:18-94439
Dokumenttyp:	Dissertation
Betreuer*in:	Ludwig, Thomas (Prof. Dr.)
Enthalten in den Sammlungen:	Elektronische Dissertationen und Habilitationen

Dateien zu dieser Ressource:

Datei	Beschreibung	Prüfsumme	Größe	Format
Dissertation.pdf		d0fb0a7e16c044b20a453fd6736ced08	4.31 MB	Adobe PDF	Öffnen/Anzeigen

Zur Langanzeige

Diese Publikation steht in elektronischer Form im Internet bereit und kann gelesen werden. Über den freien Zugang hinaus wurden durch die Urheberin / den Urheber keine weiteren Rechte eingeräumt. Nutzungshandlungen (wie zum Beispiel der Download, das Bearbeiten, das Weiterverbreiten) sind daher nur im Rahmen der gesetzlichen Erlaubnisse des Urheberrechtsgesetzes (UrhG) erlaubt. Dies gilt für die Publikation sowie für ihre einzelnen Bestandteile, soweit nichts Anderes ausgewiesen ist.

Info

Seitenansichten

286

Letzte Woche

Letzten Monat

geprüft am 19.04.2024

Download(s)

211

Letzte Woche

Letzten Monat

geprüft am 19.04.2024

Werkzeuge

Google Scholar^TM

Prüfe

Dateien zu dieser Ressource:

Seitenansichten

Download(s)

Google ScholarTM

Google Scholar^TM