Global ETD Search

1	Analyse von webbasierten eGovernment-Anwendungen hinsichtlich der Optimierung von Suchmechanismen mit Methoden der Automatischen Sprachverarbeitung Mairif, Patrick 20 October 2017 (has links) Als Vertreter einer webbasierten eGovernment-Anwendung wird die Website der Stadt Leipzig in Hinblick einer möglichen Optimierung der Suchmechanismen analysiert. Dazu wird zu Beginn die Ausgangssituation auf vorhandene Daten und existierende Probleme untersucht. Es wird eine These bzgl. des unterschiedlichen Sprachgebrauchs auf Redaktions- und auf Nutzerseite formuliert. Der Analyse liegen die Dokumente der Website, sowie Suchanfragen der Nutzer zu Grunde. Aus den Dokumenten werden Fachbegriffe mit Hilfe des ConceptComposers extrahiert und diese zusätzlich mit den Suchanfragen verglichen. Andersherum werden auch die Suchanfragen mit den Dokumenten der Website verglichen und spezifische Begriffe des Sprachgebrauchs der Nutzer ermittelt. Während der Analyse wird eingehend auf das Thema Mehrwortbegriffe eingegangen, sowie verschiedene Methoden der Automatischen Sprachverarbeitung beleuchtet. Im direkten Zusammenhang mit der Analyse sind Werkzeuge entstanden, die es ermöglichen die Analyse in andere Umgebungen zu portieren. Es ist ein einfaches Framework für die Integration von Verfahren zur Berechnung von Synonymen entstanden, das auf die gewonnen Daten aufsetzt und es werden Wege für die Generierung eines Wörterbuches 'Amt-Bürgersprache' aufgewiesen. info:eu-repo/classification/ddc/000 ddc:000
2	Time Dynamic Topic Models Jähnichen, Patrick 30 March 2016 (has links) (PDF) Information extraction from large corpora can be a useful tool for many applications in industry and academia. For instance, political communication science has just recently begun to use the opportunities that come with the availability of massive amounts of information available through the Internet and the computational tools that natural language processing can provide. We give a linguistically motivated interpretation of topic modeling, a state-of-the-art algorithm for extracting latent semantic sets of words from large text corpora, and extend this interpretation to cover issues and issue-cycles as theoretical constructs coming from political communication science. We build on a dynamic topic model, a model whose semantic sets of words are allowed to evolve over time governed by a Brownian motion stochastic process and apply a new form of analysis to its result. Generally this analysis is based on the notion of volatility as in the rate of change of stocks or derivatives known from econometrics. We claim that the rate of change of sets of semantically related words can be interpreted as issue-cycles, the word sets as describing the underlying issue. Generalizing over the existing work, we introduce dynamic topic models that are driven by general (Brownian motion is a special case of our model) Gaussian processes, a family of stochastic processes defined by the function that determines their covariance structure. We use the above assumption and apply a certain class of covariance functions to allow for an appropriate rate of change in word sets while preserving the semantic relatedness among words. Applying our findings to a large newspaper data set, the New York Times Annotated corpus (all articles between 1987 and 2007), we are able to identify sub-topics in time, \\\\textit{time-localized topics} and find patterns in their behavior over time. However, we have to drop the assumption of semantic relatedness over all available time for any one topic. Time-localized topics are consistent in themselves but do not necessarily share semantic meaning between each other. They can, however, be interpreted to capture the notion of issues and their behavior that of issue-cycles. Topic Modelle maschinelles Lernen Bayes Modelle Automatische Sprachverarbeitung Topic Models Machine Learning Bayesian Models Time Series Analysis Natural Language Processing ddc:500
3	Time Dynamic Topic Models Jähnichen, Patrick 22 March 2016 (has links) Information extraction from large corpora can be a useful tool for many applications in industry and academia. For instance, political communication science has just recently begun to use the opportunities that come with the availability of massive amounts of information available through the Internet and the computational tools that natural language processing can provide. We give a linguistically motivated interpretation of topic modeling, a state-of-the-art algorithm for extracting latent semantic sets of words from large text corpora, and extend this interpretation to cover issues and issue-cycles as theoretical constructs coming from political communication science. We build on a dynamic topic model, a model whose semantic sets of words are allowed to evolve over time governed by a Brownian motion stochastic process and apply a new form of analysis to its result. Generally this analysis is based on the notion of volatility as in the rate of change of stocks or derivatives known from econometrics. We claim that the rate of change of sets of semantically related words can be interpreted as issue-cycles, the word sets as describing the underlying issue. Generalizing over the existing work, we introduce dynamic topic models that are driven by general (Brownian motion is a special case of our model) Gaussian processes, a family of stochastic processes defined by the function that determines their covariance structure. We use the above assumption and apply a certain class of covariance functions to allow for an appropriate rate of change in word sets while preserving the semantic relatedness among words. Applying our findings to a large newspaper data set, the New York Times Annotated corpus (all articles between 1987 and 2007), we are able to identify sub-topics in time, \\\\textit{time-localized topics} and find patterns in their behavior over time. However, we have to drop the assumption of semantic relatedness over all available time for any one topic. Time-localized topics are consistent in themselves but do not necessarily share semantic meaning between each other. They can, however, be interpreted to capture the notion of issues and their behavior that of issue-cycles. info:eu-repo/classification/ddc/500 ddc:500

1

Page generated in 0.0857 seconds