Global ETD Search

1	Active evaluation of predictive models Sawade, Christoph January 2012 (has links) The field of machine learning studies algorithms that infer predictive models from data. Predictive models are applicable for many practical tasks such as spam filtering, face and handwritten digit recognition, and personalized product recommendation. In general, they are used to predict a target label for a given data instance. In order to make an informed decision about the deployment of a predictive model, it is crucial to know the model’s approximate performance. To evaluate performance, a set of labeled test instances is required that is drawn from the distribution the model will be exposed to at application time. In many practical scenarios, unlabeled test instances are readily available, but the process of labeling them can be a time- and cost-intensive task and may involve a human expert. This thesis addresses the problem of evaluating a given predictive model accurately with minimal labeling effort. We study an active model evaluation process that selects certain instances of the data according to an instrumental sampling distribution and queries their labels. We derive sampling distributions that minimize estimation error with respect to different performance measures such as error rate, mean squared error, and F-measures. An analysis of the distribution that governs the estimator leads to confidence intervals, which indicate how precise the error estimation is. Labeling costs may vary across different instances depending on certain characteristics of the data. For instance, documents differ in their length, comprehensibility, and technical requirements; these attributes affect the time a human labeler needs to judge relevance or to assign topics. To address this, the sampling distribution is extended to incorporate instance-specific costs. We empirically study conditions under which the active evaluation processes are more accurate than a standard estimate that draws equally many instances from the test distribution. We also address the problem of comparing the risks of two predictive models. The standard approach would be to draw instances according to the test distribution, label the selected instances, and apply statistical tests to identify significant differences. Drawing instances according to an instrumental distribution affects the power of a statistical test. We derive a sampling procedure that maximizes test power when used to select instances, and thereby minimizes the likelihood of choosing the inferior model. Furthermore, we investigate the task of comparing several alternative models; the objective of an evaluation could be to rank the models according to the risk that they incur or to identify the model with lowest risk. An experimental study shows that the active procedure leads to higher test power than the standard test in many application domains. Finally, we study the problem of evaluating the performance of ranking functions, which are used for example for web search. In practice, ranking performance is estimated by applying a given ranking model to a representative set of test queries and manually assessing the relevance of all retrieved items for each query. We apply the concepts of active evaluation and active comparison to ranking functions and derive optimal sampling distributions for the commonly used performance measures Discounted Cumulative Gain and Expected Reciprocal Rank. Experiments on web search engine data illustrate significant reductions in labeling costs. / Maschinelles Lernen befasst sich mit Algorithmen zur Inferenz von Vorhersagemodelle aus komplexen Daten. Vorhersagemodelle sind Funktionen, die einer Eingabe – wie zum Beispiel dem Text einer E-Mail – ein anwendungsspezifisches Zielattribut – wie „Spam“ oder „Nicht-Spam“ – zuweisen. Sie finden Anwendung beim Filtern von Spam-Nachrichten, bei der Text- und Gesichtserkennung oder auch bei der personalisierten Empfehlung von Produkten. Um ein Modell in der Praxis einzusetzen, ist es notwendig, die Vorhersagequalität bezüglich der zukünftigen Anwendung zu schätzen. Für diese Evaluierung werden Instanzen des Eingaberaums benötigt, für die das zugehörige Zielattribut bekannt ist. Instanzen, wie E-Mails, Bilder oder das protokollierte Nutzerverhalten von Kunden, stehen häufig in großem Umfang zur Verfügung. Die Bestimmung der zugehörigen Zielattribute ist jedoch ein manueller Prozess, der kosten- und zeitaufwendig sein kann und mitunter spezielles Fachwissen erfordert. Ziel dieser Arbeit ist die genaue Schätzung der Vorhersagequalität eines gegebenen Modells mit einer minimalen Anzahl von Testinstanzen. Wir untersuchen aktive Evaluierungsprozesse, die mit Hilfe einer Wahrscheinlichkeitsverteilung Instanzen auswählen, für die das Zielattribut bestimmt wird. Die Vorhersagequalität kann anhand verschiedener Kriterien, wie der Fehlerrate, des mittleren quadratischen Verlusts oder des F-measures, bemessen werden. Wir leiten die Wahrscheinlichkeitsverteilungen her, die den Schätzfehler bezüglich eines gegebenen Maßes minimieren. Der verbleibende Schätzfehler lässt sich anhand von Konfidenzintervallen quantifizieren, die sich aus der Verteilung des Schätzers ergeben. In vielen Anwendungen bestimmen individuelle Eigenschaften der Instanzen die Kosten, die für die Bestimmung des Zielattributs anfallen. So unterscheiden sich Dokumente beispielsweise in der Textlänge und dem technischen Anspruch. Diese Eigenschaften beeinflussen die Zeit, die benötigt wird, mögliche Zielattribute wie das Thema oder die Relevanz zuzuweisen. Wir leiten unter Beachtung dieser instanzspezifischen Unterschiede die optimale Verteilung her. Die entwickelten Evaluierungsmethoden werden auf verschiedenen Datensätzen untersucht. Wir analysieren in diesem Zusammenhang Bedingungen, unter denen die aktive Evaluierung genauere Schätzungen liefert als der Standardansatz, bei dem Instanzen zufällig aus der Testverteilung gezogen werden. Eine verwandte Problemstellung ist der Vergleich von zwei Modellen. Um festzustellen, welches Modell in der Praxis eine höhere Vorhersagequalität aufweist, wird eine Menge von Testinstanzen ausgewählt und das zugehörige Zielattribut bestimmt. Ein anschließender statistischer Test erlaubt Aussagen über die Signifikanz der beobachteten Unterschiede. Die Teststärke hängt von der Verteilung ab, nach der die Instanzen ausgewählt wurden. Wir bestimmen die Verteilung, die die Teststärke maximiert und damit die Wahrscheinlichkeit minimiert, sich für das schlechtere Modell zu entscheiden. Des Weiteren geben wir eine Möglichkeit an, den entwickelten Ansatz für den Vergleich von mehreren Modellen zu verwenden. Wir zeigen empirisch, dass die aktive Evaluierungsmethode im Vergleich zur zufälligen Auswahl von Testinstanzen in vielen Anwendungen eine höhere Teststärke aufweist. Im letzten Teil der Arbeit werden das Konzept der aktiven Evaluierung und das des aktiven Modellvergleichs auf Rankingprobleme angewendet. Wir leiten die optimalen Verteilungen für das Schätzen der Qualitätsmaße Discounted Cumulative Gain und Expected Reciprocal Rank her. Eine empirische Studie zur Evaluierung von Suchmaschinen zeigt, dass die neu entwickelten Verfahren signifikant genauere Schätzungen der Rankingqualität liefern als die untersuchten Referenzverfahren. Aktive Evaluierung Vorhersagemodelle Maschinelles Lernen Fehlerschätzung Statistische Tests Active Evaluation Predictive Models Machine Learning Error Estimation Statistical Tests Data processing Computer science
2	AI-Based Transport Mode Recognition for Transportation Planning Utilizing Smartphone Sensor Data From Crowdsensing Campaigns Grubitzsch, Philipp, Werner, Elias, Matusek, Daniel, Stojanov, Viktor, Hähnel, Markus 11 May 2023 (has links) Utilizing smartphone sensor data from crowdsen-sing (CS) campaigns for transportation planning (TP) requires highly reliable transport mode recognition. To address this, we present our RNN-based AI model MovDeep, which works on GPS, accelerometer, magnetometer and gyroscope data. It was trained on 92 hours of labeled data. MovDeep predicts six transportation modes (TM) on one second time windows. A novel postprocessing further improves the prediction results. We present a validation methodology (VM), which simulates unknown context, to get a more realistic estimation of the real-world performance (RWP). We explain why existing work shows overestimated prediction qualities, when they would be used on CS data and why their results are not comparable with each other. With the introduced VM, MovDeep still achieves 99.3 % F1 -Score on six TM. We confirm the very good RWP for our model on unknown context with the Sussex-Huawei Locomotion data set. For future model comparison, both publicly available data sets can be used with our VM. In the end, we compare MovDeep to a deterministic approach as a baseline for an average performing model (82 - 88 % RWP Recall) on a CS data set of 540 k tracks, to show the significant negative impact of even small prediction errors on TP. info:eu-repo/classification/ddc/004 ddc:004
3	Forecasting the data cube Lehner, Wolfgang, Fischer, Ulrike, Schildt, Christopher, Hartmann, Claudio 12 January 2023 (has links) Forecasting time series data is crucial in a number of domains such as supply chain management and display advertisement. In these areas, the time series data to forecast is typically organized along multiple dimensions leading to a high number of time series that need to be forecasted. Most current approaches focus only on selection and optimizing a forecast model for a single time series. In this paper, we explore how we can utilize time series at different dimensions to increase forecast accuracy and, optionally, reduce model maintenance overhead. Solving this problem is challenging due to the large space of possibilities and possible high model creation costs. We propose a model configuration advisor that automatically determines the best set of models, a model configuration, for a given multi-dimensional data set. Our approach is based on a general process that iteratively examines more and more models and simultaneously controls the search space depending on the data set, model type and available hardware. The final model configuration is integrated into F2DB, an extension of PostgreSQL, that processes forecast queries and maintains the configuration as new data arrives. We comprehensively evaluated our approach on real and synthetic data sets. The evaluation shows that our approach significantly increases forecast query accuracy while ensuring low model costs. info:eu-repo/classification/ddc/004 ddc:004
4	Exploiting big data in time series forecasting: A cross-sectional approach Lehner, Wolfgang, Hartmann, Claudio, Hahmann, Martin, Rosenthal, Frank 12 January 2023 (has links) Forecasting time series data is an integral component for management, planning and decision making. Following the Big Data trend, large amounts of time series data are available from many heterogeneous data sources in more and more applications domains. The highly dynamic and often fluctuating character of these domains in combination with the logistic problems of collecting such data from a variety of sources, imposes new challenges to forecasting. Traditional approaches heavily rely on extensive and complete historical data to build time series models and are thus no longer applicable if time series are short or, even more important, intermittent. In addition, large numbers of time series have to be forecasted on different aggregation levels with preferably low latency, while forecast accuracy should remain high. This is almost impossible, when keeping the traditional focus on creating one forecast model for each individual time series. In this paper we tackle these challenges by presenting a novel forecasting approach called cross-sectional forecasting. This method is especially designed for Big Data sets with a multitude of time series. Our approach breaks with existing concepts by creating only one model for a whole set of time series and requiring only a fraction of the available data to provide accurate forecasts. By utilizing available data from all time series of a data set, missing values can be compensated and accurate forecasting results can be calculated quickly on arbitrary aggregation levels. info:eu-repo/classification/ddc/005 ddc:005
5	F2DB: The Flash-Forward Database System Lehner, Wolfgang, Fischer, Ulrike, Rosenthal, Frank 29 November 2022 (has links) Forecasts are important to decision-making and risk assessment in many domains. Since current database systems do not provide integrated support for forecasting, it is usually done outside the database system by specially trained experts using forecast models. However, integrating model-based forecasting as a first-class citizen inside a DBMS speeds up the forecasting process by avoiding exporting the data and by applying database-related optimizations like reusing created forecast models. It especially allows subsequent processing of forecast results inside the database. In this demo, we present our prototype F2DB based on PostgreSQL, which allows for transparent processing of forecast queries. Our system automatically takes care of model maintenance when the underlying dataset changes. In addition, we offer optimizations to save maintenance costs and increase accuracy by using derivation schemes for multidimensional data. Our approach reduces the required expert knowledge by enabling arbitrary users to apply forecasting in a declarative way. info:eu-repo/classification/ddc/005 ddc:005

1

Page generated in 0.0683 seconds