Global ETD Search

41	Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial Liu, Xuefeng, Daniels, Michael J., Marcus, Bess 01 June 2009 (has links) Joint models for the association of a longitudinal binary and a longitudinal continuous process are proposed for situations in which their association is of direct interest. The models are parameterized such that the dependence between the two processes is characterized by unconstrained regression coefficients. Bayesian variable selection techniques are used to parsimoniously model these coefficients. A Markov chain Monte Carlo (MCMC) sampling algorithm is developed for sampling from the posterior distribution, using data augmentation steps to handle missing data. Several technical issues are addressed to implement the MCMC algorithm efficiently. The models are motivated by, and are used for, the analysis of a smoking cessation clinical trial in which an important question of interest was the effect of the (exercise) treatment on the relationship between smoking cessation and weight gain. calibrated posterior predictive p-value data augmentation dependence joint models Markov chain Monte Carlo parameter expansion stochastic search variable selection
42	Entwicklung eines Monte-Carlo-Verfahrens zum selbständigen Lernen von Gauß-Mischverteilungen Lauer, Martin 03 March 2005 (has links) In der Arbeit wird ein neuartiges Lernverfahren für Gauß-Mischverteilungen entwickelt. Es basiert auf der Technik der Markov-Chain Monte-Carlo Verfahren und ist in der Lage, in einem Zuge die Größe der Mischverteilung sowie deren Parameter zu bestimmen. Das Verfahren zeichnet sich sowohl durch eine gute Anpassung an die Trainingsdaten als auch durch eine gute Generalisierungsleistung aus. Ausgehend von einer Beschreibung der stochastischen Grundlagen und einer Analyse der Probleme, die beim Lernen von Gauß-Mischverteilungen auftreten, wird in der Abeit das neue Lernverfahren schrittweise entwickelt und seine Eigenschaften untersucht. Ein experimenteller Vergleich mit bekannten Lernverfahren für Gauß-Mischverteilungen weist die Eignung des neuen Verfahrens auch empirisch nach. Gaussian mixture models Markov-Chain Monte-Carlo Data Augmentation unsupervised learning 54.72 - Künstliche Intelligenz 28 - Informatik, Datenverarbeitung ddc:004
43	Statistical Inference for Multivariate Stochastic Differential Equations Liu, Ge 15 November 2019 (has links) No description available. Statistics data imputation Bayesian data augmentation method Bayesian MCMC pseudo marginal MCMC stochastic process
44	Bayesian Regression Trees for Count Data: Models and Methods Geels, Vincent M. 27 September 2022 (has links) No description available. Statistics Discrete state spaces Markov chain Monte Carlo regression trees count data data augmentation Bayesian statistics response variable transformation
45	Investigation of Green Strawberry Detection Using R-CNN with Various Architectures Rivers, Daniel W 01 March 2022 (has links) (PDF) Traditional image processing solutions have been applied in the past to detect and count strawberries. These methods typically involve feature extraction followed by object detection using one or more features. Some object detection problems can be ambiguous as to what features are relevant and the solutions to many problems are only fully realized when the modern approach has been applied and tested, such as deep learning. In this work, we investigate the use of R-CNN for green strawberry detection. The object detection involves finding regions of interest (ROIs) in field images using the selective segmentation algorithm and inputting these regions into a pre-trained deep neural network (DNN) model. The convolutional neural networks VGG, MobileNet and ResNet were implemented to detect subtle differences between green strawberries and various background elements. Downscaling factors, intersection over union (IOU) thresholds and non-maxima suppression (NMS) values can be tweaked to increase recall and reduce false positives while data augmentation and negative hardminging can be used to increase the amount of input data. The state of the art model is sufficient in locating the green strawberries with an overall model accuracy of 74%. The R-CNN model can then be used for crop yield prediction to forecast the actual red strawberry count one week in advance with a 90% accuracy. Deep Learning Image Processing Selective Segmentation Data Augmentation Crop Yield Prediction Small Fruit Detection
46	Club Head Tracking : Visualizing the Golf Swing with Machine Learning Herbai, Fredrik January 2023 (has links) During the broadcast of a golf tournament, a way to show the audience what a player's swing looks like would be to draw a trace following the movement of the club head. A computer vision model can be trained to identify the position of the club head in an image, but due to the high speed at which professional players swing their clubs coupled with the low frame rate of a typical broadcast camera, the club head is not discernible whatsoever in most frames. This means that the computer vision model is only able to deliver a few sparse detections of the club head. This thesis project aims to develop a machine learning model that can predict the complete motion of the club head, in the form of a swing trace, based on the sparse club head detections. Slow motion videos of golf swings are collected, and the club head's position is annotated manually in each frame. From these annotations, relevant data to describe the club head's motion, such as position and time parameters, is extracted and used to train the machine learning models. The dataset contains 256 annotated swings of professional and competent amateur golfers. The two models that are implemented in this project are XGBoost and a feed forward neural network. The input given to the models only contains information in specific parts of the swing to mimic the pattern of the sparse detections. Both models learned the underlying physics of the golf swing, and the quality of the predicted traces depends heavily on the amount of information provided in the input. In order to produce good predictions with only the amount of input information that can be expected from the computer vision model, a lot more training data is required. The traces predicted by the neural network are significantly smoother and thus look more realistic than the predictions made by the XGBoost model. Golf Machine learning Neural network XGBoost Interpolation Deep learning Data collection Data augmentation Computer Sciences Datavetenskap (datalogi)
47	Techniques for Multilingual Document Retrieval for Open-Domain Question Answering : Using hard negatives filtering, binary retrieval and data augmentation / Tekniker för flerspråkig dokumenthämtning för OpenQA : Använder hård negativ filtrering, binär sökning och dataförstärkning Lago Solas, Carlos January 2022 (has links) Open Domain Question Answering (OpenQA) systems find an answer to a question from a large collection of unstructured documents. In this information era, we have an immense amount of data at our disposal. However, filtering all the content and trying to find the answers to our questions can be too time-consuming and ffdiicult. In addition, in such a globalised world, the information we look for to answer a question may be in a different language. Current research is focused on improving monolingual (English) OpenQA performance. This creates a disparity between the tools accessible between English and non-English speakers. The techniques explored in this study involve the combination of different methods, such as data augmentation and hard negative filtering for performance increase, and binary embeddings for improving the efficiency, with multilingual Transformers. The downstream performance is evaluated using sentiment multilingual datasets covering Cross-Lingual Transfer (XLT), question and answer in the same language, and Generalised Cross-Lingual Transfer (G-XLT), different languages for question and answer. The results show that data augmentation increased Recall by 37.0% and Mean Average Precision (MAP) by 67.0% using languages absent from the test set for XLT. Combining binary embeddings and hard negatives can reduce inference time and index size to 12.5% and 3.1% of the original, retaining 97.1% of the original Recall and 94.8% of MAP (averages of XLT and MAP). / Open Domain Question Answering (OpenQA)-system hittar svar på frågor till stora samlingar av ostrukturerade dokument. I denna informationsepok har vi en enorm mängd kunskap till vårt förfogande. Att filtrera allt innehåll för att försöka att hitta svar på våra frågor kan dock vara mycket tidskrävande och svårt. I en globaliserad värld kan informationen vi söker för att besvara en fråga dessutom vara på ett annat språk. Nuvarande forskning är primärt inriktad på att förbättra OpenQA:s enspråkiga (engelska) prestanda. Detta skapar ett gap mellan de verktyg som är tillgängliga för engelsktalande och icke-engelsktalande personer. De tekniker som undersöks i den här studien innebär en kombination av olika metoder, t.ex. dataförstärkning och hård negativ filtrering för att öka prestandan, och binära embeddings för att förbättra effektiviteten med flerspråkiga Transformatorer. Prestandan nedströms utvärderas med hjälp av flerspråkiga dataset som omfattar Cross-Lingual Transfer (XLT), fråga och svar på samma språk, och Generalised Cross-Lingual Transfer (G-XLT), olika språk för fråga och svar. Resultaten visar att dataförstärkning ökade recall med 37.0% och 67.0% för Mean Average Precision (MAP) med hjälp av språk som inte fanns med i testuppsättningen för XLT. Genom att kombinera binära embeddings och hårda negationer kan man minska tiden för inferens och indexstorleken till 12.5% och 3.1% av originalet, samtidigt som man behåller 97.1% av ursprunglig recall samt 94.8% av MAP (medelvärden av XLT och MAP). OpenQA Multilingual Transformers Document retrieval Data augmentation. OpenQA Flerspråkiga Transformatorer Dokumenthämtning Dataförstärkning. Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
48	Data augmentation for latent variables in marketing Kao, Ling-Jing 13 September 2006 (has links) No description available. Business Administration, Marketing marketing Bayesian statistics data augmentation state-space models choice models consumer preference change media effects
49	Explainable AI in Eye Tracking / Förklarbar AI inom ögonspårning Liu, Yuru January 2024 (has links) This thesis delves into eye tracking, a technique for estimating an individual’s point of gaze and understanding human interactions with the environment. A blossoming area within eye tracking is appearance-based eye tracking, which leverages deep neural networks to predict gaze positions from eye images. Despite its efficacy, the decision-making processes inherent in deep neural networks remain as ’black boxes’ to humans. This lack of transparency challenges the trust human professionals place in the predictions of appearance-based eye tracking models. To address this issue, explainable AI is introduced, aiming to unveil the decision-making processes of deep neural networks and render them comprehensible to humans. This thesis employs various post-hoc explainable AI methods, including saliency maps, gradient-weighted class activation mapping, and guided backpropagation, to generate heat maps of eye images. These heat maps reveal discriminative areas pivotal to the model’s gaze predictions, and glints emerge as of paramount importance. To explore additional features in gaze estimation, a glint-free dataset is derived from the original glint-preserved dataset by employing blob detection to eliminate glints from each eye image. A corresponding glint-free model is trained on this dataset. Cross-evaluations of the two datasets and models discover that the glint-free model extracts complementary features (pupil, iris, and eyelids) to the glint-preserved model (glints), with both feature sets exhibiting comparable intensities in heat maps. To make use of all the features, an augmented dataset is constructed, incorporating selected samples from both glint-preserved and glint-free datasets. An augmented model is then trained on this dataset, demonstrating a superior performance compared to both glint-preserved and glint-free models. The augmented model excels due to its training process on a diverse set of glint-preserved and glint-free samples: it prioritizes glints when of high quality, and adjusts the focus to the entire eye in the presence of poor glint quality. This exploration enhances the understanding of the critical factors influencing gaze prediction and contributes to the development of more robust and interpretable appearance-based eye tracking models. / Denna avhandling handlar om ögonspårning, en teknik för att uppskatta en individs blickpunkt och förstå människors interaktioner med miljön. Ett viktigt område inom ögonspårning är bildbaserad ögonspårning, som utnyttjar djupa neuronnät för att förutsäga blickpositioner från ögonbilder. Trots dess effektivitet förblir beslutsprocesserna i djupa neuronnät som ”svarta lådor” för människor. Denna brist på transparens utmanar det förtroende som yrkesverksamma sätter i förutsägelserna från bildbaserade ögonspårningsmodeller. För att ta itu med detta problem introduceras förklarbar AI, med målet att avslöja beslutsprocesserna hos djupa neuronnät och göra dem begripliga för människor. Denna avhandling använder olika efterhandsmetoder för förklarbar AI, inklusive saliency maps, gradient-weighted class activation mapping och guidad backpropagation, för att generera värmekartor av ögonbilder. Dessa värmekartor avslöjar områden som är avgörande för modellens blickförutsägelser, och ögonblänk framstår som av yttersta vikt. För att utforska ytterligare funktioner i blickuppskattning, härleds ett dataset utan ögonblänk från det ursprungliga datasetet genom att använda blobdetektering för att eliminera blänk från varje ögonbild. En motsvarande blänkfri modell tränas på detta dataset. Korsutvärderingar av de två datamängderna och modellerna visar att den blänkfria modellen tar fasta på kompletterande särdrag (pupill, iris och ögonlock) jämfört med den blänkbevarade modellen, men båda modellerna visar jämförbara intensiteter i värmekartorna. För att utnyttja all information konstrueras ett förstärkt dataset, som inkorporerar utvalda exempel från både blänkbevarade och blänkfria dataset. En förstärkt modell tränas sedan på detta dataset, och visar överlägsen prestanda jämfört med de båda andra modellerna. Den förstärkta modellen utmärker sig på grund av sin träning på en mångfaldig uppsättning av exempel med och utan blänk: den prioriterar blänk när de är av hög kvalitet och justerar fokuset till hela ögat vid dålig blänkkvalitet. Detta arbete förbättrar förståelsen för de kritiska faktorerna som påverkar blickförutsägelse och bidrar till utvecklingen av mer robusta och tolkningsbara modeller för bildbaserad ögonspårning. Eye Tracking Explainable AI Post-hoc Explanation Data Augmentation Ögonspårning Förklarbar AI Efterhandsmetoder Datatillväxt Computer and Information Sciences Data- och informationsvetenskap
50	Data Quality Evaluation and Improvement for Machine Learning Chen, Haihua 05 1900 (has links) In this research the focus is on data-centric AI with a specific concentration on data quality evaluation and improvement for machine learning. We first present a practical framework for data quality evaluation and improvement, using a legal domain as a case study and build a corpus for legal argument mining. We first created an initial corpus with 4,937 instances that were manually labeled. We define five data quality evaluation dimensions: comprehensiveness, correctness, variety, class imbalance, and duplication, and conducted a quantitative evaluation on these dimensions for the legal dataset and two existing datasets in the medical domain for medical concept normalization. The first group of experiments showed that class imbalance and insufficient training data are the two major data quality issues that negatively impacted the quality of the system that was built on the legal corpus. The second group of experiments showed that the overlap between the test datasets and the training datasets, which we defined as "duplication," is the major data quality issue for the two medical corpora. We explore several widely used machine learning methods for data quality improvement. Compared to pseudo-labeling, co-training, and expectation-maximization (EM), generative adversarial network (GAN) is more effective for automated data augmentation, especially when a small portion of labeled data and a large amount of unlabeled data is available. The data validation process, the performance improvement strategy, and the machine learning framework for data evaluation and improvement discussed in this dissertation can be used by machine learning researchers and practitioners to build high-performance machine learning systems. All the materials including the data, code, and results will be released at: https://github.com/haihua0913/dissertation-dqei. Data quality data-centric AI machine learning data augmentation transfer learning semi-supervised learning medical concept normalization legal text classification Information Science

Search results