Global ETD Search

231	Enhancement of Random Forests Using Trees with Oblique Splits Parfionovas, Andrejus 01 May 2013 (has links) This work presents an enhancement to the classification tree algorithm which forms the basis for Random Forests. Differently from the classical tree-based methods that focus on one variable at a time to separate the observations, the new algorithm performs the search for the best split in two-dimensional space using a linear combination of variables. Besides the classification, the method can be used to determine variables interaction and perform feature extraction. Theoretical investigations and numerical simulations were used to analyze the properties and performance of the new approach. Comparison with other popular classification methods was performed using simulated and real data examples. The algorithm was implemented as an extension package for the statistical computing environment R and is available for free download under the GNU General Public License. Classification Machine Learning Statistics and Probability
232	Kombinierte Optimierung für diskontinuierliche Produktion mit nicht definierten Qualitätskriterium Schulz, Thomas, Nekrasov, Ivan 27 January 2022 (has links) Diese Arbeit beschäftigt sich mit einem realen Fall der Chargenproduktion aus der pharmazeutischen Industrie. Das in der Untersuchung betrachtete Problem liegt im Bereich der Optimierung der Chargenqualität und der Minimierung des Ausschusses unter der Gegebenheit, dass die entsprechenden Qualitätsparameter im Unternehmenssteuerungssystem nicht gemessen werden. Die in dieser Arbeit vorgeschlagene Technik führt ein virtuelles Qualitätskriterium ein, das für jede der Chargen angewendet wird, basierend auf dem beschränkten Wissen der Anwender, welche Charge als optimale Charge (auch Golden Batch bezeichnet) betrachtet werden kann und somit als Referenz für die aktuell in Produktion befindliche Charge verwendet werden kann. Zu diesem Zweck verwenden wir das klassische integrale Leistungskriterium, das in der Theorie der optimalen Steuerung dynamischer Systeme weit verbreitet ist, um zu messen, wie weit der aktuelle Zustand des Systems vom 'optimalen' Punkt entfernt ist. Mit Hilfe der beschriebenen Technologie, die aus der genannten Nachbardisziplin stammt, waren wir in der Lage, die Qualität jeder Charge als ein kontinuierliches Messverhältnis zu quantifizieren, was uns erlaubte, mehrere effiziente kontinuierliche Analysetechniken für diesen anfänglichen Chargenproduktionsfall zu verwenden.
233	A Machine Learning Based Visible Light Communication Model Leveraging Complementary Color Channel Jiang, Ruizhe 08 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Recently witnessed a great popularity of unobtrusive Visible Light Communication (VLC) using screen-camera channels. They overcomes the inherent drawbacks of traditional approaches based on coded images like bar codes. One popular unobtrusive method is the utilizing of alpha channel or color channels to encode bits into the pixel translucency or color intensity changes with over-the-shelf smart devices. Specifically, Uber-in-light proves to be an successful model encoding data into the color intensity changes that only requires over-the-shelf devices. However, Uber-in-light only exploit Multi Frequency Shift Keying (MFSK), which limits the overall throughput of the system since each data segment is only 3-digit long. Motivated by some previous works like Inframe++ or Uber-in-light, in this thesis, we proposes a new VLC model encoding data into color intensity changes on red and blue channels of video frames. Multi-Phase-Shift-Keying (MPSK) along with MFSK are used to match 4-digit and 5-digit long data segments to specific transmission frequencies and phases. To ensure the transmission accuracy, a modified correlation-based demodulation method and two learning-based methods using SVM and Random Forest are also developed. Visible Light Communication Machine Learning
234	Anthrax Event Detection: Analysis of Public Opinion Using Twitter During Anthrax Scares, The Mueller Investigation, and North Korean Threats Miller, Michele E. January 2020 (has links) No description available. Biology anthrax CBRNe machine learning
235	Application of pattern recognition and adaptive DSP methods for spatio-temporal analysis of satellite based hydrological datasets Turlapaty, Anish Chand 01 May 2010 (has links) Data assimilation of satellite-based observations of hydrological variables with full numerical physics models can be used to downscale these observations from coarse to high resolution to improve microwave sensor-based soil moisture observations. Moreover, assimilation can also be used to predict related hydrological variables, e.g., precipitation products can be assimilated in a land information system to estimate soil moisture. High quality spatio-temporal observations of these processes are vital for a successful assimilation which in turn needs a detailed analysis and improvement. In this research, pattern recognition and adaptive signal processing methods are developed for the spatio-temporal analysis and enhancement of soil moisture and precipitation datasets. These methods are applied to accomplish the following tasks: (i) a consistency analysis of level-3 soil moisture data from the Advanced Microwave Scanning Radiometer – EOS (AMSR-E) against in-situ soil moisture measurements from the USDA Soil Climate Analysis Network (SCAN). This method performs a consistency assessment of the entire time series in relation to others and provides a spatial distribution of consistency levels. The methodology is based on a combination of wavelet-based feature extraction and oneclass support vector machines (SVM) classifier. Spatial distribution of consistency levels are presented as consistency maps for a region, including the states of Mississippi, Arkansas, and Louisiana. These results are well correlated with the spatial distributions of average soil moisture, and the cumulative counts of dense vegetation; (ii) a modified singular spectral analysis based interpolation scheme is developed and validated on a few geophysical data products including GODAE’s high resolution sea surface temperature (GHRSST). This method is later employed to fill the systematic gaps in level-3 AMSR-E soil moisture dataset; (iii) a combination of artificial neural networks and vector space transformation function is used to fuse several high resolution precipitation products (HRPP). The final merged product is statistically superior to any of the individual datasets over a seasonal period. The results have been tested against ground based measurements of rainfall over our study area and average accuracies obtained are 85% in the summer and 55% in the winter 2007. classification consistency analysis machine learning
236	A Strategy Oriented, Machine Learning Approach to Automatic Quality Assessment of Wikipedia Articles De La Calzada, Gabriel 01 April 2009 (has links) (PDF) This work discusses an approach to modeling and measuring information quality of Wikipedia articles. The approach is based on the idea that the quality of Wikipedia articles with distinctly diﬀerent proﬁles needs to be measured using diﬀerent information quality models. To implement this approach, a software framework written in the Java language was developed to collect and analyze information of Wikipedia articles. We report on our initial study, which involved two categories of Wikipedia articles: ”stabilized” (those, whose content has not undergone major changes for a signiﬁcant period of time) and ”controversial” (articles that have undergone vandalism, revert wars, or whose content is subject to internal discussions between Wikipedia editors). In addition, we present simple information quality models and compare their performance on a subset of Wikipedia articles with the information quality evaluations provided by human users. Our experiment shows that using special-purpose models for information quality captures user sentiment about Wikipedia articles better than using a single model for both categories of articles. wikipedia quality assessment machine learning
237	Towards General Mental Health Biomarkers : Machine Learning Analysis of Multi-Disorder EEG Data Talekar, Akshay 17 April 2023 (has links) Several studies have made use of EEG features to detect specific mental health illnesses such as epilepsy or schizophrenia, as supplementary diagnosis to the usual symptom-based diagnoses. At the same time general mental health diagnostic tools (biomarker or symptom-based) to identify individuals who are manifesting early signs of mental health disorders are not commonly available. This thesis seeks to explore the potential use of EEG features as a biomarker-based tool for general mental health diagnosis. Specifically, the predictive ability using machine learning of a general biomarker derived from EEG readings elicited from an oddball auditory experiment to predict someone’s mental health status (mentally ill or healthy) is investigated in this study. Given that mindfulness exercises are regularly provided as treatment for a wide range of mental illnesses, the features of interest seek to quantify it as a measure of mental health. The 2 feature sets developed and tested in this study were collected from a traumatic brain injury (TBI) and healthy controls dataset. Further testing of these feature sets was done on the Bipolar and Schizophrenia Network on Intermediate Phenotypes (BSNIP) dataset containing multiple mental illnesses and healthy controls to test the features for generalizability. Feature Set 1 consisted of the average and variance of P300 and N200 ERP component peak amplitudes and latencies across the centroparietal and fronto-central EEG channels respectively. Feature Set 2 contains the average and variance of P300 and N200 ERP component mean amplitudes across the centro-parietal andfronto-central EEG channels respectively. The predictive ability of these 2 feature sets was tested. Logistic regression, support vector machines, decision trees, random forests, KNN classification algorithms were used, and random forest and KNN were used in combination with oversampling to predict the mental health status of the subjects (whether they were cases or healthy controls). The model performance was tested using accuracy, precision, sensitivity, specificity, f1 score, confusion matrices, and AUC of the ROC. The results of this thesis show promise on the use of EEG features as biomarkers to diagnose mental illnesses or to get a better understanding of mental wellness. The use of this technology opens doors for more accurate, biomarker-based diagnosis of mental health conditions, lowering the cost of mental health care, and making mental health care accessible for more people. EEG Machine Learning Mental Illness
238	<strong>MODELING ACUTE CARE UTILIZATION FOR INSOMNIA PATIENTS </strong> Zitong Zhu (16629747) 30 August 2023 (has links) <p> </p> <p>Machine learning (ML) models can help improve health care services. However, they need to be practical to gain wide adoption. A methodology is proposed in this study to evaluate the utility of different data modalities and cohort segmentation strategies when designing these models. The methodology is used to compare models that predict emergency department (ED) and inpatient hospital (IH) visits. The data modalities include socio-demographics, diagnosis and medications and cohort segmentation is based on age group and disease severity. The proposed methodology is applied to models developed using a cohort of insomnia patients and a cohort of general non- insomnia patients under different data modalities and segmentation strategies. All models are evaluated using the traditional intra-cohort testing. In addition, to establish the need for disease- specific segmentation, transfer testing is recommended where the same insomnia test patients used for intra-cohort testing are submitted to the general-patient model. The results indicate that using both diagnosis and medications as a source of data does not generally improve model performance and may increase its overhead. For insomnia patients, the best ED and IH models using both data modalities or either one of the modalities achieved an area under the receiver operating curve (AUC) of 0.71 and 78, respectively. Our results also show that an insomnia-specific model is not necessary when predicting future ED visits but may have merit when predicting IH visits. As such, we recommend the evaluation of disease-specific models using transfer testing. Based on these initial findings, a language model was pretrained using diagnosis codes. This model can be used for the prediction of future ED and IH visits for insomnia and non-insomnia patients. </p> Applications in health insomnia machine learning
239	On the Use of the Kantorovich-Rubinstein Distance for Dimensionality Reduction Giordano, Gaël 13 September 2023 (has links) The goal of this thesis is to study the use of the Kantorovich-Rubinstein distance as to build a descriptor of sample complexity in classification problems. The idea is to use the fact that the Kantorovich-Rubinstein distance is a metric in the space of measures that also takes into account the geometry and topology of the underlying metric space. We associate to each class of points a measure and thus study the geometrical information that we can obtain from the Kantorovich-Rubinstein distance between those measures. We show that a large Kantorovich-Rubinstein distance between those measures allows to conclude that there exists a 1-Lipschitz classifier that classifies well the classes of points. We also discuss the limitation of the Kantorovich-Rubinstein distance as a descriptor. Machine Learning Kantorovich-Rubinstein distance
240	Design and Maintenance of Event Forecasting Systems Muthiah, Sathappan 26 March 2021 (has links) With significant growth in modern forms of communication such as social media and micro- blogs we are able to gain a real-time understanding into events happening in many parts of the world. In addition, these modern forms of communication have helped shed light into the increasing instabilities across the world via the design of anticipatory intelligence systems [45, 43, 20] that can forecast population level events like civil unrest, disease occurrences with reasonable accuracy. Event forecasting systems are generally prone to become outdated (model drift) as they fail to keep-up with constantly changing patterns and thus require regular re-training in order to sustain their accuracy and reliability. In this dissertation we try to address some of the issues associated with design and maintenance of event forecasting systems in general. We propose and showcase performance results for a drift adaptation technique in event forecasting systems and also build a hybrid system for event coding which is cognizant of and seeks human intervention in uncertain prediction contexts to maintain a good balance between prediction-fidelity and cost of human effort. Specifically we identify several micro-tasks for event coding and build separate pipelines for each with uncertainty estimation capabilities and thereby be able to seek human feedback whenever required for each micro-task independent of the rest. / Doctor of Philosophy / Event forecasting systems help reduce violence, loss/damage to humans and property. They find applicability in supply chain management, prioritizing citizen grievances, designing mea- sures to control violence and minimize disruptions and also in applications like health/tourism by providing timely travel alerts. Several issues exist with the design and maintenance of such event forecasting systems in general. Predictions from such systems may drift away from ground reality over time if not adapted to various shifts (or changes) in event occurrence patterns in real-time. A continuous source of ground-truth events is of paramount necessity for the continuous maintenance of forecasting systems. However ground-truth events used for training may not be reliable but often information about their uncertainty is not reflected in the systems that are used to build the ground truth. This dissertation focuses on addressing such issues pertaining to design and maintenance of event forecasting systems. We propose a framework for online drift-adaptation and also build machine learning methods capable of modeling and capturing uncertainty in event detection systems. Finally we propose and built a hybrid event coding system that can capture the best of both automated and manual event coders. We breakdown the overall event coding pipeline into several micro-tasks and propose individual methods for each micro-task. Each method is built with the capability to know what it doesn't know and thus is capable of balancing quality vs throughput based on available human resources. Machine learning Event Extraction Forecasting

Search results