Global ETD Search

31	Multi-Class Imbalanced Learning for Time Series Problem : An Industrial Case Study Andersson, Melanie January 2020 (has links) Classification problems with multiple classes and imbalanced sample sizes present a new challenge than the binary classification problems. Methods have been proposed to handle imbalanced learning, however most of them are specifically designed for binary classification problems. Multi-class imbalance imposes additional challenges when applied to time series classification problems, such as weather classification. In this thesis, we introduce, apply and evaluate a new algorithm for handling multi-class imbalanced problems involving time series data. Our proposed algorithm is designed to handle both multi-class imbalance and time series classification problems and is inspired by the Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification algorithm. The feasibility of our proposed algorithm is studied through an empirical evaluation performed on a telecom use-case at Ericsson, Sweden where data from commercial microwave links is used for weather classification. Our proposed algorithm is compared to the currently used model at Ericsson which is a one-dimensional convolutional neural network, as well as three other deep learning models. The empirical evaluation indicates that the performance of our proposed algorithm for weather classification is comparable to that of the current solution. Our proposed algorithm and the current solution are the two best performing models of the study. Time Series Imbalanced Learning Weather Classification Microwave Link Multi-Class Classification Engineering and Technology Teknik och teknologier
32	Purchase Probability Prediction : Predicting likelihood of a new customer returning for a second purchase using machine learning methods Alstermark, Olivia, Stolt, Evangelina January 2021 (has links) When a company evaluates a customer for being a potential prospect, one of the key questions to answer is whether the customer will generate profit in the long run. A possible step to answer this question is to predict the likelihood of the customer returning to the company again after the initial purchase. The aim of this master thesis is to investigate the possibility of using machine learning techniques to predict the likelihood of a new customer returning for a second purchase within a certain time frame. To investigate to what degree machine learning techniques can be used to predict probability of return, a number of di↵erent model setups of Logistic Lasso, Support Vector Machine and Extreme Gradient Boosting are tested. Model development is performed to ensure well-calibrated probability predictions and to possibly overcome the diculty followed from an imbalanced ratio of returning and non-returning customers. Throughout the thesis work, a number of actions are taken in order to account for data protection. One such action is to add noise to the response feature, ensuring that the true fraction of returning and non-returning customers cannot be derived. To further guarantee data protection, axes values of evaluation plots are removed and evaluation metrics are scaled. Nevertheless, it is perfectly possible to select the superior model out of all investigated models. The results obtained show that the best performing model is a Platt calibrated Extreme Gradient Boosting model, which has much higher performance than the other models with regards to considered evaluation metrics, while also providing predicted probabilities of high quality. Further, the results indicate that the setups investigated to account for imbalanced data do not improve model performance. The main con- clusion is that it is possible to obtain probability predictions of high quality for new customers returning to a company for a second purchase within a certain time frame, using machine learning techniques. This provides a powerful tool for a company when evaluating potential prospects. Purchase Probability Prediction Machine Learning Models Well-Calibrated Probabilities Imbalanced Data Data Protection Mathematics Matematik
33	Quasiparticle excitations in FeSe in the vicinity of BCS-BEC crossover studied by thermal transport measurements / FeSe単結晶における熱輸送係数の測定 Watashige, Tatsuya 23 March 2017 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(理学) / 甲第20166号 / 理博第4251号 / 新制\|\|理\|\|1611(附属図書館) / 京都大学大学院理学研究科物理学・宇宙物理学専攻 / (主査)教授松田祐司, 教授川上則雄, 教授前野悦輝 / 学位規則第4条第1項該当 / Doctor of Science / Kyoto University / DFAM Iron-based superconductor BCS-BEC crossover Thermal transport measurements 400
34	Comparative Data Analytic Approach for Detection of Diabetes Sood, Radhika January 2018 (has links) No description available. Information Technology Data mining Diabetes Clustering K-fold cross-validation Imbalanced data Decision-support tool
35	Classification of imbalanced disparate medical data using ontology / Klassificering av Obalanserad Medicinsk Data med Ontologier Karlsson, Ludvig, Wilhelm Kopp Sundin, Gustav January 2023 (has links) Through the digitization of healthcare, large volumes of data are generated and stored in healthcare operations. Today, a multitude of platforms and digital infrastructures are used for storage and management of data. The systems lack a common ontology which limits the interoperability between datasets. Limited interoperability impacts various areas of healthcare, for instance sharing of data between entities and the possibilities for aggregated machine learning research incorporating distributed data. This study examines how a random forest classifier performs on two datasets consisting of phase III clinical trial studies on small-cell lung cancer where the datasets do not share a common ontology. The performance is then compared to the same classifier’s performance on one dataset consisting of a connection of the two earlier mentioned sets where a common ontology is implemented. The study does not show unambiguous results indicating that a single ontology is creating a better performance for the random forest classifier. In addition, the conditions of entities within primary care in Sweden for undergoing a transition to a new platform for storage of data is discussed together with areas for future research. / Till följd av digitaliseringen inom hälso- och sjukvården genereras stora volymer data som lagras och används i verksamheten. Idag används en mängd olika plattformar för lagring och hantering av data. Systemen saknar en gemensam ontologi, vilket begränsar interoperabiliteten mellan datamängderna. Bristande interoperabilitet påverkar olika områden inom hälso- och sjukvården, till exempel delning av data mellan vårdinstanser och möjligheterna för forskning på en aggregerad nivå där maskininlärning används. Denna studie undersöker hur en random forest klassificerare presterar på två dataset bestående av fas III kliniska prövningar av småcellig lungcancer där dataseten inte delar en gemensam ontologi. Prestandan jämförs sedan med samma klassificerares prestanda på ett dataset som består av en anslutning mellan de två tidigare nämnda dataseten där en gemensam ontologi har implementerats. Studien visar inte entydiga resultat som indikerar att en gemensam eller icke-gemensam ontologi skapar bättre prestanda för en random forest klassificerare. Vidare diskuteras förutsättningarna och krav på förändringsprocessen för en övergång till Centrum för Datadriven Hälsas föreslagna plattform utifrån en klinik inom primärvårdens perspektiv. Ontology machine learning random forest imbalanced data oncology digital transformation Computer and Information Sciences Data- och informationsvetenskap
36	IMBALANCED TIME SERIES FORECASTING AND NEURAL TIME SERIES CLASSIFICATION Chen, Xiaoqian 01 August 2023 (has links) (PDF) This dissertation will focus on the forecasting and classification of time series. Specifically, the forecasting problem will focus on imbalanced time series (ITS) which contain a mix of a mix of low probability extreme observations and high probability normal observations. Two approaches are proposed to improve the forecasting of ITS. In the first approach proposed in chapter 2, an ITS will be modelled as a composition of normal and extreme observations, the input predictor variables and the associated forecast output will be combined into moving blocks, and the blocks will be categorized as extreme event (EE) or normal event (NE) blocks. Imbalance will be decreased by oversampling the minority EE blocks and undersampling the majority NE blocks using modifications of block bootstrapping and synthetic minority oversampling technique (SMOTE). Convolution neural networks (CNNs) and long-short term memory (LSTMs) will be selected for forecast modelling. In the second approach described in chapter 3, which focuses on improving the forecasting accuracies LSTM models, a training strategy called Circular-Shift Circular Epoch Training (CSET), is proposed to preserve the natural ordering of observations in epochs during training without any attempt to balance the extreme and normal observations. The strategy will be universal because it could be applied to train LSTMs to forecast events in normal time series or in imbalanced time series in exactly the same manner. The CSET strategy will be formulated for both univariate and multivariate time series forecasting. The classification problem will focus on the classification event-related potential neural time series by exploiting information offered by the cone of influence (COI) of the continuous wavelet transform (CWT). The COI is a boundary that is superimposed on the wavelet scalogram to delineate the coefficients that are accurate from those that are inaccurate due to edge effects. The features derived from the inaccurate coefficients are, therefore, unreliable. It is hypothesized that the classifier performance would improve if unreliable features, which are outside the COI, are zeroed out, and the performance would improve even further if those features are cropped out completely. Two CNN multidomain models will be introduced to fuse the multichannel Z-scalograms and the V-scalograms. In the first multidomain model, referred to as the Z-CuboidNet, the input to the CNN will be generated by fusing the Z-scalograms of the multichannel ERPs into a frequency-time-spatial cuboid. In the second multidomain model, referred to as the V-MatrixNet, the CNN input will be formed by fusing the frequency-time vectors of the V-scalograms of the multichannel ERPs into a frequency-time-spatial matrix. CNN cone of influence Deep learning imbalanced time series forecasting LSTM neural time series classification
37	Multi-Class Classification for Predicting Customer Satisfaction : Application of machine learning methods to predict customer satisfaction at IKEA Backerholm, Stina, Börjesjö, Malin January 2023 (has links) Gaining a comprehensive understanding of the features that contribute to customer satisfaction after contact with IKEA’s Remote Customer Meeting Points (RCMPs) is essential for implementing effective remedial measures in the future. The aim of this project is to investigate if it is possible to find key features that influence customer satisfaction and to use these to predict customer satisfaction. The task has been approached as a multi-class classification problem, with the objective of classifying the observations into five distinct levels of customer satisfaction. The study utilized three models, Multinomial Logistic Regression, Random Forest, and Extreme Gradient Boosting, to investigate these possibilities. Based on the methods used and the available data, the results indicate that it is currently not feasible to accurately identify key features or predict customer satisfaction. / Att förstå vilka faktorer som bidrar till kundnöjdhet efter en kontakt med IKEAs RCMPs är avgörande för att kunna genomföra effektiva åtgärder i framtiden. Syftet med detta projekt är att undersöka om det är möjligt att hitta nyckelfaktorer som påverkar kundnöjdhet och använda dessa för att prediktera kundnöjdhet. Uppgiften har angripits som ett multi-klass klassificeringsproblem, med syftet att klas- sificera observationerna i fem olika nivåer av kundnöjdhet. Studien har utvärderat tre olika modeller, Multinomial Logistic Regression, Random Forest och Extreme Gradient Boosting, för att undersöka dessa möjligheter. Baserat på de använda metoderna med tillgängliga data, indikerar resultaten att det för tillfället inte är möjligt att identifiera nyckelfaktorer eller prediktera kundnöjdhet med hög noggrannhet. Multi-Class Classification Imbalanced Data Machine Learning Multi-Klass Klassifisering Obalanserat Data Maskininlärning Mathematics Matematik
38	Overcoming the Curse of Missing and Noisy Data in Computational Drug Design Meng, Fanwang January 2022 (has links) Machine learning (ML) has enjoyed great success in chemistry and drug design, from designing synthetic pathways to drug screening, to biomolecular property predictions, etc.. However, ML model's generalizability and robustness require high-quality training data, which is often difficult to obtain, especially when the training data is acquired from experimental measurements. While one can always discard all data associated with noisy and/or missing values, this often results in discarding invaluable data. This thesis presents and applies mathematical techniques to solve this problem, and applies them to problems in molecular medicinal chemistry. In chapter 1, we indicate that the missing-data problem can be expressed as a matrix completion problem, and we point out how frequently matrix completion problems arise in (bio)chemical problems. Next, we use matrix completion to impute the missing values in protein-NMR data, and use this as a stepping-stone for understanding protein allostery in Chapter 2. This chapter also used several other techniques from statistical data analysis and machine learning, including denoising (from robust principal component analysis), latent feature identification from singular-value decomposition, and residue clustering by a Gaussian mixture model. In chapter 3, Δ-learning was used to predict the free energies of hydration (Δ𝐺). The aim of this study is to correct estimated hydration energies from low-level quantum chemistry calculations using continuum solvation models without significant additional computation. Extensive feature engineering, with 8 different regression algorithms and with Gaussian process regression (38 different kernels) were used to construct the predictive models. The optimal model gives us MAE of 0.6249 kcal/mol and RMSE of 1.0164 kcal/mol. Chapter 4 provides an open-source computational tool Procrustes to find the maximum similarities between metrics. Some examples are also given to show how to use Procrustes for chemical and biological problems. Finally, in Chapters 5 and 6, a database for permeability of the blood-brain barrier (BBB) was curated, and combined with resampling strategies to form predictive models. The resulting models have promising performance and are released along with a computational tool B3clf for its evaluation. / Thesis / Doctor of Science (PhD) machine learning computational drug design matrix completion missing value Δ-Learning imbalanced learning
39	A Comparison of Rule Extraction Techniques with Emphasis on Heuristics for Imbalanced Datasets Singh, Manjeet 22 September 2010 (has links) No description available. Industrial Engineering Ecological Datasets Imbalanced dataset modeling Artficial Neural Networks Surface Generation Non-linear modeling
40	Deep Convolutional Neural Networks for Multiclassification of Imbalanced Liver MRI Sequence Dataset Trivedi, Aditya January 2020 (has links) Application of deep learning in radiology has the potential to automate workflows, support radiologists with decision support, and provide patients a logic-based algorithmic assessment. Unfortunately, medical datasets are often not uniformly distributed due to a naturally occurring imbalance. For this research, a multi-classification of liver MRI sequences for imaging of hepatocellular carcinoma (HCC) was conducted on a highly imbalanced clinical dataset using deep convolutional neural network. We have compared four multi classification classifiers which were Model A and Model B (both trained using imbalanced training data), Model C (trained using augmented training images) and Model D (trained using under sampled training images). Data augmentation such as 45-degree rotation, horizontal and vertical flip and random under sampling were performed to tackle class imbalance. HCC, the third most common cause of cancer-related mortality [1], can be diagnosed with high specificity using Magnetic Resonance Imaging (MRI) with the Liver Imaging Reporting and Data System (LI-RADS). Each individual MRI sequence reveals different characteristics that are useful to determine likelihood of HCC. We developed a deep convolutional neural network for the multi-classification of imbalanced MRI sequences that will aid when building a model to apply LI-RADS to diagnose HCC. Radiologists use these MRI sequences to help them identify specific LI-RADS features, it helps automate some of the LIRADS process, and further applications of machine learning to LI-RADS will likely depend on automatic sequence classification as a first step. Our study included an imbalanced dataset of 193,868 images containing 10 MRI sequences: in- phase (IP) chemical shift imaging, out-phase (OOP) chemical shift imaging, T1-weighted post contrast imaging (C+, C-, C-C+), fat suppressed T2 weighted imaging (T2FS), T2 weighted imaging, Diffusion Weighted Imaging (DWI), Apparent Diffusion Coefficient map (ADC) and In phase/Out of phase (IPOOP) imaging. Model performance for Models A, B, C and D provided a macro average F1 score of 0.97, 0.96, 0.95 and 0.93 respectively. Model A showed higher classification scores than models trained using data augmentation and under sampling. / Thesis / Master of Science (MSc) Imbalanced Medical Imaging Deep Learning Convolutional Neural Networks Clinical Decision Support Radiology

Search results