Global ETD Search

221	Wine quality prediction model using machine learning techniques Kothawade, Rohan Dilip January 2021 (has links) The quality of a wine is important for the consumers as well as the wine industry. The traditional (expert) way of measuring wine quality is time-consuming. Nowadays, machine learning models are important tools to replace human tasks. In this case, there are several features to predict the wine quality but the entire features will not be relevant for better prediction. So, our thesis work is focusing on what wine features are important to get the promising result. For the purposeof classification model and evaluation of the relevant features, we used three algorithms namely support vector machine (SVM), naïve Bayes (NB), and artificial neural network (ANN). In this study, we used two wine quality datasets red wine and white wine. To evaluate the feature importance we used the Pearson coefficient correlation and performance measurement matrices such as accuracy, recall, precision, and f1 score for comparison of the machine learning algorithm. A grid search algorithm was applied to improve the model accuracy. Finally, we achieved the artificial neural network (ANN) algorithm has better prediction results than the Support Vector Machine (SVM) algorithm and the Naïve Bayes (NB) algorithm for both red wine and white wine datasets. Classification support vector machine naïve Bayes artificial neural network Information Systems, Social aspects
222	Training Machine Learning-based QSAR models with Conformal Prediction on Experimental Data from DNA-Encoded Chemical Libraries Geylan, Gökçe January 2021 (has links) DNA-encoded chemical libraries (DEL) allows an exhaustive chemical space sampling with a large-scale data consisting of compounds produced through combinatorial synthesis. This novel technology was utilized in the early drug discovery stages for robust hit identification and lead optimization. In this project, the aim was to build a Machine Learning- based QSAR model with conformal prediction for hit identification on two different target proteins, the DEL was assayed on. An initial investigation was conducted on a pilot project with 1000 compounds and the analyses and the conclusions drawn from this part were later applied to a larger dataset with 1.2 million compounds. With this classification model, the prediction of the compound activity in the DEL as well as in an external dataset was aimed to be analyzed with identification of the top hits to evaluate model’s performance and applicability. Support Vector Machine (SVM) and Random Forest (RF) models were built on both the pilot and the main datasets with different descriptor sets of Signature Fingerprints, RDKIT and CDK. In addition, an Autoencoder was used to supply data-driven descriptors on the pilot data as well. The Libsvm and the Liblinear implementations were explored and compared based on the models’ performances. The comparisons were made by considering the key concepts of conformal prediction such as the trade-off between validity and efficiency, observed fuzziness and the calibration against a range of significance levels. The top hits were determined by two sorting methods, credibility and p-value differences between the binary classes. The assignment of correct single-labels to the true actives over a wide range of significance levels regardless of the similarity of the test compounds to the training set was confirmed for the models. Furthermore, an accumulation of these true actives in the models’ top hit selections was observed according to the latter sorting method and additional investigations on the similarity and the building block enrichments in the top 50 and 100 compounds were conducted. The Tanimoto similarity demonstrated the model’s predictive power in selecting structurally dissimilar compounds while the building block enrichment analysis showed the selectivity of the binding pocket where the target protein B was determined to be more selective. All of these comparison methods enabled an extensive study on the model evaluation and performance. In conclusion, the Liblinear model with the Signature Fingerprints was concluded to give the best model performance for both the pilot and the main datasets with the considerations of the model performances and the computational power requirements. However, an external set prediction was not successful due to the low structural diversity in the DEL which the model was trained on. Machine Learning DNA-Encoded Chemical Library Support Vector Machine Random Forest Conformal Prediction QSAR Pharmaceutical Sciences Farmaceutiska vetenskaper
223	Tyre sound classification with machine learning Jabali, Aghyad, Mohammedbrhan, Husein Abdelkadir January 2021 (has links) Having enough data about the usage of tyre types on the road can lead to a better understanding of the consequences of studded tyres on the environment. This paper is focused on training and testing a machine learning model which can be further integrated into a larger system for automation of the data collection process. Different machine learning algorithms, namely CNN, SVM, and Random Forest, were compared in this experiment. The method used in this paper is an empirical method. First, sound data for studded and none-studded tyres was collected from three different locations in the city of Gävle/Sweden. A total of 760 Mel spectrograms from both classes was generated to train and test a well-known CNN model (AlexNet) on MATLAB. Sound features for both classes were extracted using JAudio to train and test models that use SVM and Random Forest classifi-ers on Weka. Unnecessary features were removed one by one from the list of features to improve the performance of the classifiers. The result shows that CNN achieved accuracy of 84%, SVM has the best performance both with and without removing some audio features (i.e 94% and 92%, respectively), while Random Forest has 89 % accuracy. The test data is comprised of 51% of the studded class and 49% of the none-studded class and the result of the SVM model has achieved more than 94 %. Therefore, it can be considered as an acceptable result that can be used in practice. Sound Classification Machine learning Support vector machine (SVM) Convolutional Neural Network (CNN) Random Forest. Computer Sciences Datavetenskap (datalogi)
224	Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading / 拡散テンソル画像の複数パラメータを用いた神経膠腫の悪性度予測 Inano, Rika 23 March 2016 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(医学) / 甲第19616号 / 医博第4123号 / 新制\|\|医\|\|1015(附属図書館) / 32652 / 京都大学大学院医学研究科医学専攻 / (主査)教授佐藤俊哉, 教授富樫かおり, 教授藤渕航 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM glioma grading diffusion tendor imaging voxel-based clustering self-organizing map K-means support vector machine 490
225	Klassificering av engagemangsnivå hos en samtalsdeltagare med hjälp av maskininlärning / Classification of interlocutor engagement using machine learning Ljung, Mikael, Månsson, Linnea January 2019 (has links) The work presented in this study is based on the long-term goal of developing a social robot that can be involved in leading a conversation in a language café. In detail, the study has investigated whether it is possible to classify involvement with a conversation participant based on its facial expression and gaze two factors that previous studies have shown to be central to human engagement. To perform the assessment, the software Openface has extracted said parameters from a previous field study which has then been processed with the machine learning model Support Vector Machine. After a lot of hyperparameter tuning, the final model managed to predict engagement on a three-point scale with 54.5% accuracy. Furthermore, the study has also examined the potential of the new technological paradigm that the social robot represents. The potential has been analyzed on the basis of Dosi’s four dimensions: technological possibilities, appropriability of innovation, cumulativeness of technical advances and properties of the knowledge base. The analysis clarifies that the paradigm has the potential to revolutionize a number of industries as a result of its technological opportunities and worldwide stakeholders, but also faces challenges in the form of technical and ethical difficulties. / Arbetet som presenteras i den här studien grundar sig i det långsiktiga målet att utveckla en social robot som kan vara med och leda samtalssessioner på ett språkcafé. I detalj har studien undersökt om det går att klassificera engagemang hos en samtalsdeltagare utifrån dess ansiktsuttryck och blickriktning – två faktorer som tidigare studier visat sig vara centrala för människans engagemang. För att utföra bedömningen har mjukvaran Openface extraherat nämnda parametrar från en tidigare fältstudie vilka sedan har processats med maskininlärningsmodellen Support Vector Machine. Efter gedigna försök att finna optimala värden på hyperparametrar till modellen lyckades den slutligen predicera engagemang på en tregradig skala med 54,5% accuracy. Vidare har studien också undersökt potentialen för det nya teknologiska paradigmet som den sociala roboten utgör. Potentialen har analyserats med utgångspunkt i Dosis fyra dimensioner: teknologiska möjligheter, möjliga vinster från innovation, kumulativ höjd på teknologiska framsteg och egenskaper i kunskapsbasen. Analysen klargör att paradigmet har förutsättningar att revolutionera ett flertal industrier till följd av dess teknologiska möjligheter och världsomfattande intressenter, men står också inför utmaningar i form av tekniska och etiska svårigheter. Engagement Facial Action Units Machine learning OpenFace Paradigm Social robot Support Vector Machine. Computer and Information Sciences Data- och informationsvetenskap
226	Detecting Fraud in Affiliate Marketing: Comparative Analysis of Supervised Machine Learning Algorithms Ahlqvist, Oskar January 2023 (has links) Affiliate marketing has become a rapidly growing part of the digital marketing sector. However, fraud in affiliate marketing raises a serious threat to the trust and financial stability of the involved parties. This thesis investigates the performance of three supervised machine learning algorithms - random forest, logistic regression, and support vector machine in detecting fraud in affiliate marketing. The objective is to answer the following main research question by answering two sub-questions: How much can Random Forest, Logistic Regression, and Support Vector Machine contribute to the detection of fraud in affiliate marketing? 1. How can the models be compared in an experiment? 2. How can they be optimized and applied within an affiliate marketing framework? To answer these questions, a dataset of transaction logs is analyzed in collaboration with an affiliate network company. The machine learning experiment employs k-fold crossvalidation and the Area Under the ROC Curve (AUC-ROC) performance metric to evaluate the effectiveness of the classifiers in distinguishing fraudulent from non-fraudulent transactions. The results indicate that the random forest classifier performs best out of the models, achieving the highest mean AUC of 0.7172. Furthermore, using feature importance analysis demonstrates that each feature category had different impact on the performance of the models. It was discovered that the models computes different feature importance meaning that some features displayed greater influence on specific models. By fine-tuning and optimizing the hyperparameters for each model, it is possible to enhance their performance. Despite certain limitations, such as time constraints, data availability, and security restrictions, this study highlights the potential of supervised machine learning algorithms. Particularly random forest showed to how it could be used to improve fraud detection capabilities in affiliate marketing.The insights contribute to closing the knowledge gap in comparing the effectiveness of various classification methods and practical applications for fraud detection. Fraud detection Machine learning Random Forest support vector machine Logistic Regression Classification models Affiliate marketing Computer Sciences Datavetenskap (datalogi)
227	Predicting High-Cap Tech Stock Polarity: A Combined Approach using Support Vector Machines and Bidirectional Encoders from Transformers Grisham, Ian L 01 May 2023 (has links) (PDF) The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not contain sentiment analysis-related features. The results indicated that sentiment containing datasets were typically better predictors, with improved model accuracy. However, the results did not reflect the improvements shown by similar research and will require further research to determine the nature of the relationship between sentiment and higher model performance. Quantitative Finance Sentiment Analysis Support Vector Machine BERT Artificial Intelligence and Robotics Computational Linguistics Computer Sciences Data Science Finance Probability
228	Development of battery models for on-board health estimation in hybrid vehicles Riesco Refoyo, Javier January 2017 (has links) Following the positive reception of electric and hybrid transport solutions in the market, manufacturers keep developing their vehicles further, while facing previously undertaken challenges. Knowing the way lithium-ion batteries behave is still one of the key factors for hybrid electric vehicles (HEVs) development, especially for the requirements of the battery management system during their operation. Hence, this project focuses on the necessity of robust yet reasonably simple and cost-effective models of the battery for estimating the health status during the operation of the vehicles. With this aim, the procedure and models to calculate the state-of-health (SOH) indicators, internal resistance and capacity, are proposed and the results discussed. Two machine-learning based models are presented, a support vector machine (SVM) and a neural network (NN), together with one equivalent circuit model (ECM). The data used for training and validating the models comes from testing the batteries in the laboratory with standard performance tests and real driving cycles along the battery lifespan. However, data sets measured in actual heavy-duty vehicles during their operation for three years is also analysed and compared. With respect to this matter, a study of the battery materials, behaviour and operation attributes is carried out, highlighting the main aspects and issues that affect the development of the models. The inputs for the models are signals that can be measured on-board in the vehicles, as current, voltage or temperature, and other derived from them as the state-of-charge (SOC) calculated by the internal battery management unit. Time-series of the variables are used for simulation purposes. The management of signals and implementation of the models is done in the environment of Matlab-Simulink, using some of its in-built functions and other specifically developed. The models are evaluated and compared by means of the normalized root mean squared error (NRMSE) of the voltage output profile compared to that of the tested batteries, but also the error of the internal resistance calculations calculated from the voltage profile for the three models, and the internal parameters in case of the ECM. While despite the difficulties faced with the data, the models can eventually perform accurate estimations of the resistance, the results of the capacity estimations are omitted in the document due to the lack of useful information derived. Nevertheless, the calculation procedure and other considerations to take into account regarding the capacity estimation and data sets are undertaken. Finally, the conclusions about the data used, battery materials and methods evaluated are drawn, laying down recommendations as to design the performance tests following the conditions of the driving cycles, and indicating the higher general performance of the SVM respect the other two methods, while asserting the usefulness of the ECM. Moreover, the battery with NMC material composition is observed to be easier to predict by the models than LFP, also showing different evolution of its internal resistance. Lithium-ion battery State-of-health Resistance Capacity Materials Model Equivalent circuit model Support vector machine Neural networks Materials Engineering Materialteknik
229	Evaluating Statistical MachineLearning and Deep Learning Algorithms for Anomaly Detection in Chat Messages / Utvärdering av statistiska maskininlärnings- och djupinlärningsalgoritmer för anomalitetsdetektering i chattmeddelanden Freberg, Daniel January 2018 (has links) Automatically detecting anomalies in text is of great interest for surveillance entities as vast amounts of data can be analysed to find suspicious activity. In this thesis, three distinct machine learning algorithms are evaluated as a chat message classifier is being implemented for the purpose of market surveillance. Naive Bayes and Support Vector Machine belong to the statistical class of machine learning algorithms being evaluated in this thesis and both require feature selection, a side objective of the thesis is thus to find a suitable feature selection technique to ensure mentioned algorithms achieve high performance. Long Short-Term Memory network is the deep learning algorithm being evaluated in the thesis, rather than depend on feature selection, the deep neural network will be evaluated as it is trained using word embeddings. Each of the algorithms achieved high performance but the findings ofthe thesis suggest Naive Bayes algorithm in conjunction with a feature counting feature selection technique is the most suitable choice for this particular learning problem. / Att automatiskt kunna upptäcka anomalier i text har stora implikationer för företag och myndigheter som övervakar olika sorters kommunikation. I detta examensarbete utvärderas tre olika maskininlärningsalgoritmer för chattmeddelandeklassifikation i ett marknadsövervakningsystem. Naive Bayes och Support Vector Machine tillhör båda den statistiska klassen av maskininlärningsalgoritmer som utvärderas i studien och bådar kräver selektion av vilka särdrag i texten som ska användas i algoritmen. Ett sekundärt mål med studien är således att hitta en passande selektionsteknik för att de statistiska algoritmerna ska prestera så bra som möjligt. Long Short-Term Memory Network är djupinlärningsalgoritmen som utvärderas i studien. Istället för att använda en selektionsteknik kommer djupinlärningsalgoritmen nyttja ordvektorer för att representera text. Resultaten visar att alla utvärderade algoritmer kan nå hög prestanda för ändamålet, i synnerhet Naive Bayes tillsammans med termfrekvensselektion. machine learning NLP deep learning word vectors naive bayes support vector machine LSTM Computer Sciences Datavetenskap (datalogi)
230	Multitemporal Satellite Data for Monitoring Urbanization in Nanjing from 2001 to 2016 Cai, Zipan January 2017 (has links) Along with the increasing rate of urbanization takes place in the world, the population keeps shifting from rural to urban areas. China, as the country of the largest population, has the highest urban population growth in Asia, as well as the world. However, the urbanization in China, in turn, is leading to a lot of social issues which reshape the living environment and cultural fabric. A variety of these kinds of social issues emphasize the challenges regarding a healthy and sustainable urban growth particularly in the reasonable planning of urban land use and land cover features. Therefore, it is significant to establish a set of comprehensive urban sustainable development strategies to avoid detours in the urbanization process. Nowadays, faced with such as a series of the social phenomenon, the spatial and temporal technological means including Remote Sensing and Geographic Information System (GIS) can be used to help the city decision maker to make the right choices. The knowledge of land use and land cover changes in the rural and urban area assists in identifying urban growth rate and trend in both qualitative and quantitatively ways, which provides more basis for planning and designing a city in a more scientific and environmentally friendly way. This paper focuses on the urban sprawl analysis in Nanjing, Jiangsu, China that being analyzed by urban growth pattern monitoring during a study period. From 2001 to 2016, Nanjing Municipality has experienced a substantial increase in the urban area because of the growing population. In this paper, one optimal supervised classification with high accuracy which is Support Vector Machine (SVM) classifier was used to extract thematic features from multitemporal satellite data including Landsat 7 ETM+, Landsat 8, and Sentinel-2A MSI. It was interpreted to identify the existence of urban sprawl pattern based on the land use and land cover features in 2001, 2006, 2011, and 2016. Two different types of change detection analysis including post-classification comparison and change vector analysis (CVA) were performed to explore the detailed extent information of urban growth within the study region. A comparison study on these two change detection analysis methods was carried out by accuracy assessment. Based on the exploration of the change detection analysis combined with the current urban development actuality, some constructive recommendations and future research directions were given at last. By implementing the proposed methods, the urban land use and land cover changes were successfully captured. The results show there is a notable change in the urban or built-up land feature. Also, the urban area is increased by 610.98 km2 while the agricultural land area is decreased by 766.96 km2, which proved a land conversion among these land cover features in the study period. The urban area keeps growing in each particular study period while the growth rate value has a decreasing trend in the period of 2001 to 2016. Besides, both change detection techniques obtained the similar result of the distribution of urban expansion in the study area. According to the result images from two change detection methods, the expanded urban or built-up land in Nanjing distributes mainly in the surrounding area of the central city area, both side of Yangtze River, and Southwest area. The results of change detection accuracy assessment indicated the post-classification comparison has a higher overall accuracy 86.11% and a higher Kappa Coefficient 0.72 than CVA. The overall accuracy and Kappa Coefficient for CVA is 75.43% and 0.51 respectively. These results proved the strength of agreement between predicted and truth data is at ‘good’ level for post-classification comparison and ‘moderate’ for CVA. Also, the results further confirmed the expectation from previous studies that the empirical threshold determination of CVA always leads to relatively poor change detection accuracy. In general, the two change detection techniques are found to be effective and efficient in monitoring surface changes in the different class of land cover features within the study period. Nevertheless, they have their advantages and disadvantages on processing change detection analysis particularly for the topic of urban expansion. Urbanization Nanjing Remote Sensing GIS Support Vector Machine Post-Classification Comparison Change Vector Analysis Environmental Sciences Miljövetenskap

Search results