Spelling suggestions: "subject:"supervised cachine learning"" "subject:"supervised amachine learning""
61 |
Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning TechniquesAljifri, Ahmed January 2024 (has links)
This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms.
|
62 |
Non-negative matrix decomposition approaches to frequency domain analysis of music audio signalsWood, Sean 12 1900 (has links)
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception.
|
63 |
Non-negative matrix decomposition approaches to frequency domain analysis of music audio signalsWood, Sean 12 1900 (has links)
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante. / We study the application of unsupervised matrix decomposition algorithms such as Non-negative Matrix Factorization (NMF) to frequency domain representations of music audio signals. These algorithms, driven by a given reconstruction error function, learn a set of basis functions and a set of corresponding coefficients that approximate the input signal. We compare the use of three reconstruction error functions when NMF is applied to monophonic and harmonized musical scales: least squares, Kullback-Leibler divergence, and a recently introduced “phase-aware” divergence measure. Novel supervised methods for interpreting the resulting decompositions are presented and compared to previously used methods that rely on domain knowledge. Finally, the ability of the learned basis functions to generalize across musical parameter values including note amplitude, note duration and instrument type, are analyzed. To do so, we introduce two basis function labeling algorithms that outperform the previous labeling approach in the majority of our tests, instrument type with monophonic audio being the only notable exception.
|
64 |
Détection dynamique des intrusions dans les systèmes informatiques / Dynamic intrusion detection in computer systemsPierrot, David 21 September 2018 (has links)
La démocratisation d’Internet, couplée à l’effet de la mondialisation, a pour résultat d’interconnecter les personnes, les états et les entreprises. Le côté déplaisant de cette interconnexion mondiale des systèmes d’information réside dans un phénomène appelé « Cybercriminalité ». Des personnes, des groupes mal intentionnés ont pour objectif de nuire à l’intégrité des systèmes d’information dans un but financier ou pour servir une cause. Les conséquences d’une intrusion peuvent s’avérer problématiques pour l’existence d’une entreprise ou d’une organisation. Les impacts sont synonymes de perte financière, de dégradation de l’image de marque et de manque de sérieux. La détection d’une intrusion n’est pas une finalité en soit, la réduction du delta détection-réaction est devenue prioritaire. Les différentes solutions existantes s’avèrent être relativement lourdes à mettre place aussi bien en matière de compétence que de mise à jour. Les travaux de recherche ont permis d’identifier les méthodes de fouille de données les plus performantes mais l’intégration dans une système d’information reste difficile. La capture et la conversion des données demandent des ressources de calcul importantes et ne permettent pas forcément une détection dans des délais acceptables. Notre contribution permet, à partir d’une quantité de données relativement moindre de détecter les intrusions. Nous utilisons les événements firewall ce qui réduit les besoins en terme de puissance de calcul tout en limitant la connaissance du système d’information par les personnes en charge de la détection des intrusions. Nous proposons une approche prenant en compte les aspects techniques par l’utilisation d’une méthode hybride de fouille de données mais aussi les aspects fonctionnels. L’addition de ces deux aspects est regroupé en quatre phases. La première phase consiste à visualiser et identifier les activités réseau. La deuxième phase concerne la détection des activités anormales en utilisant des méthodes de fouille de données sur la source émettrice de flux mais également sur les actifs visés. Les troisième et quatrième phases utilisent les résultats d’une analyse de risque et d’audit technique de sécurité pour une prioritisation des actions à mener. L’ensemble de ces points donne une vision générale sur l’hygiène du système d’information mais aussi une orientation sur la surveillance et les corrections à apporter. L’approche développée a donné lieu à un prototype nommé D113. Ce prototype, testé sur une plate-forme d’expérimentation sur deux architectures de taille différentes a permis de valider nos orientations et approches. Les résultats obtenus sont positifs mais perfectibles. Des perspectives ont été définies dans ce sens. / The expansion and democratization of the digital world coupled with the effect of the Internet globalization, has allowed individuals, countries, states and companies to interconnect and interact at incidence levels never previously imagined. Cybercrime, in turn, is unfortunately one the negative aspects of this rapid global interconnection expansion. We often find malicious individuals and/or groups aiming to undermine the integrity of Information Systems for either financial gain or to serve a cause. The consequences of an intrusion can be problematic for the existence of a company or an organization. The impacts are synonymous with financial loss, brand image degradation and lack of seriousness. The detection of an intrusion is not an end in itself, the reduction of the delta detection-reaction has become a priority. The different existing solutions prove to be cumbersome to set up. Research has identified more efficient data mining methods, but integration into an information system remains difficult. Capturing and converting protected resource data does not allow detection within acceptable time frames. Our contribution helps to detect intrusions. Protect us against Firewall events which reduces the need for computing power while limiting the knowledge of the information system by intrusion detectors. We propose an approach taking into account the technical aspects by the use of a hybrid method of data mining but also the functional aspects. The addition of these two aspects is grouped into four phases. The first phase is to visualize and identify network activities. The second phase concerns the detection of abnormal activities using data mining methods on the source of the flow but also on the targeted assets. The third and fourth phases use the results of a risk analysis and a safety verification technique to prioritize the actions to be carried out. All these points give a general vision on the hygiene of the information system but also a direction on monitoring and corrections to be made.The approach developed to a prototype named D113. This prototype, tested on a platform of experimentation in two architectures of different size made it possible to validate our orientations and approaches. The results obtained are positive but perfectible. Prospects have been defined in this direction.
|
65 |
Fault detection of planetary gearboxes in BLDC-motors using vibration and acoustic noise analysisAhnesjö, Henrik January 2020 (has links)
This thesis aims to use vibration and acoustic noise analysis to help a production line of a certain motor type to ensure good quality. Noise from the gearbox is sometimes present and the way it is detected is with a human listening to it. This type of error detection is subjective, and it is possible for human error to be present. Therefore, an automatic test that pass or fail the produced Brush Less Direct Current (BLDC)-motors is wanted. Two measurement setups were used. One was based on an accelerometer which was used for vibration measurements, and the other based on a microphone for acoustic sound measurements. The acquisition and analysis of the measurements were implemented using the data acquisition device, compactDAQ NI 9171, and the graphical programming software, NI LabVIEW. Two methods, i.e., power spectrum analysis and machine learning, were used for the analyzing of vibration and acoustic signals, and identifying faults in the gearbox. The first method based on the Fast Fourier transform (FFT) was used to the recorded sound from the BLDC-motor with the integrated planetary gearbox to identify the peaks of the sound signals. The source of the acoustic sound is from a faulty planet gear, in which a flank of a tooth had an indentation. Which could be measured and analyzed. It sounded like noise, which can be used as the indications of faults in gears. The second method was based on the BLDC-motors vibration characteristics and uses supervised machine learning to separate healthy motors from the faulty ones. Support Vector Machine (SVM) is the suggested machine learning algorithm and 23 different features are used. The best performing model was a Coarse Gaussian SVM, with an overall accuracy of 92.25 % on the validation data.
|
66 |
Evaluation of system design strategies and supervised classification methods for fruit recognition in harvesting robots / Undersökning av Systemdesignstrategier och Klassifikationsmetoder för Identifiering av Frukt i SkörderobotarBjörk, Gabriella January 2017 (has links)
This master thesis project is carried out by one student at the Royal Institute of Technology in collaboration with Cybercom Group. The aim was to evaluate and compare system design strategies for fruit recognition in harvesting robots and the performance of supervised machine learning classification methods when applied to this specific task. The thesis covers the basics of these systems; to which parameters, constraints, requirements, and design decisions have been investigated. The framework is used as a foundation for the implementation of both sensing system, and processing and classification algorithms. A plastic tomato plant with fruit of varying maturity was used as a basis for training and testing, and a Kinect v2 for Windows including sensors for high resolution color-, depth, and IR data was used for image acquisition. The obtained data were processed and features of objects of interest extracted using MATLAB and a SDK for Kinect provided by Microsoft. Multiple views of the plant were acquired by having the plant rotate on a platform controlled by a stepper motor and an Ardunio Uno. The algorithms tested were binary classifiers, including Support Vector Machine, Decision Tree, and k-Nearest Neighbor. The models were trained and validated using a five fold cross validation in MATLABs Classification Learner application. Peformance metrics such as precision, recall, and the F1-score, used for accuracy comparison, were calculated. The statistical models k-NN and SVM achieved the best scores. The method considered most promising for fruit recognition purposes was the SVM. / Det här masterexamensarbetet har utförts av en student från Kungliga Tekniska Högskolan i samarbete med Cybercom Group. Målet var att utvärdera och jämföra designstrategier för igenkänning av frukt i en skörderobot och prestandan av klassificerande maskininlärningsalgoritmer när de appliceras på det specifika problemet. Arbetet omfattar grunderna av dessa system; till vilket parametrar, begränsningar, krav och designbeslut har undersökts. Ramverket användes sedan som grund för implementationen av sensorsystemet, processerings- och klassifikationsalgoritmerna. En tomatplanta i pplast med frukter av varierande mognasgrad användes som bas för träning och validering av systemet, och en Kinect för Windows v2 utrustad med sensorer för högupplöst färg, djup, och infraröd data anvöndes för att erhålla bilder. Datan processerades i MATLAB med hjälp av mjukvaruutvecklingskit för Kinect tillhandahållandet av Windows, i syfte att extrahera egenskaper ifrån objekt på bilderna. Multipla vyer erhölls genom att låta tomatplantan rotera på en plattform, driven av en stegmotor Arduino Uno. De binära klassifikationsalgoritmer som testades var Support Vector MAchine, Decision Tree och k-Nearest Neighbor. Modellerna tränades och valideras med hjälp av en five fold cross validation i MATLABs Classification Learner applikation. Prestationsindikatorer som precision, återkallelse och F1- poäng beräknades för de olika modellerna. Resultatet visade bland annat att statiska modeller som k-NN och SVM presterade bättre för det givna problemet, och att den sistnömnda är mest lovande för framtida applikationer.
|
67 |
[en] A SUPERVISED LEARNING APPROACH TO PREDICT HOUSEHOLD AID DEMAND FOR RECURRENT CLIME-RELATED DISASTERS IN PERU / [pt] UMA ABORDAGEM DE APRENDIZADO SUPERVISIONADO PARA PREVER A DEMANDA DE AJUDA FAMILIAR PARA DESASTRES CLIMÁTICOS RECORRENTES NO PERURENATO JOSE QUILICHE ALTAMIRANO 21 November 2023 (has links)
[pt] Esta dissertação apresenta uma abordagem baseada em dados para
o problema de predição de desastres recorrentes em países em
desenvolvimento. Métodos de aprendizado de máquina supervisionado são
usados para treinar classificadores que visam prever se uma família seria
afetada por ameaças climáticas recorrentes (um classificador é treinado
para cada perigo natural). A abordagem desenvolvida é válida para perigos
naturais recorrentes que afetam um país e permite que os gerentes de risco
de desastres direcionem suas operações com mais conhecimento. Além
disso, a avaliação preditiva permite que os gerentes entendam os
impulsionadores dessas previsões, levando à formulação proativa de
políticas e planejamento de operações para mitigar riscos e preparar
comunidades para desastres recorrentes.
A metodologia proposta foi aplicada ao estudo de caso do Peru, onde
foram treinados classificadores para ondas de frio, inundações e
deslizamentos de terra. No caso das ondas de frio, o classificador tem
73,82% de precisão. A pesquisa descobriu que famílias pobres em áreas
rurais são vulneráveis a desastres relacionados a ondas de frio e precisam
de intervenção humanitária proativa. Famílias vulneráveis têm
infraestrutura urbana precária, incluindo trilhas, caminhos, postes de
iluminação e redes de água e drenagem. O papel do seguro saúde, estado
de saúde e educação é menor. Domicílios com membros doentes levam a
maiores probabilidades de serem afetados por ondas de frio. Maior
realização educacional do chefe da família está associada a uma menor
probabilidade de ser afetado por ondas de frio. No caso das inundações, o classificador tem 82.57% de precisão.
Certas condições urbanas podem tornar as famílias rurais mais suscetíveis
a inundações, como acesso à água potável, postes de iluminação e redes
de drenagem. Possuir um computador ou laptop diminui a probabilidade de
ser afetado por inundações, enquanto possuir uma bicicleta e ser chefiado
por indivíduos casados aumenta. Inundações são mais comuns em
assentamentos urbanos menos desenvolvidos do que em famílias rurais
isoladas.
No caso dos deslizamentos de terra, o classificador tem 88.85% de
precisão, e é segue uma lógica diferente do das inundações. A importância
na previsão é mais uniformemente distribuída entre as características
consideradas no aprendizado do classificador. Assim, o impacto de um
recurso individual na previsão é pequeno. A riqueza a longo prazo parece
ser mais crítica: a probabilidade de ser afetado por um deslizamento é
menor para famílias com certos aparelhos e materiais domésticos de
construção. Comunidades rurais são mais afetadas por deslizamentos,
especialmente aquelas localizadas em altitudes mais elevadas e maiores
distâncias das cidades e mercados. O impacto marginal médio da altitude
é não linear.
Os classificadores fornecem um método inteligente baseado em
dados que economiza recursos garantindo precisão. Além disso, a
pesquisa fornece diretrizes para abordar a eficiência na distribuição da
ajuda, como formulações de localização da instalação e roteamento de
veículos.
Os resultados da pesquisa têm várias implicações gerenciais, então
os autores convocam à ação gestores de risco de desastres e outros
interessados relevantes. Desastres recorrentes desafiam toda a
humanidade. / [en] This dissertation presents a data-driven approach to the problem of predicting recurrent disasters in developing countries. Supervised machine learning methods are used to train classifiers that aim to predict whether a household would be affected by recurrent climate threats (one classifier is trained for each natural hazard). The approach developed is valid for recurrent natural hazards affecting a country and allows disaster risk managers to target their operations with more knowledge. In addition, predictive assessment allows managers to understand the drivers of these predictions, leading to proactive policy formulation and operations planning to mitigate risks and prepare communities for recurring disasters. The proposed methodology was applied to the case study of Peru, where classifiers were trained for cold waves, floods, and landslides. In the case of cold waves, the classifier was 73.82% accurate. The research found that low-income families in rural areas are vulnerable to cold wave related disasters and need proactive humanitarian intervention. Vulnerable families have poor urban infrastructure, including footpaths, roads, lampposts, and water and drainage networks. The role of health insurance, health status, and education is minor. Households with sick members are more likely to be affected by cold waves. Higher educational attainment of the head of the household is associated with a lower probability of being affected by cold snaps.In the case of flooding, the classifier is 82.57% accurate. Certain urban conditions, such as access to drinking water, lampposts, and drainage networks, can make rural households more susceptible to flooding. Owning a computer or laptop decreases the likelihood of being affected by flooding while owning a bicycle and being headed by married individuals increases it. Flooding is more common in less developed urban settlements than isolated rural families.In the case of landslides, the classifier is 88.85% accurate and follows a different logic than that of floods. The importance of the prediction is more evenly distributed among the features considered when learning the classifier. Thus, the impact of an individual feature on the prediction is small. Long-term wealth is more critical: the probability of being affected by a landslide is lower for families with specific appliances and household building materials. Rural communities are more affected by landslides, especially those located at higher altitudes and greater distances from cities and markets. The average marginal impact of altitude is non-linear.The classifiers provide an intelligent data-driven method that saves resources by ensuring accuracy. In addition, the research provides guidelines for addressing efficiency in aid distribution, such as facility location formulations and vehicle routing.The research results have several managerial implications, so the authors call for action from disaster risk managers and other relevant stakeholders. Recurrent disasters challenge all of humanity.
|
68 |
Differentiation of Occlusal Discolorations and Carious Lesions with Hyperspectral Imaging In VitroVosahlo, Robin, Golde, Jonas, Walther, Julia, Koch, Edmund, Hannig, Christian, Tetschke, Florian 19 April 2024 (has links)
Stains and stained incipient lesions can be challenging to differentiate with established clinical tools. New diagnostic techniques are required for improved distinction to enable early noninvasive treatment. This in vitro study evaluates the performance of artificial intelligence (AI)-based classification of hyperspectral imaging data for early occlusal lesion detection and differentiation from stains. Sixty-five extracted permanent human maxillary and mandibular bicuspids and molars (International Caries Detection and Assessment System [ICDAS] II 0–4) were imaged with a hyperspectral camera (Diaspective Vision TIVITA® Tissue, Diaspective Vision, Pepelow, Germany) at a distance of 350 mm, acquiring spatial and spectral information in the wavelength range 505–1000 nm; 650 fissural spectra were used to train classification algorithms (models) for automated distinction between stained but sound enamel and stained lesions. Stratified 10-fold cross-validation was used. The model with the highest classification performance, a fine k-nearest neighbor classification algorithm, was used to classify five additional tooth fissural areas. Polarization microscopy of ground sections served as reference. Compared to stained lesions, stained intact enamel showed higher reflectance in the wavelength range 525–710 nm but lower reflectance in the wavelength range 710–1000 nm. A fine k-nearest neighbor classification algorithm achieved the highest performance with a Matthews correlation coefficient (MCC) of 0.75, a sensitivity of 0.95 and a specificity of 0.80 when distinguishing between intact stained and stained lesion spectra. The superposition of color-coded classification results on further tooth occlusal projections enabled qualitative assessment of the entire fissure’s enamel health. AI-based evaluation of hyperspectral images is highly promising as a complementary method to visual and radiographic examination for early occlusal lesion detection.
|
Page generated in 0.2149 seconds