Spelling suggestions: "subject:"classification - algorithms"" "subject:"classification - a.lgorithms""
11 |
An Evaluation of Classification Algorithms for Machinery Fault DiagnosisBuzza, Matthew 15 June 2017 (has links)
No description available.
|
12 |
Analytical fusion of multimodal magnetic resonance imaging to identify pathological states in genetically selected Marchigian Sardinian alcohol-preferring (msP) ratsCosa Liñán, Alejandro 06 November 2017 (has links)
[EN] Alcohol abuse is one of the most alarming issues for the health authorities. It is estimated that at least 23 million of European citizens are affected by alcoholism causing a cost around 270 million euros. Excessive alcohol consumption is related with physical harm and, although it damages the most of body organs, liver, pancreas, and brain are more severally affected. Not only physical harm is associated to alcohol-related disorders, but also other psychiatric disorders such as depression are often comorbiding. As well, alcohol is present in many of violent behaviors and traffic injures. Altogether reflects the high complexity of alcohol-related disorders suggesting the involvement of multiple brain systems.
With the emergence of non-invasive diagnosis techniques such as neuroimaging or EEG, many neurobiological factors have been evidenced to be fundamental in the acquisition and maintenance of addictive behaviors, relapsing risk, and validity of available treatment alternatives. Alterations in brain structure and function reflected in non-invasive imaging studies have been repeatedly investigated. However, the extent to which imaging measures may precisely characterize and differentiate pathological stages of the disease often accompanied by other pathologies is not clear. The use of animal models has elucidated the role of neurobiological mechanisms paralleling alcohol misuses. Thus, combining animal research with non-invasive neuroimaging studies is a key tool in the advance of the disorder understanding.
As the volume of data from very diverse nature available in clinical and research settings increases, an integration of data sets and methodologies is required to explore multidimensional aspects of psychiatric disorders. Complementing conventional mass-variate statistics, interests in predictive power of statistical machine learning to neuroimaging data is currently growing among scientific community.
This doctoral thesis has covered most of the aspects mentioned above. Starting from a well-established animal model in alcohol research, Marchigian Sardinian rats, we have performed multimodal neuroimaging studies at several stages of alcohol-experimental design including the etiological mechanisms modulating high alcohol consumption (in comparison to Wistar control rats), alcohol consumption, and treatment with the opioid antagonist Naltrexone, a well-established drug in clinics but with heterogeneous response. Multimodal magnetic resonance imaging acquisition included Diffusion Tensor Imaging, structural imaging, and the calculation of magnetic-derived relaxometry maps. We have designed an analytical framework based on widely used algorithms in neuroimaging field, Random Forest and Support Vector Machine, combined in a wrapping fashion. Designed approach was applied on the same dataset with two different aims: exploring the validity of the approach to discriminate experimental stages running at subject-level and establishing predictive models at voxel-level to identify key anatomical regions modified during the experiment course.
As expected, combination of multiple magnetic resonance imaging modalities resulted in an enhanced predictive power (between 3 and 16%) with heterogeneous modality contribution. Surprisingly, we have identified some inborn alterations correlating high alcohol preference and thalamic neuroadaptations related to Naltrexone efficacy. As well, reproducible contribution of DTI and relaxometry -related biomarkers has been repeatedly identified guiding further studies in alcohol research.
In summary, along this research we demonstrate the feasibility of incorporating multimodal neuroimaging, machine learning algorithms, and animal research in the advance of the understanding alcohol-related disorders. / [ES] El abuso de alcohol es una de las mayores preocupaciones de las autoridades sanitarias en la Unión Europea. El consumo de alcohol en exceso afecta en mayor o menor medida la totalidad del organismo siendo el páncreas e hígado los más severamente afectados. Además de estos, el sistema nervioso central sufre deterioros relacionados con el alcohol y con frecuencia se presenta en paralelo con otras patologías psiquiátricas como la depresión u otras adicciones como la ludopatía. La presencia de estas comorbidades demuestra la complejidad de la patología en la que multitud de sistemas neuronales interaccionan entre sí.
El uso imágenes de resonancia magnética (RM) han ayudado en el estudio de enfermedades psiquiátricas facilitando el descubrimiento de mecanismos neurológicos fundamentales en el desarrollo y mantenimiento de la adicción al alcohol, recaídas y el efecto de los tratamientos disponibles. A pesar de los avances, todavía se necesita investigar más para identificar las bases biológicas que contribuyen a la enfermedad. En este sentido, los modelos animales sirven, por lo tanto, a discriminar aquellos factores únicamente relacionados con el alcohol controlando otros factores que facilitan el desarrollo del alcoholismo. Estudios de resonancia magnética en animales de laboratorio y su posterior evaluación en humanos juegan un papel fundamental en el entendimiento de las patologías psiquatricas como la addicción al alcohol.
La imagen por resonancia magnética se ha integrado en entornos clínicos como prueba diagnósticas no invasivas. A medida que el volumen de datos se va incrementando, se necesitan herramientas y metodologías capaces de fusionar información de muy distinta naturaleza y así establecer criterios diagnósticos cada vez más exactos. El poder predictivo de herramientas derivadas de la inteligencia artificial como el aprendizaje automático sirven de complemento a tradicionales métodos estadísticos.
En este trabajo se han abordado la mayoría de estos aspectos. Se han obtenido datos multimodales de resonancia magnética de un modelo validado en la investigación de patologías derivadas del consumo del alcohol, las ratas Marchigian-Sardinian desarrolladas en la Universidad de Camerino (Italia) y con consumos de alcohol comparables a los humanos. Para cada animal se han adquirido datos antes y después del consumo de alcohol y bajo dos condiciones de abstinencia (con y sin tratamiento de Naltrexona, una medicaciones anti-recaídas usada como farmacoterapia en el alcoholismo). Los datos de resonancia magnética multimodal consistentes en imágenes de difusión, de relaxometría y estructurales se han fusionado en un esquema analítico multivariable incorporando dos herramientas generalmente usadas en datos derivados de neuroimagen, Random Forest y Support Vector Machine. Nuestro esquema fue aplicado con dos objetivos diferenciados. Por un lado, determinar en qué fase experimental se encuentra el sujeto a partir de biomarcadores y por el otro, identificar sistemas cerebrales susceptibles de alterarse debido a una importante ingesta de alcohol y su evolución durante la abstinencia.
Nuestros resultados demostraron que cuando biomarcadores derivados de múltiples modalidades de neuroimagen se fusionan en un único análisis producen diagnósticos más exactos que los derivados de una única modalidad (hasta un 16% de mejora). Biomarcadores derivados de imágenes de difusión y relaxometría discriminan estados experimentales. También se han identificado algunos aspectos innatos que están relacionados con posteriores comportamientos con el consumo de alcohol o la relación entre la respuesta al tratamiento y los datos de resonancia magnética.
Resumiendo, a lo largo de esta tesis, se demuestra que el uso de datos de resonancia magnética multimodales en modelos animales combinados en esquemas analíticos multivariados es una herramienta válida en el entendimiento de patologías / [CAT] L'abús de alcohol es una de les majors preocupacions per part de les autoritats sanitàries de la Unió Europea. Malgrat la dificultat de establir xifres exactes, se estima que uns 23 milions de europeus actualment sofreixen de malalties derivades del alcoholisme amb un cost que supera els 150.000 milions de euros per a la societat. Un consum de alcohol en excés afecta en major o menor mesura el cos humà sent el pàncreas i el fetge el més afectats. A més, el cervell sofreix de deterioraments produïts per l'alcohol i amb freqüència coexisteixen amb altres patologies com depressió o altres addiccions com la ludopatia. Tot aquest demostra la complexitat de la malaltia en la que múltiple sistemes neuronals interactuen entre si.
Tècniques no invasives com el encefalograma (EEG) o imatges de ressonància magnètica (RM) han ajudat en l'estudi de malalties psiquiàtriques facilitant el descobriment de mecanismes neurològics fonamentals en el desenvolupament i manteniment de la addició, recaiguda i la efectivitat dels tractaments disponibles. Tot i els avanços, encara es necessiten més investigacions per identificar les bases biològiques que contribueixen a la malaltia. En aquesta direcció, el models animals serveixen per a identificar únicament dependents del abús del alcohol. Estudis de ressonància magnètica en animals de laboratori i posterior avaluació en humans jugarien un paper fonamental en l' enteniment de l'ús del alcohol.
L'ús de probes diagnostiques no invasives en entorns clínics has sigut integrades. A mesura que el volum de dades es incrementa, eines i metodologies per a la fusió d' informació de molt distinta natura i per tant, establir criteris diagnòstics cada vegada més exactes. La predictibilitat de eines desenvolupades en el camp de la intel·ligència artificial com la aprenentatge automàtic serveixen de complement a mètodes estadístics tradicionals.
En aquesta investigació se han abordat tots aquestes aspectes. Dades multimodals de ressonància magnètica se han obtingut de un model animal validat en l'estudi de patologies relacionades amb el consum d'alcohol, les rates Marchigian-Sardinian desenvolupades en la Universitat de Camerino (Italià) i amb consums d'alcohol comparables als humans. Per a cada animal es van adquirir dades previs i després al consum de alcohol i dos condicions diferents de abstinència (amb i sense tractament anti-recaiguda). Dades de ressonància magnètica multimodal constituides per imatges de difusió, de relaxometria magnètica i estructurals van ser fusionades en esquemes analítics multivariats incorporant dues metodologies validades en el camp de neuroimatge, Random Forest i Support Vector Machine. Nostre esquema ha sigut aplicat amb dos objectius diferenciats. El primer objectiu es determinar en quina fase experimental es troba el subjecte a partir de biomarcadors obtinguts per neuroimatge. Per l'altra banda, el segon objectiu es identificar el sistemes cerebrals susceptibles de ser alterats durant una important ingesta de alcohol i la seua evolució durant la fase del tractament.
El nostres resultats demostraren que l'ús de biomarcadors derivats de varies modalitats de neuroimatge fusionades en un anàlisis multivariat produeixen diagnòstics més exactes que els derivats de una única modalitat (fins un 16% de millora). Biomarcadors derivats de imatges de difusió i relaxometria van contribuir de distints estats experimentals. També s'han identificat aspectes innats que estan relacionades amb posterior preferències d'alcohol o la relació entre la resposta al tractament anti-recaiguda i les dades de ressonància magnètica.
En resum, al llarg de aquest treball, es demostra que l'ús de dades de ressonància magnètica multimodal en models animals combinats en esquemes analítics multivariats són una eina molt valida en l'enteniment i avanç de patologies psiquiàtriques com l'alcoholisme. / Cosa Liñán, A. (2017). Analytical fusion of multimodal magnetic resonance imaging to identify pathological states in genetically selected Marchigian Sardinian alcohol-preferring (msP) rats [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90523
|
13 |
A machine learning approach for ethnic classification: the British Pakistani faceKhalid Jilani, Shelina, Ugail, Hassan, Bukar, Ali M., Logan, Andrew J., Munshi, Tasnim January 2017 (has links)
No / Ethnicity is one of the most salient clues to face identity. Analysis of ethnicity-specific facial data is a challenging problem and predominantly carried out using computer-based algorithms. Current published literature focusses on the use of frontal face images. We addressed the challenge of binary (British Pakistani or other ethnicity) ethnicity classification using profile facial images. The proposed framework is based on the extraction of geometric features using 10 anthropometric facial landmarks, within a purpose-built, novel database of 135 multi-ethnic and multi-racial subjects and a total of 675 face images. Image dimensionality was reduced using Principle Component Analysis and Partial Least Square Regression. Classification was performed using Linear Support Vector Machine. The results of this framework are promising with 71.11% ethnic classification accuracy using a PCA algorithm + SVM as a classifier, and 76.03% using PLS algorithm + SVM as a classifier.
|
14 |
Automating debugging through data mining / Automatisering av felsökning genom data miningThun, Julia, Kadouri, Rebin January 2017 (has links)
Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug. / Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.
|
15 |
Εξόρυξη γνώσης από ιατροβιολογικά δεδομένα / Biomedical data miningΚαλλά, Μαρία-Παυλίνα 28 February 2013 (has links)
Πίσω από όλα αυτά τα δεδομένα που υπάρχουν
κρύβεται ένας τεράστιος θησαυρός γνώσεων τον οποίο δεν μπορούμε να αντιληφθούμε καθώς η μορφή των πληροφοριών δεν μας το επιτρέπει. Έτσι αναπτύχθηκαν μέθοδοι και τεχνικές που μας βοηθούν να βρούμε την κρυμμένη
γνώση και να την αξιοποιήσουμε προς όφελος κυρίως του κοινού και η πιο γνωστή
μέθοδος, με την οποία θα ασχοληθούμε και εμείς είναι η Εξόρυξη Γνώσης.
Στην εργασία που ακολουθεί θα μιλήσουμε για την χρήση των μεθόδων Εξόρυξης Γνώσης (όπως λέγονται) σε βιοϊατρικά δεδομένα.
Στην αρχή θα κάνουμε αναφορά στην Μοριακή Βιολογία και στην Βιοπληροφορική. Ακολούθως θα δουμε την Ανακάλυψη γνώσης από βάσεις δεδομένων. Θα δούμε αναλυτικά την Εξόρυξη γνώσης και πιο πολύ τις μεθόδους κατηγοριοποίησης. Τέλος θα εφαρμόσουμε τους αλγορίθμους σε ιατροβιολογικά δεδομένα και θα δούμε τα συμπεράσματα που προκύπτουν αλλά και μελλοντικές επεκτάσεις. / Behind all these data
there is hidden a huge treasure of knowledge which we can not understand . Thus developed methods and techniques that help us find the hidden
knowledge and to utilize it for the benefit of the public.
The most famous method, which we will study, is Data Mining.
In the work that follows we will discuss the use of data mining methods (as they are called) in biomedical data.
In the beginning, we will report information about Molecular Biology and Bioinformatics. Then. we will see the knowledge discovery in databases. We will see in detail the Data Mining and the classification methods. Finally we implement the algorithms in biomedical data and see the conclusions and future extensions.
|
16 |
Data-Driven Emptying Detection for Smart Recycling ContainersRutqvist, David January 2018 (has links)
Waste Management is one of the biggest challenges for modern cities caused by urbanisation and increased population. Smart Waste Management tries to solve this challenge with the help of techniques such as Internet of Things, machine learning and cloud computing. By utilising smart algorithms the time when a recycling container is going to be full can be predicted. By continuously measuring the filling level of containers and then partitioning the filling level data between consecutive emptyings a regression model can be used for prediction. In order to do this an accurate emptying detection is a requirement. This thesis investigates different data-driven approaches to solve the problem of an accurate emptying detection in a setting where the majority of the data are non-emptyings, i.e. suspected emptyings which by manual examination have been concluded not to be actual emptyings. This is done by starting with the currently deployed legacy solution and step-by-step increasing the performance by optimisation and machine learning models. The final solution achieves the classification accuracy of 99.1 % and the recall of 98.2 % by using a random forest classifier on a set of features based on the filling level at different given time spans. To be compared with the recall of 50 % by the legacy solution. In the end, it is concluded that the final solution, with a few minor practical modifications, is feasible for deployment in the next release of the system.
|
17 |
Mineração de dados aplicada à classificação do risco de evasão de discentes ingressantes em instituições federais de ensino superiorAMARAL, Marcelo Gomes do 08 July 2016 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-07-11T14:35:16Z
No. of bitstreams: 3
license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5) / Made available in DSpace on 2017-07-11T14:35:16Z (GMT). No. of bitstreams: 3
license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5)
projeto_v26016.pdf: 1271790 bytes, checksum: f724d8523f2ffdb11ce599aff1eb8eb6 (MD5)
Previous issue date: 2016-07-08 / As Instituições Federais de Ensino Superior (IFES) possuem um
importante papel no desenvolvimento social e econômico do país, contribuindo
para o avanço tecnológico e cientifico e fomentando investimentos. Nesse
sentido, entende-se que um melhor aproveitamento dos recursos educacionais
ofertados pelas IFES contribui para a evolução da educação superior, como um
todo. Uma maneira eficaz de atender esta necessidade é analisar o perfil dos
estudantes ingressos e procurar prever, com antecedência, casos indesejáveis
de evasão que, quanto mais cedo identificados, melhor poderão ser estudados
e tratados pela administração. Neste trabalho, propõe-se a definição de uma
abordagem para aplicação de técnicas diretas de Mineração de Dados
objetivando a classificação dos discentes ingressos de acordo com o risco de
evasão que apresentam. Como prova de conceito, a análise dos aspectos
inerentes ao processo de Mineração de Dados proposto se deu por meio de
experimentações conduzidas no ambiente da Universidade Federal de
Pernambuco (UFPE). Para alguns dos algoritmos classificadores, foi possível
obter uma acurácia de classificação de 73,9%, utilizando apenas dados
socioeconômicos disponíveis quando do ingresso do discente na instituição,
sem a utilização de nenhum dado dependente do histórico acadêmico. / The Brazilian's Federal Institutions of Higher Education have an
important role in the social and economic development of the country,
contributing to the technological and scientific advances and encouraging
investments. Therefore, it is possible to infer that a better use of the educational
resources offered by those institutions contributes to the evolution of higher
education as a whole. An effective way to meet this need is to analyze the
profile of the freshmen students and try to predict, as soon as possible,
undesirable cases of dropout that when earlier identified can be examined and
addressed by the institution's administration. This work propose the
development of a approach for direct application of Data Mining techniques to
classify newcomer students according to their dropout risk. As a viability proof,
the proposed Data Mining approach was evaluated through experimentations
conducted in the Federal University of Pernambuco. Some of the classification
algorithms tested had an classification accuracy of 73.9% using only
socioeconomic data available since the student's admission to the institution,
without the use of any academic related data.
|
18 |
Análise temporal da sinalização elétrica em plantas de soja submetidas a diferentes perturbações externas / Temporal analysis of electrical signaling in soybean plants subjected to different external disturbancesSaraiva, Gustavo Francisco Rosalin 31 March 2017 (has links)
Submitted by Michele Mologni (mologni@unoeste.br) on 2018-07-27T17:57:40Z
No. of bitstreams: 1
Gustavo Francisco Rosalin Saraiva.pdf: 5041218 bytes, checksum: 30127a7816b12d3bd7e57182e6229bc2 (MD5) / Made available in DSpace on 2018-07-27T17:57:40Z (GMT). No. of bitstreams: 1
Gustavo Francisco Rosalin Saraiva.pdf: 5041218 bytes, checksum: 30127a7816b12d3bd7e57182e6229bc2 (MD5)
Previous issue date: 2017-03-31 / Plants are complex organisms with dynamic processes that, due to their sessile way of life, are influenced by environmental conditions at all times. Plants can accurately perceive and respond to different environmental stimuli intelligently, but this requires a complex and efficient signaling system. Electrical signaling in plants has been known for a long time, but has recently gained prominence with the understanding of the physiological processes of plants. The objective of this thesis was to test the following hypotheses: temporal series of data obtained from electrical signaling of plants have non-random information, with dynamic and oscillatory pattern, such dynamics being affected by environmental stimuli and that there are specific patterns in responses to stimuli. In a controlled environment, stressful environmental stimuli were applied in soybean plants, and the electrical signaling data were collected before and after the application of the stimulus. The time series obtained were analyzed using statistical and computational tools to determine Frequency Spectrum (FFT), Autocorrelation of Values and Approximate Entropy (ApEn). In order to verify the existence of patterns in the series, classification algorithms from the area of machine learning were used. The analysis of the time series showed that the electrical signals collected from plants presented oscillatory dynamics with frequency distribution pattern in power law. The results allow to differentiate with great efficiency series collected before and after the application of the stimuli. The PSD and autocorrelation analyzes showed a great difference in the dynamics of the electric signals before and after the application of the stimuli. The ApEn analysis showed that there was a decrease in the signal complexity after the application of the stimuli. The classification algorithms reached significant values in the accuracy of pattern detection and classification of the time series, showing that there are mathematical patterns in the different electrical responses of the plants. It is concluded that the time series of bioelectrical signals of plants contain discriminant information. The signals have oscillatory dynamics, having their properties altered by environmental stimuli. There are still mathematical patterns built into plant responses to specific stimuli. / As plantas são organismos complexos com processos dinâmicos que, devido ao seu modo séssil de vida, sofrem influência das condições ambientais todo o tempo. Plantas podem percebem e responder com precisão a diferentes estímulos ambientais de forma inteligente, mas para isso se faz necessário um complexo e eficiente sistema de sinalização. A sinalização elétrica em plantas já é conhecida há muito tempo, mas vem ganhando destaque recentemente com seu entendimento em relação aos processos fisiológicos das plantas. O objetivo desta tese foi testar as seguintes hipóteses: séries temporais de dados obtidos da sinalização elétrica de plantas possuem informação não aleatória, com padrão dinâmico e oscilatório, sendo tal dinâmica afetada por estímulos ambientais e que há padrões específicos nas respostas a estímulos. Em ambiente controlado, foram aplicados estímulos ambientais estressantes em plantas de soja, e captados os dados de sinalização elétrica antes e após a aplicação dos mesmos. As séries temporais obtidas foram analisadas utilizando ferramentas estatísticas e computacionais para se determinar o Espectro de Frequências (FFT), Autocorrelação dos valores e Entropia Aproximada (ApEn). Para se verificar a existência de padrões nas séries, foram utilizados algoritmos de classificação da área de aprendizado de máquina. A análise das séries temporais mostrou que os sinais elétricos coletados de plantas apresentaram dinâmica oscilatória com padrão de distribuição de frequências em lei de potência. Os resultados permitem diferenciar com grande eficácia séries coletadas antes e após a aplicação dos estímulos. As análises de PSD e autocorrelação mostraram grande diferença na dinâmica dos sinais elétricos antes e após a aplicação dos estímulos. A análise de ApEn mostrou haver diminuição da complexidade do sinal após a aplicação dos estímulos. Os algoritmos de classificação alcançaram valores significativos na acurácia de detecção de padrões e classificação das séries temporais, mostrando haver padrões matemáticos nas diferentes respostas elétricas das plantas. Conclui-se que as séries temporais de sinais bioelétricos de plantas possuem informação discriminante. Os sinais possuem dinâmica oscilatória, tendo suas propriedades alteradas por estímulos ambientais. Há ainda padrões matemáticos embutidos nas respostas da planta a estímulos específicos.
|
19 |
Comparison of Machine learningalgorithms on Predicting Churn withinMusic streaming serviceGaddam, Lahari, Kadali, Sree Lakshmi Hiranmayee January 2022 (has links)
Background: Customer churn prediction is one of the most popular part of bigbusinesses and often help the companies in customer retention and revenue generation.Customer churn may lead to huge loss of revenue and is important to analyzeand determine the cause for churn. Moreover, it is easier to retain an existing customerrather than acquiring new clients.Therefore, to get a better understanding onchurn prediction, this research work focuses on finding the best performing machinelearning model after effective comparision among four machine learning models. Theresearch also gives a brief report of latest literature work done in churn analysis ofmusic streaming services. Objectives: In this thesis work, we aim to research about churn prediction done inmusic streaming services. We focus on two main objectives, first one includes literaturereview on the latest research work done in churn prediction of music streamingservices. Secondly, we aim in comparing the performance of four supervised machinelearning algorithms, to find out the best performing algorithm for churn prediction. Methods: This thesis involves two methods literature review and experimentationto answer our research questions. We chose to use literature review for RQ1 soit can give a better understanding on our selected problem and works as base workfor our research and helps in clear and better comprehension. Experimentation ischosen for RQ2 to to build and train the selected machine learning model to validatethe performance of algorithms. Experimentation is chosen because it gives betterresults and prediction compared to surveys and reviews. Results: We have selected four classification supervised machine learning algorithmsnamely, Logistic regression, Naive Bayes, KNN, and RF in this research.Upon experimentation and training the models using the algorithms with a preprocessingthe KKBox’s dataset, RF achieved highest accuracy of 97% compared toother models. Conclusions: We have trained four models using the four machine learning algorithmsfor the prediction of churn in music streaming service domain. Upon trainingthe models with the KKBox’s dataset and upon experimentation, we came to a conclusionthat RF has the best performance with better accuracy and AUC score.
|
20 |
Contributions to evaluation of machine learning models. Applicability domain of classification modelsRado, Omesaad A.M. January 2019 (has links)
Artificial intelligence (AI) and machine learning (ML) present some application opportunities and
challenges that can be framed as learning problems. The performance of machine learning models
depends on algorithms and the data. Moreover, learning algorithms create a model of reality through
learning and testing with data processes, and their performance shows an agreement degree of their
assumed model with reality. ML algorithms have been successfully used in numerous classification
problems. With the developing popularity of using ML models for many purposes in different domains,
the validation of such predictive models is currently required more formally. Traditionally, there are
many studies related to model evaluation, robustness, reliability, and the quality of the data and the
data-driven models. However, those studies do not consider the concept of the applicability domain
(AD) yet. The issue is that the AD is not often well defined, or it is not defined at all in many fields. This
work investigates the robustness of ML classification models from the applicability domain
perspective. A standard definition of applicability domain regards the spaces in which the model
provides results with specific reliability.
The main aim of this study is to investigate the connection between the applicability domain approach
and the classification model performance. We are examining the usefulness of assessing the AD for
the classification model, i.e. reliability, reuse, robustness of classifiers. The work is implemented using
three approaches, and these approaches are conducted in three various attempts: firstly, assessing
the applicability domain for the classification model; secondly, investigating the robustness of the
classification model based on the applicability domain approach; thirdly, selecting an optimal model
using Pareto optimality. The experiments in this work are illustrated by considering different machine
learning algorithms for binary and multi-class classifications for healthcare datasets from public
benchmark data repositories. In the first approach, the decision trees algorithm (DT) is used for the
classification of data in the classification stage. The feature selection method is applied to choose
features for classification. The obtained classifiers are used in the third approach for selection of
models using Pareto optimality. The second approach is implemented using three steps; namely,
building classification model; generating synthetic data; and evaluating the obtained results.
The results obtained from the study provide an understanding of how the proposed approach can help
to define the model’s robustness and the applicability domain, for providing reliable outputs. These
approaches open opportunities for classification data and model management. The proposed
algorithms are implemented through a set of experiments on classification accuracy of instances,
which fall in the domain of the model. For the first approach, by considering all the features, the
highest accuracy obtained is 0.98, with thresholds average of 0.34 for Breast cancer dataset. After
applying recursive feature elimination (RFE) method, the accuracy is 0.96% with 0.27 thresholds
average. For the robustness of the classification model based on the applicability domain approach,
the minimum accuracy is 0.62% for Indian Liver Patient data at r=0.10, and the maximum accuracy is
0.99% for Thyroid dataset at r=0.10. For the selection of an optimal model using Pareto optimality,
the optimally selected classifier gives the accuracy of 0.94% with 0.35 thresholds average.
This research investigates critical aspects of the applicability domain as related to the robustness of
classification ML algorithms. However, the performance of machine learning techniques depends on
the degree of reliable predictions of the model. In the literature, the robustness of the ML model can
be defined as the ability of the model to provide the testing error close to the training error. Moreover,
the properties can describe the stability of the model performance when being tested on the new
datasets. Concluding, this thesis introduced the concept of applicability domain for classifiers and
tested the use of this concept with some case studies on health-related public benchmark datasets. / Ministry of Higher Education in Libya
|
Page generated in 0.131 seconds