Global ETD Search

131	Modelos probabilísticos e não probabilísticos de classificação binária para pacientes com ou sem demência como auxílio na prática clínica em geriatria. Galdino, Maicon Vinícius. January 2020 (has links) Orientador: Liciana Vaz de Arruda Silveira / Resumo: Os objetivos deste trabalho foram apresentar modelos de classificação (Regressão Logística, Naive Bayes, Árvores de Classificação, Random Forest, k-Vizinhos mais próximos e Redes Neurais Artificiais) e a comparação destes utilizando processos de reamostragem em um conjunto de dados da área de geriatria (diagnóstico de demência). Analisar as pressuposições de cada metodologia, vantagens, desvantagens e cenários em que cada metodologia pode ser melhor utilizada. A justificativa e relevância desse projeto se baseiam na importância e na utilidade do tema proposto, visto que a população idosa aumenta em todo o mundo (nos países desenvolvidos e nos em desenvolvimento como o Brasil), os modelos de classificação podem ser úteis aos profissionais médicos, em especial aos médicos generalistas, no diagnóstico de demências, pois em diversos momentos o diagnóstico não é simples. / Doutor Regressão logística Árvore de classificação Redes neurais artificiais Naive Bayes Random Forest Algoritmo kNN
132	Méthodes d’apprentissage interactif pour la classification des messages courts / Interactive learning methods for short text classification Bouaziz, Ameni 19 June 2017 (has links) La classification automatique des messages courts est de plus en plus employée de nos jours dans diverses applications telles que l'analyse des sentiments ou la détection des « spams ». Par rapport aux textes traditionnels, les messages courts, comme les tweets et les SMS, posent de nouveaux défis à cause de leur courte taille, leur parcimonie et leur manque de contexte, ce qui rend leur classification plus difficile. Nous présentons dans cette thèse deux nouvelles approches visant à améliorer la classification de ce type de message. Notre première approche est nommée « forêts sémantiques ». Dans le but d'améliorer la qualité des messages, cette approche les enrichit à partir d'une source externe construite au préalable. Puis, pour apprendre un modèle de classification, contrairement à ce qui est traditionnellement utilisé, nous proposons un nouvel algorithme d'apprentissage qui tient compte de la sémantique dans le processus d'induction des forêts aléatoires. Notre deuxième contribution est nommée « IGLM » (Interactive Generic Learning Method). C'est une méthode interactive qui met récursivement à jour les forêts en tenant compte des nouvelles données arrivant au cours du temps, et de l'expertise de l'utilisateur qui corrige les erreurs de classification. L'ensemble de ce mécanisme est renforcé par l'utilisation d'une méthode d'abstraction permettant d'améliorer la qualité des messages. Les différentes expérimentations menées en utilisant ces deux méthodes ont permis de montrer leur efficacité. Enfin, la dernière partie de la thèse est consacrée à une étude complète et argumentée de ces deux prenant en compte des critères variés tels que l'accuracy, la rapidité, etc. / Automatic short text classification is more and more used nowadays in various applications like sentiment analysis or spam detection. Short texts like tweets or SMS are more challenging than traditional texts. Therefore, their classification is more difficult owing to their shortness, sparsity and lack of contextual information. We present two new approaches to improve short text classification. Our first approach is "Semantic Forest". The first step of this approach proposes a new enrichment method that uses an external source of enrichment built in advance. The idea is to transform a short text from few words to a larger text containing more information in order to improve its quality before building the classification model. Contrarily to the methods proposed in the literature, the second step of our approach does not use traditional learning algorithm but proposes a new one based on the semantic links among words in the Random Forest classifier. Our second contribution is "IGLM" (Interactive Generic Learning Method). It is a new interactive approach that recursively updates the classification model by considering the new data arriving over time and by leveraging the user intervention to correct misclassified data. An abstraction method is then combined with the update mechanism to improve short text quality. The experiments performed on these two methods show their efficiency and how they outperform traditional algorithms in short text classification. Finally, the last part of the thesis concerns a complete and argued comparative study of the two proposed methods taking into account various criteria such as accuracy, speed, etc. Classification des messages courts Sémantique Forêts aléatoires Interactivité Short text classification Semantics Random Forest Interactivity
133	Assessing palm decline in Florida by using advanced remote sensing with machine learning technologies and algorithms. Hanni, Christopher B. 21 March 2019 (has links) Native palms, such as the Sabal palmetto, play an important role in maintaining the ecological balance in Florida. As a side-effect of modern globalization, new phytopathogens like Texas Phoenix Palm Decline have been introduced into forest systems that threaten native palms. This presents new challenges for forestry managers and geographers. Advances in remote sensing has assisted the practice of forestry by providing spatial metrics regarding the type, quantity, location, and the state of heath for trees for many years. This study provides spatial details regarding the general palm decline in Florida by taking advantage of the new developments in deep learning constructs coupled with high resolution WorldView-2 multispectral/temporal satellite imagery and LiDAR point cloud data. A novel approach using TensorFlow deep learning classification, multiband spatial statistics and indices, data reduction, and step-wise refinement masking yielded a significant improvement over Random Forest classification in a comparison analysis. The results from the TensorFlow deep learning were then used to develop an Empirical Bayesian Kriging continuous raster as an informative map regarding palm decline zones using Normalized Difference Vegetation Index Change. The significance from this research showed a large portion of the study area exhibiting palm decline and provides a new methodology for deploying TensorFlow learning for multispectral satellite imagery. Change Detection Empirical Bayesian Kriging GLCM Random Forest TensorFlow WorldView-2 Geographic Information Sciences
134	Comparing Random forest and Kriging Methods for Surrogate Modeling Asritha, Kotha Sri Lakshmi Kamakshi January 2020 (has links) The issue with conducting real experiments in design engineering is the cost factor to find an optimal design that fulfills all design requirements and constraints. An alternate method of a real experiment that is performed by engineers is computer-aided design modeling and computer-simulated experiments. These simulations are conducted to understand functional behavior and to predict possible failure modes in design concepts. However, these simulations may take minutes, hours, days to finish. In order to reduce the time consumption and simulations required for design space exploration, surrogate modeling is used. \par Replacing the original system is the motive of surrogate modeling by finding an approximation function of simulations that is quickly computed. The process of surrogate model generation includes sample selection, model generation, and model evaluation. Using surrogate models in design engineering can help reduce design cycle times and cost by enabling rapid analysis of alternative designs.\par Selecting a suitable surrogate modeling method for a given function with specific requirements is possible by comparing different surrogate modeling methods. These methods can be compared using different application problems and evaluation metrics. In this thesis, we are comparing the random forest model and kriging model based on prediction accuracy. The comparison is performed using mathematical test functions. This thesis conducted quantitative experiments to investigate the performance of methods. After experimental analysis, it is found that the kriging models have higher accuracy compared to random forests. Furthermore, the random forest models have less execution time compared to kriging for studied mathematical test problems. Machine learning Regression Random Forest kriging Prediction models Surrogate models and Design engineering Computer Systems Datorsystem
135	Remote sensing-based land cover classification and change detection using Sentinel-2 data and Random Forest : A case study of Rusinga Island, Kenya Hesping, Malena January 2020 (has links) Healthy forests and soils are crucial for the very existence of mankind as they provide food, clean water and air, shade and protection against floods and storms. With their photosynthetic carbon storage ability, they mitigate climate change and fertilise and stabilise soils. Unfortunately, deforestation and the loss of fertile soils are the bleak reality and among the world’s most pressing challenges. Over the past decades Kenya has faced severe deforestation, but efforts are being undertaken to reverse deforestation, revegetate degraded land and combat erosion. Satellite remote sensing technology becomes increasingly useful for vegetation monitoring as the data quality improves and the costs decrease. This thesis explores the potential of free open access Sentinel-2 data for vegetation monitoring through Random Forest land cover classification and post-classification change detection on Rusinga Island, Kenya. Different single-date and multi-temporal predictor datasets differentiating respectively between five and four classes were examined to develop the most suitable model. The classification achieved acceptable results when assessed on an independent test dataset (overall accuracy of 90.06% with five classes and 96.89% with four classes), which should however be confirmed on the ground and could potentially be improved with better reference data. In this study, change detection could only be analysed over a time frame of two years, which is too short to produce meaningful results. Nevertheless, the method was proven conceptually and could be applied in the future to monitor land cover changes on Rusinga Island. Land cover classification post-classification change detection Random Forest remote sensing Sentinel-2 Environmental Sciences Miljövetenskap
136	A Statistical Framework for Classification of Tumor Type from microRNA Data / Ett statistiskt ramverk för klassificering av tumörtyp från mikroRNA data Röhss, Josefine January 2016 (has links) Hepatocellular carcinoma (HCC) is a type of liver cancer with low survival rate, not least due to the difficulty of diagnosing it in an early stage. The objective of this thesis is to build a random forest classification method based on microRNA (and messenger RNA) expression profiles from patients with HCC. The main purpose is to be able to distinguish between tumor samples and normal samples by measuring the miRNA expression. If successful, this method can be used to detect HCC at an earlier stage and to design new therapeutics. The microRNAs and messenger RNAs which have a significant difference in expression between tumor samples and normal samples are selected for building random forest classification models. These models are then tested on paired samples of tumor and surrounding normal tissue from patients with HCC. The results show that the classification models built for classifying tumor and normal samples have high prediction accuracy and hence show high potential for using microRNA and messenger RNA expression levels for diagnosis of HCC. / Hepatocellulär cancer (HCC) är en typ av levercancer med mycket låg överlevnadsgrad, inte minst på grund av svårigheten att diagnosticera i ett tidigt skede. Syftet med det här projektet är att bygga en klassificeringsmodell med random forest, baserad på uttrycksprofiler av mikroRNA (och budbärar-RNA) från patienter med HCC. Målet är att kunna skilja mellan tumörprover och normala prover genom att mäta uttrycket av mikroRNA. Om detta mål uppnås kan metoden användas för att upptäcka HCC i ett tidigare skede och för att utveckla nya läkemedel. De mikroRNA och budbärar-RNA som har en signifikant skillnad i uttryck mellan prover från tumörvävnad och intilliggande normal vävnad väljs ut för att bygga klassificaringsmodeller med random forest. Dessa modeller testas sedan på parade prover av tumörvävnad och intilliggande vävnad från patienter med HCC. Resultaten visar att modeller som byggs med denna metod kan klassificera tumörprover och normala prover med hög noggrannhet. Det finns således stor potential för att använda uttrycksprofiler från mikroRNA och budbärar-RNA för att diagnosticera HCC. miRNA mRNA random forest classification HCC diagnosis Probability Theory and Statistics Sannolikhetsteori och statistik
137	Ranking Aspect-Based Features in Restaurant Reviews Chan, Jacob Ling Hang 07 December 2020 (has links) Consumers continuously review products and services on the internet. Others have frequently relied on those reviews in making purchasing decisions. Review texts are usually free-form and associated with a star rating on a 5-point scale. The majority of restaurants receive a 3.5 or 4 star rating on average, so a standalone star rating does not provide adequate information for readers to make a decision. Many researchers have approached the problem with sentiment analysis to classify a sentence or a text as expressing a positive or a negative review. Sentiment analysis, even at the fine-grained level, can only provide classification of positive and negative judgments on any particular aspect under consideration. The novel method proposed in this thesis provides insight into what aspects reviewers deem as relevant when assigning star rating to restaurants. This is accomplished by using an interpretable star rating classification method that predicts star rating based on aspect and polarity score from the review. The model first assigns a polarity score for each aspect in the review text, then predicts a star rating, and outputs a ranked list of aspect importance according to a widely used restaurant reviews dataset. The result from this thesis suggests that the classification model is able to output a reliable ranking from the review texts. Sentiment Analysis Star Rating Prediction Feature Importance Random Forest Arts and Humanities
138	Factors Affecting the Preference of Buying Hybrid and Electric Vehicles Zhao, Zhenyu January 2021 (has links) Electric Vehicles is regarded as an important solution for emission reduction. But, the adoption to it is still a problem in many countries. With survey data containing demographic and attitude factors of respondents, this paper proposes two classification models: logistic regression and random forest using the Multiple Correspondence Analysis (MCA) as an intermediate step to identify the factors affecting the willingness of electric vehicles purchase. The analysis shows that the addition of MCA does enhance the explanatory power while it takes a low cost on prediction performance, and the results reveal that characteristics such as frequency of using modern transport services, car-sharing subscription, living place, mode of frequent trip do have a significant impact on EV purchases. Electric Vehicles Multiple Correspondence Analysis Logistic regression Random forest Probability Theory and Statistics Sannolikhetsteori och statistik
139	CAN STATISTICAL MODELS BEAT BENCHMARK PREDICTIONS BASED ON RANKINGS IN TENNIS? Svensson, William January 2021 (has links) The aim of this thesis is to beat a benchmark prediction of 64.58 percent based on player rankings on the ATP tour in tennis. That means that the player with the best rank in a tennis match is deemed as the winner. Three statistical model are used, logistic regression, random forest and XGBoost. The data are over a period between the years 2000-2010 and has over 60 000 observations with 49 variables each. After the data was prepared, new variables were created and the difference between the two players in hand taken all three statistical models did outperform the benchmark prediction. All three variables had an accuracy around 66 percent with the logistic regression performing the best with an accuracy of 66.45 percent. The most important variable overall for the models is the total win rate on different surfaces, the total win rate and rank. Logistic Regression Random Forest XGBoost ATP tour Probability Theory and Statistics Sannolikhetsteori och statistik
140	Spatial Data Science: Theory and Methods with Applications to Human Development in Morocco Lehnert, Matthew Ryan January 2021 (has links) No description available. Geography Geographic Information Science Spatial Econometrics Machine Learning Random Forest Morocco Human Development Index

Search results