Global ETD Search

1	The crucial parts of text classiﬁcation with TensorFlow.js and categorisation of news articles Nordberg, Gustav, Grandien, Jesper January 2020 (has links) Text classification is a subset of machine learning which is used to classify texts such as tweets, email, news headlines or articles, with tags or categories. As news publishing can have uncertainty in their categorisations, text classification could categorise articles autonomously and distinguish unclear categorisations. The library TensorFlow helps with operations and tools for the machine learning workflow. This paper takes focus on the crucial parts of working with machine learning using TensorFlow.js and to what extent this model can categorise a news article. The authors evaluates different models to analyse how optimising the settings will affect the accuracy of the model. Results of this paper was researched with a literature study of official documentations and peer reviewed reports. An empirical experiment where machine learning models were trained in TensorFlow.js was also performed. The results showed that the model with the highest accuracy with 87.17% accuracy was trained with 1000 articles using Relu and Softmax activation functions and the Mean squared error loss function. While the model with lowest accuracy had 75.5% using Sigmoid activation functions and Categorical cross-entropy on the 5000 articles training set. Crucial parts for this development were: optimizer function, loss function, batch size, activation functions, training data and test data with labels, normalise function, shapes of layers and computing power. There are several parts and functions to take in consideration when developing a machine learning model with text classification in TensorFlow.js. The training process needs to be performed multiple times as there are many parameters which has an affect on the model results. The model results can be improved by optimising and finding the best combination between different functions and parameters. TensorFlow.js Machine learning Text classiﬁcation JavaScript Software Engineering Programvaruteknik
2	Analyse de l’environnement sonore pour le maintien à domicile et la reconnaissance d’activités de la vie courante, des personnes âgées / Sound analysis oh the environment for healthcare and recognition of daily life activities for the elderly Robin, Maxime 17 April 2018 (has links) L’âge moyen de la population française et européenne augmente, cette constatation apporte de nouveaux enjeux techniques et sociétaux, les personnes âgées étant les personnes les plus fragiles et les plus vulnérables, notamment du point de vue des accidents domestiques et en particulier des chutes. C’est pourquoi de nombreux projets d’aide aux personnes âgées : techniques, universitaires et commerciaux ont vu le jour ces dernières années. Ce travail de thèse a été eﬀectué sous convention Cifre, conjointement entre l’entreprise KRG Corporate et le laboratoire BMBI (Biomécanique et Bio-ingénierie) de l’UTC (Université de technologie de Compiègne). Elle a pour objet de proposer un capteur de reconnaissance de sons et des activités de la vie courante, dans le but d’étoﬀer et d’améliorer le système de télé-assistance déjà commercialisé par la société. Plusieurs méthodes de reconnaissance de parole ou de reconnaissance du locuteur ont déjà été éprouvées dans le domaine de la reconnaissance de sons, entre autres les techniques : GMM (Modèle de mélange gaussien–Gaussian Mixture Model), SVM-GSL (Machine à vecteurs de support, GMM-super-vecteur à noyau linéaire – Support vector machine GMM Supervector Linear kernel) et HMM (Modèle de Markov caché – Hidden Markov Model). De la même manière, nous nous sommes proposés d’utiliser les i-vecteurs pour la reconnaissance de sons. Les i-vecteurs sont utilisés notamment en reconnaissance de locuteur, et ont révolutionné ce domaine récemment. Puis nous avons élargi notre spectre, et utilisé l’apprentissage profond (Deep Learning) qui donne actuellement de très bon résultats en classiﬁcation tous domaines confondus. Nous les avons tout d’abord utilisés en renfort des i-vecteurs, puis nous les avons utilisés comme système de classiﬁcation exclusif. Les méthodes précédemment évoquées ont également été testées en conditions bruités puis réelles. Ces diﬀérentes expérimentations nous ont permis d’obtenir des taux de reconnaissance très satisfaisants, les réseaux de neurones en renfort des i-vecteurs et les réseaux de neurones seuls étant les systèmes ayant la meilleure précision, avec une amélioration très signiﬁcative par rapport aux diﬀérents systèmes issus de la reconnaissance de parole et de locuteur. / The average age of the French and European population is increasing; this observation brings new technical and societal challenges. Older people are the most fragile and vulnerable, especially in terms of domestic accidents and speciﬁcally falls. This is why many elderly people care projects : technical, academic and commercial have seen the light of day in recent years. This thesis work wasc arried out under Cifre agreement, jointly between the company KRG Corporate and the BMBI laboratory (Biomechanics and Bioengineering) of the UTC (Université of Technologie of Compiègne). Its purpose is to oﬀer a sensor for sound recognition and everyday activities, with the aim of expanding and improving the tele-assistance system already marketed by the company. Several speech recognition or speaker recognition methods have already been proven in the ﬁeld of sound recognition, including GMM (Modèle de mélange gaussien – Gaussian Mixture Model), SVM-GSL (Machine à vecteurs de support, GMM-super-vecteur à noyau linéaire – Support vector machine GMM Supervector Linear kernel) and HMM (Modèle de Markov caché – Hidden Markov Model). In the same way, we proposed to use i-vectors for sound recognition. I-Vectors are used in particular in speaker recognition, and have revolutionized this ﬁeld recently. Then we broadened our spectrum, and used Deep Learning, which currently gives very good results in classiﬁcation across all domains. We ﬁrst used them to reinforce the i-vectors, then we used them as our exclusive classiﬁcation system. The methods mentioned above were also tested under noisy and then real conditions. These diﬀerent experiments gaves us very satisfactory recognition rates, with neural networks as reinforcement for i-vectors and neural networks alone being the most accurate systems, with a very signiﬁcant improvement compared to the various speech and speaker recognition systems. Reconnaissance de sons I-vecteurs Réseau de neurones profonds Paramètres acoustiques Classiﬁcation en milieux bruités Sounds recognition I-vectors Deep learning Acoustic parameters Classiﬁcation in noisy environments Remarkable Energy Rate (RER)
3	Evidential calibration and fusion of multiple classifiers : application to face blurring / Calibration et fusion évidentielles de classifieurs : application à l'anonymisation de visages Minary, Pauline 08 December 2017 (has links) Aﬁn d’améliorer les performances d’un problème de classiﬁcation, une piste de recherche consiste à utiliser plusieurs classiﬁeurs et à fusionner leurs sorties. Pour ce faire, certaines approches utilisent une règle de fusion. Cela nécessite que les sorties soient d’abord rendues comparables, ce qui est généralement eﬀectué en utilisant une calibration probabiliste de chaque classiﬁeur. La fusion peut également être réalisée en concaténant les sorties et en appliquant à ce vecteur une calibration probabiliste conjointe. Récemment, des extensions des calibrations d’un classiﬁeur individuel ont été proposées en utilisant la théorie de l’évidence, aﬁn de mieux représenter les incertitudes. Premièrement, cette idée est adaptée aux techniques de calibrations probabilistes conjointes, conduisant à des versions évidentielles. Cette approche est comparée à celles mentionnées ci-dessus sur des jeux de données de classiﬁcation classiques. Dans la seconde partie, le problème d’anonymisation de visages sur des images, auquel SNCF doit répondre, est considéré. Une méthode consiste à utiliser plusieurs détecteurs de visages, qui retournent des boites et des scores de conﬁance associés, et à combiner ces sorties avec une étape d’association et de calibration évidentielle. Il est montré que le raisonnement au niveau pixel est plus intéressant que celui au niveau boite et que, parmi les approches de fusion abordées dans la première partie, la calibration conjointe évidentielle donne les meilleurs résultats. Enﬁn, le cas des images provenant de vidéos est considéré. Pour tirer parti de l’information contenue dans les vidéos, un algorithme de suivi classique est ajouté au système. / In order to improve overall performance of a classiﬁcation problem, a path of research consists in using several classiﬁers and to fuse their outputs. To perform this fusion, some approaches merge the outputs using a fusion rule. This requires that the outputs be made comparable beforehand, which is usually done using a probabilistic calibration of each classiﬁer. The fusion can also be performed by concatenating the classiﬁer outputs into a vector, and applying a joint probabilistic calibration to it. Recently, extensions of probabilistic calibrations of an individual classiﬁer have been proposed using evidence theory, in order to better represent the uncertainties inherent to the calibration process. In the ﬁrst part of this thesis, this latter idea is adapted to joint probabilistic calibration techniques, leading to evidential versions. This approach is then compared to the aforementioned ones on classical classiﬁcation datasets. In the second part, the challenging problem of blurring faces on images, which SNCF needs to address, is tackled. A state-of-the-art method for this problem is to use several face detectors, which return boxes with associated conﬁdence scores, and to combine their outputs using an association step and an evidential calibration. In this report, it is shown that reasoning at the pixel level is more interesting than reasoning at the box-level, and that among the fusion approaches discussed in the ﬁrst part, the evidential joint calibration yields the best results. Finally, the case of images coming from videos is considered. To leverage the information contained in videos, a classical tracking algorithm is added to the blurring system. Calibration Détection de visages Théorie des fonctions de croyance Classiﬁcation Fusion d’informations Régression logistique Calibration Face detection Theory of belief functions Classiﬁcation Information fusion Logistic regression 621.39
4	Classifying Urgency : A Study in Machine Learning for Classifying the Level of Medical Emergency of an Animal’s Situation Strallhofer, Daniel, Ahlqvist, Jonatan January 2018 (has links) This paper explores the use of Naive Bayes as well a Linear Support Vector Machines in order to classify a text based on the level of medical emergency. The primary source of testing will be an online veterinarian service’s customer data. The aspects explored are whether a single text gives enough information for a medical decision to be made and if there are alternative data gathering processes that would be preferred. Past research has proven that text classiﬁers based on Naive Bayes and SVMs can often give good results. We show how to optimize the results so that important decisions can be made with these classiﬁcations as a basis. Optimal data gathering procedures will be a part of this optimization process. The business applications of such a venture will also be discussed since implementing such a system in an online medical service will possibly affect customer ﬂow, goodwill, cost/revenue, and online competitiveness. / Denna studie utforskar användandet av Naive Bayes samt Linear Support Vector Machines för att klassificera en text på en medicinsk skala. Den huvudsakliga datamängden som kommer att användas för att göra detta är kundinformation från en online veterinär. Aspekter som utforskas är om en enda text kan innehålla tillräckligt med information för att göra ett medicinskt beslut och om det finns alternativa metoder för att samla in mer anpassade datamängder i framtiden. Tidigare studier har bevisat att både Naive Bayes och SVMs ofta kan nå väldigt bra resultat. Vi visar hur man kan optimera resultat för att främja framtida studier. Optimala metoder för att samla in datamängder diskuteras som en del av optimeringsprocessen. Slutligen utforskas även de affärsmässiga aspekterna utigenom implementationen av ett datalogiskt system och hur detta kommer påverka kundflödet, goodwill, intäkter/kostnader och konkurrenskraft. Medical Urgency Veterinarian Text Classiﬁcation Machine Learning Multinomial Naive Bayes Linear Support Vector Classiﬁcation Edge cases Data gathering process
5	Improving the performance of the prediction analysis of microarrays algorithm via different thresholding methods and heteroscedastic modeling Sahtout, Mohammad Omar January 1900 (has links) Doctor of Philosophy / Department of Statistics / Haiyan Wang / This dissertation considers diﬀerent methods to improve the performance of the Prediction Analysis of Microarrays (PAM). PAM is a popular algorithm for high-dimensional classiﬁcation. However, it has a drawback of retaining too many features even after multiple runs of the algorithm to perform further feature selection. The average number of selected features is 2611 from the application of PAM to 10 multi-class microarray human cancer datasets. Such a large number of features make it diﬃcult to perform follow up study. This drawback is the result of the soft thresholding method used in the PAM algorithm and the thresholding parameter estimate of PAM. In this dissertation, we extend the PAM algorithm with two other thresholding methods (hard and order thresholding) and a deep search algorithm to achieve better thresholding parameter estimate. In addition to the new proposed algorithms, we derived an approximation for the probability of misclassiﬁcation for the hard thresholded algorithm under the binary case. Beyond the aforementioned work, this dissertation considers the heteroscedastic case in which the variances for each feature are diﬀerent for diﬀerent classes. In the PAM algorithm the variance of the values for each predictor was assumed to be constant across diﬀerent classes. We found that this homogeneity assumption is invalid for many features in most data sets, which motivates us to develop the new heteroscedastic version algorithms. The diﬀerent thresholding methods were considered in these algorithms. All new algorithms proposed in this dissertation are extensively tested and compared based on real data or Monte Carlo simulation studies. The new proposed algorithms, in general, not only achieved better cancer status prediction accuracy, but also resulted in more parsimonious models with signiﬁcantly smaller number of genes. Prediction analysis of microarrays High dimensional classiﬁcation Nearest shrunken centroids Thresholding Statistics (0463)
6	Částice se spinem v algebraicky speciálních prostoročasech / Spinning particles in algebraically special space-times Šrámek, Milan January 2013 (has links) Spinning-particle motion is studied, within the pole-dipole approximation, in algebraically special space-times of type N, III and D. The spin-curvature interaction is analysed for the Pirani and Tulczyjew spin supplementary conditions; for N and D types, the condition is related to a relative acceleration of two near observers separated in the direction of particle's spin. For Tulczyjew's condition, the momentum-velocity relation is also studied as well as its consequences for the spin-curvature interaction. Finally, the type of motion is mentioned for which both the supplementary conditions considered are equivalent.
7	Disorderclassifier: classificação de texto para categorização de transtornos mentais NUNES, Francisca Pâmela Carvalho 23 August 2016 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-04-19T13:35:36Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) DISSERTAÇÃO_Franscisca Pamela Carvalho.pdf: 2272114 bytes, checksum: 83ff79a7d05409b93fe71ce4c307dc30 (MD5) / Made available in DSpace on 2017-04-19T13:35:36Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) DISSERTAÇÃO_Franscisca Pamela Carvalho.pdf: 2272114 bytes, checksum: 83ff79a7d05409b93fe71ce4c307dc30 (MD5) Previous issue date: 2016-08-23 / Nos últimos anos, através da Internet, a comunicação se tornou mais ampla e acessível. Com o grande crescimento das redes sociais, blogs, sites em geral, foi possível estabelecer uma extensa base de conteúdo diversificado, onde os usuários apresentam suas opiniões e relatos pessoais. Esses informes podem ser relevantes para observações futuras ou até mesmo para o auxílio na tomada de decisão de outras pessoas. No entanto, essa massa de informação está esparsa na Web, em formato livre, dificultando a análise manual dos textos para categorização dos mesmos. Tornar esse trabalho automático é a melhor opção, porém a compreensão desses textos em formato livre não é um trabalho simples para o computador, devido a irregularidades e imprecisões da língua natural. Nessas circunstâncias, estão surgindo sistemas que classificam textos, de forma automática, por tema, gênero, características, entre outros, através dos conceitos da área de Mineração de Texto (MT). A MT objetiva extrair informações importantes de um texto, através da análise de um conjunto de documentos textuais. Diversos trabalhos de MT foram sugeridos em âmbitos variados como, por exemplo, no campo da psiquiatria. Vários dos trabalhos propostos, nessa área, buscam identificar características textuais para percepção de distúrbios psicológicos, para análise dos sentimentos de pacientes, para detecção de problemas de segurança de registros médicos ou até mesmo para exploração da literatura biomédica. O trabalho aqui proposto, busca analisar depoimentos pessoais de potenciais pacientes para categorização dos textos por tipo de transtorno mental, seguindo a taxonomia DSM-5. O procedimento oferecido classifica os relatos pessoais coletados, em quatro tipos de transtorno (Anorexia, TOC, Autismo e Esquizofrenia). Utilizamos técnicas de MT para o pré-processamento e classificação de texto, com o auxilio dos pacotes de software do Weka. Resultados experimentais mostraram que o método proposto apresenta alto índice de precisão e que a fase de pré-processamento do texto tem impacto nesses resultados. A técnica de classificação Support Vector Machine (SVM) apresentou melhor desempenho, para os fins apresentados, em comparação a outras técnicas usadas na literatura. / In the last few years, through the internet, communication became broader and more accessible. With the growth of social media, blogs, and websites in general, it became possible to establish a broader, diverse content base, where users present their opinions and personal stories. These data can be relevant to future observations or even to help other people’s decision process. However, this mass information is dispersing on the web, in free format, hindering the manual analysis for text categorization. Automating is the best option. However, comprehension of these texts in free format is not a simple task for the computer, taking into account irregularities and imprecisions of natural language. Giving these circumstances, automated text classification systems, by theme, gender, features, among others, are arising, through Text Mining (MT) concepts. MT aims to extract information from a text, by analyzing a set of text documents. Several MT papers were suggested on various fields, as an example, psychiatric fields. A number of proposed papers, in this area, try to identify textual features to perceive psychological disorders, to analyze patient’s sentiments, to detect security problems in medical records or even biomedical literature exploration. The paper here proposed aim to analyze potential patient’s personal testimonies for text categorization by mental disorder type, according to DSM-5 taxonomy. The offered procedure classifies the collected personal testimonies in four disorder types (anorexia, OCD, autism, and schizophrenia). MT techniques were used for pre-processing and text classification, with the support of software packages of Weka. Experimental results showed that the proposed method presents high precision values and the text pre-processing phase has impact in these results. The Support Vector Machine (SVM) classification technique presented better performance, for the presented ends, in comparison to other techniques used in literature. Mineração de Texto Classiﬁcação de texto Depoimentos pessoais Transtorno mental Text Mining Text classiﬁcation Personal testimonies Mental disorder
8	Design Patterns for Multi-Agent Systems Juziuk, Joanna January 2012 (has links) Design patterns document a field's systematic knowledge derived from experiences. Despite the vast body of work in the field of multi-agent systems (MAS), design patterns for MAS are not popular among software practitioners. As MAS have features that are widely considered as key to engineering complex distributed applications, for example in manufacturing, robotics, ecommerce, traffic control and coordination, science simulations, it is important to provide a clear overview of existing patterns to make this knowledge accessible. To that end, a systematic literature review was performed covering the main publication venues of the field since 1998, resulting in 206 patterns. The study shows that (1) there is a lack of a standard template for documenting design patterns for MAS, which hampers the use of patterns by practitioners, (2) associations between patterns are poorly described, which results in a lack of overview of the pattern space, (3) patterns for MAS have been used for a variety of application domains, which underpins their high potential for practitioners, and (4) classifications of design patterns for MAS are bounded to specific pattern catalogs, a more holistic view on the pattern space is missing. From the study, a number of guidelines is outlined that are important for future work on design patterns for MAS and their adoption in practice. design patterns multi-agent systems classiﬁcation guidelines Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
9	Knowledge Discovery and Predictive Modeling from Brain Tumor MRIs Zhou, Mu 16 September 2015 (has links) Quantitative cancer imaging is an emerging field that develops computational techniques to acquire a deep understanding of cancer characteristics for cancer diagnosis and clinical decision making. The recent emergence of growing clinical imaging data provides a wealth of opportunity to systematically explore quantitative information to advance cancer diagnosis. Crucial questions arise as to how we can develop specific computational models that are capable of mining meaningful knowledge from a vast quantity of imaging data and how to transform such findings into improved personalized health care? This dissertation presents a set of computational models in the context of malignant brain tumors— Giloblastoma Multiforme (GBM), which is notoriously aggressive with a poor survival rate. In particular, this dissertation developed quantitative feature extraction approaches for tumor diagnosis from magnetic resonance imaging (MRI), including a multi-scale local computational feature and a novel regional habitat quantification analysis of tumors. In addition, we proposed a histogram-based representation to investigate biological features to characterize ecological dynamics, which is of great clinical interest in evaluating tumor cellular distributions. Furthermore, in regards to clinical systems, generic machine learning techniques are typically incapable of generalizing well to specific diagnostic problems. Therefore, quantitative analysis from a data-driven perspective is becoming critical. In this dissertation, we propose two specific data-driven models to tackle different types of clinical MRI data. First, we inspected cancer systems from a time-domain perspective. We propose a quantitative histogram-based approach that builds a prediction model, measuring the differences from pre- and post-treatment diagnostic MRI data. Second, we investigated the problem of mining knowledge from a skewed distribution—data samples of each survival group are unequally distributed. We proposed an algorithmic framework to effectively predict survival groups by jointly considering imbalanced distributions and classifier design. Our approach achieved an accuracy of 95.24%, suggesting it captures class-specific information in a challenging clinical setting. Computer-aided Diagnosis Radiology Data Mining Feature Extraction Classiﬁcation Computer Engineering
10	Shearlet-Based Descriptors and Deep Learning Approaches for Medical Image Classification Al-Insaif, Sadiq 07 June 2021 (has links) In this Ph.D. thesis, we develop eﬀective techniques for medical image classiﬁcation, particularly, for histopathological and magnetic resonance images (MRI). Our techniques are capable of handling the high variability in the content of such images. Handcrafted techniques based on texture analysis are used for the classiﬁcation task. We also use deep learning models but training such models from scratch can be a challenging process, instead, we employ deep features and transfer learning. First, we propose a combined texture-based feature representation that is computed in the complex shearlet domain for histopathological image classiﬁcation. With complex coeﬃcients, we examine both the magnitude and relative phase of shearlets to form the feature space. Our proposed techniques are successful for histopathological image classiﬁcation. Furthermore, we investigate their ability to generalize to MRI datasets that present an additional challenge, namely high dimensionality. An MRI sample consists of a large number of slices. Our proposed shearlet-based feature representation for histopathological images cannot be used without adjustment. Therefore, we consider the 3D shearlet transform given the volumetric nature of MRI data. An advantage of the 3D shearlet transform is that it takes into consideration adjacent slices of MRI data. Secondly, we study the classiﬁcation of histopathological images using pre-trained deep learning models. A pre-trained deep learning model can act as a starting point for datasets with a limited number of samples. Therefore, we used various models either as unsupervised feature extractors, or weight initializers to classify histopathological images. When it comes to MRI samples, ﬁne-tuning a deep learning model is not straightforward. Pre-trained models are trained on RGB images which have a channel size of 3, but an MRI sample has a larger number of slices. Fine-tuning a convolutional neural network (CNN) requires adjusting a model to work with MRI data. We ﬁne-tune pre-trained models and then use them as feature extractors. Thereafter, we demonstrate the eﬀectiveness of ﬁne-tuned deep features with classical machine learning (ML) classiﬁers, namely a support vector machine and a decision tree bagger. Furthermore, instead of using a classical ML classiﬁer for the MRI sample, we built a custom CNN that takes both the 3D shearlet descriptors and deep features as an input. This custom network processes our feature representation end-to-end and then classiﬁes an MRI sample. Our custom CNN is more eﬀective in comparison to a classical ML on a hidden MRI dataset. It is an indication that our CNN model is less susceptible to over-ﬁtting. Texture Descriptors Shearlets MRI Images Deep Learning SVM Classiﬁcation Decision tree Histology

Search results