Global ETD Search

271	A Client-Server Solution for Detecting Guns in School Environment using Deep Learning Techniques Olsson, Johan January 2019 (has links) Att använda maskininlärning för att detektera vapen eliminerar en konstant mänsklig övervakning, vilket också kan leda till en lägre responstid till polis. I den här rapporten undersöks hur en vapendetektor kan konstrueras och byggas som en del av en klient-server-lösning. / With the progress of deep learning methods the last couple of years, object detectionrelated tasks are improving rapidly. Using object detection for detecting guns in schoolsremove the need for human supervision and hopefully reduces police response time. Thispaper investigates how a gun detection system can be built by reading frames locally andusing a server for detection. The detector is based on a pre-trained SSD model and throughtransfer learning is taught to recognize guns. The detector obtained an Average Precisionof 51.1% and the server response time for a frame of size 1920 x 1080 was 480 ms, but couldbe scaled down to 240 x 135 to reach 210 ms, without affecting the accuracy. A non-gunclass was implemented to reduce the number of false positives and on a set of 300 imagescontaining 165 guns, the number of false positives dropped from 21 to 11. machine learning gun detection tensorflow server client training school shootings school environment gun control SSD RetinaNet transfer learning AI surveillance IP network camera Axis Communications Media and Communication Technology Medieteknik
272	A Client-Server Solution for Detecting Guns in School Environment using Deep Learning Techniques Olsson, Johan January 2019 (has links) With the progress of deep learning methods the last couple of years, object detection related tasks are improving rapidly. Using object detection for detecting guns in schools remove the need for human supervision and hopefully reduces police response time. This paper investigates how a gun detection system can be built by reading frames locally and using a server for detection. The detector is based on a pre-trained SSD model and through transfer learning is taught to recognize guns. The detector obtained an Average Precision of 51.1% and the server response time for a frame of size 1920 x 1080 was 480 ms, but could be scaled down to 240 x 135 to reach 210 ms, without affecting the accuracy. A non-gun class was implemented to reduce the number of false positives and on a set of 300 images containing 165 guns, the number of false positives dropped from 21 to 11. machine learning gun detection tensorflow server client training school shootings school environment gun control SSD RetinaNet transfer learning AI surveillance IP network camera Axis Communications Media and Communication Technology Medieteknik
273	Transfer learning between domains : Evaluating the usefulness of transfer learning between object classification and audio classification Frenger, Tobias, Häggmark, Johan January 2020 (has links) Convolutional neural networks have been successfully applied to both object classification and audio classification. The aim of this thesis is to evaluate the degree of how well transfer learning of convolutional neural networks, trained in the object classification domain on large datasets (such as CIFAR-10, and ImageNet), can be applied to the audio classification domain when only a small dataset is available. In this work, four different convolutional neural networks are tested with three configurations of transfer learning against a configuration without transfer learning. This allows for testing how transfer learning and the architectural complexity of the networks affects the performance. Two of the models developed by Google (Inception-V3, Inception-ResNet-V2), are used. These models are implemented using the Keras API where they are pre-trained on the ImageNet dataset. This paper also introduces two new architectures which are developed by the authors of this thesis. These are Mini-Inception, and Mini-Inception-ResNet, and are inspired by Inception-V3 and Inception-ResNet-V2, but with a significantly lower complexity. The audio classification dataset consists of audio from RC-boats which are transformed into mel-spectrogram images. For transfer learning to be possible, Mini-Inception, and Mini-Inception-ResNet are pre-trained on the dataset CIFAR-10. The results show that transfer learning is not able to increase the performance. However, transfer learning does in some cases enable models to obtain higher performance in the earlier stages of training. Convolutional neural networks Object classification Audio classification Transfer learning Inception-V3 Inception-ResNet-V2 Keras ImageNet Mini-Inception Mini-Inception-ResNet Mel-spectrogram CIFAR-10 Information Systems, Social aspects
274	Can Wizards be Polyglots: Towards a Multilingual Knowledge-grounded Dialogue System Liu, Evelyn Kai Yan January 2022 (has links) The research of open-domain, knowledge-grounded dialogue systems has been advancing rapidly due to the paradigm shift introduced by large language models (LLMs). While the strides have improved the performance of the dialogue systems, the scope is mostly monolingual and English-centric. The lack of multilingual in-task dialogue data further discourages research in this direction. This thesis explores the use of transfer learning techniques to extend the English-centric dialogue systems to multiple languages. In particular, this work focuses on five typologically diverse languages, of which well-performing models could generalize to the languages that are part of the language family as the target languages, hence widening the accessibility of the systems to speakers of various languages. I propose two approaches: Multilingual Retrieval-Augmented Dialogue Model (xRAD) and Multilingual Generative Dialogue Model (xGenD). xRAD is adopted from a pre-trained multilingual question answering (QA) system and comprises a neural retriever and a multilingual generation model. Prior to the response generation, the retriever fetches relevant knowledge and conditions the retrievals to the generator as part of the dialogue context. This approach can incorporate knowledge into conversational agents, thus improving the factual accuracy of a dialogue model. In addition, xRAD has advantages over xGenD because of its modularity, which allows the fusion of QA and dialogue systems so long as appropriate pre-trained models are employed. On the other hand, xGenD takes advantage of an existing English dialogue model and performs a zero-shot cross-lingual transfer by training sequentially on English dialogue and multilingual QA datasets. Both automated and human evaluation were carried out to measure the models' performance against the machine translation baseline. The result showed that xRAD outperformed xGenD significantly and surpassed the baseline in most metrics, particularly in terms of relevance and engagingness. Whilst xRAD performance was promising to some extent, a detailed analysis revealed that the generated responses were not actually grounded in the retrieved paragraphs. Suggestions were offered to mitigate the issue, which hopefully could lead to significant progress of multilingual knowledge-grounded dialogue systems in the future. Knowledge-grounded dialogue Dialogue systems Generative question answering Multilingual question answering Multilingual dialogue systems Transfer learning Multi-task learning Sequential training Conversational AI Natural Language Processing (NLP) Deep learning Machine learning
275	Brain Tumor Grade Classification in MR images using Deep Learning / Klassificering av hjärntumör-grad i MR-bilder genom djupinlärning Chatzitheodoridou, Eleftheria January 2022 (has links) Brain tumors represent a diverse spectrum of cancer types which can induce grave complications and lead to poor life expectancy. Amongst the various brain tumor types, gliomas are primary brain tumors that compose about 30% of adult brain tumors. They are graded according to the World Health Organization into Grades 1 to 4 (G1-G4), where G4 is the highest grade with the highest malignancy and poor prognosis. Early diagnosis and classification of brain tumor grade is very important since it can improve the treatment procedure and (potentially) prolong a patient's life, since life expectancy largely depends on the level of malignancy and the tumor's histological characteristics. While clinicians have diagnostic tools they use as a gold standard, such as biopsies these are either invasive or costly. A widely used example of a non-invasive technique is magnetic resonance imaging, due to its ability to produce images with different soft-tissue contrast and high spatial resolution thanks to multiple imaging sequences. However, the examination of such images can be overwhelming for radiologists due to the overall large amount of data. Deep learning approaches, on the other hand, have shown great potential in brain tumor diagnosis and can assist radiologists in the decision-making process. In this thesis, brain tumor grade classification in MR images is performed using deep learning. Two popular pre-trained CNN models (VGG-19, ResNet50) were employed using single MR modalities and combinations of them to classify gliomas into three grades. All models were trained using data augmentation on 2D images from the TCGA dataset, which consisted of 3D volumes from 142 anonymized patients. The models were evaluated based on accuracy, precision, recall, F1-score, AUC score, as well as the Wilcoxon Signed-Rank test to establish if one classifier was statistically significantly better than the other. Since deep learning models are typically 'black box' models and can be difficult to interpret by non-experts, Gradient-weighted Class Activation Mapping (Grad-CAM) was used in order to address model explainability. For single modalities, VGG-19 displayed the highest performance with a test accuracy of 77.86%, whilst for combinations of two and three modalities T1ce, FLAIR and T2, T1ce, FLAIR were the best performing ones for VGG-19 with a test accuracy of 74.48%, 75.78%, respectively. Statistical comparisons indicated that for single MR modalities and combinations of two MR modalities, there was not a statistically significant difference between the two classifiers, whilst for combination of three modalities, one model was better than the other. However, given the small size of the test population, these comparisons have low statistical power. The use of Grad-CAM for model explainability indicated that ResNet50 was able to localize the tumor region better than VGG-19. brain tumor glioma magnetic resonance imaging deep learning brain tumor grade classification convolutional neural network transfer learning explainability multi-class classification Probability Theory and Statistics Sannolikhetsteori och statistik
276	A study of transfer learning on data-driven motion synthesis frameworks / En studie av kunskapsöverföring på datadriven rörelse syntetiseringsramverk Chen, Nuo January 2022 (has links) Various research has shown the potential and robustness of deep learning-based approaches to synthesise novel motions of 3D characters in virtual environments, such as video games and films. The models are trained with the motion data that is bound to the respective character skeleton (rig). It inflicts a limitation on the scalability and the applicability of the models since they can only learn motions from one particular rig (domain) and produce motions in that domain only. Transfer learning techniques can be used to overcome this issue and allow the models to better adapt to other domains with limited data. This work presents a study of three transfer learning techniques for the proposed Objective-driven motion generation model (OMG), which is a model for procedurally generating animations conditioned on positional and rotational objectives. Three transfer learning approaches for achieving rig-agnostic encoding (RAE) are proposed and experimented with: Feature encoding (FE), Feature clustering (FC) and Feature selection (FS), to improve the learning of the model on new domains with limited data. All three approaches demonstrate significant improvement in both the performance and the visual quality of the generated animations, when compared to the vanilla performance. The empirical results indicate that the FE and the FC approaches yield better transferring quality than the FS approach. It is inconclusive which of them performs better, but the FE approach is more computationally efficient, which makes it the more favourable choice for real-time applications. / Många studier har visat potentialen och robustheten av djupinlärningbaserade modeller för syntetisering av nya rörelse för 3D karaktärer i virtuell miljö, som datorspel och filmer. Modellerna är tränade med rörelse data som är bunden till de respektive karaktärskeletten (rig). Det begränsar skalbarheten och tillämpningsmöjligheten av modellerna, eftersom de bara kan lära sig av data från en specifik rig (domän) och därmed bara kan generera animationer i den domänen. Kunskapsöverföringsteknik (transfer learning techniques) kan användas för att överkomma denna begränsning och underlättar anpassningen av modeller på nya domäner med begränsade data. I denna avhandling presenteras en studie av tre kunskapsöverföringsmetoder för den föreslagna måldriven animationgenereringsnätverk (OMG), som är ett neural nätverk-baserad modell för att procedurellt generera animationer baserade på positionsmål och rotationsmål. Tre metoder för att uppnå rig-agnostisk kodning är presenterade och experimenterade: Feature encoding (FE), Feature clustering (FC) and Feature selection (FS), för att förbättra modellens lärande på nya domäner med begränsade data. All tre metoderna visar signifikant förbättring på både prestandan och den visuella kvaliteten av de skapade animationerna, i jämförelse med den vanilla prestandan. De empiriska resultaten indikerar att både FE och FC metoderna ger bättre överföringskvalitet än FS metoden. Det går inte att avgöra vilken av de presterar bättre, men FE metoden är mer beräkningseffektiv, vilket är fördelaktigt för real-time applikationer. Transfer learning data-driven motion synthesis objective-driven motion generation rig-agnostic encoding deep learning-based clustering model procedural animation Kunskapsöverföring data-driven rörelsesyntetisering procedurell animation mål-driven animation-genereringsmodel rig-agnostisk-kodning djupinlärningsbaserad klusteringsmodel Computer Sciences Datavetenskap (datalogi)
277	Transfer Learning for Soil Spectroscopy Based on Convolutional Neural Networks and Its Application in Soil Clay Content Mapping Using Hyperspectral Imagery Liu, Lanfa, Ji, Min, Buchroithner, Manfred 14 December 2018 (has links) Soil spectra are often measured in the laboratory, and there is an increasing number of large-scale soil spectral libraries establishing across the world. However, calibration models developed from soil libraries are difficult to apply to spectral data acquired from the field or space. Transfer learning has the potential to bridge the gap and make the calibration model transferrable from one sensor to another. The objective of this study is to explore the potential of transfer learning for soil spectroscopy and its performance on soil clay content estimation using hyperspectral data. First, a one-dimensional convolutional neural network (1D-CNN) is used on Land Use/Land Cover Area Frame Survey (LUCAS) mineral soils. To evaluate whether the pre-trained 1D-CNN model was transferrable, LUCAS organic soils were used to fine-tune and validate the model. The fine-tuned model achieved a good accuracy (coefficient of determination (R²) = 0.756, root-mean-square error (RMSE)= 7.07 and ratio of percent deviation (RPD) = 2.26) for the estimation of clay content. Spectral index, as suggested as a simple transferrable feature, was also explored on LUCAS data, but did not performed well on the estimation of clay content. Then, the pre-trained 1D-CNN model was further fine-tuned by field samples collect in the study area with spectra extracted from HyMap imagery, achieved an accuracy of R² = 0.601, RMSE = 8.62 and RPD = 1.54. Finally, the soil clay map was generated with the fine-tuned 1D-CNN model and hyperspectral data. info:eu-repo/classification/ddc/620 ddc:620
278	Zero-shot, One Kill: BERT for Neural Information Retrieval Efes, Stergios January 2021 (has links) [Background]: The advent of bidirectional encoder representation from trans- formers (BERT) language models (Devlin et al., 2018) and MS Marco, a large scale human-annotated dataset for machine reading comprehension (Bajaj et al., 2016) that made publicly available, led the field of information retrieval (IR) to experience a revolution (Lin et al., 2020). The retrieval model based on BERT of Nogueira and Cho (2019), by the time they published their paper, became the top entry in the MS Marco passage-reranking leaderboard, surpassing the previous state of the art by 27% in MRR@10. However, training such neural IR models for different domains than MS Marco is still hard because neural approaches often require a vast amount of training data to perform effectively, which is not always available. To address the problem of the shortage of labelled data a new line of research emerged, training neural models with weak supervision. In weak supervision, given an unlabelled dataset labels are generated automatically using an existing model and then a machine learning model is trained upon the artificial “weak“ data. In case of weak supervision for IR, the training dataset comes in the form of a tuple (query, passage). Dehghani et al. (2017) in their work used the AOL query logs (Pass et al., 2006), which is a set of millions of real web queries, and BM25 to retrieve the relevant passages for each of the user queries. A drawback with this approach is that it is hard to obtain query logs for every single different domain. [Objective]: This thesis proposes an intuitive approach for addressing the shortage of data in domains with limited or no data at all through transfer learning in the context of IR. We leverage Wikipedia’s structure for creating a Wikipedia-based generic IR training dataset for zero-shot neural models. [Method]: We create the “pseudo-queries“ by concatenating the titles of Wikipedia’s articles along with each of their title sections and we consider the associated section’s passage as the relevant passage of the pseudo-queries. All of our experiments are evaluated on a standard collection: MS Marco, which is a large scale web collection. For our zero-shot experiments, our proposed model, called “Wiki“, is a BERT model trained on the artificial Wikipedia-based dataset and the baseline is a default BERT model without any additional training. In our second line of experiments, we explore the benefits gained by pre-fine- tuning on the Wikipedia-based IR dataset and further fine-tuning on in-domain data. Our proposed model, "Wiki+Ma", is a BERT model pre-fine-tuned in the Wikipedia-based dataset and further fine-tuned in MS Marco, while the baseline is a BERT model fine-tuned only in MS Marco. [Results]: Results regarding our first experiments show that our BERT model trained on the Wikipedia-based IR dataset, called "Wiki", achieves a performance of 0.197 in MRR@10, which is about +10 points more in comparison to a BERT model with default weights; in addition, results in the development set indicate that the “Wiki“ model performs better than BERT model trained on in-domain data when the data is between 10k-50k instances. Results regarding our second line of experiments show that pre-fine-tuning on the Wikipedia-based IR dataset benefits later fine-tuning steps on in-domain data in terms of stability. [Conclusion]: Our findings suggest that transfer learning for IR tasks by leveraging the generic knowledge incorporated in Wikipedia is possible, though more experimentation is needed to understand its limitations in comparison with the traditional approaches such as the BM25. neural information retrieval passage ranking weak supervision question answering passage reranking BERT transfer-learning in IR zero-shot IR passage-retrieval BERT for passage-retrieval MS Marco information retrieval neural IR
279	Deep Learning Models for Human Activity Recognition Albert Florea, George, Weilid, Filip January 2019 (has links) AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det ﬁnns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an oﬃce environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an oﬃce environment; meeting scenario in a four person sized oﬃce room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize diﬀerent human activities in the AMI oﬃce environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are diﬀerent types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully. ANN Deep learning DL human activity recognition ResNet VGG16 Inception V3 transfer learning ImageNet Keras AMI Augmented Multi-party Interaction LSTM RCNN CNN RGB colored images audio spectrograms Neural Network Engineering and Technology Teknik och teknologier
280	Alzheimer prediction from connected speech extracts : assessment of generalisation to new data Chafouleas, Geneviève 09 1900 (has links) co-direction : Simona Brambati / Plusieurs avancées utilisant le discours obtenu de la tâche de description d’image ont été réalisées dans la détection de la maladie d’Alzheimer (AD). L’utilisation de caractéristiques linguistiques et acoustiques sélectionnées manuellement ainsi que l’utilisation de méthodologies d’apprentissage profond ont montré des résultats très prometteurs dans la classification des patients avec AD. Dans ce mémoire, nous comparons les deux méthodologies sur la scène Cookie Theft du Boston Aphasia Examination en entrainant des modèles avec des caractéristiques sélectionnées à partir des extraits textuels et audio ainsi que sur un modèle d’apprentissage profond BERT. Nos modèles sont entrainés sur l’ensemble de données ADReSS challenge plus récent et évaluées sur l’ensemble de données CCNA et vice versa pour mesurer la généralisation des modèles sur des exemples jamais vus dans des ensembles de données différents. Une évaluation détaillée de l’interprétabilité des modèles est effectuée pour déterminer si les modèles ont bien appris les représentations reliées à la maladie. Nous observons que les modèles ne performent pas bien lorsqu’ils sont évalués sur différents ensembles de données provenant du même domaine. Les représentations apprises des modèles entrainés sur les deux ensembles de données sont très différentes, ce qui pourrait expliquer le bas niveau de performance durant l’étape d’évaluation. Même si nous démontrons l’importance des caractéristiques linguistiques sur la classification des AD vs contrôle, nous observons que le meilleur modèle est BERT avec un niveau d’exactitude de 62.6% sur les données ADReSS challenge et 66.7% sur les données CCNA. / Many advances have been made in the early diagnosis of Alzheimer’s Disease (AD) using connected speech elicited from a picture description task. The use of hand built linguistic and acoustic features as well as Deep Learning approaches have shown promising results in the classification of AD patients. In this research, we compare both approaches on the Cookie Theft scene from the Boston Aphasia Examination with models trained with features derived from the text and audio extracts as well as a Deep Learning approach using BERT. We train our models on the newer ADReSS challenge dataset and evaluate on the CCNA dataset and vice versa in order to asses the generalisation of the trained model on unseen examples from a different dataset. A thorough evaluation of the interpretability of the models is performed to see how well each of the models learn the representations related to the disease. It is observed that the models do not perform well when evaluated on a different dataset from the same domain. The selected and learned representations from the models trained on either dataset are very different and may explain the low performance in the evaluation step. While we demonstrate the importance of linguistic features in the classification of AD vs non-AD, we find the best overall model is BERT which achieves a test accuracy of 62.6% on the ADRess challenge dataset and 66.7% on the CCNA dataset. Natural language processing Transfer learning Machine learning Alzheimer's disease Traitement automatique des langues Apprentissage machine Maladie d'Alzheimer Apprentissage par transfert

Search results