Global ETD Search

11	Vyhledávání fotografií podle obsahu / Content Based Photo Search Dvořák, Pavel January 2014 (has links) This thesis covers design and practical realization of a tool for quick search in large image databases, containing from tens to hundreds of thousands photos, based on image similarity. The proposed technique uses various methods of descriptor extraction, creation of Bag of Words dictionaries and methods of storing image data in PostgreSQL database. Further, experiments with the implemented software were carried out to evaluate the search time effectivity and scaling possibilities of the design solution.
12	Authorship classification using the Vector Space Model and kernel methods Westin, Emil January 2020 (has links) Authorship identification is the field of classifying a given text by its author based on the assumption that authors exhibit unique writing styles. This thesis investigates the semantic shortcomings of the vector space model by constructing a semantic kernel created from WordNet which is evaluated on the problem of authorship attribution. A multiclass SVM classifier is constructed using the one-versus-all strategy and evaluated in terms of precision, recall, accuracy and F1 scores. Results show that the use of the semantic scores from WordNet degrades the performance compared to using a linear kernel. Experiments are run to identify the best feature engineering configurations, showing that removing stopwords has a positive effect on the financial dataset Reuters while the Kaggle dataset consisting of short extracts of horror stories benefit from keeping the stopwords. vector space model semantic kernel support vector machine bag-of-words Probability Theory and Statistics Sannolikhetsteori och statistik
13	Hierarchical Latent Networks for Image and Language Correlation Frey, Nathan J. January 2011 (has links) No description available. Computer Science Electrical Engineering Computer Vision Natural Language Processing Bag of Words Describing Images
14	LSTM vs Random Forest for Binary Classification of Insurance Related Text / LSTM vs Random Forest för binär klassificering av försäkringsrelaterad text Kindbom, Hannes January 2019 (has links) The field of natural language processing has received increased attention lately, but less focus is put on comparing models, which differ in complexity. This thesis compares Random Forest to LSTM, for the task of classifying a message as question or non-question. The comparison was done by training and optimizing the models on historic chat data from the Swedish insurance company Hedvig. Different types of word embedding were also tested, such as Word2vec and Bag of Words. The results demonstrated that LSTM achieved slightly higher scores than Random Forest, in terms of F1 and accuracy. The models’ performance were not significantly improved after optimization and it was also dependent on which corpus the models were trained on. An investigation of how a chatbot would affect Hedvig’s adoption rate was also conducted, mainly by reviewing previous studies about chatbots’ effects on user experience. The potential effects on the innovation’s five attributes, relative advantage, compatibility, complexity, trialability and observability were analyzed to answer the problem statement. The results showed that the adoption rate of Hedvig could be positively affected, by improving the first two attributes. The effects a chatbot would have on complexity, trialability and observability were however suggested to be negligible, if not negative. / Det vetenskapliga området språkteknologi har fått ökad uppmärksamhet den senaste tiden, men mindre fokus riktas på att jämföra modeller som skiljer sig i komplexitet. Den här kandidatuppsatsen jämför Random Forest med LSTM, genom att undersöka hur väl modellerna kan användas för att klassificera ett meddelande som fråga eller icke-fråga. Jämförelsen gjordes genom att träna och optimera modellerna på historisk chattdata från det svenska försäkringsbolaget Hedvig. Olika typer av word embedding, så som Word2vec och Bag of Words, testades också. Resultaten visade att LSTM uppnådde något högre F1 och accuracy än Random Forest. Modellernas prestanda förbättrades inte signifikant efter optimering och resultatet var också beroende av vilket korpus modellerna tränades på. En undersökning av hur en chattbot skulle påverka Hedvigs adoption rate genomfördes också, huvudsakligen genom att granska tidigare studier om chattbotars effekt på användarupplevelsen. De potentiella effekterna på en innovations fem attribut, relativ fördel, kompatibilitet, komplexitet, prövbarhet and observerbarhet analyserades för att kunna svara på frågeställningen. Resultaten visade att Hedvigs adoption rate kan påverkas positivt, genom att förbättra de två första attributen. Effekterna en chattbot skulle ha på komplexitet, prövbarhet och observerbarhet ansågs dock vara försumbar, om inte negativ. Random Forest Classification Natural Language Processing Machine Learning Neural Networks Bag of Words Bachelor Thesis Diffusion of Innovation Adoption Rate User Experience Random Forest Klassificering Språkteknologi Maskininlärning Neurala nätverk Bag of Words Kandidatexamensarbete Användarupplevelse Probability Theory and Statistics Sannolikhetsteori och statistik
15	Исследование моделей генерации аннотаций для художественных произведений : магистерская диссертация / Research on Annotation Generation Models for Fiction Драгомиров, Д. С., Dragomirov, D. S. January 2024 (has links) В современном мире текстовая обработка и искусственный интеллект активно используются для автоматизации различных процессов, включая создание аннотаций для художественных произведений. Автоматическая генерация аннотаций помогает читателям быстро понять содержание книги и принять решение о её прочтении. В этой диссертации проводится исследование различных моделей генерации аннотаций, таких как Bag-of-Words (BoW), TF-IDF, Latent Dirichlet Allocation (LDA), Recurrent Neural Networks (RNNs), BERT (Bidirectional Encoder Representations from Transformers), T5 и PEGASUS. Эффективность этих моделей оценивается с помощью метрик BLEU Score, ROUGE Score, METEOR Score, F1 Score и CIDEr Score. Для тестирования моделей используется датасет, состоящий из книг в формате .docx. Результаты работы позволяют выявить наиболее эффективные методы автоматической генерации аннотаций и предлагают направления для дальнейшего совершенствования этих моделей. / In today's world, text processing and artificial intelligence are actively used to automate various processes, including the creation of annotations for fiction works. Automatic annotation generation helps readers quickly grasp the content of a book and decide whether to read it. This dissertation investigates various models for generating annotations, such as Bag-of-Words (BoW), TF-IDF, Latent Dirichlet Allocation (LDA), Recurrent Neural Networks (RNNs), BERT (Bidirectional Encoder Representations from Transformers), T5, and PEGASUS. The effectiveness of these models is evaluated using metrics such as BLEU Score, ROUGE Score, METEOR Score, F1 Score, and CIDEr Score. A dataset of books in .docx format is used to test the models. The results of the study identify the most effective methods for automatic annotation generation and suggest directions for further improvement of these models. MASTER'S THESIS ANNOTATION GENERATION FICTION WORKS MACHINE LEARNING BAG-OF-WORDS TF-IDF LDA RNNS BERT T5 PEGASUS QUALITY METRICS ГЕНЕРАЦИЯ АННОТАЦИЙ МАШИННОЕ ОБУЧЕНИЕ BAG-OF-WORDS TF-IDF LDA RNNS BERT T5 PEGASUS МЕТРИКИ КАЧЕСТВА
16	Segmentation d'objets déformables en imagerie ultrasonore / Deformable object segmentation in ultra-sound images Massich, Joan 04 December 2013 (has links) Le cancer du sein est le type de cancer le plus répandu, il est la cause principale de mortalité chez les femmes aussi bien dans les pays occidentaux que dans les pays en voie de développement. L'imagerie médicale joue un rôle clef dans la réduction de la mortalité du cancer du sein, en facilitant sa première détection par le dépistage, le diagnostic et la biopsie guidée. Bien que la Mammographie Numérique (DM) reste la référence pour les méthodes d'examen existantes, les échographies ont prouvé leur place en tant que modalité complémentaire. Les images de cette dernière fournissent des informations permettant de différencier le caratère bénin ou malin des lésions solides, ce qui ne peut être détecté par DM. Malgré leur utilité clinique, les images échographiques sont bruitées, ce qui compromet les diagnostiques des radiologues à partir de celles ci. C'est pourquoi un des objectifs premiers des chercheurs en imagerie médicale est d'améliorer la qualité des images et des méthodologies afin de simplifier et de systématiser la lecture et l'interprétation de ces images.La méthode proposée considère le processus de segmentation comme la minimisation d'une structure probabilistique multi-label utilisant un algorithme de minimisation du Max-Flow/Min-Cut pour associer le label adéquat parmi un ensemble de labels figurant des types de tissus, et ce, pour tout les pixels de l'image.Cette dernière est divisée en régions adjacentes afin que tous les pixels d'une même régions soient labelisés de la même manière en fin du processus. Des modèles stochastiques pour la labellisation sont crées à partir d'une base d'apprentissage de données. / Breast cancer is the second most common type of cancer being the leading cause of cancer death among females both in western and in economically developing countries. Medical imaging is key for early detection, diagnosis and treatment follow-up. Despite Digital Mammography (DM) remains the reference imaging modality, Ultra-Sound (US) imaging has proven to be a successful adjunct image modality for breast cancer screening, specially as a consequence of the discriminative capabilities that US offers for differentiating between solid lesions that are benign or malignant. Despite US usability,US suffers inconveniences due to its natural noise that compromises the diagnosis capabilities of radiologists. Therefore the research interest in providing radiologists with Computer Aided Diagnosis (CAD) tools to assist the doctors during decision taking. This thesis analyzes the current strategies to segment breast lesions in US data in order to infer meaningful information to be feet to CAD, and proposes a fully automatic methodology for generating accurate segmentations of breast lesions in US data with low false positive rates. Ultrasonore Ultrasound Breast cancer Segmentation Graph-cuts Optimization Bag-of-words Bag-of-features Features Machine learning Medical imaging 620 610.2
17	Unsupervised Entity Classification with Wikipedia and WordNet / Klasifikace entit pomocí Wikipedie a WordNetu Kliegr, Tomáš January 2007 (has links) This dissertation addresses the problem of classification of entities in text represented by noun phrases. The goal of this thesis is to develop a method for automated classification of entities appearing in datasets consisting of short textual fragments. The emphasis is on unsupervised and semi-supervised methods that will allow for fine-grained character of the assigned classes and require no labeled instances for training. The set of target classes is either user-defined or determined automatically. Our initial attempt to address the entity classification problem is called Semantic Concept Mapping (SCM) algorithm. SCM maps the noun phrases representing the entities as well as the target classes to WordNet. Graph-based WordNet similarity measures are used to assign the closest class to the noun phrase. If a noun phrase does not match any WordNet concept, a Targeted Hypernym Discovery (THD) algorithm is executed. The THD algorithm extracts a hypernym from a Wikipedia article defining the noun phrase using lexico-syntactic patterns. This hypernym is then used to map the noun phrase to a WordNet synset, but it can also be perceived as the classification result by itself, resulting in an unsupervised classification system. SCM and THD algorithms were designed for English. While adaptation of these algorithms for other languages is conceivable, we decided to develop the Bag of Articles (BOA) algorithm, which is language agnostic as it is based on the statistical Rocchio classifier. Since this algorithm utilizes Wikipedia as a source of data for classification, it does not require any labeled training instances. WordNet is used in a novel way to compute term weights. It is also used as a positive term list and for lemmatization. A disambiguation algorithm utilizing global context is also proposed. We consider the BOA algorithm to be the main contribution of this dissertation. Experimental evaluation of the proposed algorithms is performed on the WordSim353 dataset, which is used for evaluation in the Word Similarity Computation (WSC) task, and on the Czech Traveler dataset, the latter being specifically designed for the purpose of our research. BOA performance on WordSim353 achieves Spearman correlation of 0.72 with human judgment, which is close to the 0.75 correlation for the ESA algorithm, to the author's knowledge the best performing algorithm for this gold-standard dataset, which does not require training data. The advantage of BOA over ESA is that it has smaller requirements on preprocessing of the Wikipedia data. While SCM underperforms on the WordSim353 dataset, it overtakes BOA on the Czech Traveler dataset, which was designed specifically for our entity classification problem. This discrepancy requires further investigation. In a standalone evaluation of THD on Czech Traveler dataset the algorithm returned a correct hypernym for 62% of entities.
18	RELOCALIZATION AND LOOP CLOSING IN VISION SIMULTANEOUS LOCALIZATION AND MAPPING (VSLAM) OF A MOBILE ROBOT USING ORB METHOD Venkatanaga Amrusha Aryasomyajula (8728027) 24 April 2020 (has links) <p><a>It is essential for a mobile robot during autonomous navigation to be able to detect revisited places or loop closures while performing Vision Simultaneous Localization And Mapping (VSLAM). Loop closing has been identified as one of the critical data association problem when building maps. It is an efficient way to eliminate errors and improve the accuracy of the robot localization and mapping. In order to solve loop closing problem, the ORB-SLAM algorithm, a feature based simultaneous localization and mapping system that operates in real time is used. This system includes loop closing and relocalization and allows automatic initialization. </a></p> <p>In order to check the performance of the algorithm, the monocular and stereo and RGB-D cameras are used. The aim of this thesis is to show the accuracy of relocalization and loop closing process using ORB SLAM algorithm in a variety of environmental settings. The performance of relocalization and loop closing in different challenging indoor scenarios are demonstrated by conducting various experiments. Experimental results show the applicability of the approach in real time application like autonomous navigation.</p> Control Systems, Robotics and Automation VSLAM ORB Place recognition Bag of Words Loop closing
19	Natural language processing for researchh philosophies and paradigms dissertation (DFIT91) Mawila, Ntombhimuni 28 February 2021 (has links) Research philosophies and paradigms (RPPs) reveal researchers’ assumptions and provide a systematic way in which research can be carried out effectively and appropriately. Different studies highlight cognitive and comprehension challenges of RPPs concepts at the postgraduate level. This study develops a natural language processing (NLP) supervised classification application that guides students in identifying RPPs applicable to their study. By using algorithms rooted in a quantitative research approach, this study builds a corpus represented using the Bag of Words model to train the naïve Bayes, Logistic Regression, and Support Vector Machine algorithms. Computer experiments conducted to evaluate the performance of the algorithms reveal that the Naïve Bayes algorithm presents the highest accuracy and precision levels. In practice, user testing results show the varying impact of knowledge, performance, and effort expectancy. The findings contribute to the minimization of issues postgraduates encounter in identifying research philosophies and the underlying paradigms for their studies. / Science and Technology Education / MTech. (Information Technology) Research Philosophy Paradigm Corpus Algorithm Classification model Classifier Bag of words Naive Bayes Researcher 006.35
20	Evaluating Random Forest and a Long Short-Term Memory in Classifying a Given Sentence as a Question or Non-Question Ankaräng, Fredrik, Waldner, Fabian January 2019 (has links) Natural language processing and text classification are topics of much discussion among researchers of machine learning. Contributions in the form of new methods and models are presented on a yearly basis. However, less focus is aimed at comparing models, especially comparing models that are less complex to state-of-the-art models. This paper compares a Random Forest with a Long-Short Term Memory neural network for the task of classifying sentences as questions or non-questions, without considering punctuation. The models were trained and optimized on chat data from a Swedish insurance company, as well as user comments data on articles from a newspaper. The results showed that the LSTM model performed better than the Random Forest. However, the difference was small and therefore Random Forest could still be a preferable alternative in some use cases due to its simplicity and its ability to handle noisy data. The models’ performances were not dramatically improved after hyper parameter optimization. A literature study was also conducted aimed at exploring how customer service can be automated using a chatbot and what features and functionality should be prioritized by management during such an implementation. The findings of the study showed that a data driven design should be used, where features are derived based on the specific needs and customers of the organization. However, three features were general enough to be presented the personality of the bot, its trustworthiness and in what stage of the value chain the chatbot is implemented. / Språkteknologi och textklassificering är vetenskapliga områden som tillägnats mycket uppmärksamhet av forskare inom maskininlärning. Nya metoder och modeller presenteras årligen, men mindre fokus riktas på att jämföra modeller av olika karaktär. Den här uppsatsen jämför Random Forest med ett Long Short-Term Memory neuralt nätverk genom att undersöka hur väl modellerna klassificerar meningar som frågor eller icke-frågor, utan att ta hänsyn till skiljetecken. Modellerna tränades och optimerades på användardata från ett svenskt försäkringsbolag, samt kommentarer från nyhetsartiklar. Resultaten visade att LSTM-modellen presterade bättre än Random Forest. Skillnaden var dock liten, vilket innebär att Random Forest fortfarande kan vara ett bättre alternativ i vissa situationer tack vare dess enkelhet. Modellernas prestanda förbättrades inte avsevärt efter hyperparameteroptimering. En litteraturstudie genomfördes även med målsättning att undersöka hur arbetsuppgifter inom kundsupport kan automatiseras genom införandet av en chatbot, samt vilka funktioner som bör prioriteras av ledningen inför en sådan implementation. Resultaten av studien visade att en data-driven approach var att föredra, där funktionaliteten bestämdes av användarnas och organisationens specifika behov. Tre funktioner var dock tillräckligt generella för att presenteras personligheten av chatboten, dess trovärdighet och i vilket steg av värdekedjan den implementeras. Bag-of-Words Chatbot Classification LSTM Machine Learning Natural Language Processing Random Forest Word2Vec Computer and Information Sciences Data- och informationsvetenskap

Search results