11 |
Vyhledávání fotografií podle obsahu / Content Based Photo SearchDvořák, Pavel January 2014 (has links)
This thesis covers design and practical realization of a tool for quick search in large image databases, containing from tens to hundreds of thousands photos, based on image similarity. The proposed technique uses various methods of descriptor extraction, creation of Bag of Words dictionaries and methods of storing image data in PostgreSQL database. Further, experiments with the implemented software were carried out to evaluate the search time effectivity and scaling possibilities of the design solution.
|
12 |
Authorship classification using the Vector Space Model and kernel methodsWestin, Emil January 2020 (has links)
Authorship identification is the field of classifying a given text by its author based on the assumption that authors exhibit unique writing styles. This thesis investigates the semantic shortcomings of the vector space model by constructing a semantic kernel created from WordNet which is evaluated on the problem of authorship attribution. A multiclass SVM classifier is constructed using the one-versus-all strategy and evaluated in terms of precision, recall, accuracy and F1 scores. Results show that the use of the semantic scores from WordNet degrades the performance compared to using a linear kernel. Experiments are run to identify the best feature engineering configurations, showing that removing stopwords has a positive effect on the financial dataset Reuters while the Kaggle dataset consisting of short extracts of horror stories benefit from keeping the stopwords.
|
13 |
Hierarchical Latent Networks for Image and Language CorrelationFrey, Nathan J. January 2011 (has links)
No description available.
|
14 |
LSTM vs Random Forest for Binary Classification of Insurance Related Text / LSTM vs Random Forest för binär klassificering av försäkringsrelaterad textKindbom, Hannes January 2019 (has links)
The field of natural language processing has received increased attention lately, but less focus is put on comparing models, which differ in complexity. This thesis compares Random Forest to LSTM, for the task of classifying a message as question or non-question. The comparison was done by training and optimizing the models on historic chat data from the Swedish insurance company Hedvig. Different types of word embedding were also tested, such as Word2vec and Bag of Words. The results demonstrated that LSTM achieved slightly higher scores than Random Forest, in terms of F1 and accuracy. The models’ performance were not significantly improved after optimization and it was also dependent on which corpus the models were trained on. An investigation of how a chatbot would affect Hedvig’s adoption rate was also conducted, mainly by reviewing previous studies about chatbots’ effects on user experience. The potential effects on the innovation’s five attributes, relative advantage, compatibility, complexity, trialability and observability were analyzed to answer the problem statement. The results showed that the adoption rate of Hedvig could be positively affected, by improving the first two attributes. The effects a chatbot would have on complexity, trialability and observability were however suggested to be negligible, if not negative. / Det vetenskapliga området språkteknologi har fått ökad uppmärksamhet den senaste tiden, men mindre fokus riktas på att jämföra modeller som skiljer sig i komplexitet. Den här kandidatuppsatsen jämför Random Forest med LSTM, genom att undersöka hur väl modellerna kan användas för att klassificera ett meddelande som fråga eller icke-fråga. Jämförelsen gjordes genom att träna och optimera modellerna på historisk chattdata från det svenska försäkringsbolaget Hedvig. Olika typer av word embedding, så som Word2vec och Bag of Words, testades också. Resultaten visade att LSTM uppnådde något högre F1 och accuracy än Random Forest. Modellernas prestanda förbättrades inte signifikant efter optimering och resultatet var också beroende av vilket korpus modellerna tränades på. En undersökning av hur en chattbot skulle påverka Hedvigs adoption rate genomfördes också, huvudsakligen genom att granska tidigare studier om chattbotars effekt på användarupplevelsen. De potentiella effekterna på en innovations fem attribut, relativ fördel, kompatibilitet, komplexitet, prövbarhet and observerbarhet analyserades för att kunna svara på frågeställningen. Resultaten visade att Hedvigs adoption rate kan påverkas positivt, genom att förbättra de två första attributen. Effekterna en chattbot skulle ha på komplexitet, prövbarhet och observerbarhet ansågs dock vara försumbar, om inte negativ.
|
15 |
Segmentation d'objets déformables en imagerie ultrasonore / Deformable object segmentation in ultra-sound imagesMassich, Joan 04 December 2013 (has links)
Le cancer du sein est le type de cancer le plus répandu, il est la cause principale de mortalité chez les femmes aussi bien dans les pays occidentaux que dans les pays en voie de développement. L'imagerie médicale joue un rôle clef dans la réduction de la mortalité du cancer du sein, en facilitant sa première détection par le dépistage, le diagnostic et la biopsie guidée. Bien que la Mammographie Numérique (DM) reste la référence pour les méthodes d'examen existantes, les échographies ont prouvé leur place en tant que modalité complémentaire. Les images de cette dernière fournissent des informations permettant de différencier le caratère bénin ou malin des lésions solides, ce qui ne peut être détecté par DM. Malgré leur utilité clinique, les images échographiques sont bruitées, ce qui compromet les diagnostiques des radiologues à partir de celles ci. C'est pourquoi un des objectifs premiers des chercheurs en imagerie médicale est d'améliorer la qualité des images et des méthodologies afin de simplifier et de systématiser la lecture et l'interprétation de ces images.La méthode proposée considère le processus de segmentation comme la minimisation d'une structure probabilistique multi-label utilisant un algorithme de minimisation du Max-Flow/Min-Cut pour associer le label adéquat parmi un ensemble de labels figurant des types de tissus, et ce, pour tout les pixels de l'image.Cette dernière est divisée en régions adjacentes afin que tous les pixels d'une même régions soient labelisés de la même manière en fin du processus. Des modèles stochastiques pour la labellisation sont crées à partir d'une base d'apprentissage de données. / Breast cancer is the second most common type of cancer being the leading cause of cancer death among females both in western and in economically developing countries. Medical imaging is key for early detection, diagnosis and treatment follow-up. Despite Digital Mammography (DM) remains the reference imaging modality, Ultra-Sound (US) imaging has proven to be a successful adjunct image modality for breast cancer screening, specially as a consequence of the discriminative capabilities that US offers for differentiating between solid lesions that are benign or malignant. Despite US usability,US suffers inconveniences due to its natural noise that compromises the diagnosis capabilities of radiologists. Therefore the research interest in providing radiologists with Computer Aided Diagnosis (CAD) tools to assist the doctors during decision taking. This thesis analyzes the current strategies to segment breast lesions in US data in order to infer meaningful information to be feet to CAD, and proposes a fully automatic methodology for generating accurate segmentations of breast lesions in US data with low false positive rates.
|
16 |
Unsupervised Entity Classification with Wikipedia and WordNet / Klasifikace entit pomocí Wikipedie a WordNetuKliegr, Tomáš January 2007 (has links)
This dissertation addresses the problem of classification of entities in text represented by noun phrases. The goal of this thesis is to develop a method for automated classification of entities appearing in datasets consisting of short textual fragments. The emphasis is on unsupervised and semi-supervised methods that will allow for fine-grained character of the assigned classes and require no labeled instances for training. The set of target classes is either user-defined or determined automatically. Our initial attempt to address the entity classification problem is called Semantic Concept Mapping (SCM) algorithm. SCM maps the noun phrases representing the entities as well as the target classes to WordNet. Graph-based WordNet similarity measures are used to assign the closest class to the noun phrase. If a noun phrase does not match any WordNet concept, a Targeted Hypernym Discovery (THD) algorithm is executed. The THD algorithm extracts a hypernym from a Wikipedia article defining the noun phrase using lexico-syntactic patterns. This hypernym is then used to map the noun phrase to a WordNet synset, but it can also be perceived as the classification result by itself, resulting in an unsupervised classification system. SCM and THD algorithms were designed for English. While adaptation of these algorithms for other languages is conceivable, we decided to develop the Bag of Articles (BOA) algorithm, which is language agnostic as it is based on the statistical Rocchio classifier. Since this algorithm utilizes Wikipedia as a source of data for classification, it does not require any labeled training instances. WordNet is used in a novel way to compute term weights. It is also used as a positive term list and for lemmatization. A disambiguation algorithm utilizing global context is also proposed. We consider the BOA algorithm to be the main contribution of this dissertation. Experimental evaluation of the proposed algorithms is performed on the WordSim353 dataset, which is used for evaluation in the Word Similarity Computation (WSC) task, and on the Czech Traveler dataset, the latter being specifically designed for the purpose of our research. BOA performance on WordSim353 achieves Spearman correlation of 0.72 with human judgment, which is close to the 0.75 correlation for the ESA algorithm, to the author's knowledge the best performing algorithm for this gold-standard dataset, which does not require training data. The advantage of BOA over ESA is that it has smaller requirements on preprocessing of the Wikipedia data. While SCM underperforms on the WordSim353 dataset, it overtakes BOA on the Czech Traveler dataset, which was designed specifically for our entity classification problem. This discrepancy requires further investigation. In a standalone evaluation of THD on Czech Traveler dataset the algorithm returned a correct hypernym for 62% of entities.
|
17 |
RELOCALIZATION AND LOOP CLOSING IN VISION SIMULTANEOUS LOCALIZATION AND MAPPING (VSLAM) OF A MOBILE ROBOT USING ORB METHODVenkatanaga Amrusha Aryasomyajula (8728027) 24 April 2020 (has links)
<p><a>It is essential for a mobile robot
during autonomous navigation to be able to detect revisited places or loop
closures while performing Vision Simultaneous Localization And Mapping (VSLAM).
Loop closing has been identified as one of the critical data association
problem when building maps. It is an efficient way to eliminate errors and
improve the accuracy of the robot localization and mapping. In order to solve loop
closing problem, the ORB-SLAM algorithm, a feature based simultaneous
localization and mapping system that operates in real time is used. This system
includes loop closing and relocalization and allows automatic initialization. </a></p>
<p>In order to check the
performance of the algorithm, the monocular and stereo and RGB-D cameras are
used. The aim of this thesis is to show the accuracy of relocalization and loop
closing process using ORB SLAM algorithm in a variety of environmental
settings. The performance of relocalization and loop closing in different challenging
indoor scenarios are demonstrated by conducting various experiments. Experimental
results show the applicability of the approach in real time application like
autonomous navigation.</p>
|
18 |
Natural language processing for researchh philosophies and paradigms dissertation (DFIT91)Mawila, Ntombhimuni 28 February 2021 (has links)
Research philosophies and paradigms (RPPs) reveal researchers’ assumptions and provide a systematic way in which research can be carried out effectively and appropriately. Different studies highlight cognitive and comprehension challenges of RPPs concepts at the postgraduate level. This study develops a natural language processing (NLP) supervised classification application that guides students in identifying RPPs applicable to their study. By using algorithms rooted in a quantitative research approach, this study builds a corpus represented using the Bag of Words model to train the naïve Bayes, Logistic Regression, and Support Vector Machine algorithms. Computer experiments conducted to evaluate the performance of the algorithms reveal that the Naïve Bayes algorithm presents the highest accuracy and precision levels. In practice, user testing results show the varying impact of knowledge, performance, and effort expectancy. The findings contribute to the minimization of issues postgraduates encounter in identifying research philosophies and the underlying paradigms for their studies. / Science and Technology Education / MTech. (Information Technology)
|
19 |
Evaluating Random Forest and a Long Short-Term Memory in Classifying a Given Sentence as a Question or Non-QuestionAnkaräng, Fredrik, Waldner, Fabian January 2019 (has links)
Natural language processing and text classification are topics of much discussion among researchers of machine learning. Contributions in the form of new methods and models are presented on a yearly basis. However, less focus is aimed at comparing models, especially comparing models that are less complex to state-of-the-art models. This paper compares a Random Forest with a Long-Short Term Memory neural network for the task of classifying sentences as questions or non-questions, without considering punctuation. The models were trained and optimized on chat data from a Swedish insurance company, as well as user comments data on articles from a newspaper. The results showed that the LSTM model performed better than the Random Forest. However, the difference was small and therefore Random Forest could still be a preferable alternative in some use cases due to its simplicity and its ability to handle noisy data. The models’ performances were not dramatically improved after hyper parameter optimization. A literature study was also conducted aimed at exploring how customer service can be automated using a chatbot and what features and functionality should be prioritized by management during such an implementation. The findings of the study showed that a data driven design should be used, where features are derived based on the specific needs and customers of the organization. However, three features were general enough to be presented the personality of the bot, its trustworthiness and in what stage of the value chain the chatbot is implemented. / Språkteknologi och textklassificering är vetenskapliga områden som tillägnats mycket uppmärksamhet av forskare inom maskininlärning. Nya metoder och modeller presenteras årligen, men mindre fokus riktas på att jämföra modeller av olika karaktär. Den här uppsatsen jämför Random Forest med ett Long Short-Term Memory neuralt nätverk genom att undersöka hur väl modellerna klassificerar meningar som frågor eller icke-frågor, utan att ta hänsyn till skiljetecken. Modellerna tränades och optimerades på användardata från ett svenskt försäkringsbolag, samt kommentarer från nyhetsartiklar. Resultaten visade att LSTM-modellen presterade bättre än Random Forest. Skillnaden var dock liten, vilket innebär att Random Forest fortfarande kan vara ett bättre alternativ i vissa situationer tack vare dess enkelhet. Modellernas prestanda förbättrades inte avsevärt efter hyperparameteroptimering. En litteraturstudie genomfördes även med målsättning att undersöka hur arbetsuppgifter inom kundsupport kan automatiseras genom införandet av en chatbot, samt vilka funktioner som bör prioriteras av ledningen inför en sådan implementation. Resultaten av studien visade att en data-driven approach var att föredra, där funktionaliteten bestämdes av användarnas och organisationens specifika behov. Tre funktioner var dock tillräckligt generella för att presenteras personligheten av chatboten, dess trovärdighet och i vilket steg av värdekedjan den implementeras.
|
20 |
Stock Market Forecasting Using SVM With Price and News AnalysisHansen, Patrik, Vojcic, Sandi January 2020 (has links)
Many machine learning approaches have been usedfor financial forecasting to estimate stock trends in the future. Thefocus of this project is to implement a Support Vector Machinewith price and news analysis for companies within the technologysector as inputs to predict if the price of the stock is going torise or fall in the coming days and to observe the impact on theprediction accuracy by adding news to the technical analysis.The price analysis is compiled of 9 different financial indicatorsused to indicate changes in price, and the news analysis uses thebag-of-words method to rate headlines as positive or negative.There is a slight indication of the news improving the resultsif the validation data is randomly sampled the testing accuracyincreases. When testing on the last fifth of the data of eachcompany, there was only a small difference in the results whenadding news to the calculation and such no clear correlation canbe seen. The resulting program has a mean and median testingaccuracy over 50 % for almost all settings. Complications whenusing SVM for the purpose of price forecasting in the stockmarket is also discussed. / Många metoder för maskininlärning har använts i syfte av finansiell prognos för att uppskatta aktie trender i framtiden. Fokus för detta projekt är att implementera en Support Vector Machine med pris- och nyhetsanalys för företag inom teknologisektorn som inmatning för att förutsäga om priset på aktien kommer att öka eller minska under de kommande dagarna och för att observera påverkan på förutsägelsens noggrannhet av att lägga till nyheter till den tekniska analysen. Prisanalysen består av 9 olika finansiella indikatorer som används för att indikera prisändringar, och nyhetsanalysen använder metoden bag-of-word för att betygsätta rubriker som positiva eller negativa. Det finns en liten indikation på att nyheterna förbättrar resultat där om valideringsdata stickas ur slumpmässigt provningsnoggrannheten ökar. När man testade den sista femte delen av inmatningsdatan från varje företag, fanns det bara en liten skillnad i resultaten när nyheterna beräknades vilket leder till att en tydlig korrelation kan inte ses. Det resulterande programmet har en genomsnittlig och median test nogrannhet över 50 % för nästan alla inställningar. Komplikationer när SVM används för prisprognoser på aktiemarknaden diskuteras också. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
|
Page generated in 0.0293 seconds