Global ETD Search

31	Evaluation of the correlation between test cases dependency and their semantic text similarity Andersson, Filip January 2020 (has links) An important step in developing software is to test the system thoroughly. Testing software requires a generation of test cases that can reach large numbers and is important to be performed in the correct order. Certain information is critical to know to schedule the test cases incorrectly order and isn’t always available. This leads to a lot of required manual work and valuable resources to get correct. By instead analyzing their test speciﬁcation it could be possible to detect the functional dependencies between test cases. This study presents a natural language processing (NLP) based approach and performs cluster analysis on a set of test cases to evaluate the correlation between test case dependencies and their semantic similarities. After an initial feature selection, the test cases’ similarities are calculated through the Cosine distance function. The result of the similarity calculation is then clustered using the HDBSCAN clustering algorithm. The clusters would represent test cases’ relations where test cases with close similarities are put in the same cluster as they were expected to share dependencies. The clusters are then validated with a Ground Truth containing the correct dependencies. The result is an F-Score of 0.7741. The approach in this study is used on an industrial testing project at Bombardier Transportation in Sweden. Software Testing Test optimization NLP Dependency Semantic Similarity Clustering Cosine Similarity HDBSCAN Computer Sciences Datavetenskap (datalogi)
32	Iris Biometric Identification Using Artificial Neural Networks Haskett, Kevin Joseph 01 August 2018 (has links) A biometric method is a more secure way of personal identification than passwords. This thesis examines the iris as a personal identifier with the use of neural networks as the classifier. A comparison of different feature extraction methods that include the Fourier transform, discrete cosine transform, the eigen analysis method, and the wavelet transform, is performed. The robustness of each method, with respect to distortion and noise, is also studied. Neural Networks Eigen Analysis Discrete Cosine Transform Wavelet Transform Classifier Iris Identification
33	Changing a user’s search experience byincorporating preferences of metadata / Andra en användares sökupplevelse genom att inkorporera metadatapreferenser Ali, Miran January 2014 (has links) Implicit feedback is usually data that comes from users’ clicks, search queries and text highlights. It exists in abun- dance, but it is riddled with much noise and requires advanced algorithms to properly make good use of it. Several findings suggest that factors such as click-through data and reading time could be used to create user behaviour models in order to predict the users’ information need. This Master’s thesis aims to use click-through data and search queries together with heuristics to create a model that prioritises metadata-fields of the documents in order to predict the information need of a user. Simply put, implicit feedback will be used to improve the precision of a search engine. The Master’s thesis was carried out at Findwise AB - a search engine consultancy firm. Documents from the benchmark dataset INEX were indexed into a search engine. Two different heuristics were proposed that increment the priority of different metadata-fields based on the users’ search queries and clicks. It was assumed that the heuristics would be able to change the listing order of the search results. Evaluations were carried out for the two heuristics and the unmodified search engine was used as the baseline for the experiment. The evaluations were based on simulating a user that searches queries and clicks on documents. The queries and documents, with manually tagged relevance, used in the evaluation came from a data set given by INEX. It was expected that listing order would change in a way that was favourable for the user; the top-ranking results would be documents that truly were in the interest of the user. The evaluations revealed that the behaviour of the heuristics and the baseline have erratic behaviours and metrics never converged to any specific mean-relevance. A statistical test revealed that there is no difference in accuracy between the heuristics and the baseline. These results mean that the proposed heuristics do not improve the precision of the search engine and several factors, such as the indexing of too redundant metadata, could have been responsible for this outcome. / Implicit feedback är oftast data som kommer från användarnas klick, sökfrågor och textmarkeringar. Denna data finns i överflöd, men har för mycket brus och kräver avancerade algoritmer för att man ska kunna dra nytta av den. Flera rön föreslår att faktorer som klickdata och läsningstid kan användas för att skapa beteendemodeller för att förutse användarens informationsbehov. Detta examensarbete ämnar att använda klickdata och sökfrågor tillsammans med heuristiker för att skapa en modell som prioriterar metadata-fält i dokument så att användarens informationsbehov kan förutses. Alltså ska implicit feedback användas för att förbättra en sökmotors precision. Examensarbetet utfördes hos Findwise AB - en konsultfirma som specialiserar sig på söklösningar. Dokument från utvärderingsdatamängden INEX indexerades i en sökmotor. Två olika heuristiker skapades för att ändra prioriteten av metadata-fälten utifrån användarnas sök- och klickdata. Det antogs att heuristikerna skulle kunna förändra ordningen av sökresultaten. Evalueringar utfördes för båda heuristiker och den omodifierade sökmotorn användes som måttstock för experimentet. Evalueringarna gick ut på att simulera en användare som söker på frågor och klickar på dokument. Dessa frågor och dokument, med manuellt taggad relevansdata, kom från en datamängd som tillhandahölls av INEX. Evalueringarna visade att beteendet av heuristikerna och måttstocket är slumpmässiga och oberäkneliga. Ingen av heuristikerna konvergerar mot någon specifik medelrelevans. Ett statistiskt test visar att det inte är någon signifikant skillnad på uppmätt träffsäkerhet mellan heuristikerna och måttstocket. Dessa resultat innebär att heuristikerna inte förbättrar sökmotorns precision. Detta utfall kan bero på flera faktorer som t.ex. indexering av överflödig meta-data. search engine search findwise solr searching relevance qf cosine similarity mongodb inex Computer Sciences Datavetenskap (datalogi)
34	Termediator-II: Identification of Interdisciplinary Term Ambiguity Through Hierarchical Cluster Analysis Riley, Owen G. 23 April 2014 (has links) (PDF) Technical disciplines are evolving rapidly leading to changes in their associated vocabularies. Confusion in interdisciplinary communication occurs due to this evolving terminology. Two causes of confusion are multiple definitions (overloaded terms) and synonymous terms. The formal names for these two problems are polysemy and synonymy. Termediator-I, a web application built on top of a collection of glossaries, uses definition count as a measure of term confusion. This tool was an attempt to identify confusing cross-disciplinary terms. As more glossaries were added to the collection, this measure became ineffective. This thesis provides a measure of term polysemy. Term polysemy is effectively measured by semantically clustering the text concepts, or definitions, of each term and counting the number of resulting clusters. Hierarchical clustering uses a measure of proximity between the text concepts. Three such measures are evaluated: cosine similarity, latent semantic indexing, and latent Dirichlet allocation. Two linkage types, for determining cluster proximity during the hierarchical clustering process, are also evaluated: complete linkage and average linkage. Crowdsourcing through a web application was unsuccessfully attempted to obtain a viable clustering threshold by public consensus. An alternate metric of polysemy, convergence value, is identified and tested as a viable clustering threshold. Six resulting lists of terms ranked by cluster count based on convergence values are generated, one for each similarity measure and linkage type combination. Each combination produces a competitive list, and no clear combination can be determined as superior. Semantic clustering successfully identifies polysemous terms, but each similarity measure and linkage type combination provides slightly different results. cosine similarity LSI LDA text similarity hierarchical clustering polysemy Construction Engineering and Management
35	The spatial relationship of DCT coefficients between a block and its sub-blocks. Jiang, Jianmin, Feng, G.C. January 2002 (has links) No / At present, almost all digital images are stored and transferred in their compressed format in which discrete cosine transform (DCT)-based compression remains one of the most important data compression techniques due to the efforts from JPEG. In order to save the computation and memory cost, it is desirable to have image processing operations such as feature extraction, image indexing, and pattern classifications implemented directly in the DCT domain. To this end, we present in this paper a generalized analysis of spatial relationships between the DCTs of any block and its sub-blocks. The results reveal that DCT coefficients of any block can be directly obtained from the DCT coefficients of its sub-blocks and that the interblock relationship remains linear. It is useful in extracting global features in compressed domain for general image processing tasks such as those widely used in pyramid algorithms and image indexing. In addition, due to the fact that the corresponding coefficient matrix of the linear combination is sparse, the computational complexity of the proposed algorithms is significantly lower than that of the existing methods. Algorithm Feature extraction Discrete cosine transforms Data compression Digital processing Image processing Signal processing
36	How natural language processing can be used to improve digital language learning / Hur natural language processing kan användas för att förbättra digital språkinlärning Kakavandy, Hanna, Landeholt, John January 2020 (has links) The world is facing globalization and with that, companies are growing and need to hire according their needs. A great obstacle for this is the language barrier between job applicants and employers who want to hire competent candidates. One spark of light in this challenge is Lingio, who provides a product that teaches digital profession-specific Swedish. Lingio intends to make their existing product more interactive and this research paper aims to research aspects involved in that. This study evaluates system utterances that are planned to be used in Lingio’s product for language learners to use in their practice and studies the feasibility of using the natural language model cosine similarity in classifying the correctness of answers to these utterances. This report also looks at whether it best to use crowd sourced material or a golden standard as benchmark for a correct answer. The results indicate that there are a number of improvements and developments that need to be made to the model in order for it to accurately classify answers due to its formulation and the complexity of human language. It is also concluded that the utterances by Lingio might need to be further developed in order to be efficient in their use for learning language and that crowd sourced material works better than a golden standard. The study makes several interesting observations from the collected data and analysis, aiming to contribute to further research in natural language engineering when it comes to text classification and digital language learning. / Globaliseringen medför flertal konsekvenser för växande företag. En av utmaningarna som företag står inför är anställandet av tillräckligt med kompentent personal. För många företag står språkbarriären mellan de och att anställa kompetens, arbetsökande har ofta inte tillräckligt med språkkunskaper för att klara av jobbet. Lingio är företag som arbetar med just detta, deras produkt är en digital applikation som undervisar yrkesspecific svenska, en effektiv lösning för den som vill fokusera sin inlärning av språket inför ett jobb. Syftet är att hjälpa Lingio i utvecklingen av deras produkt, närmare bestämt i arbetet med att göra den mer interaktiv. Detta görs genom att undersöka effektiviteten hos applikationens yttranden som används för inlärningssyfte och att använda en språkteknologisk modell för att klassificera en användares svar till ett yttrande. Vidare analyseras huruvida det är bäst att använda en golden standard eller insamlat material från enkäter som referenspunkt för ett korrekt yttrande. Resultatet visar att modellen har flertal svagheter och behöver utvecklas för att kunna göra klassificeringen på ett korrekt sätt och att det finns utrymme för bättring när det kommer till yttrandena. Det visas även att insamlat material från enkäter fungerar bättre än en golden standard. cosine similarity system utterance golden standard Computer and Information Sciences Data- och informationsvetenskap
37	Experiments on deep face recognition using partial faces Elmahmudi, Ali A.M., Ugail, Hassan January 2018 (has links) Yes / Face recognition is a very current subject of great interest in the area of visual computing. In the past, numerous face recognition and authentication approaches have been proposed, though the great majority of them use full frontal faces both for training machine learning algorithms and for measuring the recognition rates. In this paper, we discuss some novel experiments to test the performance of machine learning, especially the performance of deep learning, using partial faces as training and recognition cues. Thus, this study sharply differs from the common approaches of using the full face for recognition tasks. In particular, we study the rate of recognition subject to the various parts of the face such as the eyes, mouth, nose and the forehead. In this study, we use a convolutional neural network based architecture along with the pre-trained VGG-Face model to extract features for training. We then use two classifiers namely the cosine similarity and the linear support vector machine to test the recognition rates. We ran our experiments on the Brazilian FEI dataset consisting of 200 subjects. Our results show that the cheek of the face has the lowest recognition rate with 15% while the (top, bottom and right) half and the 3/4 of the face have near 100% recognition rates. / Supported in part by the European Union's Horizon 2020 Programme H2020-MSCA-RISE-2017, under the project PDE-GIR with grant number 778035. Face recognition Partial face Deep learning Cosine similarity Support vector machine Feature extraction Machine learning
38	Performance Evaluation of Raised-Cosine Wavelet for Multicarrier Applications Anoh, Kelvin O.O., Abd-Alhameed, Raed, Ochonogor, O., Dama, Yousef A.S., Jones, Steven M.R., Mapoka, Trust T. January 2014 (has links) No / Wavelets are alternative building kernels of the multicarrier systems, such as the orthogonal frequency division multiplexing (OFDM). The wavelets can be designed by changing the parent basis functions or constructing new filters. Some two new wavelets are considered for multicarrier design; one is designed using raised-cosine functions while the other was constructed using ideal filters. The spectrums of raisedcosine wavelet filters are controlled by a roll-off factor which leads to many distorting sidelobes. The second family of wavelet, which the raised-cosine wavelet is compared to, have no distorting sidelobes. It will be shown that raised-cosine wavelets are less suitable for multicarrier design in multicarrier environment, in terms of BER when compared to the wavelet constructed from the ideal filter. Wavelets Ideal filter OFDM Multicarrier system Raised-cosine
39	MDCT Domain Enhancements For Audio Processing Suresh, K 08 1900 (has links) (PDF) Modified discrete cosine transform (MDCT) derived from DCT IV has emerged as the most suitable choice for transform domain audio coding applications due to its time domain alias cancellation property and de-correlation capability. In the present research work, we focus on MDCT domain analysis of audio signals for compression and other applications. We have derived algorithms for linear filtering in DCT IV and DST IV domains for symmetric and non-symmetric filter impulse responses. These results are also extended to MDCT and MDST domains which have the special property of time domain alias cancellation. We also derive filtering algorithms for the DCT II and DCT III domains. Comparison with other methods in the literature shows that, the new algorithm developed is computationally MAC efficient. These results are useful for MDCT domain audio processing such as reverb synthesis, without having to reconstruct the time domain signal and then perform the necessary filtering operations. In audio coding, the psychoacoustic model plays a crucial role and is used to estimate the masking thresholds for adaptive bit-allocation. Transparent quality audio coding is possible if the quantization noise is kept below the masking threshold for each frame. In the existing methods, the masking threshold is calculated using the DFT of the signal frame separately for MDCT domain adaptive quantization. We have extended the spectral integration based psychoacoustic model proposed for sinusoidal modeling of audio signals to the MDCT domain. This has been possible because of the detailed analysis of the relation between DFT and MDCT; we interpret the MDCT coefficients as co-sinusoids and then apply the sinusoidal masking model. The validity of the masking threshold so derived is verified through listening tests as well as objective measures. Parametric coding techniques are used for low bit rate encoding of multi-channel audio such as 5.1 format surround audio. In these techniques, the surround channels are synthesized at the receiver using the analysis parameters of the parametric model. We develop algorithms for MDCT domain analysis and synthesis of reverberation. Integrating these ideas, a parametric audio coder is developed in the MDCT domain. For the parameter estimation, we use a novel analysis by synthesis scheme in the MDCT domain which results in better modeling of the spatial audio. The resulting parametric stereo coder is able to synthesize acceptable quality stereo audio from the mono audio channel and a side information of approximately 11 kbps. Further, an experimental audio coder is developed in the MDCT domain incorporating the new psychoacoustic model and the parametric model. Sound Recodings Audio Signal - Data Processing Audio Processing Audio Signal Processing MDCT Domain Modified Discrete Cosine Transform Discrete Cosine Transform (DCT) Discrete Sine Transform (DST) Discrete Fourier Transform (DFT) Communication Engineering
40	A comparison of different methods in their ability to compare semantic similarity between articles and press releases / En jämförelse av olika metoder i deras förmåga att jämföra semantisk likhet mellan artiklar och pressmeddelanden Andersson, Julius January 2022 (has links) The goal of a press release is to have the information spread as widely as possible. A suitable approach to distribute the information is to target journalists who are likely to distribute the information further. Deciding which journalists to target has traditionally been performed manually without intelligent digital assistance and therefore has been a time consuming task. Machine learning can be used to assist the user by predicting a ranking of journalists based on their most semantically similar written article to the press release. The purpose of this thesis was to compare different methods in their ability to compare semantic similarity between articles and press releases when used for the task of ranking journalists. Three methods were chosen for comparison: (1.) TF-IDF together with cosine similarity, (2.) TF-IDF together with soft-cosine similarity and (3.) sentence mover’s distance (SMD) together with SBERT. Based on the proposed heuristic success metric, both TF-IDF methods outperformed the SMD method. The best performing method was TF-IDF with soft-cosine similarity. / Målet med ett pressmeddelande är att få informationen att spriddas till så många som möjligt. Ett lämpligt tillvägagångssätt för att sprida informationen är att rikta in sig på journalister som sannolikt kommer att sprida informationen vidare. Beslutet om vilka journalister man ska rikta sig till har traditionellt utförts manuellt utan intelligent digital assistans och har därför varit en tidskrävande uppgift. Maskininlärning kan användas för att hjälpa användaren genom att förutsäga en rankning av journalister baserat på deras mest semantiskt liknande skrivna artikel till pressmeddelandet. Syftet med denna uppsats var att jämföra olika metoder i deras förmåga att jämföra semantisk likhet mellan artiklar och pressmeddelanden när de används för att rangordna journalister. Tre metoder valdes för jämförelse: (1.) TF-IDF tillsammans med cosinus likhet, (2.) TF-IDF tillsammans med mjuk-cosinus likhet och (3.) sentence mover’s distance (SMD) tillsammans med SBERT. Baserat på det föreslagna heuristiska framgångsmåttet överträffade båda TF-IDF-metoderna SMD-metoden. Den bäst presterande metoden var TF-IDF med mjuk-cosinus likhet. Semantic similarity TF-IDF SBERT Cosine similarity Soft-cosine similarity Sentence mover’s distance Semantisk likhet TF-IDF SBERT Cosinus likhet Mjuk-cosinus likhet Sentence mover’s distance Computer and Information Sciences Data- och informationsvetenskap

Search results