Global ETD Search

21	Mera sličnosti između modela Gausovih smeša zasnovana na transformaciji prostora parametara Krstanović Lidija 25 September 2017 (has links) <p>Predmet istraživanja ovog rada je istraživanje i eksploatacija mogućnosti da parametri Gausovih komponenti korišćenih Gaussian mixture modela  (GMM) aproksimativno leže na niže dimenzionalnoj površi umetnutoj u konusu pozitivno definitnih matrica. U tu svrhu uvodimo novu, mnogo efikasniju meru sličnosti između GMM-ova projektovanjem LPP-tipa parametara komponenti iz više dimenzionalnog parametarskog originalno konfiguracijskog prostora u prostor značajno niže dimenzionalnosti. Prema tome, nalaženje distance između dva GMM-a iz originalnog prostora se redukuje na nalaženje distance između dva skupa niže dimenzionalnih euklidskih vektora, ponderisanih odgovarajućim težinama. Predložena mera je pogodna za primene koje zahtevaju visoko dimenzionalni prostor obeležja i/ili veliki ukupan broj Gausovih komponenti. Razrađena metodologija je primenjena kako na sintetičkim tako i na realnim eksperimentalnim podacima.</p> / <p>This thesis studies the possibility that the parameters of Gaussian components of a<br />particular Gaussian Mixture Model (GMM) lie approximately on a lower-dimensional<br />surface embedded in the cone of positive definite matrices. For that case, we deliver<br />novel, more efficient similarity measure between GMMs, by LPP-like projecting the<br />components of a particular GMM, from the high dimensional original parameter space,<br />to a much lower dimensional space. Thus, finding the distance between two GMMs in<br />the original space is reduced to finding the distance between sets of lower<br />dimensional euclidian vectors, pondered by corresponding weights. The proposed<br />measure is suitable for applications that utilize high dimensional feature spaces and/or<br />large overall number of Gaussian components. We confirm our results on artificial, as<br />well as real experimental data.</p>
22	Contributions for Handling Big Data Heterogeneity. Using Intuitionistic Fuzzy Set Theory and Similarity Measures for Classifying Heterogeneous Data Ali, Najat January 2019 (has links) A huge amount of data is generated daily by digital technologies such as social media, web logs, traffic sensors, on-line transactions, tracking data, videos, and so on. This has led to the archiving and storage of larger and larger datasets, many of which are multi-modal, or contain different types of data which contribute to the problem that is now known as “Big Data”. In the area of Big Data, volume, variety and velocity problems remain difficult to solve. The work presented in this thesis focuses on the variety aspect of Big Data. For example, data can come in various and mixed formats for the same feature(attribute) or different features and can be identified mainly by one of the following data types: real-valued, crisp and linguistic values. The increasing variety and ambiguity of such data are particularly challenging to process and to build accurate machine learning models. Therefore, data heterogeneity requires new methods of analysis and modelling techniques to enable useful information extraction and the modelling of achievable tasks. In this thesis, new approaches are proposed for handling heterogeneous Big Data. these include two techniques for filtering heterogeneous data objects are proposed. The two techniques called Two-Dimensional Similarity Space(2DSS) for data described by numeric and categorical features, and Three-Dimensional Similarity Space(3DSS) for real-valued, crisp and linguistic data are proposed for filtering such data. Both filtering techniques are used in this research to reduce the noise from the initial dataset and make the dataset more homogeneous. Furthermore, a new similarity measure based on intuitionistic fuzzy set theory is proposed. The proposed measure is used to handle the heterogeneity and ambiguity within crisp and linguistic data. In addition, new combine similarity models are proposed which allow for a comparison between the heterogeneous data objects represented by a combination of crisp and linguistic values. Diverse examples are used to illustrate and discuss the efficiency of the proposed similarity models. The thesis also presents modification of the k-Nearest Neighbour classifier, called k-Nearest Neighbour Weighted Average (k-NNWA), to classify the heterogeneous dataset described by real-valued, crisp and linguistic data. Finally, the thesis also introduces a novel classification model, called FCCM (Filter Combined Classification Model), for heterogeneous data classification. The proposed model combines the advantages of the 3DSS and k-NNWA classifier and outperforms the latter algorithm. All the proposed models and techniques have been applied to weather datasets and evaluated using accuracy, Fscore and ROC area measures. The experiments revealed that the proposed filtering techniques are an efficient approach for removing noise from heterogeneous data and improving the performance of classification models. Moreover, the experiments showed that the proposed similarity measure for intuitionistic fuzzy data is capable of handling the fuzziness of heterogeneous data and the intuitionistic fuzzy set theory offers some promise in solving some Big Data problems by handling the uncertainties, and the heterogeneity of the data. Big data Heterogeneous data Similarity measures Intuitionistic fuzzy set theory Numeric features Categorical features Intuitionistic fuzzy features Classification model
23	Fuzzer Test Log Analysis Using Machine Learning : Framework to analyze logs and provide feedback to guide the fuzzer Yadav, Jyoti January 2018 (has links) In this modern world machine learning and deep learning have become popular choice for analysis and identifying various patterns on data in large volumes. The focus of the thesis work has been on the design of the alternative strategies using machine learning to guide the fuzzer in selecting the most promising test cases. Thesis work mainly focuses on the analysis of the data by using machine learning techniques. A detailed analysis study and work is carried out in multiple phases. First phase is targeted to convert the data into suitable format(pre-processing) so that necessary features can be extracted and fed as input to the unsupervised machine learning algorithms. Machine learning algorithms accepts the input data in form of matrices which represents the dimensionality of the extracted features. Several experiments and run time benchmarks have been conducted to choose most efficient algorithm based on execution time and results accuracy. Finally, the best choice has been implanted to get the desired result. The second phase of the work deals with applying supervised learning over clustering results. The final phase describes how an incremental learning model is built to score the test case logs and return their score in near real time which can act as feedback to guide the fuzzer. / I denna moderna värld har maskininlärning och djup inlärning blivit populärt val för analys och identifiering av olika mönster på data i stora volymer. Uppsatsen har fokuserat på utformningen av de alternativa strategierna med maskininlärning för att styra fuzzer i valet av de mest lovande testfallen. Examensarbete fokuserar huvudsakligen på analys av data med hjälp av maskininlärningsteknik. En detaljerad analysstudie och arbete utförs i flera faser. Första fasen är inriktad på att konvertera data till lämpligt format (förbehandling) så att nödvändiga funktioner kan extraheras och matas som inmatning till de oövervakade maskininlärningsalgoritmerna. Maskininlärningsalgoritmer accepterar ingångsdata i form av matriser som representerar dimensionen av de extraherade funktionerna. Flera experiment och körtider har genomförts för att välja den mest effektiva algoritmen baserat på exekveringstid och resultatnoggrannhet. Slutligen har det bästa valet implanterats för att få önskat resultat. Den andra fasen av arbetet handlar om att tillämpa övervakat lärande över klusterresultat. Slutfasen beskriver hur en inkrementell inlärningsmodell är uppbyggd för att få poäng i testfallsloggarna och returnera poängen i nära realtid vilket kan fungera som feedback för att styra fuzzer. Computer and Information Sciences Data- och informationsvetenskap
24	Travel Diary Semantics Enrichment of Trajectoriesbased on Trajectory Similarity Measures LIU, RUI January 2018 (has links) Trajectory data is playing an increasingly important role in our daily lives, as well as in commercial applications and scientific research. With the rapid development andpopularity of GPS, people can locate themselves in real time. Therefore, the users’behavior information can be collected by analyzing their GPS trajectory data, so as topredict their new trajectories’ destinations, ways of travelling and even thetransportation mode they use, which forms a complete personal travel diary. The taskin this thesis is to implement travel diary semantics enrichment of user’s trajectoriesbased on the historical labeled data of the user and trajectory similarity measures.Specially, this dissertation studies the following tasks: Firstly, trip segmentationconcerns detecting the trips from trajectory which is an unbounded sequence oftimestamp locations of the user. This means that it is important to detect the stops,moves and trips of the user between two consecutive stops. In this thesis, a heuristicrule is used to identify the stops. Secondly, tripleg segmentation concerns identifyingthe location / time instances between two triplegs where / when a user changesbetween transport modes in the user's trajectory, also called makes transport modetransitions. Finally, mode inference concerns identifying travel mode for each tripleg.Specially, steps 2 and 3 are both based on the same trajectory similarity measure andproject the information from the matched similar trip trajectory onto the unlabeled triptrajectory. The empirical evaluation of these three tasks is based on real word data set(contains 4240 trips and 5451 triplegs with 14 travel modes for 206 users using oneweek study period) and the experiment performance (including trends, coverage andaccuracy) are evaluated and accuracy is around 25% for trip segmentation; accuracyvaries between 50% and 55% for tripleg segmentation; for mode inference, it isbetween 55% and 60%. Moreover, accuracy is higher for longer trips than shortertrips, probably because people have more mode choices in short distance trips (likemoped, bus and car), which makes the measure more confused and the accuracy canbe increased by nearly 10% with the help of reverse trip identifiable, because it makesa trip have more similar historical trips and increases the probability that a newunlabeled trip can be matched based on its historical trips. trajectory similarity measures trip segmentation tripleg segmentation mode inference Annan geovetenskap och miljövetenskap
25	A framework for comparing heterogeneous objects: on the similarity measurements for fuzzy, numerical and categorical attributes Bashon, Yasmina M., Neagu, Daniel, Ridley, Mick J. 09 1900 (has links) No / Real-world data collections are often heterogeneous (represented by a set of mixed attributes data types: numerical, categorical and fuzzy); since most available similarity measures can only be applied to one type of data, it becomes essential to construct an appropriate similarity measure for comparing such complex data. In this paper, a framework of new and unified similarity measures is proposed for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes. Examples are used to illustrate, compare and discuss the applications and efficiency of the proposed approach to heterogeneous data comparison and clustering. Similarity measures ; Fuzzy objects ; Fuzzy attributes ; Numerical attributes ; Categorical attributes ; Clustering-algorithm ; Classification ; Information ; Distance ; Words ; Sets
26	Contributions to fuzzy object comparison and applications : similarity measures for fuzzy and heterogeneous data and their applications Bashon, Yasmina Massoud January 2013 (has links) This thesis makes an original contribution to knowledge in the fi eld of data objects' comparison where the objects are described by attributes of fuzzy or heterogeneous (numeric and symbolic) data types. Many real world database systems and applications require information management components that provide support for managing such imperfect and heterogeneous data objects. For example, with new online information made available from various sources, in semi-structured, structured or unstructured representations, new information usage and search algorithms must consider where such data collections may contain objects/records with di fferent types of data: fuzzy, numerical and categorical for the same attributes. New approaches of similarity have been presented in this research to support such data comparison. A generalisation of both geometric and set theoretical similarity models has enabled propose new similarity measures presented in this thesis, to handle the vagueness (fuzzy data type) within data objects. A framework of new and unif ied similarity measures for comparing heterogeneous objects described by numerical, categorical and fuzzy attributes has also been introduced. Examples are used to illustrate, compare and discuss the applications and e fficiency of the proposed approaches to heterogeneous data comparison. 003
27	The predictability problem Ong, James Kwan Yau January 2007 (has links) Wir versuchen herauszufinden, ob das subjektive Maß der Cloze-Vorhersagbarkeit mit der Kombination objektiver Maße (semantische und n-gram-Maße) geschätzt werden kann, die auf den statistischen Eigenschaften von Textkorpora beruhen. Die semantischen Maße werden entweder durch Abfragen von Internet-Suchmaschinen oder durch die Anwendung der Latent Semantic Analysis gebildet, während die n-gram-Wortmaße allein auf den Ergebnissen von Internet-Suchmaschinen basieren. Weiterhin untersuchen wir die Rolle der Cloze-Vorhersagbarkeit in SWIFT, einem Modell der Blickkontrolle, und wägen ab, ob andere Parameter den der Vorhersagbarkeit ersetzen können. Unsere Ergebnisse legen nahe, dass ein computationales Modell, welches Vorhersagbarkeitswerte berechnet, nicht nur Maße beachten muss, die die Relatiertheit eines Wortes zum Kontext darstellen; das Vorhandensein eines Maßes bezüglich der Nicht-Relatiertheit ist von ebenso großer Bedeutung. Obwohl hier jedoch nur Relatiertheits-Maße zur Verfügung stehen, sollte SWIFT ebensogute Ergebnisse liefern, wenn wir Cloze-Vorhersagbarkeit mit unseren Maßen ersetzen. / We try to determine whether it is possible to approximate the subjective Cloze predictability measure with two types of objective measures, semantic and word n-gram measures, based on the statistical properties of text corpora. The semantic measures are constructed either by querying Internet search engines or by applying Latent Semantic Analysis, while the word n-gram measures solely depend on the results of Internet search engines. We also analyse the role of Cloze predictability in the SWIFT eye movement model, and evaluate whether other parameters might be able to take the place of predictability. Our results suggest that a computational model that generates predictability values not only needs to use measures that can determine the relatedness of a word to its context; the presence of measures that assert unrelatedness is just as important. In spite of the fact, however, that we only have similarity measures, we predict that SWIFT should perform just as well when we replace Cloze predictability with our measures. Cloze-Vorhersagbarkeit Blickbewegungen Latente-Semantische-Analyse Wort-n-Gramme-Wahrscheinlichkeit Ähnlichkeit-Masse Cloze predictability eye movements Latent Semantic Analysis word n-gram probability similarity measures Mathematics
28	Μελέτη και συγκριτική αξιολόγηση μεθόδων δόμησης περιεχομένου ιστοτόπων : εφαρμογή σε ειδησεογραφικούς ιστοτόπους Στογιάννος, Νικόλαος-Αλέξανδρος 20 April 2011 (has links) Η κατάλληλη οργάνωση του περιεχομένου ενός ιστοτόπου, έτσι ώστε να αυξάνεται η ευρεσιμότητα των πληροφοριών και να διευκολύνεται η επιτυχής ολοκλήρωση των τυπικών εργασιών των χρηστών, αποτελεί έναν από τους πρωταρχικούς στόχους των σχεδιαστών ιστοτόπων. Οι υπάρχουσες τεχνικές του πεδίου Αλληλεπίδρασης-Ανθρώπου Υπολογιστή που συνεισφέρουν στην επίτευξη αυτού του στόχου συχνά αγνοούνται εξαιτίας των απαιτήσεών τους σε χρονικούς και οικονομικούς πόρους. Ειδικότερα για ειδησεογραφικούς ιστοτόπους, τόσο το μέγεθος τους όσο και η καθημερινή προσθήκη και τροποποίηση των παρεχόμενων πληροφοριών, καθιστούν αναγκαία τη χρήση αποδοτικότερων τεχνικών για την οργάνωση του περιεχομένου τους. Στην εργασία αυτή διερευνούμε την αποτελεσματικότητα μίας μεθόδου, επονομαζόμενης AutoCardSorter, που έχει προταθεί στη βιβλιογραφία για την ημιαυτόματη κατηγοριοποίηση ιστοσελίδων, βάσει των σημασιολογικών συσχετίσεων του περιεχομένου τους, στο πλαίσιο οργάνωσης των πληροφοριών ειδησεογραφικών ιστοτόπων. Για το σκοπό αυτό διενεργήθηκαν πέντε συνολικά μελέτες, στις οποίες πραγματοποιήθηκε τόσο ποσοτική όσο και ποιοτική σύγκριση των κατηγοριοποιήσεων που προέκυψαν από συμμετέχοντες σε αντίστοιχες μελέτες ταξινόμησης καρτών ανοικτού και κλειστού τύπου, με τα αποτελέσματα της τεχνικής AutoCardSorter. Από την ανάλυση των αποτελεσμάτων προέκυψε ότι η AutoCardSorter παρήγαγε ομαδοποιήσεις άρθρων που βρίσκονται σε μεγάλη συμφωνία με αυτές των συμμετεχόντων στις μελέτες, αλλά με σημαντικά αποδοτικότερο τρόπο, επιβεβαιώνοντας προηγούμενες παρόμοιες μελέτες σε ιστοτόπους άλλων θεματικών κατηγοριών. Επιπρόσθετα, οι μελέτες έδειξαν ότι μία ελαφρώς τροποποιημένη εκδοχή της AutoCardSorter τοποθετεί νέα άρθρα σε προϋπάρχουσες κατηγορίες με αρκετά μικρότερο ποσοστό συμφωνίας συγκριτικά με τον τρόπο που επέλεξαν οι συμμετέχοντες. Η εργασία ολοκληρώνεται με την παρουσίαση κατευθύνσεων για την βελτίωση της αποτελεσματικότητας της AutoCardSorter, τόσο στο πλαίσιο οργάνωσης του περιεχομένου ειδησεογραφικών ιστοτόπων όσο και γενικότερα. / The proper structure of a website's content, so as to increase the findability of the information provided and to ease the typical user task-making, is one of the primary goals of website designers. The existing methods from the field of HCI that assist designers in this, are often neglected due to their high cost and human resources demanded. Even more so on News Sites, their size and the daily content updating call for improved and more efficient techniques. In this thesis we investigate the efficiency of a novel method, called AutoCardSorter, that has been suggested in bibliography for the semi-automatic content categorisation based on the semantic similarity of each webpage-content. To accomplish this we conducted five comparative studies in which the method was compared, to the primary alternatives of the classic Card Sorting method (open, closed). The analysis of the results showed that AutoCardSorter suggested article categories with high relavance to the ones suggested from a group of human subjects participating in the CardSort studies, although in a much more efficient way. This confirms the results of similar previous studies on websites of other themes (eg. travel, education). Moreover, the studies showed that a modified version of the method places articles under pre-existing categories with significant less relavance to the categorisation suggested by the participants. The thesis is concluded with the proposal of different ways to improve the proposed method's efficiency, both in the content of News Sites and in general. Ταξινόμηση καρτών 025.042 2 Human-computer interaction Semantic similarity measures Latent semantic analysis (LSA) Information architecture Card sorting
29	Identifikace osob pomocí biometrie sítnice / Identification of persons using retinal biometry Klimešová, Lenka January 2018 (has links) This paper deals with identification of persons using retinal biometry. The retinal vasculature is invariant and unique to everyone, which determines it for biometric purposes. The first part of the work includes information about biometrics, biometric systems and reliability measures. The next part describes the principle of using experimental video ophthalmoscope, which was used for retinal vascular imaging and includes the literature research of use of retinal images for biometrics, feature extraction methods and similarity measures. Finally, two algorithms to use the input data are proposed and realized in programming environment MATLAB®. The methods are tested and evaluated on a data set from experimental video ophthalmoscope and on publicly available STRaDe and DRIVE databases.
30	Optimalizace pro registraci obrazů založená na genetických algoritmech / Optimization based on genetic algorithms for image registration Horáková, Pavla January 2012 (has links) Diploma thesis is focused on global optimization methods and their utilization for medical image registration. The main aim is creation of the genetic algorithm and test its functionality on synthetic data. Besides test functions and test figures algorithm was subjected to real medical images. For this purpose was created graphical user interface with choise of parameters according to actual requirement. After adding an iterative gradient method it became of hybrid genetic algorithm.

Search results