Global ETD Search

31	Word2vec2syn : Synonymidentifiering med Word2vec / Word2vec2syn : Synonym Identification using Word2vec Pettersson, Tove January 2019 (has links) Inom NLP (eng. natural language processing) är synonymidentifiering en av de språkvetenskapliga utmaningarna som många antar. Fodina Language Technology AB är ett företag som skapat ett verktyg, Termograph, ämnad att samla termer inom företag och hålla den interna språkanvändningen konsekvent. En metodkombination bestående av språkteknologiska strategier utgör synonymidentifieringen och Fodina önskar ett större täckningsområde samt mer dynamik i framtagningsprocessen. Därav syftade detta arbete till att ta fram en ny metod, utöver metodkombinationen, för just synonymidentifiering. En färdigtränad Word2vec-modell användes och den inbyggda funktionen för cosinuslikheten användes för att få fram synonymer och skapa kluster. Modellen validerades, testades och utvärderades i förhållande till metodkombinationen. Valideringen visade att modellen skattade inom ett rimligt mänskligt spann i genomsnitt 60,30 % av gångerna och Spearmans korrelation visade på en signifikant stark korrelation. Testningen visade att 32 % av de bearbetade klustren innehöll matchande synonymförslag. Utvärderingen visade att i de fall som förslagen inte matchade så var modellens synonymförslag korrekta i 5,73 % av fallen jämfört med 3,07 % för metodkombinationen. Den interna reliabiliteten för utvärderarna visade på en befintlig men svag enighet, Fleiss Kappa = 0,19, CI(0,06, 0,33). Trots viss osäkerhet i resultaten påvisas ändå möjligheter för vidare användning av word2vec-modeller inom Fodinas synonymidentifiering. / One of the main challenges in the field of natural language processing (NLP) is synonym identification. Fodina Language Technology AB is the company behind the tool, Termograph, that aims to collect terms and provide a consistent language within companies. A combination of multiple methods from the field of language technology constitutes the synonym identification and Fodina would like to improve the area of coverage and increase the dynamics of the working process. The focus of this thesis was therefore to evaluate a new method for synonym identification beyond the already used combination. Initially a trained Word2vec model was used and for the synonym identification the built-in-function for cosine similarity was applied in order to create clusters. The model was validated, tested and evaluated relative to the combination. The validation implicated that the model made estimations within a fair human-based range in an average of 60.30% and Spearmans correlation indicated a strong significant correlation. The testing showed that 32% of the processed synonym clusters contained matching synonym suggestions. The evaluation showed that the synonym suggestions from the model was correct in 5.73% of all cases compared to 3.07% for the combination in the cases where the clusters did not match. The interrater reliability indicated a slight agreement, Fleiss’ Kappa = 0.19, CI(0.06, 0.33). Despite uncertainty in the results, opportunities for further use of Word2vec-models within Fodina’s synonym identification are nevertheless demonstrated. Word2vec synonym identification vector space model word vectors cosine similarity Word2vec synonymidentifiering vektorrymdsmodell ordvektorer cosinuslikhet
32	Integrating Structure and Meaning: Using Holographic Reduced Representations to Improve Automatic Text Classification Fishbein, Jonathan Michael January 2008 (has links) Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words (Bag-of-Words) or `concepts' (Bag-of-Concepts). Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduced Representations (HRRs) as a technique to encode both semantic and syntactic structure, though in very different ways. This method is unique in the literature in that it encodes the structure across all features of the document vector while preserving text semantics. Our method does not increase the dimensionality of the document vectors, allowing for efficient computation and storage. We present the results of various Support Vector Machine classification experiments that demonstrate the superiority of this method over Bag-of-Concepts representations and improvement over Bag-of-Words in certain classification contexts. Holographic Reduced Representations Vector Space Model Text Classification Parts of Speech Tagging Random Indexing Support Vector Machines Syntactic Structure Semantics System Design Engineering
33	Integrating Structure and Meaning: Using Holographic Reduced Representations to Improve Automatic Text Classification Fishbein, Jonathan Michael January 2008 (has links) Current representation schemes for automatic text classification treat documents as syntactically unstructured collections of words (Bag-of-Words) or `concepts' (Bag-of-Concepts). Past attempts to encode syntactic structure have treated part-of-speech information as another word-like feature, but have been shown to be less effective than non-structural approaches. We propose a new representation scheme using Holographic Reduced Representations (HRRs) as a technique to encode both semantic and syntactic structure, though in very different ways. This method is unique in the literature in that it encodes the structure across all features of the document vector while preserving text semantics. Our method does not increase the dimensionality of the document vectors, allowing for efficient computation and storage. We present the results of various Support Vector Machine classification experiments that demonstrate the superiority of this method over Bag-of-Concepts representations and improvement over Bag-of-Words in certain classification contexts. Holographic Reduced Representations Vector Space Model Text Classification Parts of Speech Tagging Random Indexing Support Vector Machines Syntactic Structure Semantics System Design Engineering
34	Φασματικές μέθοδοι ανάκτησης πληροφορίας, εργαλεία λογισμικού και εφαρμογές Ζεϊμπέκης, Δημήτριος 20 October 2009 (has links) Η διαρκώς αυξανόμενη διαθεσιμότητα ηλεκτρονικών πηγών πληροφόρησης έχει δημιουργήσει νέα δεδομένα και απαιτήσεις στην περιοχή της Ανάκτησης Πληροφορίας. Υπάρχει αδιάκοπη ανάγκη για βελτίωση των υπαρχόντων και σχεδίαση νέων αλγορίθμων, που να επιτυγχάνουν υψηλή απόδοση και αξιοπιστία. Ένα επιπλέον ζητούμενο είναι η κατασκευή λογισμικού περιβάλλοντος που θα διευκολύνει τη χρήση υπαρχόντων αλγορίθμων, την εισαγωγή νέων, το συνδυασμό τους και τη συγκριτική αξιολόγησή τους. Στην παρούσα διδακτορική διατριβή, εστιάζουμε σε μεθόδους ανάκτησης πληροφορίας (με έμφαση στην ανάκτηση κειμένου), που έχουν στον πυρήνα τους τεχνολογίες Γραμμικής Άλγεβρας και πιο συγκεκριμένα σε τεχνικές που αξιοποιούν τα φασματικά χαρακτηριστικά των μητρώων όρων-κειμένων. Υπενθυμίζουμε ότι περίοπτη θέση στην περιοχή της Ανάκτησης Πληροφορίας, όσον αφορά τις τεχνικές της γραμμικής άλγεβρας, κατέχουν οι ιδιάζουσες τιμές και τα ιδιάζοντα διανύσματα των μητρώων. Περιγράφουμε επίσης το σχεδιασμό και την κατασκευή ενός ολοκληρωμένου περιβάλλοντος που διευκολύνει τους χρήστες στην ανάπτυξη, χρήση και αξιολόγηση των αλγορίθμων που στηρίζεται στο εξαιρετικά διαδεδομένο περιβάλλον της MATLAB. Αρχικά, εξετάζουμε τα βασικά προβλήματα στην Ανάκτηση Πληροφορίας, που είναι η ομαδοποίηση, η εξαγωγή σχετικών κειμένων και η κατηγοριοποίηση. Στην πρώτη κατηγορία προβλημάτων, στόχος μας είναι η βελτίωση παραδοσιακών αλγορίθμων όπως οι k-means και PDDP. Στο πλαίσιο αυτό προτείνουμε ένα σύνολο υβριδικών τεχνικών που βασίζονται στους δύο αυτούς αλγορίθμους και αντιμετωπίζουν προβλήματα που σχετίζονται με αυτούς. Ειδικότερα, πετυχαίνουν τη βελτίωση της απόδοσής τους ως προς την ποιότητα των παρεχόμενων αποτελεσμάτων ή ως προς την ταχύτητά τους. Σε σύγκριση με τον k-means, επιτυγχάνουν την αφαίρεση του στοιχείου της τυχαιότητας που τον χαρακτηρίζει, λόγω της γνωστής ευαισθησίας του στις αρχικές συνθήκες. Επιπλέον, προτείνουμε ένα ενιαίο σύνολο αποδοτικών "μεθόδων πυρήνα" (kernel methods) που μπορούν να χρησιμοποιηθούν στην περίπτωση που τα δεδομένα του προβλήματος έχουν μη γραμμικά χαρακτηριστικά. Οι παραπάνω υβριδικές μέθοδοι εφαρμόζονται και στο πρόβλημα της μπλοκ διαγωνιοποίησης στοχαστικών μητρώων που μοντελοποιούν για παράδειγμα χημικές διεργασίες, μέσω μαρκοβιανών αλυσίδων. Τα αρχικά αποτελέσματα που έχουμε, υποδεικνύουν ότι η προσέγγιση αυτή μπορεί να βελτιώσει σημαντικά υπάρχουσες μεθόδους, παρέχοντας ταυτόχρονα προσεγγίσεις του πλήθους των μπλοκ που αντιστοιχούν σε σταθερές καταστάσεις της μαρκοβιανής αλυσίδας. Τέλος, προτείνουμε μια διαφορετική προσέγγιση με τον αλγόριθμο ομαδοποίησης Oriented k-windows ο οποίος, όπως και ο PDDP, χρησιμοποιεί ιδιάζοντα διανύσματα (ισοδύναμα, κύριους άξονες - PCA) με σκοπό την εξαγωγή πληροφορίας αναφορικά με τον κυρίαρχο προσανατολισμό των ομάδων στον Ευκλείδειο χώρο. Στη συνέχεια, παρουσιάζουμε αλγορίθμους ανάκτησης σχετικών κειμένων και αλγορίθμους κατηγοριοποίησης που βασίζονται στη "Λανθάνουσα Σημασιολογική Δεικτοδότηση" (LSI). Πιο συγκεκριμένα, παρουσιάζουμε ένα αλγοριθμικό πλαίσιο που στηρίζεται σε μια "μεθοδολογία αντιπροσώπων", με την οποία προσπαθούμε να προσεγγίσουμε σημασιολογικά μια συλλογή, εξάγοντας υποχώρους του χώρου στηλών του μητρώου όρων-κειμένων που προσεγγίζουν τον βέλτιστο υποχώρο της διάσπασης ιδιαζουσών τιμών. Η μεθοδολογία μας χρησιμοποιεί αλγορίθμους ομαδοποίησης, όπως οι υβριδικές μέθοδοι που αναφέραμε, με σκοπό τη διάσπαση του προβλήματος σε ένα σύνολο όσο γίνεται περισσότερο ανεξάρτητων προβλημάτων που μπορούν να λυθούν περισσότερο αποδοτικά. Μέσα από μια εκτεταμένη πειραματική μελέτη, δείχνουμε ότι η συγκεκριμένη μεθοδολογία μπορεί να βελτιώσει άλλες διαδεδομένες προσεγγίσεις (LSI, LLSF κ.λπ.). Επίσης, επεκτείνουμε και εφαρμόζουμε τη "μεθοδολογία αντιπροσώπων" σε μεθόδους πυρήνα, καθώς επίσης και στο πρόβλημα υπολογισμού μη αρνητικών παραγοντοποίησεων μητρώων (NMF). Δείχνουμε ότι η χρήση της μεθοδολογίας επιφέρει σημαντική μείωση του κόστους σε μνήμη και υπολογισμούς των μεθόδων πυρήνα και βελτίωση της ποιότητας των αποτελεσμάτων της NMF. Η διατριβή στάθηκε αφορμή για την ανάπτυξη ενός ολοκληρωμένου λογισμικού περιβάλλοντος. Πιο συγκεκριμένα, οι νέες μέθοδοι που αναφέραμε, καθώς και άλλες διαδεδομένες τεχνικές έχουν υλοποιηθεί και ενταχθεί στο περιβάλλον Text to Matrix Generator (TMG). Το TMG στηρίζεται κατά κύριο λόγο στη MATLAB ενώ μικρότερα τμήματά του έχουν γραφτεί σε Perl. Το TMG αποτελείται από έξι τμήματα, ενώ είναι εύκολα επεκτάσιμο. Τα τμήματα αυτά παρέχουν μια ευρεία συλλογή μεθόδων ανάκτησης πληροφορίας που αποτελείται από μεθόδους (i) κατασκευής και ανανέωσης μητρώων όρων-κειμένων, (ii) υπολογισμού προσεγγίσεων μειωμένης διάστασης και (iii) μη αρνητικών παραγοντοποιήσεων, (iv) ανάκτησης σχετικών κειμένων, (v) ομαδοποίησης και (vi) κατηγοριοποίησης. Για όλα τα παραπάνω, το εργαλείο παρέχει κατάλληλα προσαρμοσμένες γραφικές διεπαφές που διευκολύνουν το χρήστη. Εναλλακτικά, οι λειτουργίες του μπορούν να κληθούν απευθείας από τη γραμμή εντολών. Το TMG διευκολύνει την ταχεία προτοτυποποίηση αλγορίθμων και διατίθεται ελεύθερα μέσω ιστοσελίδας (http://scgroup.hpclab.ceid.upatras.gr/scgroup/Projects/TMG/). Από αναζητήσεις τεκμηριώνεται ότι έχει υποστηρίξει πολλούς επιστήμονες παγκοσμίως τόσο σε ερευνητικό όσο και σε εκπαιδευτικό επίπεδο. Περιγράφουμε επίσης τις πρόσφατες εργασίες μας για την ανάδειξη του TMG ως υπηρεσίας στον Παγκόσμιο Ιστό. Ειδικότερα, αναπτύσσεται λογισμικό για την απομακρυσμένη χρήση του TMG μέσω ειδικού API και τίθενται οι βάσεις για μελλοντική έρευνα που θα αφορά στην βελτιωμένη επίδοση και στην αποδοτική χρήση του συστήματος. / The amount of digital data is rapidly growing and continuously motivates research innovation in Information Retrieval. Much of the data is text, so there is an ever present need to push the field of Text Mining forward by designing and implementing novel, effective algorithms that attain high performance and reliability. It is also desirable to develop software environments that facilitate not only access to existing methods, but also enable the rapid prototyping, performance evaluation and incorporation of new algorithms for Text Mining. In this research we focus on algorithms that use Linear Algebra and Matrix Analysis tools as computational kernels. We use the term spectral to highlight the fact that our methods rely on the spectral characteristics of the underlying term-document matrices that encode the texts under study. We consolidate our new and existing algorithms in a software environment, called TMG, that we built on top of MATLAB and Perl. First, we consider the basic text mining tasks, namely clustering, ad-hoc retrieval and text classication. In clustering, we focus on a well-known spectral method, called PDDP (Principal Direction Divisive Partitioning) and investigate hybrid methods that combine PDDP and standard workhorses such as k-means. In particular, the proposed methods improve the performance of the aforementioned algorithms, regarding the quality of the attained clustering and/or their speed. Compared with k-means, our algorithms eliminate the non-determinism originating from k-means' initialization phase. We also propose a framework for kernel methods, that can be used in case the data exhibit non-linearities. Our spectral clustering algorithms are applied in sparse matrix reordering, specifically in the block diagonalization of row stochastic matrices. In addition to helping in the intepretation of a recent method for identifying metastable states of Markov chains, they also provide the means to improve their performance. Initial results, demonstrate that the proposed methodology can improve significantly over existing techniques, deriving approximations of the number of blocks corresponding to dinstict stable states of the underlying Markov chain. We also show how to use spectral methods to improve the performance of a density-based clustering approach, called Oriented k-windows. In particular, the algorithm uses information derived from the Principal Component Analysis (PCA), in order to guide a windowing technique, namely k-windows, that could give insights about the data orientation. The next part of the thesis deals with ad-hoc retrieval and classification methods, based on Latent Semantic Indexing (LSI). We propose an algorithmic framework based on a "representatives methodology", in order to approximate a collection semantically, by extracting subspaces of the column space of the term-document matrix, that approximate the optimal subspace derived by the SVD. Our methodology uses clustering techniques, like the aforementioned hybrid methods, in a preprocessing stage. Our objective is to split the problem into a set of independent subproblems that could be solved more efficiently. Results from extensive experimentation indicate that our methodology can improve a state-of-the-art method like LSI. We also apply the representatives methodology to kernel methods and Nonnegative Matrix Factorization (NMF). Extensive numerical experiments indicate that this methodology improves the computational cost and memory requirements of kernel methods and also increases the quality of the nonnegative approximations. We have incorporated all the proposed methods in a software environment, called Text to Matrix Generator (TMG). The first release of TMG was before this Ph.D. was even started. but has since undergone several upgrades and rewrites. TMG currently consists of six easily extensible modules. These modules provide methods for (i) constructing and updating term-document matrices, (ii) computing low rank approximations and (iii) non negative factorizations, and (iv) ad-hoc retrieval, (v) clustering and (vi) classification. TMG is accessible in two primary modes, graphical and command line and is freely downloadable from its webpage (http://scgroup.hpclab.ceid.upatras.gr/scgroup/Projects/TMG/). As our usage logs indicate, TMG is being used worldwide for research and educational uses. We also describe a brief overview of open problems and ongoing work. We describe our first version of "remote TMG", that views TMG as a Web resource and provides remote access mode to it by means of a special API. Ομαδοποίηση Κατηγοριοποίηση Προσέγγιση μητρώων 005.74 Vector space models Clustering Classification PDDP Matrix approximation Text to matrix generators CLSI Oriented k-windows
35	Variedades afins e aplicaÃÃes / Affine varieties and applications Diego Ponciano de Oliveira Lima 03 August 2013 (has links) In this paper, we consider affine varieties in vector space to analyze and understand the geometric behavior of sets solutions of systems of linear equations, solutions of linear ordinary differential equations of second order resulting from mathematical modeling of systems, etc. We observed characteristics of affine varieties in vector spaces as a subspaces vector transferred to any vector belonging to affine variety and do a comparison of geometric representations of the solution sets of problem situations, cited above, with such features. / Neste trabalho, consideramos variedades afins no espaÃo vetorial para analisar e compreender o comportamento geomÃtrico de conjuntos soluÃÃes de sistemas de equaÃÃes lineares, de soluÃÃes de equaÃÃes diferenciais ordinÃrias lineares de segunda ordem resultantes de modelagens matemÃticas de sistemas, etc. Verificamos caracterÃsticas das variedades afins em espaÃos vetoriais como um subespaÃo vetorial transladado de qualquer vetor pertencente Ã variedade afim e fazemos uma comparaÃÃo das representaÃÃes geomÃtricas dos conjuntos soluÃÃes das situaÃÃes-problema, citados acima, com tais caracterÃsticas. variedade afim espaÃo vetorial subespaÃo vetorial equaÃÃes lineares equaÃÃes diferenciais affine variety vector space vector subspace linear equations differential equations MATEMATICA
36	Módulos e grupos abelianos finitamente gerados Jesus, Elisângela Valéria de 16 May 2017 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The concept of module M on a ring A can be seen as a generalization of the concept of vector space V over a field K. In this work, we will present definitions, examples and results about modules, our main objective being to demonstrate the theorem of structures for Abelian groups that tells us that every finitely generated abelian group is the direct sum of cyclic subgroups. / O conceito de módulo M sobre um anel A pode ser visto como uma generalização do conceito de espaço vetorial V sobre um corpo K. Neste trabalho, apresentaremos definições, exemplos e resultados acerca de módulos, sendo o nosso objetivo principal demonstrar o teorema de estruturas para grupos abelianos que nos diz que todo grupo abeliano finitamente gerado é a soma direta de subgrupos cíclicos. Matemática Módulos Espaços vetoriais Grupos abelianos Anel Espaço vetorial Corpo Estrutura Modules Ring Vector space Field Structure Abelian groups CIENCIAS EXATAS E DA TERRA::MATEMATICA
37	Matching Domain Model with Source Code using Relationships Bharat, Patil Tejas January 2014 (has links) (PDF) We address the task of mapping a given domain model (e.g., an industry-standard reference model) for a given domain (e.g., ERP), with the source code of an independently developed application in the same domain. This has applications in improving the understandability of an existing application, migrating it to a more flexible architecture, or integrating it with other related applications. We build on a previous approach, which uses relationships among source code elements for improving the precision of the mapping process. We extend this approach by considering relationships among domain model elements in addition to relationships among source code elements, and also by stating the mapping process as an optimization problem. We have implemented our approach, and compared it with the previous approach. We show that our approach gives significantly better precision as well as recall than the previous approach when applied on a real industry-standard domain model and an open-source application. Information Retrieval Optimization Framework Domain Models Source Code Vector Space Model VSM Model Computer Science
38	Shlukování slov podle významu / Word Sense Clustering Jadrníček, Zbyněk January 2015 (has links) This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system.
39	Elementary proof of the Riemann—Roch Theorem Sundgren, Hampus January 2023 (has links) This thesis will cover an elementary proof of the Riemann–Roch Theorem for planecurves. We will introduce the notions of divisors, which is a convenient way of com-puting multiplicities of rational function, then continuing by introducing differentials.Furthermore we will introduce the K-vector space L(D), consisting of rational func-tions which are controlled by a divisor D. This is followed by presenting some moreresults before we arrive at an elementary proof of the Riemann–Roch Theorem. Algebraic Geometry Riemann–Roch The Riemann–Roch Theorem Divisors Differentials L(D) l(D) Vector space L(D) Proof of the Riemann–Roch Theorem Algebra and Logic Algebra och logik
40	Cohomologie de fibrés en droite sur le fibré cotangent de variétés grassmanniennes généralisées Ascah-Coallier, Isabelle 04 1900 (has links) Cette thèse s'intéresse à la cohomologie de fibrés en droite sur le fibré cotangent de variétés projectives. Plus précisément, pour $G$ un groupe algébrique simple, connexe et simplement connexe, $P$ un sous-groupe maximal de $G$ et $\omega$ un générateur dominant du groupe de caractères de $P$, on cherche à comprendre les groupes de cohomologie $H^i(T^(G/P),\mathcal{L})$ où $\mathcal{L}$ est le faisceau des sections d'un fibré en droite sur $T^(G/P)$. Sous certaines conditions, nous allons montrer qu'il existe un isomorphisme, à graduation près, entre $H^i(T^(G/P),\mathcal{L})$ et $H^i(T^(G/P),\mathcal{L}^{\vee})$ Après avoir travaillé dans un contexte théorique, nous nous intéresserons à certains sous-groupes paraboliques en lien avec les orbites nilpotentes. Dans ce cas, l'algèbre de Lie du radical unipotent de $P$, que nous noterons $\nLie$, a une structure d'espace vectoriel préhomogène. Nous pourrons alors déterminer quels cas vérifient les hypothèses nécessaires à la preuve de l'isomorphisme en montrant l'existence d'un $P$-covariant $f$ dans $\comp[\nLie]$ et en étudiant ses propriétés. Nous nous intéresserons ensuite aux singularités de la variété affine $V(f)$. Nous serons en mesure de montrer que sa normalisation est à singularités rationnelles. / In this thesis, we study the cohomology of line bundles on cotangent bundle of projective varieties. To be more precise, let $G$ be an semisimple algebraic group which is simply connected, $P$ a maximal subgroup and $\omega$ a dominant weight that generates the character group of $P$. Our goal is to understand the cohomology groups $H^i(T^(G/P),\mathcal{L})$ where $\mathcal{L}$ is the sheaf of sections of a line bundle on $T^(G/P)$. Under some conditions, we will show that there exists an isomorphism, up to grading, between $H^i(T^(G/P),\mathcal{L})$ and $H^i(T^(G/P),\mathcal{L}^{\vee})$. After we worked in a theoretical setting, we will focus on maximal parabolic subgroups related to nilpotent varieties. In this case, the Lie algebra of the unipotent radical of $P$ has a structure of prehomogeneous vector spaces. We will be able to determine which cases verify the hypothesis of the isomorphism by showing the existence of a $P$-covariant $f$ in $\comp[\nLie]$ and by studying its properties. We will be interested by the singularities of the affine variety $V(f)$. We will show that the normalisation of $V(f)$ has rational singularities. Sous-groupe parabolique maximal application moment groupe de classe module réflexif cohomologie espace vectoriel préhomogène covariant maximal parabolic subgroup moment map class group reflexive module cohomology prehomogeneous vector space

Search results