• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 112
  • 42
  • 13
  • 9
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 208
  • 208
  • 208
  • 80
  • 59
  • 54
  • 43
  • 36
  • 32
  • 28
  • 25
  • 25
  • 25
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

Bidirectional Helmholtz Machines

Shabanian, Samira 09 1900 (has links)
L'entraînement sans surveillance efficace et inférence dans les modèles génératifs profonds reste un problème difficile. Une approche assez simple, la machine de Helmholtz, consiste à entraîner du haut vers le bas un modèle génératif dirigé qui sera utilisé plus tard pour l'inférence approximative. Des résultats récents suggèrent que de meilleurs modèles génératifs peuvent être obtenus par de meilleures procédures d'inférence approximatives. Au lieu d'améliorer la procédure d'inférence, nous proposons ici un nouveau modèle, la machine de Helmholtz bidirectionnelle, qui garantit qu'on peut calculer efficacement les distributions de haut-vers-bas et de bas-vers-haut. Nous y parvenons en interprétant à les modèles haut-vers-bas et bas-vers-haut en tant que distributions d'inférence approximative, puis ensuite en définissant la distribution du modèle comme étant la moyenne géométrique de ces deux distributions. Nous dérivons une borne inférieure pour la vraisemblance de ce modèle, et nous démontrons que l'optimisation de cette borne se comporte en régulisateur. Ce régularisateur sera tel que la distance de Bhattacharyya sera minisée entre les distributions approximatives haut-vers-bas et bas-vers-haut. Cette approche produit des résultats de pointe en terme de modèles génératifs qui favorisent les réseaux significativement plus profonds. Elle permet aussi une inférence approximative amérliorée par plusieurs ordres de grandeur. De plus, nous introduisons un modèle génératif profond basé sur les modèles BiHM pour l'entraînement semi-supervisé. / Efficient unsupervised training and inference in deep generative models remains a challenging problem. One basic approach, called Helmholtz machine, involves training a top-down directed generative model together with a bottom-up auxiliary model used for approximate inference. Recent results indicate that better generative models can be obtained with better approximate inference procedures. Instead of improving the inference procedure, we here propose a new model, the bidirectional Helmholtz machine, which guarantees that the top-down and bottom-up distributions can efficiently invert each other. We achieve this by interpreting both the top-down and the bottom-up directed models as approximate inference distributions and by defining the model distribution to be the geometric mean of these two. We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized. This approach results in state of the art generative models which prefer significantly deeper architectures while it allows for orders of magnitude more efficient approximate inference. Moreover, we introduce a deep generative model for semi-supervised learning problems based on BiHM models.
172

Classification automatique pour la compréhension de la parole : vers des systèmes semi-supervisés et auto-évolutifs / Machine learning applied to speech language understanding : towards semi-supervised and self-evolving systems

Gotab, Pierre 04 December 2012 (has links)
La compréhension automatique de la parole est au confluent des deux grands domaines que sont la reconnaissance automatique de la parole et l'apprentissage automatique. Un des problèmes majeurs dans ce domaine est l'obtention d'un corpus de données conséquent afin d'obtenir des modèles statistiques performants. Les corpus de parole pour entraîner des modèles de compréhension nécessitent une intervention humaine importante, notamment dans les tâches de transcription et d'annotation sémantique. Leur coût de production est élevé et c'est la raison pour laquelle ils sont disponibles en quantité limitée.Cette thèse vise principalement à réduire ce besoin d'intervention humaine de deux façons : d'une part en réduisant la quantité de corpus annoté nécessaire à l'obtention d'un modèle grâce à des techniques d'apprentissage semi-supervisé (Self-Training, Co-Training et Active-Learning) ; et d'autre part en tirant parti des réponses de l'utilisateur du système pour améliorer le modèle de compréhension.Ce dernier point touche à un second problème rencontré par les systèmes de compréhension automatique de la parole et adressé par cette thèse : le besoin d'adapter régulièrement leurs modèles aux variations de comportement des utilisateurs ou aux modifications de l'offre de services du système / Two wide research fields named Speech Recognition and Machine Learning meet with the Automatic Speech Language Understanding. One of the main problems in this domain is to obtain a sufficient corpus to train an efficient statistical model. Such speech corpora need a lot of human involvement to transcript and semantically annotate them. Their production cost is therefore quite high and they are difficultly available.This thesis mainly aims at reducing the need of human intervention in two ways: firstly, reducing the amount of corpus needed to build a model thanks to some semi-supervised learning methods (Self-Training, Co-Training and Active-Learning); And lastly, using the answers of the system end-user to improve the comprehension model.This last point addresses another problem related to automatic speech understanding systems: the need to adapt their models to the fluctuation of end-user habits or to the modification of the services list offered by the system
173

Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Täckström, Oscar January 2013 (has links)
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
174

Agrupamento de dados baseado em comportamento coletivo e auto-organização / Data clustering based on collective behavior and self-organization

Gueleri, Roberto Alves 18 June 2013 (has links)
O aprendizado de máquina consiste de conceitos e técnicas que permitem aos computadores melhorar seu desempenho com a experiência, ou, em outras palavras, aprender com dados. Um dos principais tópicos do aprendizado de máquina é o agrupamento de dados que, como o nome sugere, procura agrupar os dados de acordo com sua similaridade. Apesar de sua definição relativamente simples, o agrupamento é uma tarefa computacionalmente complexa, tornando proibitivo o emprego de algoritmos exaustivos, na busca pela solução ótima do problema. A importância do agrupamento de dados, aliada aos seus desafios, faz desse campo um ambiente de intensa pesquisa. Também a classe de fenômenos naturais conhecida como comportamento coletivo tem despertado muito interesse. Isso decorre da observação de um estado organizado e global que surge espontaneamente das interações locais presentes em grandes grupos de indivíduos, caracterizando, pois, o que se chama auto-organização ou emergência, para ser mais preciso. Os desafios intrínsecos e a relevância do tema vêm motivando sua pesquisa em diversos ramos da ciência e da engenharia. Ao mesmo tempo, técnicas baseadas em comportamento coletivo vêm sendo empregadas em tarefas de aprendizado de máquina, mostrando-se promissoras e ganhando bastante atenção. No presente trabalho, objetivou-se o desenvolvimento de técnicas de agrupamento baseadas em comportamento coletivo. Faz-se cada item do conjunto de dados corresponder a um indivíduo, definem-se as leis de interação local, e então os indivíduos são colocados a interagir entre si, de modo que os padrões que surgem reflitam os padrões originalmente presentes no conjunto de dados. Abordagens baseadas em dinâmica de troca de energia foram propostas. Os dados permanecem fixos em seu espaço de atributos, mas carregam certa informação a energia , a qual é progressivamente trocada entre eles. Os grupos são estabelecidos entre dados que tomam estados de energia semelhantes. Este trabalho abordou também o aprendizado semissupervisionado, cuja tarefa é rotular dados em bases parcialmente rotuladas. Nesse caso, foi adotada uma abordagem baseada na movimentação dos próprios dados pelo espaço de atributos. Procurou-se, durante todo este trabalho, não apenas propor novas técnicas de aprendizado, mas principalmente, por meio de muitas simulações e ilustrações, mostrar como elas se comportam em diferentes cenários, num esforço em mostrar onde reside a vantagem de se utilizar a dinâmica coletiva na concepção dessas técnicas / Machine learning consists of concepts and techniques that enable computers to improve their performance with experience, i.e., enable computers to learn from data. Data clustering (or just clustering) is one of its main topics, which aims to group data according to their similarities. Regardless of its simple definition, clustering is a complex computational task. Its relevance and challenges make this field an environment of intense research. The class of natural phenomena known as collective behavior has also attracted much interest. This is due to the observation that global patterns may spontaneously arise from local interactions among large groups of individuals, what is know as self-organization (or emergence). The challenges and relevance of the subject are encouraging its research in many branches of science and engineering. At the same time, techniques based on collective behavior are being employed in machine learning tasks, showing to be promising. The objective of the present work was to develop clustering techniques based on collective behavior. Each dataset item corresponds to an individual. Once the local interactions are defined, the individuals begin to interact with each other. It is expected that the patterns arising from these interactions match the patterns originally present in the dataset. Approaches based on dynamics of energy exchange have been proposed. The data are kept fixed in their feature space, but they carry some sort of information (the energy), which is progressively exchanged among them. The groups are established among data that take similar energy states. This work has also addressed the semi-supervised learning task, which aims to label data in partially labeled datasets. In this case, it has been proposed an approach based on the motion of the data themselves around the feature space. More than just providing new machine learning techniques, this research has tried to show how the techniques behave in different scenarios, in an effort to show where lies the advantage of using collective dynamics in the design of such techniques
175

O algoritmo de aprendizado semi-supervisionado co-training e sua aplicação na rotulação de documentos / The semi-supervised learning algorithm co-training applied to label text documents

Matsubara, Edson Takashi 26 May 2004 (has links)
Em Aprendizado de Máquina, a abordagem supervisionada normalmente necessita de um número significativo de exemplos de treinamento para a indução de classificadores precisos. Entretanto, a rotulação de dados é freqüentemente realizada manualmente, o que torna esse processo demorado e caro. Por outro lado, exemplos não-rotulados são facilmente obtidos se comparados a exemplos rotulados. Isso é particularmente verdade para tarefas de classificação de textos que envolvem fontes de dados on-line tais como páginas de internet, email e artigos científicos. A classificação de textos tem grande importância dado o grande volume de textos disponível on-line. Aprendizado semi-supervisionado, uma área de pesquisa relativamente nova em Aprendizado de Máquina, representa a junção do aprendizado supervisionado e não-supervisionado, e tem o potencial de reduzir a necessidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados está disponível. Este trabalho descreve o algoritmo de aprendizado semi-supervisionado co-training, que necessita de duas descrições de cada exemplo. Deve ser observado que as duas descrições necessárias para co-training podem ser facilmente obtidas de documentos textuais por meio de pré-processamento. Neste trabalho, várias extensões do algoritmo co-training foram implementadas. Ainda mais, foi implementado um ambiente computacional para o pré-processamento de textos, denominado PreTexT, com o objetivo de utilizar co-training em problemas de classificação de textos. Os resultados experimentais foram obtidos utilizando três conjuntos de dados. Dois conjuntos de dados estão relacionados com classificação de textos e o outro com classificação de páginas de internet. Os resultados, que variam de excelentes a ruins, mostram que co-training, similarmente a outros algoritmos de aprendizado semi-supervisionado, é afetado de maneira bastante complexa pelos diferentes aspectos na indução dos modelos. / In Machine Learning, the supervised approach usually requires a large number of labeled training examples to learn accurately. However, labeling is often manually performed, making this process costly and time-consuming. By contrast, unlabeled examples are often inexpensive and easier to obtain than labeled examples. This is especially true for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Text classification is of great practical importance today given the massive volume of online text available. Semi-supervised learning, a relatively new area in Machine Learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labeled data whenever only a small set of labeled examples is available. This work describes the semi-supervised learning algorithm co-training, which requires a partitioned description of each example into two distinct views. It should be observed that the two different views required by co-training can be easily obtained from textual documents through pre-processing. In this works, several extensions of co-training algorithm have been implemented. Furthermore, we have also implemented a computational environment for text pre-processing, called PreTexT, in order to apply the co-training algorithm to text classification problems. Experimental results using co-training on three data sets are described. Two data sets are related to text classification and the other one to web-page classification. Results, which range from excellent to poor, show that co-training, similarly to other semi-supervised learning algorithms, is affected by modelling assumptions in a rather complicated way.
176

Analyse harmonique sur graphes dirigés et applications : de l'analyse de Fourier aux ondelettes / Harmonic Analysis on directed graphs and applications : From Fourier analysis to wavelets

Sevi, Harry 22 November 2018 (has links)
La recherche menée dans cette thèse a pour but de développer une analyse harmonique pour des fonctions définies sur les sommets d'un graphe orienté. À l'ère du déluge de données, de nombreuses données sont sous forme de graphes et données sur ce graphe. Afin d'analyser d'exploiter ces données de graphes, nous avons besoin de développer des méthodes mathématiques et numériquement efficientes. Ce développement a conduit à l'émergence d'un nouveau cadre théorique appelé le traitement de signal sur graphe dont le but est d'étendre les concepts fondamentaux du traitement de signal classique aux graphes. Inspirées par l'aspect multi échelle des graphes et données sur graphes, de nombreux constructions multi-échelles ont été proposé. Néanmoins, elles s'appliquent uniquement dans le cadre non orienté. L'extension d'une analyse harmonique sur graphe orienté bien que naturelle, s'avère complexe. Nous proposons donc une analyse harmonique en utilisant l'opérateur de marche aléatoire comme point de départ de notre cadre. Premièrement, nous proposons des bases de type Fourier formées des vecteurs propres de l'opérateur de marche aléatoire. De ces bases de Fourier, nous en déterminons une notion fréquentielle en analysant la variation de ses vecteurs propres. La détermination d'une analyse fréquentielle à partir de la base des vecteurs de l'opérateur de marche aléatoire nous amène aux constructions multi-échelles sur graphes orientés. Plus particulièrement, nous proposons une construction en trames d'ondelettes ainsi qu'une construction d'ondelettes décimées sur graphes orientés. Nous illustrons notre analyse harmonique par divers exemples afin d'en montrer l'efficience et la pertinence. / The research conducted in this thesis aims to develop a harmonic analysis for functions defined on the vertices of an oriented graph. In the era of data deluge, much data is in the form of graphs and data on this graph. In order to analyze and exploit this graph data, we need to develop mathematical and numerically efficient methods. This development has led to the emergence of a new theoretical framework called signal processing on graphs, which aims to extend the fundamental concepts of conventional signal processing to graphs. Inspired by the multi-scale aspect of graphs and graph data, many multi-scale constructions have been proposed. However, they apply only to the non-directed framework. The extension of a harmonic analysis on an oriented graph, although natural, is complex. We, therefore, propose a harmonic analysis using the random walk operator as the starting point for our framework. First, we propose Fourier-type bases formed by the eigenvectors of the random walk operator. From these Fourier bases, we determine a frequency notion by analyzing the variation of its eigenvectors. The determination of a frequency analysis from the basis of the vectors of the random walk operator leads us to multi-scale constructions on oriented graphs. More specifically, we propose a wavelet frame construction as well as a decimated wavelet construction on directed graphs. We illustrate our harmonic analysis with various examples to show its efficiency and relevance.
177

Aprendizado semissupervisionado multidescrição em classificação de textos / Multi-view semi-supervised learning in text classification

Braga, Ígor Assis 23 April 2010 (has links)
Algoritmos de aprendizado semissupervisionado aprendem a partir de uma combinação de dados rotulados e não rotulados. Assim, eles podem ser aplicados em domínios em que poucos exemplos rotulados e uma vasta quantidade de exemplos não rotulados estão disponíveis. Além disso, os algoritmos semissupervisionados podem atingir um desempenho superior aos algoritmos supervisionados treinados nos mesmos poucos exemplos rotulados. Uma poderosa abordagem ao aprendizado semissupervisionado, denominada aprendizado multidescrição, pode ser usada sempre que os exemplos de treinamento são descritos por dois ou mais conjuntos de atributos disjuntos. A classificação de textos é um domínio de aplicação no qual algoritmos semissupervisionados vêm obtendo sucesso. No entanto, o aprendizado semissupervisionado multidescrição ainda não foi bem explorado nesse domínio dadas as diversas maneiras possíveis de se descrever bases de textos. O objetivo neste trabalho é analisar o desempenho de algoritmos semissupervisionados multidescrição na classificação de textos, usando unigramas e bigramas para compor duas descrições distintas de documentos textuais. Assim, é considerado inicialmente o difundido algoritmo multidescrição CO-TRAINING, para o qual são propostas modificações a fim de se tratar o problema dos pontos de contenção. É também proposto o algoritmo COAL, o qual pode melhorar ainda mais o algoritmo CO-TRAINING pela incorporação de aprendizado ativo como uma maneira de tratar pontos de contenção. Uma ampla avaliação experimental desses algoritmos foi conduzida em bases de textos reais. Os resultados mostram que o algoritmo COAL, usando unigramas como uma descrição das bases textuais e bigramas como uma outra descrição, atinge um desempenho significativamente melhor que um algoritmo semissupervisionado monodescrição. Levando em consideração os bons resultados obtidos por COAL, conclui-se que o uso de unigramas e bigramas como duas descrições distintas de bases de textos pode ser bastante compensador / Semi-supervised learning algorithms learn from a combination of both labeled and unlabeled data. Thus, they can be applied in domains where few labeled examples and a vast amount of unlabeled examples are available. Furthermore, semi-supervised learning algorithms may achieve a better performance than supervised learning algorithms trained on the same few labeled examples. A powerful approach to semi-supervised learning, called multi-view learning, can be used whenever the training examples are described by two or more disjoint sets of attributes. Text classification is a domain in which semi-supervised learning algorithms have shown some success. However, multi-view semi-supervised learning has not yet been well explored in this domain despite the possibility of describing textual documents in a myriad of ways. The aim of this work is to analyze the effectiveness of multi-view semi-supervised learning in text classification using unigrams and bigrams as two distinct descriptions of text documents. To this end, we initially consider the widely adopted CO-TRAINING multi-view algorithm and propose some modifications to it in order to deal with the problem of contention points. We also propose the COAL algorithm, which further improves CO-TRAINING by incorporating active learning as a way of dealing with contention points. A thorough experimental evaluation of these algorithms was conducted on real text data sets. The results show that the COAL algorithm, using unigrams as one description of text documents and bigrams as another description, achieves significantly better performance than a single-view semi-supervised algorithm. Taking into account the good results obtained by COAL, we conclude that the use of unigrams and bigrams as two distinct descriptions of text documents can be very effective
178

Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

Täckström, Oscar January 2013 (has links)
Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language.
179

Μελέτη και σχεδίαση συστήματος ανάλυσης εικόνας κατατμημένου σπερματικού DNA με χρήση τεχνικών υπολογιστικής νοημοσύνης / Study and design of an image analysis system for sperm DNA fragmentation using computational intelligence techniques

Αλμπάνη, Ελένη 13 July 2010 (has links)
Ιατρικές έρευνες έχουν δείξει ότι η ανδρική υπογονιμότητα σχετίζεται άμεσα με την ύπαρξη κατατμημένου DNA στον πυρήνα των σπερματοζωαρίων. Οι διαταραχές στις τιμές της συγκέντρωσης σπερματοζωαρίων, της κινητικότητάς τους, του όγκου της εκσπερμάτισης και στη μορφολογία τους που παρατηρούνται σε ένα σπερμοδιάγραμμα έχουν σα βαθύτερο αίτιο την ύπαρξη κατατμημένου DNA. Το εργαστήριο πειραματικής εμβρυολογίας και ιστολογίας της Ιατρικής Αθηνών χρησιμοποιεί τη μέθοδο TUNEL (deoxynucleotidyl transferase-mediated dUTP nick end labeling) για να σηματοδοτήσει τα άκρα κάθε τμήματος του DNA με χρώμα διαφορετικό από αυτό που χρησιμοποιεί για το υπόλοιπο τμήμα του DNA. Αποτέλεσμα της επεξεργασίας που υφίστανται τα σπερματοζωάρια σε μια αντικειμενοφόρο πλάκα είναι ένα σύνολο από μπλε φθορίζοντα σπερματοζωάρια με πιθανό κόκκινο στο πυρήνα τους, στην περίπτωση που υπάρχει κατατμημένο DNA. Όσο μεγαλύτερος είναι ο βαθμός κατάτμησης, τόσο περισσότερο είναι το κόκκινο και τόσο περισσότερο παθολογικό το σπερματοζωάριο και άρα λιγότερο ικανό να γονιμοποιήσει. Τη διαδικασία της TUNEL ακολουθεί η φωτογράφηση της αντικειμενοφόρου πλάκας με κάμερα υψηλής ανάλυσης και μεγάλης ευαισθησίας, ειδική για εφαρμογές φθορισμού. Στη συνέχεια, οι εικόνες επεξεργάζονται με ειδικό λογισμικό, όπως έχει προταθεί στο «Automatic Analysis of TUNEL assay Microscope Images» από τους Kontaxakis et al. στο 2007 IEEE International Symposium on Signal Processing and Information Technology. Το αποτέλεσμα της επεξεργασίας των εικόνων είναι η ταξινόμηση των αντικειμένων που απεικονίζονται σε ομάδες από α) σπερματοζωάρια μονήρη β) επικαλυπτόμενα και γ) «σκουπίδια» όπως λευκοκύτταρα ή θραύσματα σπερματοζωαρίων. Στη συνέχεια για κάθε μονήρες σπερματοζωάριο γίνεται ο υπολογισμός των κόκκινων και μπλε pixels. Κατ’ αυτό τον τρόπο έχουμε ποσοτικοποιημένη την έκταση του κερματισμού κάθε σπερματοζωαρίου. Στόχος της διπλωματικής εργασίας είναι αρχικά η μελέτη και στη συνέχεια η σχεδίαση και υλοποίηση ενός συστήματος, το οποίο λαμβάνοντας υπόψη τα δεδομένα από την επεξεργασία εικόνας καθώς και δεδομένα που είναι γνωστά από το σπερμοδιάγραμμα, όπως η κινητικότητα και η συγκέντρωση των σπερματοζωαριών, χρησιμοποιώντας τεχνικές της υπολογιστικής νοημοσύνης θα εκπαιδεύεται και θα ταξινομεί αυτόματα ασθενείς ανάλογα με το συνολικό βαθμό κερματισμού του DNA τους. Τέλος, θα υπολογίζει και ένα κατώφλι ή μία περιοχή τιμών άνω της οποίας ένας ασθενής θα χαρακτηρίζεται ως στείρος. Απώτερος στόχος είναι να γίνει όλη η παραπάνω διαδικασία ένας έλεγχος ρουτίνας για τα εργαστήρια που ασχολούνται με την ανδρική υπογονιμότητα και την τεχνητή γονιμοποίηση, προφυλάσσοντας ζευγάρια από άσκοπες και επιβλαβείς για την υγεία της γυναίκας προσπάθειες τεχνητής γονιμοποίησης. / Studies have proven that male infertility is directly connected with the existence of fragmented DNA in sperm nucleus Structural disorders and functional abnormalities are often present in spermatozoa from infertile men, as they are the impact of DNA fragmentation. The histology and embryology laboratory in Medical School in Athens uses the TUNEL assay to mark the edges of DNA helix with color different from the rest of the helix. The result of this procedure is that the human spermatozoa are blue and in the interior of every cell, an area proportional to the degree of the cell DNA fragmentation has been stained in reddish color. The more reddish the area is, the more fragmented the DNA is and the more infertile the patient is. The TUNEL assay is followed by image collection using a camera of high sensitivity appropriate for fluorescence applications. Afterwards, the obtained images are processed as described in “Automatic Analysis of TUNEL assay Microscope Images” at IEEE International Symposium on Signal Processing and Information Technology in 2007. The results of the processing above is image segmentation, shapes classification in 3 groups, solitary spermatozoa, overlapped spermatozoa and debris and at last the area measurement of red pixel for each solitary spermatozoon. This way, we have in numbers how much fragmented the DNA is. This master thesis aims at the study and the design of a system, that taking into consideration the data from the image analysis accompanied by the data from the basic sperm analysis, like sperm concentration and motility, and using computational intelligence techniques, it will be trained and will automatically classify the patients according their DNA fragmentation degree. In the end, it will estimate a threshold or an area of values above which a patient will be considered as infertile. Our ultimate goal is the above procedure to be a routine for the labs that are dealing with male infertility and artificial insemination, so that couples are protected against pointless and prejudicial artificial insemination attempts.
180

Classifying Hate Speech using Fine-tuned Language Models

Brorson, Erik January 2018 (has links)
Given the explosion in the size of social media, the amount of hate speech is also growing. To efficiently combat this issue we need reliable and scalable machine learning models. Current solutions rely on crowdsourced datasets that are limited in size, or using training data from self-identified hateful communities, that lacks specificity. In this thesis we introduce a novel semi-supervised modelling strategy. It is first trained on the freely available data from the hateful communities and then fine-tuned to classify hateful tweets from crowdsourced annotated datasets. We show that our model reach state of the art performance with minimal hyper-parameter tuning.

Page generated in 0.117 seconds