Global ETD Search

481	Novel document representations based on labels and sequential information Kim, Seungyeon 21 September 2015 (has links) A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major challenges: sparsity and sequentiality. The sparsity often causes high estimation error, and text's sequential nature, interdependency between words, causes even more complication. This thesis presents novel document representations to overcome the two challenges. First, I employ label characteristics to estimate a compact document representation. Because label attributes implicitly describe the geometry of dense subspace that has substantial impact, I can effectively resolve the sparsity issue while only focusing the compact subspace. Second, while modeling a document as a joint or conditional distribution between words and their sequential information, I can efficiently reflect sequential nature of text in my document representations. Lastly, the thesis is concluded with a document representation that employs both labels and sequential information in a unified formulation. The following four criteria are utilized to evaluate the goodness of representations: how close a representation is to its original data, how strongly a representation can be distinguished from each other, how easy to interpret a representation by a human, and how much computational effort is needed for a representation. While pursuing those good representation criteria, I was able to obtain document representations that are closer to the original data, stronger in discrimination, and easier to be understood than traditional document representations. Efficient computation algorithms make the proposed approaches largely scalable. This thesis examines emotion prediction, temporal emotion analysis, modeling documents with edit histories, locally coherent topic modeling, and text categorization tasks for possible applications. Representation learning Topic modeling Supervised learning Sequential document modeling Sentiment analysis Mood analysis Matrix factorization Machine learning Artificial intelligence
482	Answering complex questions : supervised approaches Sadid-Al-Hasan, Sheikh, University of Lethbridge. Faculty of Arts and Science January 2009 (has links) The term “Google” has become a verb for most of us. Search engines, however, have certain limitations. For example ask it for the impact of the current global financial crisis in different parts of the world, and you can expect to sift through thousands of results for the answer. This motivates the research in complex question answering where the purpose is to create summaries of large volumes of information as answers to complex questions, rather than simply offering a listing of sources. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, this task is accomplished by the query-focused multidocument summarization systems. In this thesis we apply different supervised learning techniques to confront the complex question answering problem. To run our experiments, we consider the DUC-2007 main task. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element (BE) overlap, syntactic similarity measure, semantic similarity measure and Extended String Subsequence Kernel (ESSK). The representative supervised methods we use are Support Vector Machines (SVM), Conditional Random Fields (CRF), Hidden Markov Models (HMM) and Maximum Entropy (MaxEnt). We annotate DUC-2006 data and use them to train our systems, whereas 25 topics of DUC-2007 data set are used as test data. The evaluation results reveal the impact of automatic labeling methods on the performance of the supervised approaches to complex question answering. We also experiment with two ensemble-based approaches that show promising results for this problem domain. / x, 108 leaves : ill. ; 29 cm Supervised learning (Machine learning) Semantic computing Computational linguistics Information retrieval Dissertations, Academic
483	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.
484	Active Learning : an unbiased approach Ribeiro de Mello, Carlos Eduardo 04 June 2013 (has links) (PDF) Active Learning arises as an important issue in several supervised learning scenarios where obtaining data is cheap, but labeling is costly. In general, this consists in a query strategy, a greedy heuristic based on some selection criterion, which searches for the potentially most informative observations to be labeled in order to form a training set. A query strategy is therefore a biased sampling procedure since it systematically favors some observations by generating biased training sets, instead of making independent and identically distributed draws. The main hypothesis of this thesis lies in the reduction of the bias inherited from the selection criterion. The general proposal consists in reducing the bias by selecting the minimal training set from which the estimated probability distribution is as close as possible to the underlying distribution of overall observations. For that, a novel general active learning query strategy has been developed using an Information-Theoretic framework. Several experiments have been performed in order to evaluate the performance of the proposed strategy. The obtained results confirm the hypothesis about the bias, showing that the proposal outperforms the baselines in different datasets. [SPI:OTHER] Engineering Sciences/Other [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Active learning Supervised learning Sampling
485	Integrating Field and Remotely Sensed Data for Assessment of Coral Reef and Seagrass Habitats Chris Roelfsema Unknown Date (has links) Coral reef habitats are being threatened by global warming, natural disasters and the increased pressure of the global population. These habitats are in urgent need of efficient monitoring and management programs to sustain their biological, economic and cultural values for the global community. Habitats maps, describing the extent, composition and the condition of the benthos in time and space, form a valuable information source for scientists and managers to answer their management questions. Adequate and accurate habitat maps are needed and can be provided by a range of mapping approaches, which are based on integration of field and remotely sensed image data sets. Scientists, technicians and managers lack knowledge on the cost effectiveness and procedures for calibrating and validating mapping approaches that integratef field data and remote sensing imagery, for use in various coral reef and seagrass environments. This knowledge is required to adequately design, apply and assess operational mapping approaches and their maps. Hence, the aim of this study is to improve habitat mapping capabilities by integrating low cost remote sensing approaches and field-calibration and -validation methods for a range of coral reef and seagrass environments. To achieve this aim, commonly used habitat mapping approaches that integrated field-calibration and -validation methods with remote sensing image based processing techniques were studied, in different coral reef and seagrass environments in Fiji and Australia. These environments varied in: water clarity, water depth, benthic composition, spatial complexity of benthic features, and remoteness. The study had three objectives: (1) to evaluate the accuracy, cost and perceived relevance of eight commonly used benthic cover mapping approaches for three different coral reef environments. (2) Conduct a cost-benefit comparison of two field survey methods for calibrating and validating maps of coral reef benthos derived from high-spatial resolution satellite images in three different coral reef environments. (3) Identify considerations for comparing the thematic accuracy of multi-use image based habitat maps in various coral reef and seagrass environments. A scientific assessment and an evaluation of the relevance for managers, was conducted on eight commonly used habitat mapping approaches for three different coral reef environments. This analysis revealed a preference for a mapping approach based on supervised classification of Quickbird imagery integrated with basic field data. This approach produced an accurate map within a short time with low cost in that suited the user’s purpose. Additionally, the results indicated that user preference in selecting a suitable map was affected by: variations in environmental complexity; map purpose, and resource management requirements. To assess the variation in performance of methods for calibration and validation for coral reef benthic community maps, derived from high-spatial resolution satellite images, a comparison was conducted between spot check and georeferenced photo-transect based mapping approaches. The assessment found that the transect based method was a robust procedure which could be used in a range of coral reef environments to map the benthic communities accurately. In contrast, the spot check method is a fast and low cost approach suitable to map benthic communities which have lower spatial complexity. However, the spot check approach provides robust results, if it is applied in a standardised manner, providing a description of selected homogenous areas with georeferenced benthic cover photos. Considerations for comparing the thematic accuracy of multi-use image based habitat maps in various coral reef and seagrass environments were assessed. This included a review of 80 scientific publications on coral reef and seagrass habitat mapping, which revealed a lack of knowledge and reporting in regards to the assessment of the thematic map accuracy. These publications commonly used thematic accuracy measures and factors controlling their variation were then determined for various habitat mapping approaches for different coral reefs and seagrass environments. Assessment of these measures found that variations in accuracy levels were not only a result of actual differences in map accuracy, but were also due to: spatial complexity of benthic features present in the study area; distribution of the calibration and validation samples relative to each other, and the level of detail provided by these samples. Two main outcomes resulted from this dissertation. The first was the development of a robust mapping approach based on georeferenced photo-transect method integrated with high spatial resolution imagery, which is able to accurately map a variety of coral reef and seagrass habitats. The second outcome is an increase in capacity for coral reef and seagrass habitat mapping by scientists and managers. This increase is accomplished by providing knowledge on various habitat mapping approaches in regards to their: cost/time, accuracy and user relevance; performance of calibration and validation field methods; and performance of accuracy measures, when applied in a range of coral reef and seagrass environments. The findings and outcomes from this dissertation will significantly contribute to management of coral reef and seagrass environments by enabling scientists and managers to choose appropriate combinations of: field and image data sources; processing approaches, and validation methods for habitat mapping in these environments. 05 Environmental Sciences coral reefs Seagrasses Habitat Mapping Supervised classification remote sensing calibration validation Cost Benefit Analysis Quickbird Management
486	Sketch Classification with Neural Networks : A Comparative Study of CNN and RNN on the Quick, Draw! data set Andersson, Melanie, Maja, Arvola, Hedar, Sara January 2018 (has links) The aim of the study is to apply and compare the performance of two different types of neural networks on the Quick, Draw! dataset and from this determine whether interpreting the sketches as sequences gives a higher accuracy than interpreting them as pixels. The two types of networks constructed were a recurrent neural network (RNN) and a convolutional neural network (CNN). The networks were optimised and the final architectures included five layers. The final evaluation accuracy achieved was 94.2% and 92.3% respectively, leading to the conclusion that the sequential interpretation of the Quick, Draw! dataset is favourable. Artificial intelligence CNN image classification machine learning neural networks RNN sketch classification supervised learning Quick Draw! Engineering and Technology Teknik och teknologier
487	Improving search results with machine learning : Classifying multi-source data with supervised machine learning to improve search results Stakovska, Meri January 2018 (has links) Sony’s Support Application team wanted an experiment to be conducted by which they could determine if it was suitable to use Machine Learning to improve the quantity and quality of search results of the in-application search tool. By improving the quantity and quality of the results the team wanted to improve the customer’s journey. A supervised machine learning model was created to classify articles into four categories; Wi-Fi & Connectivity, Apps & Settings, System & Performance, andBattery Power & Charging. The same model was used to create a service that categorized the search terms into one of the four categories. The classified articles and the classified search terms were used to complement the existing search tool. The baseline for the experiment was the result of the search tool without classification. The results of the experiment show that the number of articles did indeed increase but due mainly to the broadness of the categories the search results held low quality. Searcher Frustration Information Retrieval Search Results Topic Classification Machine Learning Supervised Classification Naive Bayes Computer Sciences Datavetenskap (datalogi)
488	Oriented filters for feature extraction in digital Images : Application to corners detection, Contours evaluation and color Steganalysis / Filtres orientés pour l'extraction de primitives dans les images : Application à la détection de coins, l'évaluation de contours, et à la stéganalyse d'images couleur Abdulrahman, Hasan 17 November 2017 (has links) L’interprétation du contenu de l’image est un objectif très important dans le traitement de l’image et la vision par ordinateur. Par conséquent, plusieurs chercheurs y sont intéressés. Une image contient des informations multiples qui peuvent être étudiés, telles que la couleur, les formes, les arêtes, les angles, la taille et l’orientation. En outre, les contours contiennent les structures les plus importantes de l’image. Afin d’extraire les caractéristiques du contour d’un objet, nous devons détecter les bords de cet objet. La détection de bords est un point clé dans plusieurs applications, telles que :la restauration, l’amélioration de l’image, la stéganographie, le filigrane, la récupération, la reconnaissance et la compression de l’image, etc. Toutefois, l’évaluation de la performance de la méthode de détection de bords reste un grand défi. Les images numériques sont parfois modifiées par une procédure légale ou illégale afin d’envoyer des données secrètes ou spéciales. Afin d’être moins visibles, la plupart des méthodes stéganographiques modifient les valeurs de pixels dans les bords/textures de parties de l’image. Par conséquent, il est important de détecter la présence de données cachées dans les images numériques. Cette thèse est divisée principalement en deux parties.La première partie discute l’évaluation des méthodes de détection des bords du filtrage, des contours et des angles. En effet, cinq contributions sont présentées dans cette partie : d’abord, nous avons proposé un nouveau plan de surveillance normalisée de mesure de la qualité. En second lieu, nous avons proposé une nouvelle technique pour évaluer les méthodes de détection des bords de filtrage impliquant le score minimal des mesures considérées. En plus, nous avons construit une nouvelle vérité terrain de la carte de bords étiquetée d’une manière semi-automatique pour des images réelles.En troisième lieu, nous avons proposé une nouvelle mesure prenant en compte les distances de faux points positifs pour évaluer un détecteur de bords d’une manière objective. Enfin, nous avons proposé une nouvelle approche de détection de bords qui combine la dérivée directionnelle et l’homogénéité des grains. Notre approche proposée est plus stable et robuste au bruit que dix autres méthodes célèbres de détection. La seconde partie discute la stéganalyse de l’image en couleurs, basée sur l’apprentissage automatique (machine learning). En effet, trois contributions sont présentées dans cette partie : d’abord, nous avons proposé une nouvelle méthode de stéganalyse de l’image en couleurs, basée sur l’extraction de caractéristiques de couleurs à partir de corrélations entre les gradients de canaux rouge, vert et bleu. En fait, ces caractéristiques donnent le cosinus des angles entre les gradients. En second lieu, nous avons proposé une nouvelle méthode de stéganalyse de l’image en couleurs, basée sur des mesures géométriques obtenues par le sinus et le cosinus des angles de gradients entre tous les canaux de couleurs. Enfin, nous avons proposé une nouvelle méthode de stéganalyse de l’image en couleurs, basée sur une banque de filtres gaussiens orientables. Toutes les trois méthodes proposées présentent des résultats intéressants et prometteur en devançant l’état de l’art de la stéganalyse en couleurs. / Interpretation of image contents is very important objective in image processing and computer vision. Wherefore, it has received much attention of researchers. An image contains a lot of information which can be studied such as color, shapes, edges, corners, size, and orientation. Moreover, contours include the most important structures in the image. In order to extract features contour of an object, we must detect the edges of that object. Edge detection results, remains a key point and very important step in wide range of applications such as: image restoration, enhancement, steganography, watermarking, image retrieval, recognition, compression, and etc. An efficient boundary detection method should create a contour image containing edges at their correct locations with a minimum of misclassified pixels. However, the performance evaluationof the edge detection results is still a challenging problem. The digital images are sometimes modify by a legal or illegal data in order to send special or secret data. These changes modify slight coefficient values of the image. In order to be less visible, most of the steganography methods modify the pixel values in the edge/texture image areas. Therefore, it is important to detect the presence of hidden data in digital images. This thesis is divided mainly into two main parts. The first part, deals with filtering edge detection, contours evaluation and corners detection methods. More deeply, there are five contributions are presented in this part: first, proposed a new normalized supervised edge map quality measure. The strategy to normalize the evaluation enables to consider a score close to 0 as a good edge map, whereas a score 1 translates a poor segmentation. Second, proposed a new technique to evaluate filtering edge detection methods involving the minimum score of the considerate measures. Moreover, build a new ground truth edge map labelled in semi-automatic way in real images. Third, proposed a new measure takes into account the distances of false positive points to evaluate an edge detector in an objective way. Finally, proposed a new approach for corner detection based on the combination of directional derivative and homogeneity kernels. The proposed approach remains more stable and robust to noise than ten famous corner detection methods. The second part, deals with color image steganalysis, based on a machine learning classification. More deeply, there are three contributionsare presented in this part: first, proposed a new color image steganalysis method based on extract color features from correlations between the gradients of red, green and blue channels. Since these features give the cosine of angles between gradients. Second, proposed a new color steganalysis method based on geometric measures obtained by the sine and cosine of gradient angles between all the color channels. Finally, proposed a new approach for color image steganalysisbased on steerable Gaussian filters Bank.All the three proposed methods in this part, provide interesting and promising results by outperforming the state-of-art color image steganalysis. Filtres orientés Détection de contours Détection des coins Évaluation supervisée Stéganographie Stéganalyse Oriented filters Edge detection Corners detection Supervised evaluation Steganography Steganalysis
489	Estágio supervisionado de Ciências Biológicas : aproximações entre o legal e o real / Sposito, Neusa Elisa Carignato. January 2009 (has links) Orientador: Ana Maria de Andrade Caldeira / Banca: Álvaro Lorencini Júnior / Banca: Nelson Antonio Pirola / Banca: Celi Rodrigues Chaves Dominguez / Banca: Renato Eugênio da Silva Diniz / Resumo: Este estudo trata do estágio supervisionado realizado por dezenove licenciandos do Curso de Licenciatura de Ciências Biológicas, da UNESP - Bauru, no segundo semestre de 2006 em três diferentes Escolas de Educação Básica, públicas. O objetivo da pesquis foi verificar se a efetivação do estágio supervisionado poderia ocorrer em atendimento às determinações das Diretrizes Curriculares Nacionais para a Formação de Professores (DCNs) no que se refere à parceria entre as instituições formadoras: a Universidade e a Escola de Educação Básica. Essa legislação propõe um acordo preliminar entre as diferentes instituiçõe de ensino e, em decorrência desse acordo, a elaboração de um projeto de estágio com ações de mútua colaboração entre elas. Há necessidade de se pesquisar sobre esse assunto, em especial na Licenciatura em Ciências Biológicas, diante do pouco que foi publicado até agora, em virtude de a lei estar em fase de implementação, por exemplo. A presente pesquisa teve três etapas: a preliminar (momento da apresentação da pesquisadora às duas diretoras e uma vice-diretora das três escolas de educação básica do estágio supervisionado propriamente dito, envolvendo os licenciandos, as professoras, os alunos das escolas públicas, as diretoras, vice-diretora e coordenador pedagógico) e a final (período de encerramento, com uma avaliação escrita feita pelos licenciados sobre o estágio realizado). Trata-se de pesquisa qualitativa que se assemelha ao estudo de caso e está fundamentada em uma das modalidades da investigação qualitativa apresentada por Flick (2004), que se refere à construção e à compreensão de textos a partir da experiência realizada, ou seja, da pesquisa. Os instrumentos de coleta de dados foram: gravações, relatórios, anotações, questionários. Além das DCNs, os autores que sustentam esse trabalho são:... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: This study treats about the supervised student teaching practiced by nineteen majoring students of the Major Biological Sciences Course of UNESP - Bauru, on the second semester of 2006 in three different Schools of Basic Education, public ones. The objective of the research was to verify if the effectuation of the supervised student teaching could occur according to the determinations of the National Curricular Directives for the Graduation of Teachers (DNCs) on what is referred to the partnership between graduating institutions: the University and the School of Basic Education. This legislation proposes a preliminary agreement between the different teaching institutions and, as result of this agreement, the elaboration of a student teaching project with cooperative actions between them. There is a research need about this subject, especially on Majoring on Biological Sciences, in front of what was published so far, in the fact of the law being under implementation, for example. The present research had three stages: the preliminary (moment of the introduction of the researcher to the two principals and one vice-principal of three schools of basic education with supervised student teaching centered on the partnership between institutions), intermediary (stage of carrying out the supervised student teaching, involving the majoring students, teachers, students of publi schools, principals, vice-principal and pedagogical counselor) and final (period of conclusion, with a writing evaluation made by the majoring students about the student teaching accomplished). It is a qualitative research similar to the case study and is based upon one of the forms of qualitative investigation presented by Flick (2004), which refers to the text construction and comprehension starting from the accomplished experience, which means, the research. The instruments of data collection were: records, reports... (Complete abstract click electronic access below) / Doutor Ciência. Supervised student teaching. eng Teaching practice. eng Majoring on Biological Sciences. eng
490	Traitement des dossiers refusés dans le processus d'octroi de crédit aux particuliers. / Reject inference in the process for granting credit. Guizani, Asma 19 March 2014 (has links) Le credit scoring est généralement considéré comme une méthode d’évaluation du niveau du risque associé à un dossier de crédit potentiel. Cette méthode implique l'utilisation de différentes techniques statistiques pour aboutir à un modèle de scoring basé sur les caractéristiques du client.Le modèle de scoring estime le risque de crédit en prévoyant la solvabilité du demandeur de crédit. Les institutions financières utilisent ce modèle pour estimer la probabilité de défaut qui va être utilisée pour affecter chaque client à la catégorie qui lui correspond le mieux: bon payeur ou mauvais payeur. Les seules données disponibles pour construire le modèle de scoring sont les dossiers acceptés dont la variable à prédire est connue. Ce modèle ne tient pas compte des demandeurs de crédit rejetés dès le départ ce qui implique qu'on ne pourra pas estimer leurs probabilités de défaut, ce qui engendre un biais de sélection causé par la non-représentativité de l'échantillon. Nous essayons dans ce travail en utilisant l'inférence des refusés de remédier à ce biais, par la réintégration des dossiers refusés dans le processus d'octroi de crédit. Nous utilisons et comparons différentes méthodes de traitement des refusés classiques et semi supervisées, nous adaptons certaines à notre problème et montrons sur un jeu de données réel, en utilisant les courbes ROC confirmé par simulation, que les méthodes semi-supervisé donnent de bons résultats qui sont meilleurs que ceux des méthodes classiques. / Credit scoring is generally considered as a method of evaluation of a risk associated with a potential loan applicant. This method involves the use of different statistical techniques to determine a scoring model. Like any statistical model, scoring model is based on historical data to help predict the creditworthiness of applicants. Financial institutions use this model to assign each applicant to the appropriate category : Good payer or Bad payer. The only data used to build the scoring model are related to the accepted applicants in which the predicted variable is known. The method has the drawback of not estimating the probability of default for refused applicants which means that the results are biased when the model is build on only the accepted data set. We try, in this work using the reject inference, to solve the problem of selection bias, by reintegrate reject applicants in the process of granting credit. We use and compare different methods of reject inference, classical methods and semi supervised methods, we adapt some of them to our problem and show, on a real dataset, using ROC curves, that the semi-supervised methods give good results and are better than classical methods. We confirmed our results by simulation. Crédit scoring Méthodes d'inférence des refusés Credit scoring Semi supervised learning methods Reject inference methods

Search results