281 |
Consciência fonológica: dimensionalidade e precisão de classificação do risco/não risco de dificuldade de leitura e de escritaHenriques, Flávia Guimarães 23 February 2016 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2016-04-12T17:18:39Z
No. of bitstreams: 1
flaviaguimaraeshenriques.pdf: 667239 bytes, checksum: 5b93224cd56ee8b6710aabff2fcc94a4 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2016-04-24T03:31:32Z (GMT) No. of bitstreams: 1
flaviaguimaraeshenriques.pdf: 667239 bytes, checksum: 5b93224cd56ee8b6710aabff2fcc94a4 (MD5) / Made available in DSpace on 2016-04-24T03:31:32Z (GMT). No. of bitstreams: 1
flaviaguimaraeshenriques.pdf: 667239 bytes, checksum: 5b93224cd56ee8b6710aabff2fcc94a4 (MD5)
Previous issue date: 2016-02-23 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Os objetivos do presente estudo foram: 1) realizar uma revisão da literatura dos estudos que investigaram a precisão com que medidas de consciência fonológica (CF) classificam indivíduos como estando em risco ou não estando em risco de apresentar dificuldade de leitura/escrita; 2) avaliar a dimensionalidade da CF em falantes do português brasileiro; e 3) verificar a precisão de classificação do risco/não risco de dificuldade de leitura/escrita de diferentes medidas de CF. Os resultados da revisão da literatura evidenciaram, de uma forma geral, que as diferentes medidas de CF, quando analisadas isoladamente foram ruins ou razoáveis em classificar as crianças como em risco/sem risco de dificuldade de leitura/escrita. Duzentas e treze crianças foram avaliadas através de diferentes tarefas de CF quando estavam no último ano da Educação Infantil e, aproximadamente, um ano depois, foram avaliadas através de uma medida de leitura e uma medida de escrita. Resultados de Análises Fatoriais evidenciaram que as diferentes medidas de CF avaliam um construto predominantemente unidimensional e análises da curva ROC indicaram que duas medidas compostas de CF mostraram-se razoáveis para classificar as crianças como tendo ou não tendo risco de dificuldade de leitura ou de escrita, apresentando áreas sob a curva em torno de 0,75. / The aims of this study were: 1) to present a literature review of studies about the precision of phonological awareness measures to classify individuals as being or not at risk of presenting difficulties in reading or writing; 2) to evaluate the dimensionality of phonological awareness in Brazilian Portuguese speakers; and 3) to verify the classification accuracy of the risk/no risk of difficulty in reading and writing from different measures of phonological awareness. In general, among the reviewed studies, phonological awareness measures varied from poor to reasonable in reading/writing risk classification accuracy. Two hundred and thirteen Brazilian Portuguese speaking children took part on the present study. They were evaluated through different phonological awareness tasks in the last year of early childhood education. Nine months later, they were evaluated through a reading measure and a writing measure. Factorial Analysis results showed that the different phonological awareness measures index a single construct. Concerning the ROC curve analysis, two composite measures of phonological awareness proved reasonable to discriminate children in the groups with and without difficulty in reading/writing, presenting AUCs around 0,75.
|
282 |
Decomposição baseada em modelo de problemas de otimização de projeto utilizando redução de dimensionalidade e redes complexasCardoso, Alexandre Cançado 16 September 2016 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-03-07T15:01:41Z
No. of bitstreams: 1
alexandrecancadocardoso.pdf: 3207141 bytes, checksum: 46de44194b8a9a99093ecb73f332eacd (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-07T15:07:15Z (GMT) No. of bitstreams: 1
alexandrecancadocardoso.pdf: 3207141 bytes, checksum: 46de44194b8a9a99093ecb73f332eacd (MD5) / Made available in DSpace on 2017-03-07T15:07:15Z (GMT). No. of bitstreams: 1
alexandrecancadocardoso.pdf: 3207141 bytes, checksum: 46de44194b8a9a99093ecb73f332eacd (MD5)
Previous issue date: 2016-09-16 / A estratégia de dividir para conquistar é comum a diversos ramos de atuação, indo
do projeto de algoritmos à politica e sociologia. Em engenharia, é utilizada, dentre
outras aplicações, para auxiliar na resolução de problemas de criação de um projeto
(general desing problems) ou de um projeto ótimo (optimal design problems) de sistemas
grandes, complexos ou multidisciplinares. O presente, trabalho apresenta um método para
divisão, decomposição destes problemas em sub-problemas menores a partir de informação
apenas do seu modelo (model-based decomposition). Onde a extração dos padrões de
relação entre as variáveis, funções, simulações e demais elementos do modelo é realizada
através de algoritmos de aprendizado não supervisionado em duas etapas. Primeiramente,
o espaço dimensional é reduzido a fim de ressaltar as relações mais significativas, e
em seguida utiliza-se a técnica de detecção de comunidade oriunda da área de redes
complexas ou técnicas de agrupamento para identificação dos sub-problemas. Por fim,
o método é aplicado a problemas de otimização de projeto encontrados na literatura
de engenharia estrutural e mecânica. Os sub-problemas obtidos são avaliados segundo
critérios comparativos e qualitativos. / The divide and conquer strategy is common to many fields of activity, ranging from
the algorithms design to politics and sociology. In engineering, it is used, among other
applications, to assist in solving general design problems or optimal design problems
of large, complex or multidisciplinary systems. The present work presents a method
for splitting, decomposition of these problems into smaller sub-problems using only
information from its model (model-based decomposition). Where the pattern extraction
of relationships between variables, functions, simulations and other model elements is
performed using unsupervised learning algorithms in two steps. First, the dimensional
space is reduced in order to highlight the most significant relationships, and then we use
the community detection technique coming from complex networks area and clustering
techniques to identify the sub-problems. Finally, the method is applied to design
optimization problems encountered in structural and mechanical engineering literature.
The obtained sub-problems are evaluated against comparative and qualitative criteria.
|
283 |
Méthodes de detection robustes avec apprentissage de dictionnaires. Applications à des données hyperspectrales / Detection tests for worst-case scenarios with optimized dictionaries. Applications to hyperspectral dataRaja Suleiman, Raja Fazliza 16 December 2014 (has links)
Le travail dans cette thèse porte sur le problème de détection «one among many» où l’on doit distinguer entre un bruit sous H0 et une parmi L alternatives connues sous H1. Ce travail se concentre sur l’étude et la mise en œuvre de méthodes de détection robustes de dimension réduite utilisant des dictionnaires optimisés. Ces méthodes de détection sont associées au test de Rapport de Vraisemblance Généralisé. Les approches proposées sont principalement évaluées sur des données hyperspectrales. Dans la première partie, plusieurs sujets techniques associés à cette thèse sont présentés. La deuxième partie met en évidence les aspects théoriques et algorithmiques des méthodes proposées. Deux inconvénients liés à un grand nombre d’alternatives se posent. Dans ce cadre, nous proposons des techniques d’apprentissage de dictionnaire basées sur un critère robuste qui cherche à minimiser la perte de puissance dans le pire des cas (type minimax). Dans le cas où l’on cherche un dictionnaire à K = 1 atome, nous montrons que la solution exacte peut être obtenue. Ensuite, nous proposons dans le cas K > 1 trois algorithmes d’apprentissage minimax. Finalement, la troisième partie de ce manuscrit présente plusieurs applications. L’application principale concerne les données astrophysiques hyperspectrales de l’instrument Multi Unit Spectroscopic Explorer. Les résultats numériques montrent que les méthodes proposées sont robustes et que le cas K > 1 permet d’augmenter les performances de détection minimax par rapport au cas K = 1. D’autres applications possibles telles que l’apprentissage minimax de visages et la reconnaissance de chiffres manuscrits dans le pire cas sont présentées. / This Ph.D dissertation deals with a "one among many" detection problem, where one has to discriminate between pure noise under H0 and one among L known alternatives under H1. This work focuses on the study and implementation of robust reduced dimension detection tests using optimized dictionaries. These detection methods are associated with the Generalized Likelihood Ratio test. The proposed approaches are principally assessed on hyperspectral data. In the first part, several technical topics associated to the framework of this dissertation are presented. The second part highlights the theoretical and algorithmic aspects of the proposed methods. Two issues linked to the large number of alternatives arise in this framework. In this context, we propose dictionary learning techniques based on a robust criterion that seeks to minimize the maximum power loss (type minimax). In the case where the learned dictionary has K = 1 column, we show that the exact solution can be obtained. Then, we propose in the case K > 1 three minimax learning algorithms. Finally, the third part of this manuscript presents several applications. The principal application regards astrophysical hyperspectral data of the Multi Unit Spectroscopic Explorer instrument. Numerical results show that the proposed algorithms are robust and in the case K > 1 they allow to increase the minimax detection performances over the K = 1 case. Other possible applications such as worst-case recognition of faces and handwritten digits are presented.
|
284 |
Etude de la dynamique structurale du domaine de liaison au ligand de RXRα et implication de la phosphorylation dans la transcription / Structural dynamics of the ligand binding domain of RXRα and implication of phosphorylation in transcriptionEberhardt, Jérôme 12 December 2016 (has links)
De nombreuses études révèlent que le domaine de liaison au ligand de RXRα est très dynamique, même en présence d'un ligand agoniste. Nous avons utilisé les données expérimentales (HDX, RMN et X-ray) disponibles sur ce domaine pour mettre en place un protocole, basé sur la dynamique moléculaire accélérée, permettant d'explorer efficacement la dynamique conformationnelle du domaine de liaison au ligand de RXRα et de valider les ensembles conformationnels obtenus. Ce protocole a été appliqué pour analyser l'influence de la phosphorylation pSer260, située à proximité de la surface d'interaction avec les protéines coactivatrice et impliquée dans le développement de carcinomes hépatocellulaires, sur la structure de ce domaine et sa dynamique. Parallèlement, une méthode de réduction de la dimensionnalité a été développé afin d'analyser de longues trajectoires de dynamique moléculaire. Ainsi grâce à cette méthode, nous avons pu identifier plusieurs nouvelles conformations alternative stables du domaine de liaison au ligand de RXRα. / Many studies reveal that the ligand binding domain of RXRα is very dynamic, still even in a presence of an agonist ligand. Therefore, the availability of experimental data (HDX, NMR and X-ray) on the domain was used as a leverage in order to set up a protocol, based on accelerated molecular dynamics, to explore its conformational dynamic and to validate it. This protocol was applied to understand the influence of the pSer260 phosphorylation, closed to the binding surface of coactivator proteins and implied in the hepatocellular carcinoma growth, on its structure and its dynamic. At the same time, a dimensional reduction method was developed to analyse long molecular dynamic trajectories. Thus, with this approach, we identified a couple of new alternative and stable conformations of the ligand binding domain of RXRα.
|
285 |
Transformace dat pomocí evolučních algoritmů / Evolutionary Algorithms for Data TransformationŠvec, Ondřej January 2017 (has links)
In this work, we propose a novel method for a supervised dimensionality reduc- tion, which learns weights of a neural network using an evolutionary algorithm, CMA-ES, optimising the success rate of the k-NN classifier. If no activation func- tions are used in the neural network, the algorithm essentially performs a linear transformation, which can also be used inside of the Mahalanobis distance. There- fore our method can be considered to be a metric learning algorithm. By adding activations to the neural network, the algorithm can learn non-linear transfor- mations as well. We consider reductions to low-dimensional spaces, which are useful for data visualisation, and demonstrate that the resulting projections pro- vide better performance than other dimensionality reduction techniques and also that the visualisations provide better distinctions between the classes in the data thanks to the locality of the k-NN classifier. 1
|
286 |
Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery DiseaseDuan, Haoyang January 2014 (has links)
From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on Single-Nucleotide Polymorphisms (SNPs) from the Ontario Heart Genomics Study (OHGS). First, the thesis explains the k-Nearest Neighbour (k-NN) and Random Forest learning algorithms, and includes a complete proof that k-NN is universally consistent in finite dimensional normed vector spaces. Second, the thesis introduces two dimensionality reduction techniques: Random Projections and a new method termed Mass Transportation Distance (MTD) Feature Selection. Then, this thesis compares the performance of Random Projections with k-NN against MTD Feature Selection and Random Forest for predicting artery disease. Results demonstrate that MTD Feature Selection with Random Forest is superior to Random Projections and k-NN. Random Forest is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.
|
287 |
Data mining in large sets of complex data / Mineração de dados em grande conjuntos de dados complexosRobson Leonardo Ferreira Cordeiro 29 August 2011 (has links)
Due to the increasing amount and complexity of the data stored in the enterprises\' databases, the task of knowledge discovery is nowadays vital to support strategic decisions. However, the mining techniques used in the process usually have high computational costs that come from the need to explore several alternative solutions, in different combinations, to obtain the desired knowledge. The most common mining tasks include data classification, labeling and clustering, outlier detection and missing data prediction. Traditionally, the data are represented by numerical or categorical attributes in a table that describes one element in each tuple. Although the same tasks applied to traditional data are also necessary for more complex data, such as images, graphs, audio and long texts, the complexity and the computational costs associated to handling large amounts of these complex data increase considerably, making most of the existing techniques impractical. Therefore, especial data mining techniques for this kind of data need to be developed. This Ph.D. work focuses on the development of new data mining techniques for large sets of complex data, especially for the task of clustering, tightly associated to other data mining tasks that are performed together. Specifically, this Doctoral dissertation presents three novel, fast and scalable data mining algorithms well-suited to analyze large sets of complex data: the method Halite for correlation clustering; the method BoW for clustering Terabyte-scale datasets; and the method QMAS for labeling and summarization. Our algorithms were evaluated on real, very large datasets with up to billions of complex elements, and they always presented highly accurate results, being at least one order of magnitude faster than the fastest related works in almost all cases. The real data used come from the following applications: automatic breast cancer diagnosis, satellite imagery analysis, and graph mining on a large web graph crawled by Yahoo! and also on the graph with all users and their connections from the Twitter social network. Such results indicate that our algorithms allow the development of real time applications that, potentially, could not be developed without this Ph.D. work, like a software to aid on the fly the diagnosis process in a worldwide Healthcare Information System, or a system to look for deforestation within the Amazon Rainforest in real time / O crescimento em quantidade e complexidade dos dados armazenados nas organizações torna a extração de conhecimento utilizando técnicas de mineração uma tarefa ao mesmo tempo fundamental para aproveitar bem esses dados na tomada de decisões estratégicas e de alto custo computacional. O custo vem da necessidade de se explorar uma grande quantidade de casos de estudo, em diferentes combinações, para se obter o conhecimento desejado. Tradicionalmente, os dados a explorar são representados como atributos numéricos ou categóricos em uma tabela, que descreve em cada tupla um caso de teste do conjunto sob análise. Embora as mesmas tarefas desenvolvidas para dados tradicionais sejam também necessárias para dados mais complexos, como imagens, grafos, áudio e textos longos, a complexidade das análises e o custo computacional envolvidos aumentam significativamente, inviabilizando a maioria das técnicas de análise atuais quando aplicadas a grandes quantidades desses dados complexos. Assim, técnicas de mineração especiais devem ser desenvolvidas. Este Trabalho de Doutorado visa a criação de novas técnicas de mineração para grandes bases de dados complexos. Especificamente, foram desenvolvidas duas novas técnicas de agrupamento e uma nova técnica de rotulação e sumarização que são rápidas, escaláveis e bem adequadas à análise de grandes bases de dados complexos. As técnicas propostas foram avaliadas para a análise de bases de dados reais, em escala de Terabytes de dados, contendo até bilhões de objetos complexos, e elas sempre apresentaram resultados de alta qualidade, sendo em quase todos os casos pelo menos uma ordem de magnitude mais rápidas do que os trabalhos relacionados mais eficientes. Os dados reais utilizados vêm das seguintes aplicações: diagnóstico automático de câncer de mama, análise de imagens de satélites, e mineração de grafos aplicada a um grande grafo da web coletado pelo Yahoo! e também a um grafo com todos os usuários da rede social Twitter e suas conexões. Tais resultados indicam que nossos algoritmos permitem a criação de aplicações em tempo real que, potencialmente, não poderiam ser desenvolvidas sem a existência deste Trabalho de Doutorado, como por exemplo, um sistema em escala global para o auxílio ao diagnóstico médico em tempo real, ou um sistema para a busca por áreas de desmatamento na Floresta Amazônica em tempo real
|
288 |
Action Recognition in Still Images and Inference of Object AffordancesGirish, Deeptha S. 15 October 2020 (has links)
No description available.
|
289 |
Strojové učení v klasifikaci obrazu / Machine Learning in Image ClassificationKrál, Jiří January 2011 (has links)
This project deals vith analysis and testing of algorithms and statistical models, that could potentionaly improve resuts of FIT BUT in ImageNet Large Scale Visual Recognition Challenge and TRECVID. Multinomial model was tested. Phonotactic Intersession Variation Compensation (PIVCO) model was used for reducing random e ffects in image representation and for dimensionality reduction. PIVCO - dimensionality reduction achieved the best mean average precision while reducing to one-twenyth of original dimension. KPCA model was tested to approximate Kernel SVM. All statistical models were tested on Pascal VOC 2007 dataset.
|
290 |
Algorithme de chemin de régularisation pour l'apprentissage statistique / Regularization path algorithm for statistical learningZapién Arreola, Karina 09 July 2009 (has links)
La sélection d’un modèle approprié est l’une des tâches essentielles de l’apprentissage statistique. En général, pour une tâche d’apprentissage donnée, on considère plusieurs classes de modèles ordonnées selon un certain ordre de « complexité». Dans ce cadre, le processus de sélection de modèle revient `a trouver la « complexité » optimale, permettant d’estimer un modèle assurant une bonne généralisation. Ce problème de sélection de modèle se résume à l’estimation d’un ou plusieurs hyper-paramètres définissant la complexité du modèle, par opposition aux paramètres qui permettent de spécifier le modèle dans la classe de complexité choisie. L’approche habituelle pour déterminer ces hyper-paramètres consiste à utiliser une « grille ». On se donne un ensemble de valeurs possibles et on estime, pour chacune de ces valeurs, l’erreur de généralisation du meilleur modèle. On s’intéresse, dans cette thèse, à une approche alternative consistant à calculer l’ensemble des solutions possibles pour toutes les valeurs des hyper-paramètres. C’est ce qu’on appelle le chemin de régularisation. Il se trouve que pour les problèmes d’apprentissage qui nous intéressent, des programmes quadratiques paramétriques, on montre que le chemin de régularisation associé à certains hyper-paramètres est linéaire par morceaux et que son calcul a une complexité numérique de l’ordre d’un multiple entier de la complexité de calcul d’un modèle avec un seul jeu hyper-paramètres. La thèse est organisée en trois parties. La première donne le cadre général des problèmes d’apprentissage de type SVM (Séparateurs à Vaste Marge ou Support Vector Machines) ainsi que les outils théoriques et algorithmiques permettant d’appréhender ce problème. La deuxième partie traite du problème d’apprentissage supervisé pour la classification et l’ordonnancement dans le cadre des SVM. On montre que le chemin de régularisation de ces problèmes est linéaire par morceaux. Ce résultat nous permet de développer des algorithmes originaux de discrimination et d’ordonnancement. La troisième partie aborde successivement les problèmes d’apprentissage semi supervisé et non supervisé. Pour l’apprentissage semi supervisé, nous introduisons un critère de parcimonie et proposons l’algorithme de chemin de régularisation associé. En ce qui concerne l’apprentissage non supervisé nous utilisons une approche de type « réduction de dimension ». Contrairement aux méthodes à base de graphes de similarité qui utilisent un nombre fixe de voisins, nous introduisons une nouvelle méthode permettant un choix adaptatif et approprié du nombre de voisins. / The selection of a proper model is an essential task in statistical learning. In general, for a given learning task, a set of parameters has to be chosen, each parameter corresponds to a different degree of “complexity”. In this situation, the model selection procedure becomes a search for the optimal “complexity”, allowing us to estimate a model that assures a good generalization. This model selection problem can be summarized as the calculation of one or more hyperparameters defining the model complexity in contrast to the parameters that allow to specify a model in the chosen complexity class. The usual approach to determine these parameters is to use a “grid search”. Given a set of possible values, the generalization error for the best model is estimated for each of these values. This thesis is focused in an alternative approach consisting in calculating the complete set of possible solution for all hyperparameter values. This is what is called the regularization path. It can be shown that for the problems we are interested in, parametric quadratic programming (PQP), the corresponding regularization path is piece wise linear. Moreover, its calculation is no more complex than calculating a single PQP solution. This thesis is organized in three chapters, the first one introduces the general setting of a learning problem under the Support Vector Machines’ (SVM) framework together with the theory and algorithms that allow us to find a solution. The second part deals with supervised learning problems for classification and ranking using the SVM framework. It is shown that the regularization path of these problems is piecewise linear and alternative proofs to the one of Rosset [Ross 07b] are given via the subdifferential. These results lead to the corresponding algorithms to solve the mentioned supervised problems. The third part deals with semi-supervised learning problems followed by unsupervised learning problems. For the semi-supervised learning a sparsity constraint is introduced along with the corresponding regularization path algorithm. Graph-based dimensionality reduction methods are used for unsupervised learning problems. Our main contribution is a novel algorithm that allows to choose the number of nearest neighbors in an adaptive and appropriate way contrary to classical approaches based on a fix number of neighbors.
|
Page generated in 0.0808 seconds