• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 188
  • 56
  • 24
  • 10
  • 9
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 383
  • 232
  • 87
  • 73
  • 70
  • 66
  • 48
  • 46
  • 46
  • 40
  • 39
  • 37
  • 35
  • 34
  • 31
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
311

Neue Indexingverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen Datenmengen

Guhlemann, Steffen 08 April 2016 (has links)
Ein zunehmend wichtiges Thema in der Informatik ist der Umgang mit Ähnlichkeit in einer großen Anzahl unterschiedlicher Domänen. Derzeit existiert keine universell verwendbare Infrastruktur für die Ähnlichkeitssuche in allgemeinen metrischen Räumen. Ziel der Arbeit ist es, die Grundlage für eine derartige Infrastruktur zu legen, die in klassische Datenbankmanagementsysteme integriert werden könnte. Im Rahmen einer Analyse des State of the Art wird der M-Baum als am besten geeignete Basisstruktur identifiziert. Dieser wird anschließend zum EM-Baum erweitert, wobei strukturelle Kompatibilität mit dem M-Baum erhalten wird. Die Abfragealgorithmen werden im Hinblick auf eine Minimierung notwendiger Distanzberechnungen optimiert. Aufbauend auf einer mathematischen Analyse der Beziehung zwischen Baumstruktur und Abfrageaufwand werden Freiheitsgrade in Baumänderungsalgorithmen genutzt, um Bäume so zu konstruieren, dass Ähnlichkeitsanfragen mit einer minimalen Anzahl an Anfrageoperationen beantwortet werden können. / A topic of growing importance in computer science is the handling of similarity in multiple heterogenous domains. Currently there is no common infrastructure to support this for the general metric space. The goal of this work is lay the foundation for such an infrastructure, which could be integrated into classical data base management systems. After some analysis of the state of the art the M-Tree is identified as most suitable base and enhanced in multiple ways to the EM-Tree retaining structural compatibility. The query algorithms are optimized to reduce the number of necessary distance calculations. On the basis of a mathematical analysis of the relation between the tree structure and the query performance degrees of freedom in the tree edit algorithms are used to build trees optimized for answering similarity queries using a minimal number of distance calculations.
312

馬庫塞的「單向度」概念研究

吳建騁 Unknown Date (has links)
在1960年為《理性與革命》寫的序言中,馬庫塞就曾提過所謂的「既定事實的力量」(the authority of established fact),他認為「這是一種壓迫的力量」。 馬庫塞在其著作當中已多次提及所謂「既定的現實」(the established reality),所意指的不外乎是身處於先進工業社會中的個體受到既定體制的壓迫(oppression)與制約,換言之,先進工業社會充滿著非解放的因素。 「現今,自由(freedom)與奴役(servitude)的結合變得『理所當然』(“natural”),它已成為進步的一種手段(a vehicle of progress)。」此一論點馬庫塞在《愛欲與文明》一書中已提出,並在《單向度的人》一書中更充分地開展。 《單向度的人》的主要論點在於整體社會的意識受制於後期資本主義社會的結構,致使人們的思維無法覺醒,「現實即是合理」已經成為社會的正常看法,也就是大多數人皆依循此種模式過生活,而不去變革現狀(the status quo)。因為這一種消費與生產已形成人們的固定思維模式,即單向度的思維,由於人們不再反省現實社會的單向度現象,因此人們無法從單向度的現象中解放出來,大多數人仍處於不自覺的狀態,受到操控,社會中的成員共同架構出壓迫性的思維方式(oppressive mode of thinking),「人的意識受制於社會的存在」,人心加重現實的趨勢致使「單向度」(one-dimensionality)的態勢逐步成形。
313

Reconnaissance des actions humaines à partir d'une séquence vidéo

Touati, Redha 12 1900 (has links)
The work done in this master's thesis, presents a new system for the recognition of human actions from a video sequence. The system uses, as input, a video sequence taken by a static camera. A binary segmentation method of the the video sequence is first achieved, by a learning algorithm, in order to detect and extract the different people from the background. To recognize an action, the system then exploits a set of prototypes generated from an MDS-based dimensionality reduction technique, from two different points of view in the video sequence. This dimensionality reduction technique, according to two different viewpoints, allows us to model each human action of the training base with a set of prototypes (supposed to be similar for each class) represented in a low dimensional non-linear space. The prototypes, extracted according to the two viewpoints, are fed to a $K$-NN classifier which allows us to identify the human action that takes place in the video sequence. The experiments of our model conducted on the Weizmann dataset of human actions provide interesting results compared to the other state-of-the art (and often more complicated) methods. These experiments show first the sensitivity of our model for each viewpoint and its effectiveness to recognize the different actions, with a variable but satisfactory recognition rate and also the results obtained by the fusion of these two points of view, which allows us to achieve a high performance recognition rate. / Le travail mené dans le cadre de ce projet de maîtrise vise à présenter un nouveau système de reconnaissance d’actions humaines à partir d'une séquence d'images vidéo. Le système utilise en entrée une séquence vidéo prise par une caméra statique. Une méthode de segmentation binaire est d'abord effectuée, grâce à un algorithme d’apprentissage, afin de détecter les différentes personnes de l'arrière-plan. Afin de reconnaitre une action, le système exploite ensuite un ensemble de prototypes générés, par une technique de réduction de dimensionnalité MDS, à partir de deux points de vue différents dans la séquence d'images. Cette étape de réduction de dimensionnalité, selon deux points de vue différents, permet de modéliser chaque action de la base d'apprentissage par un ensemble de prototypes (censé être relativement similaire pour chaque classe) représentés dans un espace de faible dimension non linéaire. Les prototypes extraits selon les deux points de vue sont amenés à un classifieur K-ppv qui permet de reconnaitre l'action qui se déroule dans la séquence vidéo. Les expérimentations de ce système sur la base d’actions humaines de Wiezmann procurent des résultats assez intéressants comparés à d’autres méthodes plus complexes. Ces expériences montrent d'une part, la sensibilité du système pour chaque point de vue et son efficacité à reconnaitre les différentes actions, avec un taux de reconnaissance variable mais satisfaisant, ainsi que les résultats obtenus par la fusion de ces deux points de vue, qui permet l'obtention de taux de reconnaissance très performant.
314

Étude et conception d'un système automatisé de contrôle d'aspect des pièces optiques basé sur des techniques connexionnistes / Investigation and design of an automatic system for optical devices' defects detection and diagnosis based on connexionist approach

Voiry, Matthieu 15 July 2008 (has links)
Dans différents domaines industriels, la problématique du diagnostic prend une place importante. Ainsi, le contrôle d’aspect des composants optiques est une étape incontournable pour garantir leurs performances opérationnelles. La méthode conventionnelle de contrôle par un opérateur humain souffre de limitations importantes qui deviennent insurmontables pour certaines optiques hautes performances. Dans ce contexte, cette thèse traite de la conception d’un système automatique capable d’assurer le contrôle d’aspect. Premièrement, une étude des capteurs pouvant être mis en oeuvre par ce système est menée. Afin de satisfaire à des contraintes de temps de contrôle, la solution proposée utilise deux capteurs travaillant à des échelles différentes. Un de ces capteurs est basé sur la microscopie Nomarski ; nous présentons ce capteur ainsi qu’un ensemble de méthodes de traitement de l’image qui permettent, à partir des données fournies par celui-ci, de détecter les défauts et de déterminer la rugosité, de manière robuste et répétable. L’élaboration d’un prototype opérationnel, capable de contrôler des pièces optiques de taille limitée valide ces différentes techniques. Par ailleurs, le diagnostic des composants optiques nécessite une phase de classification. En effet, si les défauts permanents sont détectés, il en est de même pour de nombreux « faux » défauts (poussières, traces de nettoyage. . . ). Ce problème complexe est traité par un réseau de neurones artificiels de type MLP tirant partie d’une description invariante des défauts. Cette description, issue de la transformée de Fourier-Mellin est d’une dimension élevée qui peut poser des problèmes liés au « fléau de la dimension ». Afin de limiter ces effets néfastes, différentes techniques de réduction de dimension (Self Organizing Map, Curvilinear Component Analysis et Curvilinear Distance Analysis) sont étudiées. On montre d’une part que les techniques CCA et CDA sont plus performantes que SOM en termes de qualité de projection, et d’autre part qu’elles permettent d’utiliser des classifieurs de taille plus modeste, à performances égales. Enfin, un réseau de neurones modulaire utilisant des modèles locaux est proposé. Nous développons une nouvelle approche de décomposition des problèmes de classification, fondée sur le concept de dimension intrinsèque. Les groupes de données de dimensionnalité homogène obtenus ont un sens physique et permettent de réduire considérablement la phase d’apprentissage du classifieur tout en améliorant ses performances en généralisation / In various industrial fields, the problem of diagnosis is of great interest. For example, the check of surface imperfections on an optical device is necessary to guarantee its operational performances. The conventional control method, based on human expert visual inspection, suffers from limitations, which become critical for some high-performances components. In this context, this thesis deals with the design of an automatic system, able to carry out the diagnosis of appearance flaws. To fulfil the time constraints, the suggested solution uses two sensors working on different scales. We present one of them based on Normarski microscopy, and the image processing methods which allow, starting from issued data, to detect the defects and to determine roughness in a reliable way. The development of an operational prototype, able to check small optical components, validates the proposed techniques. The final diagnosis also requires a classification phase. Indeed, if the permanent defects are detected, many “false” defects (dust, cleaning marks. . . ) are emphasized as well. This complex problem is solved by a MLP Artificial Neural Network using an invariant description of the defects. This representation, resulting from the Fourier-Mellin transform, is a high dimensional vector, what implies some problems linked to the “curse of dimensionality”. In order to limit these harmful effects, various dimensionality reduction techniques (Self Organizing Map, Curvilinear Component Analysis and Curvilinear Distance Analysis) are investigated. On one hand we show that CCA and CDA are more powerful than SOM in terms of projection quality. On the other hand, these methods allow using more simple classifiers with equal performances. Finally, a modular neural network, which exploits local models, is developed. We proposed a new classification problems decomposition scheme, based on the intrinsic dimension concept. The obtained data clusters of homogeneous dimensionality have a physical meaning and permit to reduce significantly the training phase of the classifier, while improving its generalization performances
315

Towards on-line domain-independent big data learning : novel theories and applications

Malik, Zeeshan January 2015 (has links)
Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets.
316

Triangular similarity metric learning : A siamese architecture approach / Apprentissage métrique de similarité triangulaire : Une approche d'architecture siamois

Zheng, Lilei 10 May 2016 (has links)
Dans de nombreux problèmes d’apprentissage automatique et de reconnaissance des formes, il y a toujours un besoin de fonctions métriques appropriées pour mesurer la distance ou la similarité entre des données. La fonction métrique est une fonction qui définit une distance ou une similarité entre chaque paire d’éléments d’un ensemble de données. Dans cette thèse, nous proposons une nouvelle methode, Triangular Similarity Metric Learning (TSML), pour spécifier une fonction métrique de données automatiquement. Le système TSML proposée repose une architecture Siamese qui se compose de deux sous-systèmes identiques partageant le même ensemble de paramètres. Chaque sous-système traite un seul échantillon de données et donc le système entier reçoit une paire de données en entrée. Le système TSML comprend une fonction de coût qui définit la relation entre chaque paire de données et une fonction de projection permettant l’apprentissage des formes de haut niveau. Pour la fonction de coût, nous proposons d’abord la similarité triangulaire (Triangular Similarity), une nouvelle similarité métrique qui équivaut à la similarité cosinus. Sur la base d’une version simplifiée de la similarité triangulaire, nous proposons la fonction triangulaire (the triangular loss) afin d’effectuer l’apprentissage de métrique, en augmentant la similarité entre deux vecteurs dans la même classe et en diminuant la similarité entre deux vecteurs de classes différentes. Par rapport aux autres distances ou similarités, la fonction triangulaire et sa fonction gradient nous offrent naturellement une interprétation géométrique intuitive et intéressante qui explicite l’objectif d’apprentissage de métrique. En ce qui concerne la fonction de projection, nous présentons trois fonctions différentes: une projection linéaire qui est réalisée par une matrice simple, une projection non-linéaire qui est réalisée par Multi-layer Perceptrons (MLP) et une projection non-linéaire profonde qui est réalisée par Convolutional Neural Networks (CNN). Avec ces fonctions de projection, nous proposons trois systèmes de TSML pour plusieurs applications: la vérification par paires, l’identification d’objet, la réduction de la dimensionnalité et la visualisation de données. Pour chaque application, nous présentons des expérimentations détaillées sur des ensembles de données de référence afin de démontrer l’efficacité de notre systèmes de TSML. / In many machine learning and pattern recognition tasks, there is always a need for appropriate metric functions to measure pairwise distance or similarity between data, where a metric function is a function that defines a distance or similarity between each pair of elements of a set. In this thesis, we propose Triangular Similarity Metric Learning (TSML) for automatically specifying a metric from data. A TSML system is loaded in a siamese architecture which consists of two identical sub-systems sharing the same set of parameters. Each sub-system processes a single data sample and thus the whole system receives a pair of data as the input. The TSML system includes a cost function parameterizing the pairwise relationship between data and a mapping function allowing the system to learn high-level features from the training data. In terms of the cost function, we first propose the Triangular Similarity, a novel similarity metric which is equivalent to the well-known Cosine Similarity in measuring a data pair. Based on a simplified version of the Triangular Similarity, we further develop the triangular loss function in order to perform metric learning, i.e. to increase the similarity between two vectors in the same class and to decrease the similarity between two vectors of different classes. Compared with other distance or similarity metrics, the triangular loss and its gradient naturally offer us an intuitive and interesting geometrical interpretation of the metric learning objective. In terms of the mapping function, we introduce three different options: a linear mapping realized by a simple transformation matrix, a nonlinear mapping realized by Multi-layer Perceptrons (MLP) and a deep nonlinear mapping realized by Convolutional Neural Networks (CNN). With these mapping functions, we present three different TSML systems for various applications, namely, pairwise verification, object identification, dimensionality reduction and data visualization. For each application, we carry out extensive experiments on popular benchmarks and datasets to demonstrate the effectiveness of the proposed systems.
317

Coexistence onde de densité de charge / Supraconductivité dans TTF[Ni(dmit)2]2 / Charge Density Wave / Superconductivity coexistence in TTF[Ni(dmit)2]2

Kaddour, Wafa 11 October 2013 (has links)
Au cours de cette thèse, nous étudions la compétition entre les états onde de densité de charge (ODC) et supraconducteur dans le composé multi-bandes à deux chaines TTF[Ni(dmit)2]2. Nous avons réalisé des mesures de résistivité, de pouvoir thermoélectrique et de conductivité thermique sous pression hydrostatique jusqu’aux basses températures. A basse pression, deux ondes de densité de charge détectées par mesures de rayons X et observées par nos mesures de résistivité transverse sont associées aux bandes 1D LUMO et HOMOI de la chaine Ni(dmit)2. A 12kbar, la fusion de ces deux instabilités est associée à l’emboîtement des bandes LUMO avec les bandes HOMOI à travers le point Γ de la première zone de Brillouin. A 18kbar et sur un intervalle de 5-6kbar, on observe un pic de commensurabilité attribué au vecteur d’emboîtement commensurable 2kF = 1/3b*.La supraconductivité est observée à partir de 2kbar et jusqu’à 22kbar avec une température critique 0.6K qui augmente avec la pression et est corrélée à la variation des températures de transition ODC. La supraconductivité est associée à la bande 2D HOMOII du Ni(dmit)2. Les mesures de champ magnétique critique ont permis de donner des informations sur l’évolution de la texture de coexistence métal- ODC en fonction de la pression. Elles mettent aussi en évidence une supraconductivité non conventionnelle avec des nœuds dans le gap.Ces résultats nous ont permis de revisiter le diagramme de phase Température-Pression de ce composé qui s'est révélé beaucoup plus riche que ce qui avait été rapporté jusqu’à maintenant. / In this thesis, we studied the competition between charge density wave (CDW) and superconductivity in the 1D multiband compound TTF[Ni(dmit)2]2. Resistivity, thermoelectric power and thermal conductivity measurements have been performed under pressure and down to very low temperatures.At low pressure, two charge density waves, detected by X ray measurements and observed in our transverse resistivity measurements, are associated to the 1D bands LUMO and HOMO of the Ni(dmit)2 chains. At 12kbar, the fusion of those two CDWs is associated to the nesting of the LUMO bands with the HOMO ones through the Γ point of the Brillouin zone. At 18 and around 5-6kbar, we observe a commensurability pic due to the commensurate wave vector 2kF=1/3b*.From 2 to 22kbar, we observe a superconductive transition at a critical temperature 0.6K which grow with pressure and is correlated to critical temperature of CDW transition. The superconductivity is associated to the 2D band HOMOII of the Ni(dmit)2 chains. Measurements of the upper critical field gave information about the evolution with pressure of metal-charge density wave regions. They bring out the unconventional nature of the superconductivity and the presence of node in the gap.Thanks to those results, we obtained a more rich Temperature – Pressure phase diagram of the TTF[Ni(dmit)2]2 than the existing one.
318

Sobre coleções e aspectos de centralidade em dados multidimensionais / On collections and centrality aspects of multidimensional data

Oliveira, Douglas Cedrim 14 June 2016 (has links)
A análise de dados multidimensionais tem sido por muitos anos tópico de contínua investigação e uma das razões se deve ao fato desse tipo de dados ser encontrado em diversas áreas da ciência. Uma tarefa comum ao se analisar esse tipo de dados é a investigação de padrões pela interação em projeções multidimensionais dos dados para o espaço visual. O entendimento da relação entre as características do conjunto de dados (dataset) e a técnica utilizada para se obter uma representação visual desse dataset é de fundamental importância uma vez que esse entendimento pode fornecer uma melhor intuição a respeito do que se esperar da projeção. Por isso motivado, no presente trabalho investiga-se alguns aspectos de centralidade dos dados em dois cenários distintos: coleções de documentos com grafos de coautoria; dados multidimensionais mais gerais. No primeiro cenário, o dado multidimensional que representa os documentos possui informações mais específicas, o que possibilita a combinação de diferentes aspectos para analisá-los de forma sumarizada, bem como a noção de centralidade e relevância dentro da coleção. Isso é levado em consideração para propor uma metáfora visual combinada que possibilite a exploração de toda a coleção, bem como de documentos individuais. No segundo cenário, de dados multidimensionais gerais, assume-se que tais informações não estão disponíveis. Ainda assim, utilizando um conceito de estatística não-paramétrica, deno- minado funções de profundidade de dados (data-depth functions), é feita a avaliação da ação de técnicas de projeção multidimensionais sobre os dados, possibilitando entender como suas medidas de profundidade (centralidade) foram alteradas ao longo do processo, definindo uma também medida de qualidade para projeções. / Analysis of multidimensional data has been for many years a topic of continuous research and one of the reasons is such kind of data can be found on several different areas of science. A common task analyzing such data is to investigate patterns by interacting with spatializations of the data onto the visual space. Understanding the relation between underlying dataset characteristics and the technique used to provide a visual representation of such dataset is of fundamental importance since it can provide a better intuition on what to expect from the spatialization. Motivated by this, in this work we investigate some aspects of centrality on the data in two different scenarios: document collection with co-authorship graphs; general multidimensional data. In the first scenario, the multidimensional data which encodes the documents is much more information specific, meaning it makes possible to combine different aspects such as a summarized analysis, as well as the centrality and relevance notions among the documents in the collection. In order to propose a combined visual metaphor, this is taken into account make possible the visual exploration of the whole document collection as well as individual document analysis. In the second case, of general multidimensional data, there is an assumption that such additional information is not available. Nevertheless, using the concept of data-depth functions from non-parametric statistics it is analyzed the action of multidimensional projection techniques on the data, during the projection process, in order to make possible to understand how depth measures computed in the data have been modified along the process, which also defines a quality measure for multidimensional projections.
319

Emprego de técnicas de análise exploratória de dados utilizados em Química Medicinal / Use of different techniques for exploratory data analysis in Medicinal Chemistry

Gertrudes, Jadson Castro 10 September 2013 (has links)
Pesquisas na área de Química Medicinal têm direcionado esforços na busca por métodos que acelerem o processo de descoberta de novos medicamentos. Dentre as diversas etapas relacionadas ao longo do processo de descoberta de substâncias bioativas está a análise das relações entre a estrutura química e a atividade biológica de compostos. Neste processo, os pesquisadores da área de Química Medicinal analisam conjuntos de dados que são caracterizados pela alta dimensionalidade e baixo número de observações. Dentro desse contexto, o presente trabalho apresenta uma abordagem computacional que visa contribuir para a análise de dados químicos e, consequentemente, a descoberta de novos medicamentos para o tratamento de doenças crônicas. As abordagens de análise exploratória de dados, utilizadas neste trabalho, combinam técnicas de redução de dimensionalidade e de agrupamento para detecção de estruturas naturais que reflitam a atividade biológica dos compostos analisados. Dentre as diversas técnicas existentes para a redução de dimensionalidade, são discutidas o escore de Fisher, a análise de componentes principais e a análise de componentes principais esparsas. Quanto aos algoritmos de aprendizado, são avaliados o k-médias, fuzzy c-médias e modelo de misturas ICA aperfeiçoado. No desenvolvimento deste trabalho foram utilizados quatro conjuntos de dados, contendo informações de substâncias bioativas, sendo que dois conjuntos foram relacionados ao tratamento da diabetes mellitus e da síndrome metabólica, o terceiro conjunto relacionado a doenças cardiovasculares e o último conjunto apresenta substâncias que podem ser utilizadas no tratamento do câncer. Nos experimentos realizados, os resultados alcançados sugerem a utilização das técnicas de redução de dimensionalidade juntamente com os algoritmos não supervisionados para a tarefa de agrupamento dos dados químicos, uma vez que nesses experimentos foi possível descrever níveis de atividade biológica dos compostos estudados. Portanto, é possível concluir que as técnicas de redução de dimensionalidade e de agrupamento podem possivelmente ser utilizadas como guias no processo de descoberta e desenvolvimento de novos compostos na área de Química Medicinal. / Researches in Medicinal Chemistry\'s area have focused on the search of methods that accelerate the process of drug discovery. Among several steps related to the process of discovery of bioactive substances there is the analysis of the relationships between chemical structure and biological activity of compounds. In this process, researchers of medicinal chemistry analyze data sets that are characterized by high dimensionality and small number of observations. Within this context, this work presents a computational approach that aims to contribute to the analysis of chemical data and, consequently, the discovery of new drugs for the treatment of chronic diseases. Approaches used in exploratory data analysis, employed in this work, combine techniques of dimensionality reduction and clustering for detecting natural structures that reflect the biological activity of the analyzed compounds. Among several existing techniques for dimensionality reduction, we have focused the Fisher\'s score, principal component analysis and sparse principal component analysis. For the clustering procedure, this study evaluated k-means, fuzzy c-means and enhanced ICA mixture model. In order to perform experiments, we used four data sets, containing information of bioactive substances. Two sets are related to the treatment of diabetes mellitus and metabolic syndrome, the third set is related to cardiovascular disease and the latter set has substances that can be used in cancer treatment. In the experiments, the obtained results suggest the use of dimensionality reduction techniques along with clustering algorithms for the task of clustering chemical data, since from these experiments, it was possible to describe different levels of biological activity of the studied compounds. Therefore, we conclude that the techniques of dimensionality reduction and clustering can be used as guides in the process of discovery and development of new compounds in the field of Medicinal Chemistry
320

Sobre coleções e aspectos de centralidade em dados multidimensionais / On collections and centrality aspects of multidimensional data

Douglas Cedrim Oliveira 14 June 2016 (has links)
A análise de dados multidimensionais tem sido por muitos anos tópico de contínua investigação e uma das razões se deve ao fato desse tipo de dados ser encontrado em diversas áreas da ciência. Uma tarefa comum ao se analisar esse tipo de dados é a investigação de padrões pela interação em projeções multidimensionais dos dados para o espaço visual. O entendimento da relação entre as características do conjunto de dados (dataset) e a técnica utilizada para se obter uma representação visual desse dataset é de fundamental importância uma vez que esse entendimento pode fornecer uma melhor intuição a respeito do que se esperar da projeção. Por isso motivado, no presente trabalho investiga-se alguns aspectos de centralidade dos dados em dois cenários distintos: coleções de documentos com grafos de coautoria; dados multidimensionais mais gerais. No primeiro cenário, o dado multidimensional que representa os documentos possui informações mais específicas, o que possibilita a combinação de diferentes aspectos para analisá-los de forma sumarizada, bem como a noção de centralidade e relevância dentro da coleção. Isso é levado em consideração para propor uma metáfora visual combinada que possibilite a exploração de toda a coleção, bem como de documentos individuais. No segundo cenário, de dados multidimensionais gerais, assume-se que tais informações não estão disponíveis. Ainda assim, utilizando um conceito de estatística não-paramétrica, deno- minado funções de profundidade de dados (data-depth functions), é feita a avaliação da ação de técnicas de projeção multidimensionais sobre os dados, possibilitando entender como suas medidas de profundidade (centralidade) foram alteradas ao longo do processo, definindo uma também medida de qualidade para projeções. / Analysis of multidimensional data has been for many years a topic of continuous research and one of the reasons is such kind of data can be found on several different areas of science. A common task analyzing such data is to investigate patterns by interacting with spatializations of the data onto the visual space. Understanding the relation between underlying dataset characteristics and the technique used to provide a visual representation of such dataset is of fundamental importance since it can provide a better intuition on what to expect from the spatialization. Motivated by this, in this work we investigate some aspects of centrality on the data in two different scenarios: document collection with co-authorship graphs; general multidimensional data. In the first scenario, the multidimensional data which encodes the documents is much more information specific, meaning it makes possible to combine different aspects such as a summarized analysis, as well as the centrality and relevance notions among the documents in the collection. In order to propose a combined visual metaphor, this is taken into account make possible the visual exploration of the whole document collection as well as individual document analysis. In the second case, of general multidimensional data, there is an assumption that such additional information is not available. Nevertheless, using the concept of data-depth functions from non-parametric statistics it is analyzed the action of multidimensional projection techniques on the data, during the projection process, in order to make possible to understand how depth measures computed in the data have been modified along the process, which also defines a quality measure for multidimensional projections.

Page generated in 0.085 seconds