• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 187
  • 56
  • 24
  • 9
  • 9
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 381
  • 230
  • 87
  • 73
  • 69
  • 66
  • 48
  • 46
  • 46
  • 40
  • 38
  • 37
  • 35
  • 34
  • 31
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Acquiring symbolic design optimization problem reformulation knowledge: On computable relationships between design syntax and semantics

Sarkar, Somwrita January 2009 (has links)
Doctor of Philosophy (PhD) / This thesis presents a computational method for the inductive inference of explicit and implicit semantic design knowledge from the symbolic-mathematical syntax of design formulations using an unsupervised pattern recognition and extraction approach. Existing research shows that AI / machine learning based design computation approaches either require high levels of knowledge engineering or large training databases to acquire problem reformulation knowledge. The method presented in this thesis addresses these methodological limitations. The thesis develops, tests, and evaluates ways in which the method may be employed for design problem reformulation. The method is based on the linear algebra based factorization method Singular Value Decomposition (SVD), dimensionality reduction and similarity measurement through unsupervised clustering. The method calculates linear approximations of the associative patterns of symbol cooccurrences in a design problem representation to infer induced coupling strengths between variables, constraints and system components. Unsupervised clustering of these approximations is used to identify useful reformulations. These two components of the method automate a range of reformulation tasks that have traditionally required different solution algorithms. Example reformulation tasks that it performs include selection of linked design variables, parameters and constraints, design decomposition, modularity and integrative systems analysis, heuristically aiding design “case” identification, topology modeling and layout planning. The relationship between the syntax of design representation and the encoded semantic meaning is an open design theory research question. Based on the results of the method, the thesis presents a set of theoretical postulates on computable relationships between design syntax and semantics. The postulates relate the performance of the method with empirical findings and theoretical insights provided by cognitive neuroscience and cognitive science on how the human mind engages in symbol processing and the resulting capacities inherent in symbolic representational systems to encode “meaning”. The performance of the method suggests that semantic “meaning” is a higher order, global phenomenon that lies distributed in the design representation in explicit and implicit ways. A one-to-one local mapping between a design symbol and its meaning, a largely prevalent approach adopted by many AI and learning algorithms, may not be sufficient to capture and represent this meaning. By changing the theoretical standpoint on how a “symbol” is defined in design representations, it was possible to use a simple set of mathematical ideas to perform unsupervised inductive inference of knowledge in a knowledge-lean and training-lean manner, for a knowledge domain that traditionally relies on “giving” the system complex design domain and task knowledge for performing the same set of tasks.
202

Acquiring symbolic design optimization problem reformulation knowledge: On computable relationships between design syntax and semantics

Sarkar, Somwrita January 2009 (has links)
Doctor of Philosophy (PhD) / This thesis presents a computational method for the inductive inference of explicit and implicit semantic design knowledge from the symbolic-mathematical syntax of design formulations using an unsupervised pattern recognition and extraction approach. Existing research shows that AI / machine learning based design computation approaches either require high levels of knowledge engineering or large training databases to acquire problem reformulation knowledge. The method presented in this thesis addresses these methodological limitations. The thesis develops, tests, and evaluates ways in which the method may be employed for design problem reformulation. The method is based on the linear algebra based factorization method Singular Value Decomposition (SVD), dimensionality reduction and similarity measurement through unsupervised clustering. The method calculates linear approximations of the associative patterns of symbol cooccurrences in a design problem representation to infer induced coupling strengths between variables, constraints and system components. Unsupervised clustering of these approximations is used to identify useful reformulations. These two components of the method automate a range of reformulation tasks that have traditionally required different solution algorithms. Example reformulation tasks that it performs include selection of linked design variables, parameters and constraints, design decomposition, modularity and integrative systems analysis, heuristically aiding design “case” identification, topology modeling and layout planning. The relationship between the syntax of design representation and the encoded semantic meaning is an open design theory research question. Based on the results of the method, the thesis presents a set of theoretical postulates on computable relationships between design syntax and semantics. The postulates relate the performance of the method with empirical findings and theoretical insights provided by cognitive neuroscience and cognitive science on how the human mind engages in symbol processing and the resulting capacities inherent in symbolic representational systems to encode “meaning”. The performance of the method suggests that semantic “meaning” is a higher order, global phenomenon that lies distributed in the design representation in explicit and implicit ways. A one-to-one local mapping between a design symbol and its meaning, a largely prevalent approach adopted by many AI and learning algorithms, may not be sufficient to capture and represent this meaning. By changing the theoretical standpoint on how a “symbol” is defined in design representations, it was possible to use a simple set of mathematical ideas to perform unsupervised inductive inference of knowledge in a knowledge-lean and training-lean manner, for a knowledge domain that traditionally relies on “giving” the system complex design domain and task knowledge for performing the same set of tasks.
203

Τμηματοποίηση εικόνων υφής με χρήση πολυφασματικής ανάλυσης και ελάττωσης διαστάσεων

Θεοδωρακόπουλος, Ηλίας 16 June 2010 (has links)
Τμηματοποίηση υφής ονομάζεται η διαδικασία του διαμερισμού μίας εικόνας σε πολλαπλά τμήματα-περιοχές, με κριτήριο την υφή κάθε περιοχής. Η διαδικασία αυτή βρίσκει πολλές εφαρμογές στους τομείς της υπολογιστικής όρασης, της ανάκτησης εικόνων, της ρομποτικής, της ανάλυσης δορυφορικών εικόνων κλπ. Αντικείμενο της παρούσης εργασίας είναι να διερευνηθεί η ικανότητα των αλγορίθμων μη γραμμικής ελάττωσης διάστασης, και ιδιαίτερα του αλγορίθμου Laplacian Eigenmaps, να παράγει μία αποδοτική αναπαράσταση των δεδομένων που προέρχονται από πολυφασματική ανάλυση εικόνων με χρήση φίλτρων Gabor, για την επίλυση του προβλήματος της τμηματοποίησης εικόνων υφής. Για το σκοπό αυτό προτείνεται μία νέα μέθοδος επιβλεπόμενης τμηματοποίησης υφής, που αξιοποιεί μία χαμηλής διάστασης αναπαράσταση των χαρακτηριστικών διανυσμάτων, και γνωστούς αλγόριθμους ομαδοποίησης δεδομένων όπως οι Fuzzy C-means και K-means, για την παραγωγή της τελικής τμηματοποίησης. Η αποτελεσματικότητα της μεθόδου συγκρίνεται με παρόμοιες μεθόδους που έχουν προταθεί στη βιβλιογραφία, και χρησιμοποιούν την αρχική , υψηλών διαστάσεων, αναπαράσταση των χαρακτηριστικών διανυσμάτων. Τα πειράματα διενεργήθηκαν χρησιμοποιώντας την βάση εικόνων υφής Brodatz. Κατά το στάδιο αξιολόγησης της μεθόδου, χρησιμοποιήθηκε ο δείκτης Rand index σαν μέτρο ομοιότητας ανάμεσα σε κάθε παραγόμενη τμηματοποίηση και την αντίστοιχη ground-truth τμηματοποίηση. / Texture segmentation is the process of partitioning an image into multiple segments (regions) based on their texture, with many applications in the area of computer vision, image retrieval, robotics, satellite imagery etc. The objective of this thesis is to investigate the ability of non-linear dimensionality reduction algorithms, and especially of LE algorithm, to produce an efficient representation for data derived from multi-spectral image analysis using Gabor filters, in solving the texture segmentation problem. For this purpose, we introduce a new supervised texture segmentation algorithm, which exploits a low-dimensional representation of feature vectors and well known clustering methods, such as Fuzzy C-means and K-means, to produce the final segmentation. The effectiveness of this method was compared to that of similar methods proposed in the literature, which use the initial high-dimensional representation of feature vectors. Experiments were performed on Brodatz texture database. During evaluation stage, Rand index has been used as a similarity measure between each segmentation and the corresponding ground-truth segmentation.
204

Mesure de durée de vie de porteurs minoritaires dans les structures semiconductrices de basse dimensionnalité / Measurement of the lifetime and diffusion length of minority charge carriers in low dimensionality materials

Daanoune, Mehdi 03 February 2015 (has links)
La durée de vie des porteurs minoritaires est l'un des principaux paramètres mesurés dans les semi-conducteurs et la décroissance de photoconductivité (PCD) l'une des méthodes les plus largement utilisées pour ce type de mesure. Aujourd'hui, grâce aux divers équipements automatisés, la mesure de durée de vie est devenue une caractérisation de routine qui permet de juger de la qualité d'un matériau dans tous les secteurs utilisant les semi-conducteurs. Cependant, l'utilisation de micro- et nano-matériaux dans l'industrie du photovoltaïque et de la microélectronique requière l'adaptation des techniques existantes (PCD, photoluminescence etc.). En effet, avec la réduction des dimensions (couches ultraminces telles que les couches épitaxiées, couches SOI « silicon on insulator », et nanostructures), l'influence de la surface (états d'interfaces, pièges, etc.) devient prépondérante. La présence des substrats utilisés pour les croissances ou report de couche de ces différentes structures perturbe également les mesures. Ceci rend difficile l'adaptation des méthodes de mesure de durée de vie classiques comme, par exemple, le déclin de photoconductivité. Au cours de cette thèse nous nous sommes attachés à adapter des techniques de caractérisation de durée de vie à des matériaux de faibles dimensions. Nous avons tout d'abord caractérisé des échantillons massifs et des couches épitaxiées d'une épaisseur de l'ordre de la dizaine de micromètres. Nous avons proposé une technique qui consiste à déterminer simultanément la durée de vie en volume et la vitesse de recombinaison en surface des porteurs minoritaires dans d'une couche épitaxiée, à partir de la mesure de l'intensité de photoluminescence. La méthode développée consiste à calculer le rapport de l'intensité de photoluminescence (RPL) mesurée à différentes longueurs d'onde et pour différentes puissances d'excitation. Ces rapports RPL expérimentaux sont ensuite comparés aux rapports RPL simulés, ce qui permet d'évaluer la vitesse de recombinaison en surface et le temps de vie en volume. Nous avons ensuite étudié des couches semi-conductrices ultraminces de l'ordre de la centaine de nanomètres dans des structures de type SOI (silicon on insulator). Après un rappel des méthodes de fabrication et de quelques-unes des utilisations, nous avons analysé les méthodes électriques existantes permettant de déterminer la qualité des substrats SOI. Cela nous a amené à proposer une nouvelle méthode de caractérisation apportant des solutions aux limitations de ces techniques. Cette méthode se base sur une mesure courant-tension sous obscurité et sous éclairement en configuration PSEUDO-MOSFET où le substrat de la structure SOI sert de grille du transistor et deux pointes déposées sur le film de silicium servent de source et drain. Nous avons appliqué cette nouvelle méthode de caractérisation de la durée de vie des porteurs de charge à un substrat SOI et avec l'aide de la simulation numérique, nous avons pu expliquer les phénomènes de recombinaison aux interfaces et extraire les paramètres associés. Enfin, la dernière partie de ce travail de thèse concerne l'étude des nanofils pour des applications photovoltaïques. Dans les nanofils, le rapport surface sur volume augmente considérablement ce qui entraîne une diminution de la durée de vie effective due à l'augmentation de l'influence des surfaces. Le fonctionnement des cellules solaires à base de nanofils que nous avons étudiées est très dépendant de la qualité des interfaces. Nous avons analysé ces cellules grâce à la méthode RRT (« Reverse Recovery Transient ») basée sur la proportionnalité qui existe entre la quantité de charges stockées dans les régions neutres des jonctions pn polarisées et la durée de vie des porteurs minoritaires. Ce type de structure étant assez complexe, nous avons utilisé des simulations numériques pour analyser les phénomènes de recombinaison au sein de la cellule solaire et extraire les densités de défauts aux interfaces. / The minority carrier lifetime is one of the main parameters used to analyse the semiconductors quality and photoconductivity decay (PCD) is one of the most widely used lifetime characterization method. Thanks to the variety of automated equipment that has developed, lifetime measurement has become a routine technique to assess the quality of semiconductors. However, the micro and nano materials used in the photovoltaic and microelectronics industry require an adaptation of the existing methods (PCD, photoluminescence etc.). Indeed, with reduced dimensions (epitaxial layers, SOI “Silicon on Insulator”, nanostructures and nanowires), the influence of the surface (interface states density, traps, etc.) becomes predominant. The presence of the substrates used for the material growth or for the layer transfer can also influence the measures. Consequently traditional methods of lifetime measurement are difficult to apply to low dimensional materials. This thesis is focused on the measurement of minority carrier lifetime in micro and nano materials (bulk, epitaxial layer, silicon on insulator and nanowires) with a special emphasis on the adaptation of the characterization tools to the material thickness. We have studied first bulk samples and epitaxial layers (with thicknesses around 50µm) by photoluminescence. We have developed a method to determine simultaneously the bulk lifetime and the surface recombination velocity using room temperature photoluminescence measurement. The procedure consists in measuring the photoluminescence intensity ratio at different incident laser wavelengths and power. These photoluminescence ratios are then compared with analytical simulations, which allow us to evaluate the surface recombination velocity and the bulk lifetime. We have then investigated SOI (Silicon on insulator) structures with ultrathin semiconductor layers of the order of 100 nanometers. After a brief description of the manufacturing methods and of some of their uses, we have analyzed the existing electrical methods used to evaluate the quality of SOI substrates. This led us to propose a new characterization method to overcome the limitations of these techniques. This method is based on a current-voltage measurement in the dark and under illumination called PSEUDO-MOSFET (the substrate of the SOI structure serves as the transistor gate and the two contact points deposited on the silicon film are used as the source and drain). We applied this new method to characterize the lifetime of a SOI substrate and with the help of numerical simulation, we were able to explain the recombination mechanism associated with interfaces and extract the parameters. Finally, the last chapter concerns the study of nanowires for photovoltaic applications. In the nanowires, the surface to volume ratio greatly increases leading to a decrease of the effective lifetime due to the increased influence of the surfaces. In this chapter, we have studied the minority carrier lifetime in core-shell nanowire-based solar cells under dark conditions with a purely electrical approach called reverse recovery transient (RRT). This method is based on storage time measurement which depends essentially on the amount of stored charges in the biased junction and can be used to calculate the minority carrier lifetime. Numerical simulations have also been done to explain the measurements and to validate the theory and the hypotheses used for parameter extraction.
205

Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures

January 2011 (has links)
abstract: The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed the NOHARM-based methods in both two- (2D) and three-dimensional (3D) compensatory MIRT conditions. The DETECT-based methods yielded high proportion correct, especially when correlations were .60 or smaller, data exhibited 30% or less complexity, and larger sample size. As the complexity increased and the sample size decreased, the performance typically diminished. As the complexity increased, it also became more difficult to label the resulting sets of items from DETECT in terms of the dimensions. DETECT was consistent in classification of simple items, but less consistent in classification of complex items. Out of the three NOHARM-based methods, χ2G/D and ALR generally outperformed RMSR. χ2G/D was more accurate when N = 500 and complexity levels were 30% or lower. As the number of items increased, ALR performance improved at correlation of .60 and 30% or less complexity. When the data followed a noncompensatory MIRT model, the NOHARM-based methods, specifically χ2G/D and ALR, were the most accurate of all five methods. The marginal proportions for labeling sets of items as dimension-like were typically low, suggesting that the methods generally failed to label two (three) sets of items as dimension-like in 2D (3D) noncompensatory situations. The DETECT-based methods were more consistent in classifying simple items across complexity levels, sample sizes, and correlations. However, as complexity and correlation levels increased the classification rates for all methods decreased. In most conditions, the DETECT-based methods classified complex items equally or more consistent than the NOHARM-based methods. In particular, as complexity, the number of items, and the true dimensionality increased, the DETECT-based methods were notably more consistent than any NOHARM-based method. Despite DETECT's consistency, when data follow a noncompensatory MIRT model, the NOHARM-based method should be preferred over the DETECT-based methods to assess dimensionality due to poor performance of DETECT in identifying the true dimensionality. / Dissertation/Thesis / Ph.D. Educational Psychology 2011
206

Classificação de dados imagens em alta dimensionalidade, empregando amostras semi-rotuladas e estimadores para as probabilidades a priori / Classification of high dimensionality image data, using semilabeled samples and estimation of the a priori probabilities

Liczbinski, Celso Antonio January 2007 (has links)
Em cenas naturais, ocorrem com certa freqüência classes espectralmente muito similares, isto é, os vetores média são muito próximos. Em situações como esta dados de baixa dimensionalidade (LandSat-TM, Spot) não permitem uma classificação acurada da cena. Por outro lado, sabe-se que dados em alta dimensionalidade tornam possível a separação destas classes, desde que as matrizes covariância sejam suficientemente distintas. Neste caso, o problema de natureza prática que surge é o da estimação dos parâmetros que caracterizam a distribuição de cada classe. Na medida em que a dimensionalidade dos dados cresce, aumenta o número de parâmetros a serem estimados, especialmente na matriz covariância. Contudo, é sabido que, no mundo real, a quantidade de amostras de treinamento disponíveis, é freqüentemente muito limitada, ocasionando problemas na estimação dos parâmetros necessários ao classificador, degradando, portanto a acurácia do processo de classificação, na medida em que a dimensionalidade dos dados aumenta. O Efeito de Hughes, como é chamado este fenômeno, já é bem conhecido no meio científico, e estudos vêm sendo realizados com o objetivo de mitigar este efeito. Entre as alternativas propostas com a finalidade de mitigar o Efeito de Hughes, encontram-se as técnicas que utilizam amostras não rotuladas e amostras semi-rotuladas para minimizar o problema do tamanho reduzido das amostras de treinamento. Deste modo, técnicas que utilizam amostras semi-rotuladas, tornamse um tópico interessante de estudo, bem como o comportamento destas técnicas em ambientes de dados de imagens digitais de alta dimensionalidade em sensoriamento remoto, como por exemplo, os dados fornecidos pelo sensor AVIRIS. Neste estudo foi dado prosseguimento à metodologia investigada por Lemos (2003), o qual implementou a utilização de amostras semi-rotuladas para fins de estimação dos parâmetros do classificador Máxima Verossimilhança Gaussiana (MVG). A contribuição do presente trabalho consistiu na inclusão de uma etapa adicional, introduzindo a estimação das probabilidades a priori P( wi) referentes às classes envolvidas para utilização no classificador MVG. Desta forma, utilizando-se funções de decisão mais ajustadas à realidade da cena analisada, obteve-se resultados mais acurados no processo de classificação. Os resultados atestaram que com um número limitado de amostras de treinamento, técnicas que utilizam algoritmos adaptativos, mostram-se eficientes em reduzir o Efeito de Hughes. Apesar deste Efeito, quanto à acurácia, em todos os casos o modelo quadrático mostrou-se eficiente através do algoritmo adaptativo. A conclusão principal desta dissertação é que o método do algoritmo adaptativo é útil no processo de classificação de imagens com dados em alta dimensionalidade e classes com características espectrais muito próximas. / In natural scenes there are some cases in which some of the land-cover classes involved are spectrally very similar, i.e., their first order statistics are nearly identical. In these cases, the more traditional sensor systems such as Landsat-TM and Spot, among others usually result in a thematic image low in accuracy. On the other hand, it is well known that high-dimensional image data allows for the separation of classes that are spectrally very similar, provided that their second-order statistics differ significantly. The classification of high-dimensional image data, however, poses some new problems such as the estimation of the parameters in a parametric classifier. As the data dimensionality increases, so does the number of parameters to be estimated, particularly in the covariance matrix. In real cases, however, the number of training samples available is usually limited preventing therefore a reliable estimation of the parameters required by the classifier. The paucity of training samples results in a low accuracy for the thematic image which becomes more noticeable as the data dimensionality increases. This condition is known as the Hughes Phenomenon. Different approaches to mitigate the Hughes Phenomenon investigated by many authors have been reported in the literature. Among the possible alternatives that have been proposed, the so called semi-labeled samples has shown some promising results in the classification of remote sensing high dimensional image data, such as AVIRIS data. In this dissertation the approach proposed by Lemos (2003) is further investigated to increase the reliability in the estimation of the parameters required by the Gaussian Maximum Likelihood (GML) classifier. In this dissertation, we propose a methodology to estimate the a priory probabilities P( i) required by the GMV classifier. It is expected that a more realistic estimation of the values for the a priory probabilities well help to increase the accuracy of the thematic image produced by the GML classifier. The experiments performed in this study have shown an increase in the accuracy of the thematic image, suggesting the adequacy of the proposed methodology.
207

Efficient Bayesian Inference for Multivariate Factor Stochastic Volatility Models

Kastner, Gregor, Frühwirth-Schnatter, Sylvia, Lopes, Hedibert Freitas 24 February 2016 (has links) (PDF)
We discuss efficient Bayesian estimation of dynamic covariance matrices in multivariate time series through a factor stochastic volatility model. In particular, we propose two interweaving strategies (Yu and Meng, Journal of Computational and Graphical Statistics, 20(3), 531-570, 2011) to substantially accelerate convergence and mixing of standard MCMC approaches. Similar to marginal data augmentation techniques, the proposed acceleration procedures exploit non-identifiability issues which frequently arise in factor models. Our new interweaving strategies are easy to implement and come at almost no extra computational cost; nevertheless, they can boost estimation efficiency by several orders of magnitude as is shown in extensive simulation studies. To conclude, the application of our algorithm to a 26-dimensional exchange rate data set illustrates the superior performance of the new approach for real-world data. / Series: Research Report Series / Department of Statistics and Mathematics
208

Développement d'outils statistiques pour l'analyse de données transcriptomiques par les réseaux de co-expression de gènes / A systemic approach to statistical analysis to transcriptomic data through co-expression network analysis

Brunet, Anne-Claire 17 June 2016 (has links)
Les nouvelles biotechnologies offrent aujourd'hui la possibilité de récolter une très grande variété et quantité de données biologiques (génomique, protéomique, métagénomique...), ouvrant ainsi de nouvelles perspectives de recherche pour la compréhension des processus biologiques. Dans cette thèse, nous nous sommes plus spécifiquement intéressés aux données transcriptomiques, celles-ci caractérisant l'activité ou le niveau d'expression de plusieurs dizaines de milliers de gènes dans une cellule donnée. L'objectif était alors de proposer des outils statistiques adaptés pour analyser ce type de données qui pose des problèmes de "grande dimension" (n<<p), car collectées sur des échantillons de tailles très limitées au regard du très grand nombre de variables (ici l'expression des gènes).La première partie de la thèse est consacrée à la présentation de méthodes d'apprentissage supervisé, telles que les forêts aléatoires de Breiman et les modèles de régressions pénalisées, utilisées dans le contexte de la grande dimension pour sélectionner les gènes (variables d'expression) qui sont les plus pertinents pour l'étude de la pathologie d'intérêt. Nous évoquons les limites de ces méthodes pour la sélection de gènes qui soient pertinents, non pas uniquement pour des considérations d'ordre statistique, mais qui le soient également sur le plan biologique, et notamment pour les sélections au sein des groupes de variables fortement corrélées, c'est à dire au sein des groupes de gènes co-exprimés. Les méthodes d'apprentissage classiques considèrent que chaque gène peut avoir une action isolée dans le modèle, ce qui est en pratique peu réaliste. Un caractère biologique observable est la résultante d'un ensemble de réactions au sein d'un système complexe faisant interagir les gènes les uns avec les autres, et les gènes impliqués dans une même fonction biologique ont tendance à être co-exprimés (expression corrélée). Ainsi, dans une deuxième partie, nous nous intéressons aux réseaux de co-expression de gènes sur lesquels deux gènes sont reliés si ils sont co-exprimés. Plus précisément, nous cherchons à mettre en évidence des communautés de gènes sur ces réseaux, c'est à dire des groupes de gènes co-exprimés, puis à sélectionner les communautés les plus pertinentes pour l'étude de la pathologie, ainsi que les "gènes clés" de ces communautés. Cela favorise les interprétations biologiques, car il est souvent possible d'associer une fonction biologique à une communauté de gènes. Nous proposons une approche originale et efficace permettant de traiter simultanément la problématique de la modélisation du réseau de co-expression de gènes et celle de la détection des communautés de gènes sur le réseau. Nous mettons en avant les performances de notre approche en la comparant à des méthodes existantes et populaires pour l'analyse des réseaux de co-expression de gènes (WGCNA et méthodes spectrales). Enfin, par l'analyse d'un jeu de données réelles, nous montrons dans la dernière partie de la thèse que l'approche que nous proposons permet d'obtenir des résultats convaincants sur le plan biologique, plus propices aux interprétations et plus robustes que ceux obtenus avec les méthodes d'apprentissage supervisé classiques. / Today's, new biotechnologies offer the opportunity to collect a large variety and volume of biological data (genomic, proteomic, metagenomic...), thus opening up new avenues for research into biological processes. In this thesis, what we are specifically interested is the transcriptomic data indicative of the activity or expression level of several thousands of genes in a given cell. The aim of this thesis was to propose proper statistical tools to analyse these high dimensional data (n<<p) collected from small samples with regard to the very large number of variables (gene expression variables). The first part of the thesis is devoted to a description of some supervised learning methods, such as random forest and penalized regression models. The following methods can be used for selecting the most relevant disease-related genes. However, the statistical relevance of the selections doesn't determine the biological relevance, and particularly when genes are selected within a group of highly correlated variables or co-expressed genes. Common supervised learning methods consider that every gene can have an isolated action in the model which is not so much realistic. An observable biological phenomenum is the result of a set of reactions inside a complex system which makes genes interact with each other, and genes that have a common biological function tend to be co-expressed (correlation between expression variables). Then, in a second part, we are interested in gene co-expression networks, where genes are linked if they are co-expressed. More precisely, we aim to identify communities of co-expressed genes, and then to select the most relevant disease-related communities as well as the "key-genes" of these communities. It leads to a variety of biological interpretations, because a community of co-expressed genes is often associated with a specific biological function. We propose an original and efficient approach that permits to treat simultaneously the problem of modeling the gene co-expression network and the problem of detecting the communities in network. We put forward the performances of our approach by comparing it to the existing methods that are popular for analysing gene co-expression networks (WGCNA and spectral approaches). The last part presents the results produced by applying our proposed approach on a real-world data set. We obtain convincing and robust results that help us make more diverse biological interpretations than with results produced by common supervised learning methods.
209

Dimensionality Reduction for Commercial Vehicle Fleet Monitoring

Baldiwala, Aliakbar 25 October 2018 (has links)
A variety of new features have been added in the present-day vehicles like a pre-crash warning, the vehicle to vehicle communication, semi-autonomous driving systems, telematics, drive by wire. They demand very high bandwidth from in-vehicle networks. Various electronic control units present inside the automotive transmit useful information via automotive multiplexing. Automotive multiplexing allows sharing information among various intelligent modules inside an automotive electronic system. Optimum functionality is achieved by transmitting this data in real time. The high bandwidth and high-speed requirement can be achieved either by using multiple buses or by implementing higher bandwidth. But, by doing so the cost of the network and the complexity of the wiring in the vehicle increases. Another option is to implement higher layer protocol which can reduce the amount of data transferred by using data reduction (DR) techniques, thus reducing the bandwidth usage. The implementation cost is minimal as only the changes are required in the software and not in hardware. In our work, we present a new data reduction algorithm termed as “Comprehensive Data Reduction (CDR)” algorithm. The proposed algorithm is used for minimization of the bus utilization of CAN bus for a future vehicle. The reduction in the busload was efficiently made by compressing the parameters; thus, more number of messages and lower priority messages can be efficiently sent on the CAN bus. The proposed work also presents a performance analysis of proposed algorithm with the boundary of fifteen compression algorithm, and Compression area selection algorithms (Existing Data Reduction Algorithm). The results of the analysis show that proposed CDR algorithm provides better data reduction compared to earlier proposed algorithms. The promising results were obtained in terms of reduction in bus utilization, compression efficiency, and percent peak load of CAN bus. This Reduction in the bus utilization permits to utilize a larger number of network nodes (ECU’s) in the existing system without increasing the overall cost of the system. The proposed algorithm has been developed for automotive environment, but it can also be utilized in any applications where extensive information transmission among various control units is carried out via a multiplexing bus.
210

Representações textuais e a geração de hubs : um estudo comparativo

Aguiar, Raul Freire January 2017 (has links)
Orientador: Prof. Dr. Ronaldo Pratti / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2017. / O efeito de hubness, juntamente com a maldição de dimensionalidade, vem sendo estudado, sob diferentes oticas, nos ultimos anos. Os estudos apontam que este problema esta presente em varios conjuntos de dados do mundo real e que a presença de hubs (tendencia de alguns exemplos aparecem com frequencia na lista de vizinhos mais proximos de outros exemplos) traz uma serie de consequencias indesejaveis, como por exemplo, afetar o desempenho de classificadores. Em tarefas de mineração de texto, o problema depende tambem da maneira escolhida pra representar os documentos. Sendo assim o objetivo principal dessa dissertação é avaliar o impacto da formação de hubs em diferentes representações textuais. Ate onde vai o nosso conhecimento e durante o período desta pesquisa, não foi posivel encontrar na literatura um estudo aprofundado sobre as implicaçõess do efeito de hubness em diferentes representações textuais. Os resultados sugerem que as diferentes representações textuais implicam em corpus com propensão menor para a formação de hubs. Notou-se também que a incidencia de hubs nas diferentes representações textuais possuem in uencia similar em alguns classificadores. Analisamos tambem o desempenho dos classifcadores apos a remoção de documentos sinalizados como hubs em porçõess pre-estabelecidas do tamanho total do data set. Essa remoção trouxe, a alguns algoritmos, uma tendencia de melhoria de desempenho. Dessa maneira, apesar de nem sempre efetiva, a estrategia de identifcar e remover hubs com uma vizinhança majoritariamente ruim pode ser uma interessante tecnica de pre-processamento a ser considerada, com o intuito de melhorar o desempenho preditivo da tarefa de classificação. / The hubness phenomenon, associated to the curse of dimensionality, has been studied, from diferent perspectives, in recent years. These studies point out that the hubness problem is present in several real-world data sets and, as a consequence, the hubness implies a series of undesirable side efects, such as an increase in misclassifcation error in classification tasks. In text mining research, this problem also depends on the choice of text representation. Hence, the main objective of the dissertation is to evaluate the impact of the hubs presence in diferent textual representations. To the best of our knowledge, this is the first study that performs an in-depth analysis on the efects of the hub problem in diferent textual representations. The results suggest that diferent text representations implies in diferent bias towards hubs presence in diferent corpus. It was also noticed that the presence of hubs in dierent text representations has similar in uence for some classifiers. We also analyzed the performance of classifiers after removing documents agged as hubs in pre-established portions of the total data set size. This removal allows, to some algorithms, a trend of improvement in performance. Thus, although not always efective, the strategy of identifying and removing hubs with a majority of bad neighborhood may be an interesting preprocessing technique to be considered in order to improve the predictive performance of the text classification task.

Page generated in 0.0689 seconds