Global ETD Search

201	Measuring the GRID in the Sepedi, Xitsonga and Tshivenda language groups in the South African Police Service / E. Rauch Rauch, Eloise January 2009 (has links) While the study of emotions is of universal interest because of its central role in the social sciences and humanities, emotions are of special interest for South Africa for both theoretical and applied reasons. South Africa, with its eleven official languages, is a true multicultural society with extreme differences in terms of culture, acculturation, and socio-economic status. Cultural frameworks differ substantially between ethno-cultural groups, and clarification of the differences between cultural frameworks can counter interpretation biases that could result in daily frictions and major conflicts. Additional fundamental cross-cultural research on emotional differences between cultural groups, together with the generation of a mutual understanding of the different cultural frameworks, makes these frameworks explicit and facilitates the incorporation of these frameworks into daily communication and interaction processes. The objectives of this research were to determine what the emotion structure of the Sepedi, Xitsonga and Tshivenda languages groups within a sample of Sepedi-, Xitsonga- and Tshivenda-speaking participants is, and how it compares with the European Emotion Structure. Furthermore this research aimed to establish the emotion structure and the relevant and representative features for each emotion component (such as appraisals, action tendencies, and subjective experiences) that have been encoded in a sample of Sepedi-, Xitsonga- and Tshivenda-speaking participants. Like\vise it was deemed necessary to verify (a) the extent to which the emotion words refer to specific positions on each of the emotion features of these language groups and (b) the extent of similarity or dissimilarity between emotion experiences of the Sepedi, Xitsonga and Tshivenda groups in the SAPS, as well as to compare the meaning structure between a "bottom-up" and a "top-down" (as conducted in Nicholls' research in 2008) approach between Sepedi-, Xitsonga- and Tshivenda-speaking participants. A survey design with convenience sampling was used to achieve the research objectives. The study population (n=390) consisted of Sepedi-, Xitsonga- and Tshivenda-speaking entry-level police applicants from the South African Police Service (SAPS). The Sepedi, Xitsonga and Tshivenda GRlD questionnaires were administered. Statistical methods and procedures (multidimensional scaling and descriptive statistics) were used and Cronbachrs alpha coefficients were determined to analyse the results. Results of this study on the Sepedi, Xitsonga and Tshivenda cultural groups indicated the extraction of a two-factor model within the Sepedi group. Due to the extremely low reliability analyses of the Xitsonga and Tshivenda language groups' data, a reliable scale analysis and the meaning structures of these two groups could not be determined. The low reliabilities could be attributed to the direct language translation of the questionnaire and the assessment may not have captured the full understanding of the items in the GRlD instrument. Results of this study for the Sepedi language group corresponded well with the results found in the study for the Sepedi group conducted by Nicholls (2008) on the emotion lexicon on the Sepedi, Xitsonga and Tshivenda language groups in South Africa. The Nicholls study (2008) indicated the extraction of a three-dimensional structure (evaluation, arousal, dominance) and a four-factor loading (positive emotion, sadness, fear, anger) for the Sepedi-speaking language group. In comparison, this research presented the extraction of a two-dimensional structure (evaluation and arousal) and a two-factor loading (positive emotion and sadness). Emotion concepts of the Sepedi group indicated that basic emotion concepts (love, joy, anger, sadness, fear, and surprise) readily came to mind in both Nicholls' (2008) and this study. Emotion concepts listed by the Sepedi group could be interpreted as emotion words associated with social, personality or environmental aspects and may be related to negative evaluation, dominance and/or aggression. Recommendations for future research were made. / Thesis (M.Com. (Industrial Psychology)--North-West University, Potchefstroom Campus, 2010. Emotion words Emotion terms Emotion components Dimensionality Lexicon Prototypicality Emotion theory GRID Police Cross-cultural Sepedi Xitsonga
202	The emotion structure of the isiNdebele speaking group in the Mpumalanga province / Masombuka, J.S. Masombuka, Johannes Sipho January 2011 Emotions play an important role in the lives of human beings and, without doubt, emotions form an inherent part of the workplace (Ashkanasy, Zerbe, Charmine & Hartel, 2002). Studying emotions within the South African context is relevant for applied psychology. South Africa comprises eleven official languages which are representative of the general population in the working environment. As a result, knowledge and understanding of emotions is useful since it forms part of social interaction at work. The understanding of one’s own as well as others’ emotions and the ability to deal with those emotions contribute to the productivity and cooperation among employees in the working environment. The objective of this research was to determine the conceptualization of emotion and culture according to the literature study, to determine the different and representative emotion words within the isiNdebele speaking group, to determine the relevant and representative prototypical emotion words that have been encoded in this group, to determine the cognitive emotion structure of this group and lastly, to determine the interrater reliability of the raters and reliability of the measurement instrument as well as the dimensions of emotion structure in the isiNdebele speaking group in Mpumalanga province. A survey design with convenience sample was used to achieve the research objectives in a series of three independent studies. The study population of the first phase (N=126) consisted of a convenience sample of the isiNdebele speaking group who have metric and are working in the South African Police Service in Mpumalanga province. The study population of the second phase consisted of a convenience sample of Language Experts with degrees and diplomas (N=51) in isiNdebele language from different occupations. The study population of the third phase consisted of a convenience sample of the experts (educators) in isiNdebele speaking group (N=183) from different schools in the former KwaNdebele homeland in Mpumalanga province. In this study, free listing, prototypicality and similarity rating questionnaires were administered by a qualified psychometrist. Statistical methods and procedures (Multidimensional Scaling and Descriptive Statistics) were used and Cronbach alpha coefficients were determined to analyse the results of the isiNdebele speaking group. The results of the free listing task indicated the words with the highest frequency as cry (lila), happy (thaba), laugh (hleka), angry (kwata), disappointed (swaba), confused (hlangahlangana), depressed (gandeleleka), pain (ubuhlungu), tired (dinwa), and abused (hlukumezeka). The results of this phase also indicated the basic emotion concepts of happiness (thaba) and angry (kwata) as the only emotion terms which mostly came to mind to the isiNdebele speaking group. The results of the prototypicality rating task indicated the emotion terms ranked as the ten (10) most prototypical emotion terms for the isiNdebele speaking group (N=51) were “ukuthaba khulu” (exhilaration), “itukuthelo/ ukukwata” (anger), “ithabo elikhulu” (euphoria), “ukuthaba” (cheerfulness), “ithabo” (happiness), “ukudana” (dejection), “ukutlhuwa/ ukudana”(glumness), “ukuthaba” (joviality), “ukulila/isililo” (cry), “ithabo” (joy). A multi– dimensional scaling was conducted to determine the cognitive structure of emotion concepts whereby a two– dimensional structure (evaluation and power) was identified to the isiNdebele speaking group. Recommendations for future research to the organisation as well as recommendations for future research were suggested. / http://hdl.handle.net/10394/7044 / http://hdl.handle.net/10394/7044 / Thesis (M.A. (Industrial Psychology))--North-West University, Potchefstroom Campus, 2012. Emotion Emotion lexicon Prototypicality Componential theory Dimensionality Ethnic group Emosie Emosieleksikon Prototipikaliteit Komponensiële teorie Dimensionalitiet Etniese groep
203	Acquiring symbolic design optimization problem reformulation knowledge: On computable relationships between design syntax and semantics Sarkar, Somwrita January 2009 (has links) Doctor of Philosophy (PhD) / This thesis presents a computational method for the inductive inference of explicit and implicit semantic design knowledge from the symbolic-mathematical syntax of design formulations using an unsupervised pattern recognition and extraction approach. Existing research shows that AI / machine learning based design computation approaches either require high levels of knowledge engineering or large training databases to acquire problem reformulation knowledge. The method presented in this thesis addresses these methodological limitations. The thesis develops, tests, and evaluates ways in which the method may be employed for design problem reformulation. The method is based on the linear algebra based factorization method Singular Value Decomposition (SVD), dimensionality reduction and similarity measurement through unsupervised clustering. The method calculates linear approximations of the associative patterns of symbol cooccurrences in a design problem representation to infer induced coupling strengths between variables, constraints and system components. Unsupervised clustering of these approximations is used to identify useful reformulations. These two components of the method automate a range of reformulation tasks that have traditionally required different solution algorithms. Example reformulation tasks that it performs include selection of linked design variables, parameters and constraints, design decomposition, modularity and integrative systems analysis, heuristically aiding design “case” identification, topology modeling and layout planning. The relationship between the syntax of design representation and the encoded semantic meaning is an open design theory research question. Based on the results of the method, the thesis presents a set of theoretical postulates on computable relationships between design syntax and semantics. The postulates relate the performance of the method with empirical findings and theoretical insights provided by cognitive neuroscience and cognitive science on how the human mind engages in symbol processing and the resulting capacities inherent in symbolic representational systems to encode “meaning”. The performance of the method suggests that semantic “meaning” is a higher order, global phenomenon that lies distributed in the design representation in explicit and implicit ways. A one-to-one local mapping between a design symbol and its meaning, a largely prevalent approach adopted by many AI and learning algorithms, may not be sufficient to capture and represent this meaning. By changing the theoretical standpoint on how a “symbol” is defined in design representations, it was possible to use a simple set of mathematical ideas to perform unsupervised inductive inference of knowledge in a knowledge-lean and training-lean manner, for a knowledge domain that traditionally relies on “giving” the system complex design domain and task knowledge for performing the same set of tasks. Design computation and optimization Pattern extraction Design reformulation
204	Acquiring symbolic design optimization problem reformulation knowledge: On computable relationships between design syntax and semantics Sarkar, Somwrita January 2009 (has links) Doctor of Philosophy (PhD) / This thesis presents a computational method for the inductive inference of explicit and implicit semantic design knowledge from the symbolic-mathematical syntax of design formulations using an unsupervised pattern recognition and extraction approach. Existing research shows that AI / machine learning based design computation approaches either require high levels of knowledge engineering or large training databases to acquire problem reformulation knowledge. The method presented in this thesis addresses these methodological limitations. The thesis develops, tests, and evaluates ways in which the method may be employed for design problem reformulation. The method is based on the linear algebra based factorization method Singular Value Decomposition (SVD), dimensionality reduction and similarity measurement through unsupervised clustering. The method calculates linear approximations of the associative patterns of symbol cooccurrences in a design problem representation to infer induced coupling strengths between variables, constraints and system components. Unsupervised clustering of these approximations is used to identify useful reformulations. These two components of the method automate a range of reformulation tasks that have traditionally required different solution algorithms. Example reformulation tasks that it performs include selection of linked design variables, parameters and constraints, design decomposition, modularity and integrative systems analysis, heuristically aiding design “case” identification, topology modeling and layout planning. The relationship between the syntax of design representation and the encoded semantic meaning is an open design theory research question. Based on the results of the method, the thesis presents a set of theoretical postulates on computable relationships between design syntax and semantics. The postulates relate the performance of the method with empirical findings and theoretical insights provided by cognitive neuroscience and cognitive science on how the human mind engages in symbol processing and the resulting capacities inherent in symbolic representational systems to encode “meaning”. The performance of the method suggests that semantic “meaning” is a higher order, global phenomenon that lies distributed in the design representation in explicit and implicit ways. A one-to-one local mapping between a design symbol and its meaning, a largely prevalent approach adopted by many AI and learning algorithms, may not be sufficient to capture and represent this meaning. By changing the theoretical standpoint on how a “symbol” is defined in design representations, it was possible to use a simple set of mathematical ideas to perform unsupervised inductive inference of knowledge in a knowledge-lean and training-lean manner, for a knowledge domain that traditionally relies on “giving” the system complex design domain and task knowledge for performing the same set of tasks. Design computation and optimization Pattern extraction Design reformulation
205	Τμηματοποίηση εικόνων υφής με χρήση πολυφασματικής ανάλυσης και ελάττωσης διαστάσεων Θεοδωρακόπουλος, Ηλίας 16 June 2010 (has links) Τμηματοποίηση υφής ονομάζεται η διαδικασία του διαμερισμού μίας εικόνας σε πολλαπλά τμήματα-περιοχές, με κριτήριο την υφή κάθε περιοχής. Η διαδικασία αυτή βρίσκει πολλές εφαρμογές στους τομείς της υπολογιστικής όρασης, της ανάκτησης εικόνων, της ρομποτικής, της ανάλυσης δορυφορικών εικόνων κλπ. Αντικείμενο της παρούσης εργασίας είναι να διερευνηθεί η ικανότητα των αλγορίθμων μη γραμμικής ελάττωσης διάστασης, και ιδιαίτερα του αλγορίθμου Laplacian Eigenmaps, να παράγει μία αποδοτική αναπαράσταση των δεδομένων που προέρχονται από πολυφασματική ανάλυση εικόνων με χρήση φίλτρων Gabor, για την επίλυση του προβλήματος της τμηματοποίησης εικόνων υφής. Για το σκοπό αυτό προτείνεται μία νέα μέθοδος επιβλεπόμενης τμηματοποίησης υφής, που αξιοποιεί μία χαμηλής διάστασης αναπαράσταση των χαρακτηριστικών διανυσμάτων, και γνωστούς αλγόριθμους ομαδοποίησης δεδομένων όπως οι Fuzzy C-means και K-means, για την παραγωγή της τελικής τμηματοποίησης. Η αποτελεσματικότητα της μεθόδου συγκρίνεται με παρόμοιες μεθόδους που έχουν προταθεί στη βιβλιογραφία, και χρησιμοποιούν την αρχική , υψηλών διαστάσεων, αναπαράσταση των χαρακτηριστικών διανυσμάτων. Τα πειράματα διενεργήθηκαν χρησιμοποιώντας την βάση εικόνων υφής Brodatz. Κατά το στάδιο αξιολόγησης της μεθόδου, χρησιμοποιήθηκε ο δείκτης Rand index σαν μέτρο ομοιότητας ανάμεσα σε κάθε παραγόμενη τμηματοποίηση και την αντίστοιχη ground-truth τμηματοποίηση. / Texture segmentation is the process of partitioning an image into multiple segments (regions) based on their texture, with many applications in the area of computer vision, image retrieval, robotics, satellite imagery etc. The objective of this thesis is to investigate the ability of non-linear dimensionality reduction algorithms, and especially of LE algorithm, to produce an efficient representation for data derived from multi-spectral image analysis using Gabor filters, in solving the texture segmentation problem. For this purpose, we introduce a new supervised texture segmentation algorithm, which exploits a low-dimensional representation of feature vectors and well known clustering methods, such as Fuzzy C-means and K-means, to produce the final segmentation. The effectiveness of this method was compared to that of similar methods proposed in the literature, which use the initial high-dimensional representation of feature vectors. Experiments were performed on Brodatz texture database. During evaluation stage, Rand index has been used as a similarity measure between each segmentation and the corresponding ground-truth segmentation. Υφή Τμηματοποίηση Φασματική αποσύνθεση Φίλτρα Gabor 621.367 Texture Segmentation Non-linear dimensionality reduction Spectral decomposition Gabor filters
206	Mesure de durée de vie de porteurs minoritaires dans les structures semiconductrices de basse dimensionnalité / Measurement of the lifetime and diffusion length of minority charge carriers in low dimensionality materials Daanoune, Mehdi 03 February 2015 (has links) La durée de vie des porteurs minoritaires est l'un des principaux paramètres mesurés dans les semi-conducteurs et la décroissance de photoconductivité (PCD) l'une des méthodes les plus largement utilisées pour ce type de mesure. Aujourd'hui, grâce aux divers équipements automatisés, la mesure de durée de vie est devenue une caractérisation de routine qui permet de juger de la qualité d'un matériau dans tous les secteurs utilisant les semi-conducteurs. Cependant, l'utilisation de micro- et nano-matériaux dans l'industrie du photovoltaïque et de la microélectronique requière l'adaptation des techniques existantes (PCD, photoluminescence etc.). En effet, avec la réduction des dimensions (couches ultraminces telles que les couches épitaxiées, couches SOI « silicon on insulator », et nanostructures), l'influence de la surface (états d'interfaces, pièges, etc.) devient prépondérante. La présence des substrats utilisés pour les croissances ou report de couche de ces différentes structures perturbe également les mesures. Ceci rend difficile l'adaptation des méthodes de mesure de durée de vie classiques comme, par exemple, le déclin de photoconductivité. Au cours de cette thèse nous nous sommes attachés à adapter des techniques de caractérisation de durée de vie à des matériaux de faibles dimensions. Nous avons tout d'abord caractérisé des échantillons massifs et des couches épitaxiées d'une épaisseur de l'ordre de la dizaine de micromètres. Nous avons proposé une technique qui consiste à déterminer simultanément la durée de vie en volume et la vitesse de recombinaison en surface des porteurs minoritaires dans d'une couche épitaxiée, à partir de la mesure de l'intensité de photoluminescence. La méthode développée consiste à calculer le rapport de l'intensité de photoluminescence (RPL) mesurée à différentes longueurs d'onde et pour différentes puissances d'excitation. Ces rapports RPL expérimentaux sont ensuite comparés aux rapports RPL simulés, ce qui permet d'évaluer la vitesse de recombinaison en surface et le temps de vie en volume. Nous avons ensuite étudié des couches semi-conductrices ultraminces de l'ordre de la centaine de nanomètres dans des structures de type SOI (silicon on insulator). Après un rappel des méthodes de fabrication et de quelques-unes des utilisations, nous avons analysé les méthodes électriques existantes permettant de déterminer la qualité des substrats SOI. Cela nous a amené à proposer une nouvelle méthode de caractérisation apportant des solutions aux limitations de ces techniques. Cette méthode se base sur une mesure courant-tension sous obscurité et sous éclairement en configuration PSEUDO-MOSFET où le substrat de la structure SOI sert de grille du transistor et deux pointes déposées sur le film de silicium servent de source et drain. Nous avons appliqué cette nouvelle méthode de caractérisation de la durée de vie des porteurs de charge à un substrat SOI et avec l'aide de la simulation numérique, nous avons pu expliquer les phénomènes de recombinaison aux interfaces et extraire les paramètres associés. Enfin, la dernière partie de ce travail de thèse concerne l'étude des nanofils pour des applications photovoltaïques. Dans les nanofils, le rapport surface sur volume augmente considérablement ce qui entraîne une diminution de la durée de vie effective due à l'augmentation de l'influence des surfaces. Le fonctionnement des cellules solaires à base de nanofils que nous avons étudiées est très dépendant de la qualité des interfaces. Nous avons analysé ces cellules grâce à la méthode RRT (« Reverse Recovery Transient ») basée sur la proportionnalité qui existe entre la quantité de charges stockées dans les régions neutres des jonctions pn polarisées et la durée de vie des porteurs minoritaires. Ce type de structure étant assez complexe, nous avons utilisé des simulations numériques pour analyser les phénomènes de recombinaison au sein de la cellule solaire et extraire les densités de défauts aux interfaces. / The minority carrier lifetime is one of the main parameters used to analyse the semiconductors quality and photoconductivity decay (PCD) is one of the most widely used lifetime characterization method. Thanks to the variety of automated equipment that has developed, lifetime measurement has become a routine technique to assess the quality of semiconductors. However, the micro and nano materials used in the photovoltaic and microelectronics industry require an adaptation of the existing methods (PCD, photoluminescence etc.). Indeed, with reduced dimensions (epitaxial layers, SOI “Silicon on Insulator”, nanostructures and nanowires), the influence of the surface (interface states density, traps, etc.) becomes predominant. The presence of the substrates used for the material growth or for the layer transfer can also influence the measures. Consequently traditional methods of lifetime measurement are difficult to apply to low dimensional materials. This thesis is focused on the measurement of minority carrier lifetime in micro and nano materials (bulk, epitaxial layer, silicon on insulator and nanowires) with a special emphasis on the adaptation of the characterization tools to the material thickness. We have studied first bulk samples and epitaxial layers (with thicknesses around 50µm) by photoluminescence. We have developed a method to determine simultaneously the bulk lifetime and the surface recombination velocity using room temperature photoluminescence measurement. The procedure consists in measuring the photoluminescence intensity ratio at different incident laser wavelengths and power. These photoluminescence ratios are then compared with analytical simulations, which allow us to evaluate the surface recombination velocity and the bulk lifetime. We have then investigated SOI (Silicon on insulator) structures with ultrathin semiconductor layers of the order of 100 nanometers. After a brief description of the manufacturing methods and of some of their uses, we have analyzed the existing electrical methods used to evaluate the quality of SOI substrates. This led us to propose a new characterization method to overcome the limitations of these techniques. This method is based on a current-voltage measurement in the dark and under illumination called PSEUDO-MOSFET (the substrate of the SOI structure serves as the transistor gate and the two contact points deposited on the silicon film are used as the source and drain). We applied this new method to characterize the lifetime of a SOI substrate and with the help of numerical simulation, we were able to explain the recombination mechanism associated with interfaces and extract the parameters. Finally, the last chapter concerns the study of nanowires for photovoltaic applications. In the nanowires, the surface to volume ratio greatly increases leading to a decrease of the effective lifetime due to the increased influence of the surfaces. In this chapter, we have studied the minority carrier lifetime in core-shell nanowire-based solar cells under dark conditions with a purely electrical approach called reverse recovery transient (RRT). This method is based on storage time measurement which depends essentially on the amount of stored charges in the biased junction and can be used to calculate the minority carrier lifetime. Numerical simulations have also been done to explain the measurements and to validate the theory and the hypotheses used for parameter extraction. Photovoltaique Durée de vie Porteurs minoritaires Faible dimensionnalité Lifetime Diffusion length Minority charge carriers Low dimensionality materials 620
207	Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures January 2011 (has links) abstract: The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed the NOHARM-based methods in both two- (2D) and three-dimensional (3D) compensatory MIRT conditions. The DETECT-based methods yielded high proportion correct, especially when correlations were .60 or smaller, data exhibited 30% or less complexity, and larger sample size. As the complexity increased and the sample size decreased, the performance typically diminished. As the complexity increased, it also became more difficult to label the resulting sets of items from DETECT in terms of the dimensions. DETECT was consistent in classification of simple items, but less consistent in classification of complex items. Out of the three NOHARM-based methods, χ2G/D and ALR generally outperformed RMSR. χ2G/D was more accurate when N = 500 and complexity levels were 30% or lower. As the number of items increased, ALR performance improved at correlation of .60 and 30% or less complexity. When the data followed a noncompensatory MIRT model, the NOHARM-based methods, specifically χ2G/D and ALR, were the most accurate of all five methods. The marginal proportions for labeling sets of items as dimension-like were typically low, suggesting that the methods generally failed to label two (three) sets of items as dimension-like in 2D (3D) noncompensatory situations. The DETECT-based methods were more consistent in classifying simple items across complexity levels, sample sizes, and correlations. However, as complexity and correlation levels increased the classification rates for all methods decreased. In most conditions, the DETECT-based methods classified complex items equally or more consistent than the NOHARM-based methods. In particular, as complexity, the number of items, and the true dimensionality increased, the DETECT-based methods were notably more consistent than any NOHARM-based method. Despite DETECT's consistency, when data follow a noncompensatory MIRT model, the NOHARM-based method should be preferred over the DETECT-based methods to assess dimensionality due to poor performance of DETECT in identifying the true dimensionality. / Dissertation/Thesis / Ph.D. Educational Psychology 2011 Educational Tests and Measurements Educational Psychology compensatory complex data structures dimensionality assessment multidimensional item response theory noncompensatory
208	Classificação de dados imagens em alta dimensionalidade, empregando amostras semi-rotuladas e estimadores para as probabilidades a priori / Classification of high dimensionality image data, using semilabeled samples and estimation of the a priori probabilities Liczbinski, Celso Antonio January 2007 (has links) Em cenas naturais, ocorrem com certa freqüência classes espectralmente muito similares, isto é, os vetores média são muito próximos. Em situações como esta dados de baixa dimensionalidade (LandSat-TM, Spot) não permitem uma classificação acurada da cena. Por outro lado, sabe-se que dados em alta dimensionalidade tornam possível a separação destas classes, desde que as matrizes covariância sejam suficientemente distintas. Neste caso, o problema de natureza prática que surge é o da estimação dos parâmetros que caracterizam a distribuição de cada classe. Na medida em que a dimensionalidade dos dados cresce, aumenta o número de parâmetros a serem estimados, especialmente na matriz covariância. Contudo, é sabido que, no mundo real, a quantidade de amostras de treinamento disponíveis, é freqüentemente muito limitada, ocasionando problemas na estimação dos parâmetros necessários ao classificador, degradando, portanto a acurácia do processo de classificação, na medida em que a dimensionalidade dos dados aumenta. O Efeito de Hughes, como é chamado este fenômeno, já é bem conhecido no meio científico, e estudos vêm sendo realizados com o objetivo de mitigar este efeito. Entre as alternativas propostas com a finalidade de mitigar o Efeito de Hughes, encontram-se as técnicas que utilizam amostras não rotuladas e amostras semi-rotuladas para minimizar o problema do tamanho reduzido das amostras de treinamento. Deste modo, técnicas que utilizam amostras semi-rotuladas, tornamse um tópico interessante de estudo, bem como o comportamento destas técnicas em ambientes de dados de imagens digitais de alta dimensionalidade em sensoriamento remoto, como por exemplo, os dados fornecidos pelo sensor AVIRIS. Neste estudo foi dado prosseguimento à metodologia investigada por Lemos (2003), o qual implementou a utilização de amostras semi-rotuladas para fins de estimação dos parâmetros do classificador Máxima Verossimilhança Gaussiana (MVG). A contribuição do presente trabalho consistiu na inclusão de uma etapa adicional, introduzindo a estimação das probabilidades a priori P( wi) referentes às classes envolvidas para utilização no classificador MVG. Desta forma, utilizando-se funções de decisão mais ajustadas à realidade da cena analisada, obteve-se resultados mais acurados no processo de classificação. Os resultados atestaram que com um número limitado de amostras de treinamento, técnicas que utilizam algoritmos adaptativos, mostram-se eficientes em reduzir o Efeito de Hughes. Apesar deste Efeito, quanto à acurácia, em todos os casos o modelo quadrático mostrou-se eficiente através do algoritmo adaptativo. A conclusão principal desta dissertação é que o método do algoritmo adaptativo é útil no processo de classificação de imagens com dados em alta dimensionalidade e classes com características espectrais muito próximas. / In natural scenes there are some cases in which some of the land-cover classes involved are spectrally very similar, i.e., their first order statistics are nearly identical. In these cases, the more traditional sensor systems such as Landsat-TM and Spot, among others usually result in a thematic image low in accuracy. On the other hand, it is well known that high-dimensional image data allows for the separation of classes that are spectrally very similar, provided that their second-order statistics differ significantly. The classification of high-dimensional image data, however, poses some new problems such as the estimation of the parameters in a parametric classifier. As the data dimensionality increases, so does the number of parameters to be estimated, particularly in the covariance matrix. In real cases, however, the number of training samples available is usually limited preventing therefore a reliable estimation of the parameters required by the classifier. The paucity of training samples results in a low accuracy for the thematic image which becomes more noticeable as the data dimensionality increases. This condition is known as the Hughes Phenomenon. Different approaches to mitigate the Hughes Phenomenon investigated by many authors have been reported in the literature. Among the possible alternatives that have been proposed, the so called semi-labeled samples has shown some promising results in the classification of remote sensing high dimensional image data, such as AVIRIS data. In this dissertation the approach proposed by Lemos (2003) is further investigated to increase the reliability in the estimation of the parameters required by the Gaussian Maximum Likelihood (GML) classifier. In this dissertation, we propose a methodology to estimate the a priory probabilities P( i) required by the GMV classifier. It is expected that a more realistic estimation of the values for the a priory probabilities well help to increase the accuracy of the thematic image produced by the GML classifier. The experiments performed in this study have shown an increase in the accuracy of the thematic image, suggesting the adequacy of the proposed methodology. Sensoriamento remoto Imagens digitais Remote sensing Patterns of recognition A priori probability Semi-labeled samples High dimensionality image data
209	Efficient Bayesian Inference for Multivariate Factor Stochastic Volatility Models Kastner, Gregor, Frühwirth-Schnatter, Sylvia, Lopes, Hedibert Freitas 24 February 2016 (has links) (PDF) We discuss efficient Bayesian estimation of dynamic covariance matrices in multivariate time series through a factor stochastic volatility model. In particular, we propose two interweaving strategies (Yu and Meng, Journal of Computational and Graphical Statistics, 20(3), 531-570, 2011) to substantially accelerate convergence and mixing of standard MCMC approaches. Similar to marginal data augmentation techniques, the proposed acceleration procedures exploit non-identifiability issues which frequently arise in factor models. Our new interweaving strategies are easy to implement and come at almost no extra computational cost; nevertheless, they can boost estimation efficiency by several orders of magnitude as is shown in extensive simulation studies. To conclude, the application of our algorithm to a 26-dimensional exchange rate data set illustrates the superior performance of the new approach for real-world data. / Series: Research Report Series / Department of Statistics and Mathematics
210	Développement d'outils statistiques pour l'analyse de données transcriptomiques par les réseaux de co-expression de gènes / A systemic approach to statistical analysis to transcriptomic data through co-expression network analysis Brunet, Anne-Claire 17 June 2016 (has links) Les nouvelles biotechnologies offrent aujourd'hui la possibilité de récolter une très grande variété et quantité de données biologiques (génomique, protéomique, métagénomique...), ouvrant ainsi de nouvelles perspectives de recherche pour la compréhension des processus biologiques. Dans cette thèse, nous nous sommes plus spécifiquement intéressés aux données transcriptomiques, celles-ci caractérisant l'activité ou le niveau d'expression de plusieurs dizaines de milliers de gènes dans une cellule donnée. L'objectif était alors de proposer des outils statistiques adaptés pour analyser ce type de données qui pose des problèmes de "grande dimension" (n<<p), car collectées sur des échantillons de tailles très limitées au regard du très grand nombre de variables (ici l'expression des gènes).La première partie de la thèse est consacrée à la présentation de méthodes d'apprentissage supervisé, telles que les forêts aléatoires de Breiman et les modèles de régressions pénalisées, utilisées dans le contexte de la grande dimension pour sélectionner les gènes (variables d'expression) qui sont les plus pertinents pour l'étude de la pathologie d'intérêt. Nous évoquons les limites de ces méthodes pour la sélection de gènes qui soient pertinents, non pas uniquement pour des considérations d'ordre statistique, mais qui le soient également sur le plan biologique, et notamment pour les sélections au sein des groupes de variables fortement corrélées, c'est à dire au sein des groupes de gènes co-exprimés. Les méthodes d'apprentissage classiques considèrent que chaque gène peut avoir une action isolée dans le modèle, ce qui est en pratique peu réaliste. Un caractère biologique observable est la résultante d'un ensemble de réactions au sein d'un système complexe faisant interagir les gènes les uns avec les autres, et les gènes impliqués dans une même fonction biologique ont tendance à être co-exprimés (expression corrélée). Ainsi, dans une deuxième partie, nous nous intéressons aux réseaux de co-expression de gènes sur lesquels deux gènes sont reliés si ils sont co-exprimés. Plus précisément, nous cherchons à mettre en évidence des communautés de gènes sur ces réseaux, c'est à dire des groupes de gènes co-exprimés, puis à sélectionner les communautés les plus pertinentes pour l'étude de la pathologie, ainsi que les "gènes clés" de ces communautés. Cela favorise les interprétations biologiques, car il est souvent possible d'associer une fonction biologique à une communauté de gènes. Nous proposons une approche originale et efficace permettant de traiter simultanément la problématique de la modélisation du réseau de co-expression de gènes et celle de la détection des communautés de gènes sur le réseau. Nous mettons en avant les performances de notre approche en la comparant à des méthodes existantes et populaires pour l'analyse des réseaux de co-expression de gènes (WGCNA et méthodes spectrales). Enfin, par l'analyse d'un jeu de données réelles, nous montrons dans la dernière partie de la thèse que l'approche que nous proposons permet d'obtenir des résultats convaincants sur le plan biologique, plus propices aux interprétations et plus robustes que ceux obtenus avec les méthodes d'apprentissage supervisé classiques. / Today's, new biotechnologies offer the opportunity to collect a large variety and volume of biological data (genomic, proteomic, metagenomic...), thus opening up new avenues for research into biological processes. In this thesis, what we are specifically interested is the transcriptomic data indicative of the activity or expression level of several thousands of genes in a given cell. The aim of this thesis was to propose proper statistical tools to analyse these high dimensional data (n<<p) collected from small samples with regard to the very large number of variables (gene expression variables). The first part of the thesis is devoted to a description of some supervised learning methods, such as random forest and penalized regression models. The following methods can be used for selecting the most relevant disease-related genes. However, the statistical relevance of the selections doesn't determine the biological relevance, and particularly when genes are selected within a group of highly correlated variables or co-expressed genes. Common supervised learning methods consider that every gene can have an isolated action in the model which is not so much realistic. An observable biological phenomenum is the result of a set of reactions inside a complex system which makes genes interact with each other, and genes that have a common biological function tend to be co-expressed (correlation between expression variables). Then, in a second part, we are interested in gene co-expression networks, where genes are linked if they are co-expressed. More precisely, we aim to identify communities of co-expressed genes, and then to select the most relevant disease-related communities as well as the "key-genes" of these communities. It leads to a variety of biological interpretations, because a community of co-expressed genes is often associated with a specific biological function. We propose an original and efficient approach that permits to treat simultaneously the problem of modeling the gene co-expression network and the problem of detecting the communities in network. We put forward the performances of our approach by comparing it to the existing methods that are popular for analysing gene co-expression networks (WGCNA and spectral approaches). The last part presents the results produced by applying our proposed approach on a real-world data set. We obtain convincing and robust results that help us make more diverse biological interpretations than with results produced by common supervised learning methods. Données transcriptomiques Réseaux de gènes Transcriptomic data Co-expression network Variable selection Dimensionality reduction Penalized regression Network clustering Machine learning

Search results