Global ETD Search

1	Nouveaux points de vue sur la classification hiérarchique et normalisation linguistique pour la segmentation et le regroupement en locuteurs / New insights into hierarchical clustering and linguistic normalization for speaker diarization Bozonnet, Simon 02 May 2012 (has links) Face au volume croissant de données audio et multimédia, les technologies liées à l'indexation de données et à l'analyse de contenu ont suscité beaucoup d'intérêt dans la communauté scientifique. Parmi celles-ci, la segmentation et le regroupement en locuteurs, répondant ainsi à la question 'Qui parle quand ?' a émergé comme une technique de pointe dans la communauté de traitement de la parole. D'importants progrès ont été réalisés dans le domaine ces dernières années principalement menés par les évaluations internationales du NIST. Tout au long de ces évaluations, deux approches se sont démarquées : l'une est bottom-up et l'autre top-down. L'ensemble des systèmes les plus performants ces dernières années furent essentiellement des systèmes types bottom-up, cependant nous expliquons dans cette thèse que l'approche top-down comporte elle aussi certains avantages. En effet, dans un premier temps, nous montrons qu'après avoir introduit une nouvelle composante de purification des clusters dans l'approche top-down, nous obtenons des performances comparables à celles de l'approche bottom-up. De plus, en étudiant en détails les deux types d'approches nous montrons que celles-ci se comportent différemment face à la discrimination des locuteurs et la robustesse face à la composante lexicale. Ces différences sont alors exploitées au travers d'un nouveau système combinant les deux approches. Enfin, nous présentons une nouvelle technologie capable de limiter l'influence de la composante lexicale, source potentielle d'artefacts dans le regroupement et la segmentation en locuteurs. Notre nouvelle approche se nomme Phone Adaptive Training par analogie au Speaker Adaptive Training / The ever-expanding volume of available audio and multimedia data has elevated technologies related to content indexing and structuring to the forefront of research. Speaker diarization, commonly referred to as the `who spoke when?' task, is one such example and has emerged as a prominent, core enabling technology in the wider speech processing research community. Speaker diarization involves the detection of speaker turns within an audio document (segmentation) and the grouping together of all same-speaker segments (clustering). Much progress has been made in the field over recent years partly spearheaded by the NIST Rich Transcription evaluations focus on meeting domain, in the proceedings of which are found two general approaches: top-down and bottom-up. Even though the best performing systems over recent years have all been bottom-up approaches we show in this thesis that the top-down approach is not without significant merit. Indeed we first introduce a new purification component leading to competitive performance to the bottom-up approach. Moreover, while investigating the two diarization approaches more thoroughly we show that they behave differently in discriminating between individual speakers and in normalizing unwanted acoustic variation, i.e.\ that which does not pertain to different speakers. This difference of behaviours leads to a new top-down/bottom-up system combination outperforming the respective baseline system. Finally, we introduce a new technology able to limit the influence of linguistic effects, responsible for biasing the convergence of the diarization system. Our novel approach is referred to as Phone Adaptive Training (PAT). Partitionnement des données Segmentation Data partitioning Segmentation
2	Local independence in computed tomography as a basis for parallel computing Martin, Daniel Morris 14 September 2007 (has links) Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware. / October 2007 parallel computing computed tomography algebraic reconstruction technique data partitioning
3	Phylogenomics of the Flowering Plant Clade Malpighiales Xi, Zhenxiang January 2012 (has links) The angiosperm order Malpighiales includes \(\sim 16,000\) species and constitutes up to 40% of the understory tree diversity in tropical rain forests. Despite remarkable progress in angiosperm phylogenetics during the last 20 years, relationships within Malpighiales have remained poorly resolved, possibly due to its rapid rise during the mid-Cretaceous. Using phylogenomic approaches, including analyses of 82 plastid genes from 58 species, we identified 12 new clades in Malpighiales and substantially increased resolution along the backbone (Chapter 1). This greatly improved phylogeny revealed a dynamic history of shifts in net species’ diversification rates across Malpighiales, with bursts of diversification noted in the Barbados cherries (Malpighiaceae), cocas (Erythroxylaceae), and passion flowers (Passifloraceae). We also found that commonly used a priori approaches for partitioning data in similar large-scale analyses, by gene or by codon position, performed poorly relative to the use of partitions identified a posteriori using a Bayesian mixture model. Another aspect of my thesis focused on investigating horizontal gene transfer (HGT) in Malpighiales. Recent studies have suggested that plant genomes have undergone potentially rampant HGT. Parasitic plants have provided the strongest evidence of HGT, which appears to be facilitated by the intimate physical association between the parasites and their hosts. Using phylogenomic approaches, we analyzed the nuclear transcriptome (Chapter 2) and mitochondrial genome (Chapter 3) of the holoparasite Rafflesiaceae, which represents an enigmatic subclade of Malpighiales. Our analyses show that several dozen actively transcribed nuclear genes, and as many as 34–47% of its mitochondrial gene sequences, show evidence of HGT depending on the species. Some of these HGTs appear to have maintained synteny with their donor and recipient lineages suggesting that vertically inherited genes have likely been displaced via homologous recombination, as is common in bacteria. Finally, our results establish for the first time that although the magnitude of HGT involving nuclear genes is appreciable in these parasitic plants, HGT involving mitochondrial genes is substantially higher. Moreover, the elevated rate of unidirectional host-to-parasite gene transfer raises the possibility that HGTs may provide a fitness benefit to Rafflesiaceae for maintaining these genes. Biology data partitioning horizontal gene transfer Malpighiales phylogenomics Rafflesiaceae
4	A data clustering algorithm for stratified data partitioning in artificial neural network Sahoo, Ajit Kumar Unknown Date No description available. data clustering algorithm data partitioning artificial neural network
5	Local independence in computed tomography as a basis for parallel computing Martin, Daniel Morris 14 September 2007 (has links) Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware. parallel computing computed tomography algebraic reconstruction technique data partitioning
6	Local independence in computed tomography as a basis for parallel computing Martin, Daniel Morris 14 September 2007 (has links) Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware. parallel computing computed tomography algebraic reconstruction technique data partitioning
7	A data clustering algorithm for stratified data partitioning in artificial neural network Sahoo, Ajit Kumar 06 1900 (has links) The statistical properties of training, validation and test data play an important role in assuring optimal performance in artificial neural networks (ANN). Re-searchers have proposed randomized data partitioning (RDP) and stratified data partitioning (SDP) methods for partition of input data into training, vali-dation and test datasets. RDP methods based on genetic algorithm (GA) are computationally expensive as the random search space can be in the power of twenty or more for an average sized dataset. For SDP methods, clustering al-gorithms such as self organizing map (SOM) and fuzzy clustering (FC) are used to form strata. It is assumed that data points in any individual stratum are in close statistical agreement. Reported clustering algorithms are designed to form natural clusters. In the case of large multivariate datasets, some of these natural clusters can be big enough such that the furthest data vectors are statis-tically far away from the mean. Further, these algorithms are computationally expensive as well. Here a custom design clustering algorithm (CDCA) has been proposed to overcome these shortcomings. Comparisons have been made using three benchmark case studies, one each from classification, function ap-proximation and prediction domain respectively. The proposed CDCA data partitioning method was evaluated in comparison with SOM, FC and GA based data partitioning methods. It was found that the CDCA data partitioning method not only performed well but also reduced the average CPU time. / Engineering Management data clustering algorithm data partitioning artificial neural network
8	Learning Level Sets and Level Learning Sets: innovations in variational methods for data partitioning Cai, Xiongcai, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) This dissertation proposes a novel theoretical framework for the data partitioning problem in computer vision and machine learning. The framework is based on level set methods that are derived from variational calculus and involve a curve-based objective function which integrates both boundary and region based information in a generic form. The proposed approaches within the framework provide original solutions to two important problems in variational methods, namely parameter tuning and information fusion, collectively termed Learning Level Sets in this thesis. Moreover, a novel pattern classification algorithm, namely Level Learning Sets, is proposed to classify any general dataset, including sparse and non sparse data. It is based on the same optimisation process of the objective function directly related to the curve propagation theory used in level set theory. The proposed approach learns the knowledge required for parameter tuning and information fusion in level set methods using machine learning techniques. It uses acquired knowledge to automatically perform parameter tuning and information fusion in level set methods. In the case of pattern classification, variational methods using level set theory optimise decision boundary construction in feature space. Consequently, the optimised values of the objective level set function over the feature space represent the model for pattern classification. The proposed automatic parameter tuning and information fusion method embedded in the level set method framework has been employed to provide original solutions to image segmentation and object extraction in computer vision. On the other hand, the Level Learning Set has been extended and applied to a variety of pattern classification problems". Several experimental results for each of the above methods are provided, demonstrating the effectiveness of the proposed solutions and indicating the potential of the automatic and dynamic tuning and fusion approaches as well as the Level Learning Set model. Data partitioning. Learning Level Sets. Level Learning Sets. Computer.
9	A Deterministic Approach to Partitioning Neural Network Training Data for the Classification Problem Smith, Gregory Edward 28 September 2006 (has links) The classification problem in discriminant analysis involves identifying a function that accurately classifies observations as originating from one of two or more mutually exclusive groups. Because no single classification technique works best for all problems, many different techniques have been developed. For business applications, neural networks have become the most commonly used classification technique and though they often outperform traditional statistical classification methods, their performance may be hindered because of failings in the use of training data. This problem can be exacerbated because of small data set size. In this dissertation, we identify and discuss a number of potential problems with typical random partitioning of neural network training data for the classification problem and introduce deterministic methods to partitioning that overcome these obstacles and improve classification accuracy on new validation data. A traditional statistical distance measure enables this deterministic partitioning. Heuristics for both the two-group classification problem and k-group classification problem are presented. We show that these heuristics result in generalizable neural network models that produce more accurate classification results, on average, than several commonly used classification techniques. In addition, we compare several two-group simulated and real-world data sets with respect to the interior and boundary positions of observations within their groups' convex polyhedrons. We show by example that projecting the interior points of simulated data to the boundary of their group polyhedrons generates convex shapes similar to real-world data group convex polyhedrons. Our two-group deterministic partitioning heuristic is then applied to the repositioned simulated data, producing results superior to several commonly used classification techniques. / Ph. D. convex sets discriminant analysis Neural networks data partitioning
10	Genetic Algorithm Based Automatic Data Partitioning Scheme For HPF On A Linux Cluster Anand, Sunil Kumar 12 1900 (has links) (PDF) No description available. Data Partitioning (Computer Science) High Performance Fortran Fortran (Computer Program Language) Linux Computing Clusters Genetic Algorithms Automatic Data Partitioning Cluster (Computing) Linux Cluster Computer Science

Search results