Global ETD Search

221	Modelling severe asthma variation Newby, Christopher James January 2013 (has links) Asthma is a heterogeneity disease that is mostly managed successfully using bronchodilators and anti-inflammatory drugs. Around 10%-15% of asthmatics however have difficult or severe asthma which is less responsive to treatments. Asthma and in particular severe asthma are now thought of a description of symptoms which may contain possible sub-groups with possible different pathologies which could be useful for targeting different drugs for different sub-groups. However little statistical work has been carried out to determine these sub-phenotypes. Studies have been carried out to partition severe asthma variables in to a number of sub-groups but the algorithms used in these studies are not based on statistical inference and it is difficult to select the number of best fitting sub-groups using such methods. It is also unclear where the clusters or sub-groups returned are actual sub-groups or reflect a bigger non-normal distribution. In the thesis we have developed a statistical model that combines factor analysis, a method used to obtain independent factors to describe processes allowing for variation over variables, and infinite mixture modelling, a process that involves determining the most probable number of mixtures or clusters thus allowing for variation over individuals. This model created is a Dirichlet process normal mixture latent variable model DPNMLVN and it is capable of determining the correct number of mixtures over each factor. The model was tested with simulations and used to analysis two severe asthma datasets and a cancer clinical trial. Sub-groups were found that reflect a high Eosinophilic group and an average eosinophilic group, a late onset older non atopic group and a highly atopic younger early onset group. In the clinical trial data 3 distinct mixtures were found relating to existing biomarkers not used in the mixture analysis. 616.238
222	Application of Clustering Method based on Orthogonal Procrustes Analysis to Analysis of Questionnaire Data Furuhashi, Takeshi, Yamaga, Shinichiro, Yoshikawa, Tomohiro January 2008 (has links) Session ID: TH-A4-3 / Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on advanced Intelligent Systems, September 17-21, 2008, Nagoya University, Nagoya, Japan Orthogonal Procrustes Analysis Clustering Questionnaire Data
223	Approximation Algorithms and New Models for Clustering and Learning Awasthi, Pranjal 01 August 2013 (has links) This thesis is divided into two parts. In part one, we study the k-median and the k-means clustering problems. We take a different approach than the traditional worst case analysis models. We show that by looking at certain well motivated stable instances, one can design much better approximation algorithms for these problems. Our algorithms achieve arbitrarily good approximation factors on stable instances, something which is provably hard on worst case instances. We also study a different model for clustering which introduces limited amount of interaction with the user. Such interactive models are very popular in the context of learning algorithms but their effectiveness for clustering is not well understood. We present promising theoretical and experimental results in this direction. The second part of the thesis studies the design of provably good learning algorithms which work under adversarial noise. One of the fundamental problems in this area is to understand the learnability of the class of disjunctions of Boolean variables. We design a learning algorithm which improves on the guarantees of the previously best known result for this problem. In addition, the techniques used seem fairly general and promising to be applicable to a wider class of problems. We also propose a new model for learning with queries. This model restricts the algorithms ability to only ask certain “local” queries. We motivate the need for the model and show that one can design efficient local query algorithms for a wide class of problems. Clustering PAC learning Interactive learning Computer Sciences
224	Unsupervised Aspect Discovery from Online Consumer Reviews Suleman, Kaheer 18 March 2104 (has links) The success of on-line review websites has led to an overwhelming number of on-line consumer reviews. These reviews have become an important tool for consumers when making a decision to purchase a product. This growth has led to the need for applications that enable this information to be presented in a way that is meaningful. These applications often rely on domain specific semantic lexicons which are both expensive and time consuming to make. The following thesis proposes an unsupervised approach for product aspect discovery in on-line consumer reviews. We apply a two step hierarchical clustering process in which we first cluster based on the semantic similarity of the contexts of terms and then on the similarity of the hypernyms of the cluster members. The method also includes a process for assigning class labels to each of the clusters. Finally an experiment showing how the proposed methods can be used to measure aspect based sentiment is performed. The methods proposed in this thesis are evaluated on a set of 157,865 reviews from a major commercial website and found that the two-step clustering process increases cluster F-scores over a single round of clustering. Finally, the proposed methods are compared to a state of the art topic modelling approach by Titov and McDonald (2008).
225	Weakly Supervised Learning Algorithms and an Application to Electromyography Hesham, Tameem January 2014 (has links) In the standard machine learning framework, training data is assumed to be fully supervised. However, collecting fully labelled data is not always easy. Due to cost, time, effort or other types of constraints, requiring the whole data to be labelled can be difficult in many applications, whereas collecting unlabelled data can be relatively easy. Therefore, paradigms that enable learning from unlabelled and/or partially labelled data have been growing recently in machine learning. The focus of this thesis is to provide algorithms that enable weakly annotating unlabelled parts of data not provided in the standard supervised setting consisting of an instance-label pair for each sample, then learning from weakly as well as strongly labelled data. More specifically, the bulk of the thesis aims at finding solutions for data that come in the form of bags or groups of instances where available information about the labels is at the bag level only. This is the form of the electromyographic (EMG) data, which represent the main application of the thesis. Electromyographic (EMG) data can be used to diagnose muscles as either normal or suffering from a neuromuscular disease. Muscles can be classified into one of three labels; normal, myopathic or neurogenic. Each muscle consists of motor units (MUs). Equivalently, an EMG signal detected from a muscle consists of motor unit potential trains (MUPTs). This data is an example of partially labelled data where instances (MUs) are grouped in bags (muscles) and labels are provided for bags but not for instances. First, we introduce and investigate a weakly supervised learning paradigm that aims at improving classification performance by using a spectral graph-theoretic approach to weakly annotate unlabelled instances before classification. The spectral graph-theoretic phase of this paradigm groups unlabelled data instances using similarity graph models. Two new similarity graph models are introduced as well in this paradigm. This paradigm improves overall bag accuracy for EMG datasets. Second, generative modelling approaches for multiple-instance learning (MIL) are presented. We introduce and analyse a variety of model structures and components of these generative models and believe it can serve as a methodological guide to other MIL tasks of similar form. This approach improves overall bag accuracy, especially for low-dimensional bags-of-instances datasets like EMG datasets. MIL generative models provide an example of models where probability distributions need to be represented compactly and efficiently, especially when number of variables of a certain model is large. Sum-product networks (SPNs) represent a relatively new class of deep probabilistic models that aims at providing a compact and tractable representation of a probability distribution. SPNs are used to model the joint distribution of instance features in the MIL generative models. An SPN whose structure is learnt by a structure learning algorithm introduced in this thesis leads to improved bag accuracy for higher-dimensional datasets.
226	Una metodología de detección de fallos transitorios en aplicaciones paralelas sobre cluster de multicores Montezanti, Diego Miguel January 2014 (has links) El aumento en la escala de integración, con el objetivo de mejorar las prestaciones en los procesadores actuales, sumado al crecimiento de los sistemas de cómputo, han producido que la fiabilidad se haya vuelto un aspecto relevante. En particular, la creciente vulnerabilidad a los fallos transitorios se ha vuelto crítica, a causa de la capacidad de estos fallos de corromper los resultados de las aplicaciones. Históricamente, los fallos transitorios han sido una preocupación en el diseño de sistemas críticos, como sistemas de vuelo o servidores de alta disponibilidad, en los que las consecuencias del fallo pueden resultar desastrosas. Pese a ser fallos temporarios, tienen la capacidad de alterar el comportamiento del sistema de cómputo. A partir del año 2000 se han vuelto más frecuentes los reportes de desperfectos significativos en distintas supercomputadoras, debidos a los fallos transitorios. El impacto de los fallos transitorios se vuelve más relevante en el contexto del Cómputo de Altas Prestaciones (HPC). Aun cuando el tiempo medio entre fallos (MTBF) es del orden de 2 años para un procesador comercial, en el caso de una supercomputadora con cientos o miles de procesadores que cooperan para resolver una tarea, el MTBF disminuye cuanto mayor es la cantidad de procesadores. Esta situación se agrava con el advenimiento de los procesadores multicore y las arquitecturas de cluster de multicores, que incorporan un alto grado de paralelismo a nivel de hardware. La incidencia de los fallos transitorios es aún mayor en el caso de aplicaciones de gran duración, que manejan elevados volúmenes de datos, dado el alto costo (en términos de tiempo y utilización de recursos) que implica volver a lanzar la ejecución desde el comienzo, en caso de obtener resulta-dos incorrectos debido a la ocurrencia del fallo. Estos factores justifican la necesidad de desarrollar estrategias específicas para mejorar la con-fiabilidad en sistemas de HPC; en este sentido, es crucial poder detectar los fallos llamados silenciosos, que alteran los resultados de las aplicaciones pero que no son interceptados por el sistema operativo ni ninguna otra capa de software del sistema, por lo que no causan la finalización abrupta de la ejecución. En este contexto, el trabajo analizará una metodología distribuida basada en software, diseñada para aplicaciones paralelas científicas que utilizan paso de mensajes, capaz de detectar fallos transitorios mediante la validación de contenidos de los mensajes que se van a enviar a otro proceso de la aplicación. Esta metodología, previamente publicada, intenta abordar un problema no cubierto por las propuestas existentes, detectando los fallos transitorios que permiten la continuidad de la ejecución pero que son capaces de corromper los resultados finales, mejorando la confiabilidad del sistema y disminuyendo el tiempo luego del cual se puede relanzar la aplicación, lo cual es especialmente útil en ejecuciones prolongadas. Error-checking Clustering Parallel processing Ciencias Informáticas
227	Education policy and the viability of small school provision : the social significance of small primary schools in England and Wales post 1988 Ribchester, Christopher Brian January 1996 (has links) No description available. 370
228	The density and velocity fields of the local universe Teodoro, Luís Filipe Alves January 1999 (has links) We present two self-consistent non-parametric models of the local cosmic velocity field based on the density distribution in the PSCz redshift survey of IRAS galaxies. Two independent methods have been applied, both based on the assumptions of gravitational instability and linear biasing. They give remarkably similar results, with no evidence of systematic differences and an r.m.s discrepancy of only ~ 70 kms(^-1) in each Cartesian velocity component. These uncertainties are consistent with a detailed independent error analysis carried out on mock PSCz catalogues constructed from TV-body simulations. The denser sampling provided by the PSCz survey compared to previous IRAS galaxy surveys allows us to reconstruct the velocity field out to larger distances. The most striking feature of the model velocity field is a coherent large-scale streaming motion along a basehne connecting Perseus-Pisces, the Local Supercluster, the Great Attractor, and the Shapley Concentration. We find no evidence for back-infall onto the Great Attractor. Instead, material behind and around the Great Attractor is inferred to be streaming towards the Shapley Concentration, aided by the expansion of two large neighbouring un- derdense regions. The PSCi model velocities compare well with those predicted from the 1.2-Jy redshift survey of IRAS galaxies and, perhaps surprisingly, with those predicted from the distribution of Abell/ACO clusters, out to 140 h(^-1)Mpc. Comparison of the real-space density fields (or, alternatively, the peculiar velocity fields) inferred from the PSCz and cluster catalogues gives a relative (linear) bias parameter between clusters and IRAS galaxies of b(_c) = 4.4 ± 0.6. In addition, we compare the cumulative bulk flows predicted from the PSCz gravity field with those measured from the MarkIII and SFI catalogues of peculiar velocities. A conservative estimate of β = Ω(_0)(^0.6)/b, where b is the bias parameter for IRAS galaxies, gives β= 0.76 ± 0.13 (1-σ), in agreement with other recent determinations. Finally, we perform a detailed comparison of the IRAS PSCz and 1.2-Jy spherical harmonic coefficients of the density and velocity fields in redshift space. Both the monopole terms of the density and velocity fields predicted from the surveys show some inconsistencies. The mismatch in the velocity monopole terms is resolved by masking the 1.2-Jy survey with the PSCz mask and using the galaxies within the PSCz survey for fluxes larger than 1.2 Jy. Davis, Nusser and Willick (1996) have found a discrepancy between the IRAS 1.2-Jy survey gravity field and the MarkIII peculiar velocity field. We conclude that the use of the deeper IRAS PSCz catalogue cannot alone resolve this mismatch. 520
229	Minimalist Multi-Robot Clustering of Square Objects: New Strategies, Experiments, and Analysis Song, Yong 03 October 2013 (has links) Studies of minimalist multi-robot systems consider multiple robotic agents, each with limited individual capabilities, but with the capacity for self-organization in order to collectively perform coordinated tasks. Object clustering is a widely studied task in which self-organized robots form piles from dispersed objects. Our work considers a variation of an object clustering derived from the influential ant-inspired work of Beckers, Holland and Deneubourg which proposed stigmergy as a design principle for such multi-robot systems. Since puck mechanics contribute to cluster accrual dynamics, we studied a new scenario with square objects because these pucks into clusters differently from cylindrical ones. Although central clusters are usually desired, workspace boundaries can cause perimeter cluster formation to dominate. This research demonstrates successful clustering of square boxes - an especially challenging instance since flat edges exacerbate adhesion to boundaries - using simpler robots than previous published research. Our solution consists of two novel behaviours, Twisting and Digging, which exploit the objects’ geometry to pry boxes free from boundaries. Physical robot experiments illustrate that cooperation between twisters and diggers can succeed in forming a single central cluster. We empirically explored the significance of different divisions of labor by measuring the spatial distribution of robots and the system performance. Data from over 40 hours of physical robot experiments show that different divisions of labor have distinct features, e.g., one is reliable while another is especially efficient. multi-robot system object clustering square objects
230	Fuzzy logic-based digital soil mapping in the Laurel Creek Conservation Area, Waterloo, Ontario Ren, Que January 2012 (has links) The aim of this thesis was to examine environmental covariate-related issues, the resolution dependency, the contribution of vegetation covariates, and the use of LiDAR data, in the purposive sampling design for fuzzy logic-based digital soil mapping. In this design fuzzy c-means (FCM) clustering of environmental covariates was employed to determine proper sampling sites and assist soil survey and inference. Two subsets of the Laurel Creek Conservation area were examined for the purposes of exploring the resolution and vegetation issues, respectively. Both conventional and LiDAR-derived digital elevation models (DEMs) were used to derive terrain covariates, and a vegetation index calculated from remotely sensed data was employed as a vegetation covariate. A basic field survey was conducted in the study area. A validation experiment was performed in another area. The results show that the choices of optimal numbers of clusters shift with resolution aggregated, which leads to the variations in the optimal partition of environmental covariates space and the purposive sampling design. Combining vegetation covariates with terrain covariates produces different results from the use of only terrain covariates. The level of resolution dependency and the influence of adding vegetation covariates vary with DEM source. This study suggests that DEM resolution, vegetation, and DEM source bear significance to the purposive sampling design for fuzzy logic-based digital soil mapping. The interpretation of fuzzy membership values at sampled sites also indicates the associations between fuzzy clusters and soil series, which lends promise to the applicability of fuzzy logic-based digital soil mapping in areas where fieldwork and data are limited. digital soil mapping FCM clustering Geography

Search results