221 |
LRS Seimo narių grupavimas pagal balsavimą ir balsavimo kitimo aptikimas / Lithuanian Parliament members grouping by their voting behavior and it’s change detectionBytautas, Kęstutis 20 June 2012 (has links)
Politikai įvairiai deklaruoja savo elgesį, todėl vienintelis būdas juos kontroliuoti –
stebėjimas. Šiame darbe yra analizuojamas LRS darbas, susijęs su balsavimais. Stengiamasi atsakyti
į klausimą: ar informacinių technologijų įrankiai gali leisti nustatyti ar Seimo narių priklausomybė
partijai (frakcijai) ar pozicijai (opozicijai) lemia jų balsavimą? Pagrindiniai darbo tikslai – Seimo
narių grupavimas ir balsavimo kitimo aptikimas. Apžvelgiama 2008-2012 metų Seimo kadencijos
veikla, atlikta balsavimų statistinė analizė, taip pat apžvelgti kiti tyrimai, susiję su parlamentinėmis
veiklomis. Seimo narių grupavimui taikome klasterizavimo metodus. Klasterizavimas gali būti
apibrėžiamas kaip objektų suskirstymas į grupes (klasterius), kuriose objektų skirtumai yra kuo
mažesni, o tarp grupių skirtumai - kuo didesni. Darbe apžvelgiami įvairūs klasterizavimo metodai,
jų veikimo principai, aprašomi atstumų tarp objektų skaičiavimo metodai, kokybės įvertinimo
kriterijai. Balsavimų duomenys saugomi MySQL duomenų bazėje, todėl sukurtas įrankis duomenų
apdorojimui. Aprašomi visi darbo etapai: naudoti įrankiai, balsavimo kodavimas, balsavimų
skaidymas į periodus.
Tyrimams atlikti pasirinkti k-Means, hierarchiniai tolimiausio kaimyno, vidutinių atstumų,
artimiausio kaimyno klasterizavimo metodai. Objektų panašumams įvertinti naudojami Euklido
(ang. Euclidean) ir Manheteno (angl. Manhattan) atstumų skaičiavimo metodai. Klasterizavimo
kokybės įvertinimui naudojame PURITY, RAND, NMI metodus... [toliau žr. visą tekstą] / Politicians declare their behavior in different ways, so the only way to control it -
monitoring. In this thesis tools for Lithuanian Parliament Members voting behavior are analyzed.
The question is following: can Information technologies tool help to determine how membership in
a faction or the position (opposition) is related with voting behavior? The main objectives of this
work are Lithuanian Parliament members grouping by their voting behavior and its' change
detection.
In the thesis the 2008-2012 of the Parliament activities are analysed using statistical voting
analysis. We use clustering for grouping members of the Parliament. A loose definition of
clustering could be the process of organizing objects into groups whose members are similar in
some way.
A cluster (group) is a collection of objects which are similar between them and are dissimilar
to the objects belonging to other clusters. We overviewed different clustering methods and their
principles of operation, described the distance between the objects of calculation methods, quality
evaluation criteria in this work. Voting data is stored in MySQL database, hence a tool was created
for data processing. We describe all the stages of the work: the use of tools, coding of the votes,
division of the votes into the periods. The following techniques were chosen: K-Means,
Hierarchical Clustering with Complete (furthest neighbor), Average, Single (nearest neighbor)
linkage. We use Euclidean and Manhattan methods for... [to full text]
|
222 |
Modelling severe asthma variationNewby, Christopher James January 2013 (has links)
Asthma is a heterogeneity disease that is mostly managed successfully using bronchodilators and anti-inflammatory drugs. Around 10%-15% of asthmatics however have difficult or severe asthma which is less responsive to treatments. Asthma and in particular severe asthma are now thought of a description of symptoms which may contain possible sub-groups with possible different pathologies which could be useful for targeting different drugs for different sub-groups. However little statistical work has been carried out to determine these sub-phenotypes. Studies have been carried out to partition severe asthma variables in to a number of sub-groups but the algorithms used in these studies are not based on statistical inference and it is difficult to select the number of best fitting sub-groups using such methods. It is also unclear where the clusters or sub-groups returned are actual sub-groups or reflect a bigger non-normal distribution. In the thesis we have developed a statistical model that combines factor analysis, a method used to obtain independent factors to describe processes allowing for variation over variables, and infinite mixture modelling, a process that involves determining the most probable number of mixtures or clusters thus allowing for variation over individuals. This model created is a Dirichlet process normal mixture latent variable model DPNMLVN and it is capable of determining the correct number of mixtures over each factor. The model was tested with simulations and used to analysis two severe asthma datasets and a cancer clinical trial. Sub-groups were found that reflect a high Eosinophilic group and an average eosinophilic group, a late onset older non atopic group and a highly atopic younger early onset group. In the clinical trial data 3 distinct mixtures were found relating to existing biomarkers not used in the mixture analysis.
|
223 |
Application of Clustering Method based on Orthogonal Procrustes Analysis to Analysis of Questionnaire DataFuruhashi, Takeshi, Yamaga, Shinichiro, Yoshikawa, Tomohiro January 2008 (has links)
Session ID: TH-A4-3 / Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on advanced Intelligent Systems, September 17-21, 2008, Nagoya University, Nagoya, Japan
|
224 |
Approximation Algorithms and New Models for Clustering and LearningAwasthi, Pranjal 01 August 2013 (has links)
This thesis is divided into two parts. In part one, we study the k-median and the k-means clustering problems. We take a different approach than the traditional worst case analysis models. We show that by looking at certain well motivated stable instances, one can design much better approximation algorithms for these problems. Our algorithms achieve arbitrarily good approximation factors on stable instances, something which is provably hard on worst case instances. We also study a different model for clustering which introduces limited amount of interaction with the user. Such interactive models are very popular in the context of learning algorithms but their effectiveness for clustering is not well understood. We present promising theoretical and experimental results in this direction.
The second part of the thesis studies the design of provably good learning algorithms which work under adversarial noise. One of the fundamental problems in this area is to understand the learnability of the class of disjunctions of Boolean variables. We design a learning algorithm which improves on the guarantees of the previously best known result for this problem. In addition, the techniques used seem fairly general and promising to be applicable to a wider class of problems. We also propose a new model for learning with queries. This model restricts the algorithms ability to only ask certain “local” queries. We motivate the need for the model and show that one can design efficient local query algorithms for a wide class of problems.
|
225 |
Unsupervised Aspect Discovery from Online Consumer ReviewsSuleman, Kaheer 18 March 2104 (has links)
The success of on-line review websites has led to an overwhelming number of on-line consumer reviews. These reviews have become an important tool for consumers when making a decision to purchase a product. This growth has led to the need for applications that enable this information to be presented in a way that is meaningful. These applications often rely on domain specific semantic lexicons which are both expensive and time consuming to make.
The following thesis proposes an unsupervised approach for product aspect discovery in on-line consumer reviews. We apply a two step hierarchical clustering process in which we first cluster based on the semantic similarity of the contexts of terms and then on the similarity of the hypernyms of the cluster members. The method also includes a process for assigning class labels to each of the clusters. Finally an experiment showing how the proposed methods can be used to measure aspect based sentiment is performed.
The methods proposed in this thesis are evaluated on a set of 157,865 reviews from a major commercial website and found that the two-step clustering process increases cluster F-scores over a single round of clustering. Finally, the proposed methods are compared to a state of the art topic modelling approach by Titov and McDonald (2008).
|
226 |
Weakly Supervised Learning Algorithms and an Application to ElectromyographyHesham, Tameem January 2014 (has links)
In the standard machine learning framework, training data is assumed to be fully supervised. However, collecting fully labelled data is not always easy. Due to cost, time, effort or other types of constraints, requiring the whole data to be labelled can be difficult in many applications, whereas collecting unlabelled data can be relatively easy. Therefore, paradigms that enable learning from unlabelled and/or partially labelled data have been growing recently in machine learning. The focus of this thesis is to provide algorithms that enable weakly annotating unlabelled parts of data not provided in the standard supervised setting consisting of an instance-label pair for each sample, then learning from weakly as well as strongly labelled data. More specifically, the bulk of the thesis aims at finding solutions for data that come in the form of bags or groups of instances where available information about the labels is at the bag level only. This is the form of the electromyographic (EMG) data, which represent the main application of the thesis. Electromyographic (EMG) data can be used to diagnose muscles as either normal or suffering from a neuromuscular disease. Muscles can be classified into one of three labels; normal, myopathic or neurogenic. Each muscle consists of motor units (MUs). Equivalently, an EMG signal detected from a muscle consists of motor unit potential trains (MUPTs). This data is an example of partially labelled data where instances (MUs) are grouped in bags (muscles) and labels are provided for bags but not for instances.
First, we introduce and investigate a weakly supervised learning paradigm that aims at improving classification performance by using a spectral graph-theoretic approach to weakly annotate unlabelled instances before classification. The spectral graph-theoretic phase of this paradigm groups unlabelled data instances using similarity graph models. Two new similarity graph models are introduced as well in this paradigm. This paradigm improves overall bag accuracy for EMG datasets.
Second, generative modelling approaches for multiple-instance learning (MIL) are presented. We introduce and analyse a variety of model structures and components of these generative models and believe it can serve as a methodological guide to other MIL tasks of similar form. This approach improves overall bag accuracy, especially for low-dimensional bags-of-instances datasets like EMG datasets.
MIL generative models provide an example of models where probability distributions need to be represented compactly and efficiently, especially when number of variables of a certain model is large. Sum-product networks (SPNs) represent a relatively new class of deep probabilistic models that aims at providing a compact and tractable representation of a probability distribution. SPNs are used to model the joint distribution of instance features in the MIL generative models. An SPN whose structure is learnt by a structure learning algorithm introduced in this thesis leads to improved bag accuracy for higher-dimensional datasets.
|
227 |
Una metodología de detección de fallos transitorios en aplicaciones paralelas sobre cluster de multicoresMontezanti, Diego Miguel January 2014 (has links)
El aumento en la escala de integración, con el objetivo de mejorar las prestaciones en los procesadores actuales, sumado al crecimiento de los sistemas de cómputo, han producido que la fiabilidad se haya vuelto un aspecto relevante. En particular, la creciente vulnerabilidad a los fallos transitorios se ha vuelto crítica, a causa de la capacidad de estos fallos de corromper los resultados de las aplicaciones.
Históricamente, los fallos transitorios han sido una preocupación en el diseño de sistemas críticos, como sistemas de vuelo o servidores de alta disponibilidad, en los que las consecuencias del fallo pueden resultar desastrosas. Pese a ser fallos temporarios, tienen la capacidad de alterar el comportamiento del sistema de cómputo. A partir del año 2000 se han vuelto más frecuentes los reportes de desperfectos significativos en distintas supercomputadoras, debidos a los fallos transitorios.
El impacto de los fallos transitorios se vuelve más relevante en el contexto del Cómputo de Altas Prestaciones (HPC). Aun cuando el tiempo medio entre fallos (MTBF) es del orden de 2 años para un procesador comercial, en el caso de una supercomputadora con cientos o miles de procesadores que cooperan para resolver una tarea, el MTBF disminuye cuanto mayor es la cantidad de procesadores. Esta situación se agrava con el advenimiento de los procesadores multicore y las arquitecturas de cluster de multicores, que incorporan un alto grado de paralelismo a nivel de hardware. La incidencia de los fallos transitorios es aún mayor en el caso de aplicaciones de gran duración, que manejan elevados volúmenes de datos, dado el alto costo (en términos de tiempo y utilización de recursos) que implica volver a lanzar la ejecución desde el comienzo, en caso de obtener resulta-dos incorrectos debido a la ocurrencia del fallo.
Estos factores justifican la necesidad de desarrollar estrategias específicas para mejorar la con-fiabilidad en sistemas de HPC; en este sentido, es crucial poder detectar los fallos llamados silenciosos, que alteran los resultados de las aplicaciones pero que no son interceptados por el sistema operativo ni ninguna otra capa de software del sistema, por lo que no causan la finalización abrupta de la ejecución.
En este contexto, el trabajo analizará una metodología distribuida basada en software, diseñada para aplicaciones paralelas científicas que utilizan paso de mensajes, capaz de detectar fallos transitorios mediante la validación de contenidos de los mensajes que se van a enviar a otro proceso de la aplicación. Esta metodología, previamente publicada, intenta abordar un problema no cubierto por las propuestas existentes, detectando los fallos transitorios que permiten la continuidad de la ejecución pero que son capaces de corromper los resultados finales, mejorando la confiabilidad del sistema y disminuyendo el tiempo luego del cual se puede relanzar la aplicación, lo cual es especialmente útil en ejecuciones prolongadas.
|
228 |
Education policy and the viability of small school provision : the social significance of small primary schools in England and Wales post 1988Ribchester, Christopher Brian January 1996 (has links)
No description available.
|
229 |
The density and velocity fields of the local universeTeodoro, Luís Filipe Alves January 1999 (has links)
We present two self-consistent non-parametric models of the local cosmic velocity field based on the density distribution in the PSCz redshift survey of IRAS galaxies. Two independent methods have been applied, both based on the assumptions of gravitational instability and linear biasing. They give remarkably similar results, with no evidence of systematic differences and an r.m.s discrepancy of only ~ 70 kms(^-1) in each Cartesian velocity component. These uncertainties are consistent with a detailed independent error analysis carried out on mock PSCz catalogues constructed from TV-body simulations. The denser sampling provided by the PSCz survey compared to previous IRAS galaxy surveys allows us to reconstruct the velocity field out to larger distances. The most striking feature of the model velocity field is a coherent large-scale streaming motion along a basehne connecting Perseus-Pisces, the Local Supercluster, the Great Attractor, and the Shapley Concentration. We find no evidence for back-infall onto the Great Attractor. Instead, material behind and around the Great Attractor is inferred to be streaming towards the Shapley Concentration, aided by the expansion of two large neighbouring un- derdense regions. The PSCi model velocities compare well with those predicted from the 1.2-Jy redshift survey of IRAS galaxies and, perhaps surprisingly, with those predicted from the distribution of Abell/ACO clusters, out to 140 h(^-1)Mpc. Comparison of the real-space density fields (or, alternatively, the peculiar velocity fields) inferred from the PSCz and cluster catalogues gives a relative (linear) bias parameter between clusters and IRAS galaxies of b(_c) = 4.4 ± 0.6. In addition, we compare the cumulative bulk flows predicted from the PSCz gravity field with those measured from the MarkIII and SFI catalogues of peculiar velocities. A conservative estimate of β = Ω(_0)(^0.6)/b, where b is the bias parameter for IRAS galaxies, gives β= 0.76 ± 0.13 (1-σ), in agreement with other recent determinations. Finally, we perform a detailed comparison of the IRAS PSCz and 1.2-Jy spherical harmonic coefficients of the density and velocity fields in redshift space. Both the monopole terms of the density and velocity fields predicted from the surveys show some inconsistencies. The mismatch in the velocity monopole terms is resolved by masking the 1.2-Jy survey with the PSCz mask and using the galaxies within the PSCz survey for fluxes larger than 1.2 Jy. Davis, Nusser and Willick (1996) have found a discrepancy between the IRAS 1.2-Jy survey gravity field and the MarkIII peculiar velocity field. We conclude that the use of the deeper IRAS PSCz catalogue cannot alone resolve this mismatch.
|
230 |
Minimalist Multi-Robot Clustering of Square Objects: New Strategies, Experiments, and AnalysisSong, Yong 03 October 2013 (has links)
Studies of minimalist multi-robot systems consider multiple robotic agents, each with limited individual capabilities, but with the capacity for self-organization in order to collectively perform coordinated tasks. Object clustering is a widely studied task in which self-organized robots form piles from dispersed objects. Our work considers a variation of an object clustering derived from the influential ant-inspired work of Beckers, Holland and Deneubourg which proposed stigmergy as a design principle for such multi-robot systems. Since puck mechanics contribute to cluster accrual dynamics, we studied a new scenario with square objects because these pucks into clusters differently from cylindrical ones. Although central clusters are usually desired, workspace boundaries can cause perimeter cluster formation to dominate. This research demonstrates successful clustering of square boxes - an especially challenging instance since flat edges exacerbate adhesion to boundaries - using simpler robots than previous published research. Our solution consists of two novel behaviours, Twisting and Digging, which exploit the objects’ geometry to pry boxes free from boundaries. Physical robot experiments illustrate that cooperation between twisters and diggers can succeed in forming a single central cluster. We empirically explored the significance of different divisions of labor by measuring the spatial distribution of robots and the system performance. Data from over 40 hours of physical robot experiments show that different divisions of labor have distinct features, e.g., one is reliable while another is especially efficient.
|
Page generated in 0.0805 seconds