Spelling suggestions: "subject:"mixture model"" "subject:"fixture model""
131 |
Finding Anomalous Energy ConsumersUsing Time Series Clustering in the Swedish Energy MarketTonneman, Lukas January 2023 (has links)
Improving the energy efficiency of buildings is important for many reasons. There is a large body of data detailing the hourly energy consumption of buildings. This work studies a large data set from the Swedish energy market. This thesis proposes a data analysis methodology for identifying abnormal consumption patterns using two steps of clustering. First, typical weekly energy usage profiles are extracted from each building by clustering week-long segments of the building’s lifetime consumption, and by extracting the medoids of the clusters. Second, all the typical weekly energyusage profiles are clustered using agglomerative hierarchical clustering. Large clusters are assumed to contain normal consumption pattens, and small clusters are assumed to have abnormal patterns. Buildings with a large presence in small clusters are said to be abnormal, and vice versa. The method employs Dynamic Time Warping distance for dissimilarity measure. Using a set of 160 buildings, manually classified by domain experts, this thesis shows that the mean abnormality-score is higher for abnormal buildings compared to normal buildings with p ≈ 0.0036.
|
132 |
Non-Intrusive Load Monitoring to Assess Retrofitting Work / Non-intrusive load monitoring för utvärderingen av renoveringsarbetens effektivitetenZucchet, Julien January 2022 (has links)
Non-intrusive load monitoring (NILM) refers to a set of statistical methods for inferring information about a household from its electricity load curve, without adding any additional sensor. The aim of this master thesis is to adapt NILM techniques for the assessment of the efficiency of retrofitting work to provide a first version of a retrofitting assessment tool. Two models are developed: a model corresponding to a constrained optimization problem, and a hierarchical Bayesian mixture model. These models are tested on a set of houses that have electric heating (which are the main target of retrofitting work). These models offer a satisfactory accuracy retrofitting assessment for about half of the houses. / Non-intrusive load monitoring (NILM) består av en uppsättning statistiska metoder för att härleda information om ett hushåll från belastningskurvan i bostaden, utan att lägga till ytterligare sensorer. Syftet med detta examensarbete är att anpassa NILM-teknikerna till utvärdering av energieffektivitet i energibyggnader och för att föreslå en första version av ett verktyg för utvärdering av effektiviteten i renoveringsarbeten. Två modeller föreslås: en modell som motsvarar ett begränsat optimeringsproblem och en hierarkisk Bayesiansk blandningsmodell. Modellerna testas på en uppsättning med elvärme (som är huvudmålet för renoveringsarbeten). De utvecklade modellerna gör det möjligt att upprå en tillfredsställande noggrannhet vid utvärderingen av arbeten för ungefär hälften av husen.
|
133 |
Clustering metagenome contigs using coverage with CONCOCT / Klustring av metagenom-kontiger baserat på abundans-profiler med CONCOCTBjarnason, Brynjar Smári January 2017 (has links)
Metagenomics allows studying genetic potentials of microorganisms without prior cultivation. Since metagenome assembly results in fragmented genomes, a key challenge is to cluster the genome fragments (contigs) into more or less complete genomes. The goal of this project was to investigate how well CONCOCT bins assembled contigs into taxonomically relevant clusters using the abundance profiles of the contigs over multiple samples. This was done by studying the effects of different parameter settings for CONCOCT on the clustering results when clustering metagenome contigs from in silico model communities generated by mixing data from isolate genomes. These parameters control how the model that CONCOCT trains is tuned and then how the model fits contigs to their cluster. Each parameter was tested in isolation while others were kept at their default values. For each of the data set used, the number of clusters was kept constant at the known number of species and strains in their respective data set. The resulting configuration was to use a tied covariance model, using principal components explaining 90% of the variance, and filtering out contigs shorter than 3000 bp. It also suggested that all available samples should be used for the abundance profiles. Using these parameters for CONCOCT, it was executed to have it estimate the number of clusters automatically. This gave poor results which lead to the conclusion that the process for selecting the number of clusters that was implemented in CONCOCT, “Bayesian Information Criterion”, was not good enough. That led to the testing of another similar mathematical model, “Dirichlet Process Gaussian Mixture Model”, that uses a different algorithm to estimate number of clusters. This new model gave much better results and CONCOCT has adapted a similar model in later versions. / Metagenomik möjliggör analys av arvsmassor i mikrobiella floror utan att först behöva odla mikroorgansimerna. Metoden innebär att man läser korta DNA-snuttar som sedan pusslas ihop till längre genomfragment (kontiger). Genom att gruppera kontiger som härstammar från samma organism kan man sedan återskapa mer eller mindre fullständiga genom, men detta är en svår bioinformatisk utmaning. Målsättningen med det här projektet var att utvärdera precisionen med vilken mjukvaran CONCOCT, som vi nyligen utvecklat, grupperar kontiger som härstammar från samma organism baserat på information om kontigernas sekvenskomposition och abundansprofil över olika prover. Vi testade hur olika parametrar påverkade klustringen av kontiger i artificiella metagenomdataset av olika komplexitet som vi skapade in silico genom att blanda data från tidigare sekvenserade genom. Parametrarna som testades rörde indata såväl som den statistiska modell som CONCOCT använder för att utföra klustringen. Parametrarna varierades en i taget medan de andra parametrarna hölls konstanta. Antalet kluster hölls också konstant och motsvarade antalet olika organismer i flororna. Bäst resultat erhölls då vi använde en låst kovariansmodell och använde principalkomponenter som förklarade 90% av variansen, samt filtrerade bort kontiger som var kortare än 3000 baspar. Vi fick också bäst resultat då vi använde alla tillgängliga prover. Därefter använde vi dessa parameterinställningar och lät CONCOCT själv bestämma lämpligt antal kluster i dataseten med “Bayesian Information Criterion” - metoden som då var implementerad i CONCOCT. Detta gav otillfredsställande resultat med i regel för få och för stora kluster. Därför testade vi en alternativ metod, “Dirichlet Process Gaussian Mixture Model”, för att uppskatta antal kluster. Denna metod gav avsevärt bättre resultat och i senare versioner av CONCOCT har en liknande metod implementerats.
|
134 |
Multilevel Mixture IRT Modeling for the Analysis of Differential Item FunctioningDras, Luke 14 August 2023 (has links) (PDF)
A multilevel mixture IRT (MMixIRT) model for DIF analysis has been proposed as a solution to gain greater insight on the source of nuisance factors which reduce the reliability and validity of educational assessments. The purpose of this study was to investigate the efficacy of a MMix2PL model in detecting DIF across a broad set of conditions in hierarchically structured, dichotomous data. Monte Carlo simulation was performed to generate examinee response data with conditions common in the field of education. These include (a) two instrument lengths, (b) nine hierarchically structured sample sizes, (c) four latent class features, and (d) eight distinct DIF characteristics, thus allowing for an examination with 576 unique data conditions. DIF analysis was performed using an iterative IRT-based ordinal logistic regression technique, with the focal group identified through estimation of latent classes from a multilevel mixture model. For computational efficiency in analyzing 50 replications for each condition, model parameters were recovered using maximum likelihood estimation (MLE) with the expectation maximization algorithm. Performance of the MMix2PL model for DIF analysis was evaluated by (a) the accuracy in recovering the true class structure, (b) the accuracy of membership classification, and (c) the sensitivity in detecting DIF items and Type I error rates. Results from this study demonstrate that the model is predominantly influenced by instrument length and separation between the class mean abilities, referred to as impact. Enumeration accuracy improved by an average of 40% when analyzing the short 10-item instrument, but with 100 clusters enumeration accuracy was high regardless of the number of items. Classification accuracy was substantially influenced by the presence of impact. Under conditions with no impact, classification was unsuccessful as the matching between model-based class assignments and examinees' true classes averaged only 53.2%. At best, with impact of one standard deviation, classification accuracy averaged between 66.5% to 70.3%. Misclassification errors were then propagated forward to influence the performance of the DIF analysis. Detection power was poor, averaging only 0.34 across the analysis iterations that reached convergence. Additionally, the short 10-item instrument proved challenging for MLE, a condition in which a Bayesian estimation method appears necessary. Finally, this paper provides recommendations on data conditions which improve performance of the MMix2PL model for DIF analysis. Additionally, suggestions for several improvements to the MMix2PL analysis process, which have potential to improve the feasibility of the model for DIF analysis, are summarized.
|
135 |
The Generalized Multiset Sampler: Theory and Its ApplicationKim, Hang Joon 25 June 2012 (has links)
No description available.
|
136 |
BLANDING’S TURTLE OCCUPANCY AND ABUNDANCE IN SOUTHERN MICHIGAN AND OHIODaniel James Earl (13943547) 13 October 2022 (has links)
<p> </p>
<p>Blanding’s Turtle populations face direct threats to their survival. To help protect populations, habitats that can best support Blanding’s Turtle populations need to be identified across their range. Blanding’s Turtles have been a difficult to detect species and may be present at a site even if not detected during targeted surveys. Additionally, Blanding’s Turtles may be present at a site but may have little to no recruitment so additional measures of site suitability beyond species presence are needed to determine more suitable or higher quality habitats. In my research, I attempt to determine suitability of sites for Blanding’s Turtles across Michigan and Ohio using data collected from rapid assessment protocols fit into single season occupancy models with wetland and upland landcover types as co-variates of occupancy. To further determine the suitability of sites based on these data, I created single season occupancy models for juvenile Blanding’s Turtles and used N-mixture abundance modelling to determine relative abundance of Blanding’s Turtles at a site using the same landcovers as covariates of occupancy and abundance. Both modelling frameworks also allowed me to include detection covariates that could increase Blanding’s Turtle detection in future surveys. </p>
<p>Detection was largely influenced by Julian date with the highest probability of detection occurring from mid-May through late June. Length of trapping surveys was also found to influence Blanding’s Turtle detection with a substantial decrease in daily trap capture rates by the fourth trap night of a survey. Michigan occupancy and abundance models found that the most suitable sites in Michigan would have high percentages of high-quality upland forest and woody wetland landcovers, with the percentage of open water supporting the occupancy of turtles but having no discernable effect on abundance. Total upland forest also significantly increased the probability of juvenile occupancy in Michigan. In Michigan, I also observed that survey method can greatly influence the estimates of occupancy and abundance, and I determined that visual surveys cannot accurately determine these estimates. The heavily disturbed nature of Ohio’s landscape took away from the predictive power of landcovers used in my research for Blanding’s Turtle occupancy and abundance. The vast difference between occupied habitats in Michigan and Ohio also takes away from the predictive power of the regional level model and relative abundance of Blanding’s Turtle populations cannot be accurately determined at this scale using the spatial covariates I included. However, total undisturbed forest and total wetland proved to be positive covariates of Blanding’s Turtle abundance and occupancy for adult and juvenile turtles across both states, but the habitats used in each state vary greatly so future conservation decisions should be made on the state level as largest spatial scale. Using my models for Michigan suitable sites can be determined within the state and compare relative abundance between sites to determine healthier populations. For future analysis in Ohio, different, smaller scales spatial covariates should be used to explain differences in occupancy and abundance between sites.</p>
|
137 |
Bayesian Approach Dealing with Mixture Model ProblemsZhang, Huaiye 05 June 2012 (has links)
In this dissertation, we focus on two research topics related to mixture models. The first topic is Adaptive Rejection Metropolis Simulated Annealing for Detecting Global Maximum Regions, and the second topic is Bayesian Model Selection for Nonlinear Mixed Effects Model.
In the first topic, we consider a finite mixture model, which is used to fit the data from heterogeneous populations for many applications. An Expectation Maximization (EM) algorithm and Markov Chain Monte Carlo (MCMC) are two popular methods to estimate parameters in a finite mixture model. However, both of the methods may converge to local maximum regions rather than the global maximum when multiple local maxima exist. In this dissertation, we propose a new approach, Adaptive Rejection Metropolis Simulated Annealing (ARMS annealing), to improve the EM algorithm and MCMC methods. Combining simulated annealing (SA) and adaptive rejection metropolis sampling (ARMS), ARMS annealing generate a set of proper starting points which help to reach all possible modes. ARMS uses a piecewise linear envelope function for a proposal distribution. Under the SA framework, we start with a set of proposal distributions, which are constructed by ARMS, and this method finds a set of proper starting points, which help to detect separate modes. We refer to this approach as ARMS annealing. By combining together ARMS annealing with the EM algorithm and with the Bayesian approach, respectively, we have proposed two approaches: an EM ARMS annealing algorithm and a Bayesian ARMS annealing approach. EM ARMS annealing implement the EM algorithm by using a set of starting points proposed by ARMS annealing. ARMS annealing also helps MCMC approaches determine starting points. Both approaches capture the global maximum region and estimate the parameters accurately. An illustrative example uses a survey data on the number of charitable donations.
The second topic is related to the nonlinear mixed effects model (NLME). Typically a parametric NLME model requires strong assumptions which make the model less flexible and often are not satisfied in real applications. To allow the NLME model to have more flexible assumptions, we present three semiparametric Bayesian NLME models, constructed with Dirichlet process (DP) priors. Dirichlet process models often refer to an infinite mixture model. We propose a unified approach, the penalized posterior Bayes factor, for the purpose of model comparison. Using simulation studies, we compare the performance of two of the three semiparametric hierarchical Bayesian approaches with that of the parametric Bayesian approach. Simulation results suggest that our penalized posterior Bayes factor is a robust method for comparing hierarchical parametric and semiparametric models. An application to gastric emptying studies is used to demonstrate the advantage of our estimation and evaluation approaches. / Ph. D.
|
138 |
A Bayesian approach to initial model inference in cryo-electron microscopyJoubert, Paul 04 March 2016 (has links)
Eine Hauptanwendung der Einzelpartikel-Analyse in der Kryo-Elektronenmikroskopie ist die Charakterisierung der dreidimensionalen Struktur makromolekularer Komplexe. Dazu werden zehntausende Bilder verwendet, die verrauschte zweidimensionale Projektionen des Partikels zeigen. Im ersten Schritt werden ein niedrig aufgelöstetes Anfangsmodell rekonstruiert sowie die unbekannten Bildorientierungen geschätzt. Dies ist ein schwieriges inverses Problem mit vielen Unbekannten, einschließlich einer unbekannten Orientierung für jedes Projektionsbild. Ein gutes Anfangsmodell ist entscheidend für den Erfolg des anschließenden Verfeinerungsschrittes.
Meine Dissertation stellt zwei neue Algorithmen zur Rekonstruktion eines Anfangsmodells in der Kryo-Elektronenmikroskopie vor, welche auf einer groben Darstellung der Elektronendichte basieren. Die beiden wesentlichen Beiträge meiner Arbeit sind zum einen das Modell, welches die Elektronendichte darstellt, und zum anderen die neuen Rekonstruktionsalgorithmen.
Der erste Hauptbeitrag liegt in der Verwendung Gaußscher Mischverteilungen zur Darstellung von Elektrondichten im Rekonstruktionsschritt. Ich verwende kugelförmige Mischungskomponenten mit unbekannten Positionen, Ausdehnungen und Gewichtungen. Diese Darstellung hat viele Vorteile im Vergleich zu einer gitterbasierten Elektronendichte, die andere Rekonstruktionsalgorithmen üblicherweise verwenden. Zum Beispiel benötigt sie wesentlich weniger Parameter, was zu schnelleren und robusteren Algorithmen führt.
Der zweite Hauptbeitrag ist die Entwicklung von Markovketten-Monte-Carlo-Verfahren im Rahmen eines Bayes'schen Ansatzes zur Schätzung der Modellparameter. Der erste Algorithmus kann aus dem Gibbs-Sampling, welches Gaußsche Mischverteilungen an Punktwolken anpasst, abgeleitet werden. Dieser Algorithmus wird hier so erweitert, dass er auch mit Bildern, Projektionen sowie unbekannten Drehungen und Verschiebungen funktioniert.
Der zweite Algorithmus wählt einen anderen Zugang. Das Vorwärtsmodell nimmt nun Gaußsche Fehler an. Sampling-Algorithmen wie Hamiltonian Monte Carlo (HMC) erlauben es, die Positionen der Mischungskomponenten und die Bildorientierungen zu schätzen.
Meine Dissertation zeigt umfassende numerische Experimente mit simulierten und echten Daten, die die vorgestellten Algorithmen in der Praxis testen und mit anderen Rekonstruktionsverfahren vergleichen.
|
139 |
Essays in the economics of subjective well-beingGoldsmith, Glenn Fraser January 2011 (has links)
This thesis explores three major issues in the burgeoning empirical literature on the determinants of subjective well-being (SWB). While economic theory assumes that it is current consumption that matters to SWB, empirical work has focused almost exclusively on the effect of income. In Part 1, we use household panel data from Russia and Britain to show that neither the standard theoretical account, nor the standard empirical practice may be adequate. Consumption, income, and wealth each contribute separately to SWB, in particular via perceptions of status and anticipation of the future; and omitting consumption from SWB equations significantly understates the importance of money to SWB. Distinguishing between consumption and income is also important to identifying reference effects. In Part 2, we confirm earlier findings that others' income has a positive (informational) effect on SWB in Russia, but show that others' consumption has an offsetting negative (comparison) effect. The net effect depends on how we define individuals' reference groups. We develop a novel econometric model that lets us estimate these reference groups from the data. Contrary to previous results, we conclude that comparison dominates information. Most SWB analyses focus on the average effects of money, relationships, and other outcomes across a given population; yet there may be significant differences in what is important to different people. In Part 3, we employ parametric and semi-parametric random coefficient models to show that there are large differences in the determinants of individual SWB in Britain, and (in contrast to previous work) that such differences cannot simply be attributed to differences in individuals' reporting functions. While individual differences correlate with (some) observable demographic variables, they do not generally correlate with individuals' perceptions about what is important to them. The results of SWB research may therefore be a useful source of information.
|
140 |
Bayesian Cluster Analysis : Some Extensions to Non-standard SituationsFranzén, Jessica January 2008 (has links)
The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations.
|
Page generated in 0.077 seconds