Global ETD Search

111	Design Of Polynomial-based Filters For Continuously Variable Sample Rate Conversion With Applications In Synthetic Instrumentati Hunter, Matthew 01 January 2008 (has links) In this work, the design and application of Polynomial-Based Filters (PBF) for continuously variable Sample Rate Conversion (SRC) is studied. The major contributions of this work are summarized as follows. First, an explicit formula for the Fourier Transform of both a symmetrical and nonsymmetrical PBF impulse response with variable basis function coefficients is derived. In the literature only one explicit formula is given, and that for a symmetrical even length filter with fixed basis function coefficients. The frequency domain optimization of PBFs via linear programming has been proposed in the literature, however, the algorithm was not detailed nor were explicit formulas derived. In this contribution, a minimax optimization procedure is derived for the frequency domain optimization of a PBF with time-domain constraints. Explicit formulas are given for direct input to a linear programming routine. Additionally, accompanying Matlab code implementing this optimization in terms of the derived formulas is given in the appendix. In the literature, it has been pointed out that the frequency response of the Continuous-Time (CT) filter decays as frequency goes to infinity. It has also been observed that when implemented in SRC, the CT filter is sampled resulting in CT frequency response aliasing. Thus, for example, the stopband sidelobes of the Discrete-Time (DT) implementation rise above the CT designed level. Building on these observations, it is shown how the rolloff rate of the frequency response of a PBF can be adjusted by adding continuous derivatives to the impulse response. This is of great advantage, especially when the PBF is used for decimation as the aliasing band attenuation can be made to increase with frequency. It is shown how this technique can be used to dramatically reduce the effect of alias build up in the passband. In addition, it is shown that as the number of continuous derivatives of the PBF increases the resulting DT implementation more closely matches the Continuous-Time (CT) design. When implemented for SRC, samples from a PBF impulse response are computed by evaluating the polynomials using a so-called fractional interval, µ. In the literature, the effect of quantizing µ on the frequency response of the PBF has been studied. Formulas have been derived to determine the number of bits required to keep frequency response distortion below prescribed bounds. Elsewhere, a formula has been given to compute the number of bits required to represent µ to obtain a given SRC accuracy for rational factor SRC. In this contribution, it is shown how these two apparently competing requirements are quite independent. In fact, it is shown that the wordlength required for SRC accuracy need only be kept in the µ generator which is a single accumulator. The output of the µ generator may then be truncated prior to polynomial evaluation. This results in significant computational savings, as polynomial evaluation can require several multiplications and additions. Under the heading of applications, a new Wideband Digital Downconverter (WDDC) for Synthetic Instruments (SI) is introduced. DDCs first tune to a signal's center frequency using a numerically controlled oscillator and mixer, and then zoom-in to the bandwidth of interest using SRC. The SRC is required to produce continuously variable output sample rates from a fixed input sample rate over a large range. Current implementations accomplish this using a pre-filter, an arbitrary factor resampler, and integer decimation filters. In this contribution, the SRC of the WDDC is simplified reducing the computational requirements to a factor of three or more. In addition to this, it is shown how this system can be used to develop a novel computationally efficient FFT-based spectrum analyzer with continuously variable frequency spans. Finally, after giving the theoretical foundation, a real Field Programmable Gate Array (FPGA) implementation of a novel Arbitrary Waveform Generator (AWG) is presented. The new approach uses a fixed Digital-to-Analog Converter (DAC) sample clock in combination with an arbitrary factor interpolator. Waveforms created at any sample rate are interpolated to the fixed DAC sample rate in real-time. As a result, the additional lower performance analog hardware required in current approaches, namely, multiple reconstruction filters and/or additional sample clocks, is avoided. Measured results are given confirming the performance of the system predicted by the theoretical design and simulation. sample rate conversion software defined radio polynomial-based filters resampling synthetic instrumentation arbitrary waveform generators decimation interpolation Electrical and Computer Engineering Electrical and Electronics Engineering
112	GENERAL-PURPOSE STATISTICAL INFERENCE WITH DIFFERENTIAL PRIVACY GUARANTEES Zhanyu Wang (13893375) 06 December 2023 (has links) <p dir="ltr">Differential privacy (DP) uses a probabilistic framework to measure the level of privacy protection of a mechanism that releases data analysis results to the public. Although DP is widely used by both government and industry, there is still a lack of research on statistical inference under DP guarantees. On the one hand, existing DP mechanisms mainly aim to extract dataset-level information instead of population-level information. On the other hand, DP mechanisms introduce calibrated noises into the released statistics, which often results in sampling distributions more complex and intractable than the non-private ones. This dissertation aims to provide general-purpose methods for statistical inference, such as confidence intervals (CIs) and hypothesis tests (HTs), that satisfy the DP guarantees. </p><p dir="ltr">In the first part of the dissertation, we examine a DP bootstrap procedure that releases multiple private bootstrap estimates to construct DP CIs. We present new DP guarantees for this procedure and propose to use deconvolution with DP bootstrap estimates to derive CIs for inference tasks such as population mean, logistic regression, and quantile regression. Our method achieves the nominal coverage level in both simulations and real-world experiments and offers the first approach to private inference for quantile regression.</p><p dir="ltr">In the second part of the dissertation, we propose to use the simulation-based ``repro sample'' approach to produce CIs and HTs based on DP statistics. Our methodology has finite-sample guarantees and can be applied to a wide variety of private inference problems. It appropriately accounts for biases introduced by DP mechanisms (such as by clamping) and improves over other state-of-the-art inference methods in terms of the coverage and type I error of the private inference. </p><p dir="ltr">In the third part of the dissertation, we design a debiased parametric bootstrap framework for DP statistical inference. We propose the adaptive indirect estimator, a novel simulation-based estimator that is consistent and corrects the clamping bias in the DP mechanisms. We also prove that our estimator has the optimal asymptotic variance among all well-behaved consistent estimators, and the parametric bootstrap results based on our estimator are consistent. Simulation studies show that our framework produces valid DP CIs and HTs in finite sample settings, and it is more efficient than other state-of-the-art methods.</p> Data and information privacy Applied statistics Statistical theory differential privacy confidence intervals hypothesis tests simulation-based inference asymptotic statistics Gaussian differential privacy resampling distribution-free inference indirect inference
113	Evaluating Factors Contributing to Crash Severity Among Older Drivers: Statistical Modeling and Machine Learning Approaches Alrumaidhi, Mubarak S. M. S. 23 February 2024 (has links) Road crashes pose a significant public health issue worldwide, often leading to severe injuries and fatalities. This dissertation embarks on a comprehensive examination of the factors affecting road crash severity, with a special focus on older drivers and the unique challenges introduced by the COVID-19 pandemic. Utilizing a dataset from Virginia, USA, the research integrates advanced statistical methods and machine learning techniques to dissect this critical issue from multiple angles. The initial study within the dissertation employs multilevel ordinal logistic regression to assess crash severity among older drivers, revealing the complex interplay of various factors such as crash type, road attributes, and driver behavior. It highlights the increased risk of severe crashes associated with head-on collisions, driver distraction or impairment, and the non-use of seat belts, specifically affecting older drivers. These findings are pivotal in understanding the unique vulnerabilities of this demographic on the road. Furthermore, the dissertation explores the efficacy of both parametric and non-parametric machine learning models in predicting crash severity. It emphasizes the innovative use of synthetic resampling techniques, particularly random over-sampling examples (ROSE) and synthetic minority over-sampling technique (SMOTE), to address class imbalances. This methodological advancement not only improves the accuracy of crash severity predictions for severe crashes but also offers a comprehensive understanding of diverse factors, including environmental and roadway characteristics. Additionally, the dissertation examines the influence of the COVID-19 pandemic on road safety, revealing a paradoxical decrease in overall traffic crashes accompanied by an increase in the rate of severe injuries. This finding underscores the pandemic's transformative effect on driving behaviors and patterns, heightening risks for vulnerable road users like pedestrians and cyclists. The study calls for adaptable road safety strategies responsive to global challenges and societal shifts. Collectively, the studies within this dissertation contribute substantially to transportation safety research. They demonstrate the complex nature of factors influencing crash severity and the efficacy of tailored approaches in addressing these challenges. The integration of advanced statistical methods with machine learning techniques offers a profound understanding of crash dynamics and sets a new benchmark for future research in transportation safety. This dissertation underscores the evolving challenges in road safety, especially amidst demographic shifts and global crises, and advocates for adaptive, evidence-based strategies to enhance road safety for all, particularly vulnerable groups like the older drivers. / Doctor of Philosophy / Road crashes are a major concern worldwide, often leading to serious injuries and loss of life. This dissertation delves into the critical issue of road crash severity, with a special focus on older drivers and the challenges brought about by the COVID-19 pandemic. Drawing on data from Virginia, USA, the research combines cutting-edge statistical methods and machine learning to shed light on this pressing matter. One important part of the research focuses on older drivers. It uses advanced analysis to find out why crashes involving this group might be more serious. The study discovered that situations like head-on collisions, driver distraction or impairment, and not wearing seat belts greatly increase the risk for older drivers. Understanding these risks is crucial in identifying the special needs of older drivers on the road. Then, the study explores the power of machine learning in predicting crash severity. Here, the research stands out by using innovative techniques to balance out the data, leading to more accurate predictions. This part of the study not only improves our understanding of what leads to severe crashes but also highlights how different environmental and road factors play a role. Following this, the research looks at how the COVID-19 pandemic has impacted road safety. Interestingly, while the overall number of crashes went down during the pandemic, the rate of severe injuries in the crashes that occurred increased. This suggests that the pandemic changed driving behaviors, posing increased risks especially to pedestrians and cyclists. In summary, this dissertation offers valuable insights into the complex factors affecting road crash severity. It underscores the importance of using advanced analysis techniques to understand these dynamics better, especially in the face of demographic changes and global challenges like the pandemic. The findings are not just academically significant; they provide practical guidance for policymakers and road safety experts to develop strategies that make roads safer for everyone, particularly older drivers. Crash Severity Machine Learning Statistical Modeling Multilevel Modeling Resampling Techniques Imbalance Data Road Safety Older drivers Temporal Instability COVID-19 Transportation safety
114	Importance sampling on the coalescent with recombination Jenkins, Paul A. January 2008 (has links) Performing inference on contemporary samples of homologous DNA sequence data is an important task. By assuming a stochastic model for ancestry, one can make full use of observed data by sampling from the distribution of genealogies conditional upon the sample configuration. A natural such model is Kingman's coalescent, with numerous extensions to account for additional biological phenomena. However, in this model the distribution of interest cannot be written down analytically, and so one solution is to utilize importance sampling. In this context, importance sampling (IS) simulates genealogies from an artificial proposal distribution, and corrects for this by weighting each resulting genealogy. In this thesis I investigate in detail approaches for developing efficient proposal distributions on coalescent histories, with a particular focus on a two-locus model mutating under the infinite-sites assumption and in which the loci are separated by a region of recombination. This model was originally studied by Griffiths (1981), and is a useful simplification for considering the correlated ancestries of two linked loci. I show that my proposal distribution generally outperforms an existing IS method which could be recruited to this model. Given today's sequencing technologies it is not difficult to find volumes of data for which even the most efficient proposal distributions might struggle. I therefore appropriate resampling mechanisms from the theory of sequential Monte Carlo in order to effect substantial improvements in IS applications. In particular, I propose a new resampling scheme and confirm that it ensures a significant gain in the accuracy of likelihood estimates. It outperforms an existing scheme which can actually diminish the quality of an IS simulation unless it is applied to coalescent models with care. Finally, I apply the methods developed here to an example dataset, and discuss a new measure for the way in which two gene trees are correlated. 572.072
115	Essays on asset allocation strategies for defined contribution plans Basu, Anup K. January 2008 (has links) Asset allocation is the most influential factor driving investment performance. While researchers have made substantial progress in the field of asset allocation since the introduction of mean-variance framework by Markowitz, there is little agreement about appropriate portfolio choice for multi-period long horizon investors. Nowhere this is more evident than trustees of retirement plans choosing different asset allocation strategies as default investment options for their members. This doctoral dissertation consists of four essays each of which explores either a novel or an unresolved issue in the area of asset allocation for individual retirement plan participants. The goal of the thesis is to provide greater insight into the subject of portfolio choice in retirement plans and advance scholarship in this field. The first study evaluates different constant mix or fixed weight asset allocation strategies and comments on their relative appeal as default investment options. In contrast to past research which deals mostly with theoretical or hypothetical models of asset allocation, we investigate asset allocation strategies that are actually used as default investment options by superannuation funds in Australia. We find that strategies with moderate allocation to stocks are consistently outperformed in terms of upside potential of exceeding the participant’s wealth accumulation target as well as downside risk of falling below that target by very aggressive strategies whose allocation to stocks approach 100%. The risk of extremely adverse wealth outcomes for plan participants does not appear to be very sensitive to asset allocation. Drawing on the evidence of the previous study, the second essay explores possible solutions to the well known problem of gender inequality in retirement investment outcomes. Using non-parametric stochastic simulation, we simulate iv and compare the retirement wealth outcomes for a hypothetical female and male worker under different assumptions about breaks in employment, superannuation contribution rates, and asset allocation strategies. We argue that modest changes in contribution and asset allocation strategy for the female plan participant are necessary to ensure an equitable wealth outcome in retirement. The findings provide strong evidence against gender-neutral default contribution and asset allocation policy currently institutionalized in Australia and other countries. In the third study we examine the efficacy of lifecycle asset allocation models which allocate aggressively to risky asset classes when the employee participants are young and gradually switch to more conservative asset classes as they approach retirement. We show that the conventional lifecycle strategies make a costly mistake by ignoring the change in portfolio size over time as a critical input in the asset allocation decision. Due to this portfolio size effect, which has hitherto remained unexplored in literature, the terminal value of accumulation in retirement account is critically dependent on the asset allocation strategy adopted by the participant in later years relative to early years. The final essay extends the findings of the previous chapter by proposing an alternative approach to lifecycle asset allocation which incorporates performance feedback. We demonstrate that strategies that dynamically alter allocation between growth and conservative asset classes at different points on the investment horizon based on cumulative portfolio performance relative to a set target generally result in superior wealth outcomes compared to those of conventional lifecycle strategies. The dynamic allocation strategy exhibits clear second-degree stochastic dominance over conventional strategies which switch assets in a deterministic manner as well as balanced diversified strategies.
116	Etude de la qualité géomorphologique de modèles numériques de terrain issus de l’imagerie spatiale / Study on the geomorphological quality of digital terrain models derived from space imagery Hage, Mhamad El 12 November 2012 (has links) La production de Modèles Numériques de Terrain (MNT) a subi d’importantes évolutions durant les deux dernières décennies en réponse à une demande croissante pour des besoins scientifiques et industriels. De nombreux satellites d’observation de la Terre, utilisant des capteurs tant optiques que radar, ont permis de produire des MNT couvrant la plupart de la surface terrestre. De plus, les algorithmes de traitement d’images et de nuages de points ont subi d’importants développements. Ces évolutions ont fourni des MNT à différentes échelles pour tout utilisateur. Les applications basées sur la géomorphologie ont profité de ces progrès. En effet, ces applications exploitent les formes du terrain dont le MNT constitue une donnée de base. Cette étude a pour objectif d’évaluer l’impact des paramètres de production de MNT par photogrammétrie et par InSAR sur la qualité de position et de forme de ces modèles. La qualité de position, évaluée par les producteurs de MNT, n’est pas suffisante pour évaluer la qualité des formes. Ainsi, nous avons décrit les méthodes d’évaluation de la qualité de position et de forme et la différence entre elles. Une méthode originale de validation interne, qui n’exige pas de données de référence, a été proposée. Ensuite, l’impact des paramètres de l’appariement stéréoscopique, du traitement interférométrique ainsi que du rééchantillonnage, sur l’altitude et les formes, a été évalué. Finalement, nous avons conclu sur des recommandations pour choisir correctement les paramètres de production, en particulier en photogrammétrie.Nous avons observé un impact négligeable de la plupart des paramètres sur l’altitude, à l’exception de ceux de l’InSAR. Par contre, un impact significatif existe sur les dérivées de l’altitude. L’impact des paramètres d’appariement présente une forte dépendance avec la morphologie du terrain et l’occupation du sol. Ainsi, le choix de ces paramètres doit être effectué en prenant en considération ces deux facteurs. L’effet des paramètres du traitement interférométrique se manifeste par des erreurs de déroulement de phase qui affectent principalement l’altitude et peu les dérivées. Les méthodes d’interpolation et la taille de maille présentent un impact faible sur l’altitude et important sur ses dérivées. En effet, leur valeur et leur qualité dépendent directement de la taille de maille. Le choix de cette taille doit s’effectuer selon les besoins de l’application visée. Enfin, nous avons conclu que ces paramètres sont interdépendants et peuvent avoir des effets similaires. Leur choix doit être effectué en prenant en considération à la fois l’application concernée, la morphologie du terrain et son occupation du sol afin de minimiser l’erreur des résultats finaux et des conclusions. / The production of Digital Elevation Models (DEMs) has undergone significant evolution duringthe last two decades resulting from a growing demand for scientific as well as industrial purposes.Many Earth observation satellites, using optical and radar sensors, have enabled the production ofDEMs covering most of the Earth’s surface. The algorithms of image and point cloud processing havealso undergone significant evolution. This progress has provided DEMs on different scales, which canfulfill the requirements of many users. The applications based on geomorphology have benefitted fromthis evolution. Indeed, these applications concentrate specifically on landforms for which the DEMconstitutes a basic data.The aim of this study is to assess the impact of the parameters of DEM production byphotogrammetry and InSAR on position and shape quality. The position quality, assessed by DEMproducers, is not sufficient for the evaluation of shape quality. Thus, the evaluation methods ofposition and shape quality and the difference between them are described. A novel method of internalvalidation, which does not require reference data, is proposed. Then, the impact of image matchingand interferometric processing parameters as well as resampling, on elevation and shapes, is assessed.Finally, we conclude on recommendations on how to choose the production parameters correctly,particularly for photogrammetry.We observe little impact from most of the parameters on the elevation, except InSAR parameters.On the other hand, there is a significant impact on the elevation derivatives. The impact of matchingparameters presents a strong dependence on the terrain morphology and the landcover. Therefore,these parameters have to be selected by taking into account these two factors. The effect ofinterferometric processing manifests by phase unwrapping errors that mainly affect the elevation andless the derivatives. The interpolation methods and the mesh size present a small impact on theelevation and a significant impact on the derivatives. Indeed, the value of the derivatives and theirquality depend directly on the mesh size. The selection of this size has to be made according to theforeseen application. Finally, we conclude that these parameters are interdependent and can havesimilar effects. They must be selected according to the foreseen application, the terrain morphologyand the landcover in order to minimize the error in the final results and the conclusions. Mnt Géomorphologie Qualité Validation externe Validation interne Imagerie spatiale Photogrammétrie InSAR Paysage Rééchantillonnage Analyse de sensibilité Dem Geomorphology Quality External validation Internal validation Spatial imagery Photogrammetry InSAR Landscape Resampling Sensitivity analysis 520
117	Contributions to decision tree based learning / Contributions à l’apprentissage de l’arbre des décisions Qureshi, Taimur 08 July 2010 (has links) Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data learning techniques which aim at producing high-level information, or models, from data. A Typical knowledge discovery process consists of data selection, data preparation, data transformation, data mining and interpretation/validation of the results. Thus, we develop automatic learning techniques which contribute to the data preparation, transformation and mining tasks of knowledge discovery. In doing so, we try to improve the prediction accuracy of the overall learning process. Our work focuses on decision tree based learning and thus, we introduce various preprocessing and transformation techniques such as discretization, fuzzy partitioning and dimensionality reduction to improve this type of learning. However, these techniques can be used in other learning methods e.g. discretization can also be used for naive-bayes classifiers. The data preparation step represents almost 80 percent of the problem and is both time consuming and critical for the quality of modeling. Discretization of continuous features is an important problem that has effects on accuracy, complexity, variance and understandability of the induction models. In this thesis, we propose and develop resampling based aggregation techniques that improve the quality of discretization. Later, we validate by comparing with other discretization techniques and with an optimal partitioning method on 10 benchmark data sets.The second part of our thesis concerns with automatic fuzzy partitioning for soft decision tree induction. Soft or fuzzy decision tree is an extension of the classical crisp tree induction such that fuzzy logic is embedded into the induction process with the effect of more accurate models and reduced variance, but still interpretable and autonomous. We modify the above resampling based partitioning method to generate fuzzy partitions. In addition we propose, develop and validate another fuzzy partitioning method that improves the accuracy of the decision tree.Finally, we adopt a topological learning scheme and perform non-linear dimensionality reduction. We modify an existing manifold learning based technique and see whether it can enhance the predictive power and interpretability of classification. / La recherche avancée dans les méthodes d'acquisition de données ainsi que les méthodes de stockage et les technologies d'apprentissage, s'attaquent défi d'automatiser de manière systématique les techniques d'apprentissage de données en vue d'extraire des connaissances valides et utilisables.La procédure de découverte de connaissances s'effectue selon les étapes suivants: la sélection des données, la préparation de ces données, leurs transformation, le fouille de données et finalement l'interprétation et validation des résultats trouvés. Dans ce travail de thèse, nous avons développé des techniques qui contribuent à la préparation et la transformation des données ainsi qu'a des méthodes de fouille des données pour extraire les connaissances. A travers ces travaux, on a essayé d'améliorer l'exactitude de la prédiction durant tout le processus d'apprentissage. Les travaux de cette thèse se basent sur les arbres de décision. On a alors introduit plusieurs approches de prétraitement et des techniques de transformation; comme le discrétisation, le partitionnement flou et la réduction des dimensions afin d'améliorer les performances des arbres de décision. Cependant, ces techniques peuvent être utilisées dans d'autres méthodes d'apprentissage comme la discrétisation qui peut être utilisées pour la classification bayesienne.Dans le processus de fouille de données, la phase de préparation de données occupe généralement 80 percent du temps. En autre, elle est critique pour la qualité de la modélisation. La discrétisation des attributs continus demeure ainsi un problème très important qui affecte la précision, la complexité, la variance et la compréhension des modèles d'induction. Dans cette thèse, nous avons proposes et développé des techniques qui ce basent sur le ré-échantillonnage. Nous avons également étudié d'autres alternatives comme le partitionnement flou pour une induction floue des arbres de décision. Ainsi la logique floue est incorporée dans le processus d'induction pour augmenter la précision des modèles et réduire la variance, en maintenant l'interprétabilité.Finalement, nous adoptons un schéma d'apprentissage topologique qui vise à effectuer une réduction de dimensions non-linéaire. Nous modifions une technique d'apprentissage à base de variété topologiques `manifolds' pour savoir si on peut augmenter la précision et l'interprétabilité de la classification. Apprentissage Topologique Arbres de Décision Classification Discrétisation Fouille des Données Partitionnement Flou Préparation de Données Ré-échantillonnage Réduction de Dimensions Classification Data Mining Data Preprocessing Decision Trees Dimensionality Reduction Discretization Fuzzy Partitioning Resampling Topological Learning
118	Dynamic Resampling for Preference-based Evolutionary Multi-objective Optimization of Stochastic Systems : Improving the efficiency of time-constrained optimization Siegmund, Florian January 2016 (has links) In preference-based Evolutionary Multi-objective Optimization (EMO), the decision maker is looking for a diverse, but locally focused non-dominated front in a preferred area of the objective space, as close as possible to the true Pareto-front. Since solutions found outside the area of interest are considered less important or even irrelevant, the optimization can focus its efforts on the preferred area and find the solutions that the decision maker is looking for more quickly, i.e., with fewer simulation runs. This is particularly important if the available time for optimization is limited, as is the case in many real-world applications. Although previous studies in using this kind of guided-search with preference information, for example, withthe R-NSGA-II algorithm, have shown positive results, only very few of them considered the stochastic outputs of simulated systems. In the literature, this phenomenon of stochastic evaluation functions is sometimes called noisy optimization. If an EMO algorithm is run without any countermeasure to noisy evaluation functions, the performance will deteriorate, compared to the case if the true mean objective values are known. While, in general, static resampling of solutions to reduce the uncertainty of all evaluated design solutions can allow EMO algorithms to avoid this problem, it will significantly increase the required simulation time/budget, as many samples will be wasted on candidate solutions which are inferior. In comparison, a Dynamic Resampling (DR) strategy can allow the exploration and exploitation trade-off to be optimized, since the required accuracy about objective values varies between solutions. In a dense, converged population, itis important to know the accurate objective values, whereas noisy objective values are less harmful when an algorithm is exploring the objective space, especially early in the optimization process. Therefore, a well-designed Dynamic Resampling strategy which resamples the solution carefully, according to the resampling need, can help an EMO algorithm achieve better results than a static resampling allocation. While there are abundant studies in Simulation-based Optimization that considered Dynamic Resampling, the survey done in this study has found that there is no related work that considered how combinations of Dynamic Resampling and preference-based guided search can further enhance the performance of EMO algorithms, especially if the problems under study involve computationally expensive evaluations, like production systems simulation. The aim of this thesis is therefore to study, design and then to compare new combinations of preference-based EMO algorithms with various DR strategies, in order to improve the solution quality found by simulation-based multi-objective optimization with stochastic outputs, under a limited function evaluation or simulation budget. Specifically, based on the advantages and flexibility offered by interactive, reference point-based approaches, studies of the performance enhancements of R-NSGA-II when augmented with various DR strategies, with increasing degrees of statistical sophistication, as well as several adaptive features in terms of optimization parameters, have been made. The research results have clearly shown that optimization results can be improved, if a hybrid DR strategy is used and adaptive algorithm parameters are chosen according to the noise level and problem complexity. In the case of a limited simulation budget, the results allow the conclusions that both decision maker preferences and DR should be used at the same time to achieve the best results in simulation-based multi-objective optimization. / Vid preferensbaserad evolutionär flermålsoptimering försöker beslutsfattaren hitta lösningar som är fokuserade kring ett valt preferensområde i målrymden och som ligger så nära den optimala Pareto-fronten som möjligt. Eftersom lösningar utanför preferensområdet anses som mindre intressanta, eller till och med oviktiga, kan optimeringen fokusera på den intressanta delen av målrymden och hitta relevanta lösningar snabbare, vilket betyder att färre lösningar behöver utvärderas. Detta är en stor fördel vid simuleringsbaserad flermålsoptimering med långa simuleringstider eftersom antalet olika konfigurationer som kan simuleras och utvärderas är mycket begränsat. Även tidigare studier som använt fokuserad flermålsoptimering styrd av användarpreferenser, t.ex. med algoritmen R-NSGA-II, har visat positiva resultat men enbart få av dessa har tagit hänsyn till det stokastiska beteendet hos de simulerade systemen. I litteraturen kallas optimering med stokastiska utvärderingsfunktioner ibland "noisy optimization". Om en optimeringsalgoritm inte tar hänsyn till att de utvärderade målvärdena är stokastiska kommer prestandan vara lägre jämfört med om optimeringsalgoritmen har tillgång till de verkliga målvärdena. Statisk upprepad utvärdering av lösningar med syftet att reducera osäkerheten hos alla evaluerade lösningar hjälper optimeringsalgoritmer att undvika problemet, men leder samtidigt till en betydande ökning av antalet nödvändiga simuleringar och därigenom en ökning av optimeringstiden. Detta är problematiskt eftersom det innebär att många simuleringar utförs i onödan på undermåliga lösningar, där exakta målvärden inte bidrar till att förbättra optimeringens resultat. Upprepad utvärdering reducerar ovissheten och hjälper till att förbättra optimeringen, men har också ett pris. Om flera simuleringar används för varje lösning så minskar antalet olika lösningar som kan simuleras och sökrymden kan inte utforskas lika mycket, givet att det totala antalet simuleringar är begränsat. Dynamisk upprepad utvärdering kan däremot effektivisera flermålsoptimeringens avvägning mellan utforskning och exploatering av sökrymden baserat på det faktum att den nödvändiga precisionen i målvärdena varierar mellan de olika lösningarna i målrymden. I en tät och konvergerad population av lösningar är det viktigt att känna till de exakta målvärdena, medan osäkra målvärden är mindre skadliga i ett tidigt stadium i optimeringsprocessen när algoritmen utforskar målrymden. En dynamisk strategi för upprepad utvärdering med en noggrann allokering av utvärderingarna kan därför uppnå bättre resultat än en allokering som är statisk. Trots att finns ett rikligt antal studier inom simuleringsbaserad optimering som använder sig av dynamisk upprepad utvärdering så har inga relaterade studier hittats som undersöker hur kombinationer av dynamisk upprepad utvärdering och preferensbaserad styrning kan förbättra prestandan hos algoritmer för flermålsoptimering ytterligare. Speciell avsaknad finns det av studier om optimering av problem med långa simuleringstider, som t.ex. simulering av produktionssystem. Avhandlingens mål är därför att studera, konstruera och jämföra nya kombinationer av preferensbaserade optimeringsalgoritmer och dynamiska strategier för upprepad utvärdering. Syftet är att förbättra resultatet av simuleringsbaserad flermålsoptimering som har stokastiska målvärden när antalet utvärderingar eller optimeringstiden är begränsade. Avhandlingen har speciellt fokuserat på att undersöka prestandahöjande åtgärder hos algoritmen R-NSGA-II i kombination med dynamisk upprepad utvärdering, baserad på fördelarna och flexibiliteten som interaktiva referenspunktbaserade algoritmer erbjuder. Exempel på förbättringsåtgärder är dynamiska algoritmer för upprepad utvärdering med förbättrad statistisk osäkerhetshantering och adaptiva optimeringsparametrar. Resultaten från avhandlingen visar tydligt att optimeringsresultaten kan förbättras om hybrida dynamiska algoritmer för upprepad utvärdering används och adaptiva optimeringsparametrar väljs beroende på osäkerhetsnivån och komplexiteten i optimeringsproblemet. För de fall där simuleringstiden är begränsad är slutsatsen från avhandlingen att både användarpreferenser och dynamisk upprepad utvärdering bör användas samtidigt för att uppnå de bästa resultaten i simuleringsbaserad flermålsoptimering. simulation-based optimization guided search preference-based optimization reference point decision support noise stochastic systems dynamic resampling budget allocation sequential sampling hybrid ranking and selection Information Systems Robotics Robotteknik och automation
119	Statistické vyhodnocení fylogeneze biologických sekvencí / Statistic evaluation of phylogeny of biological sequences Vadják, Šimon January 2014 (has links) The master's thesis provides a comprehensive overview of resampling methods for testing the correctness topology of the phylogenetic trees which estimate the process of phylogeny on the bases of biological sequences similarity. We focused on the possibility of errors creation in this estimate and the possibility of their removal and detection. These methods were implemented in Matlab for Bootstrapping, jackknifing, OTU jackknifing and PTP test (Permutation tail probability). The work aims to test their applicability to various biological sequences and also to assess the impact of the choice of input analysis parameters on the results of these statistical tests.
120	Cost-Aware Machine Learning and Deep Learning for Extremely Imbalanced Data Ahmed, Jishan 11 August 2023 (has links) No description available. Computer Science Statistics Machine learning Cost-sensitive learning Resampling techniques Failure prediction Class imbalance Deep learning Focal loss LSTM BLSTM 1D CNN Survival analysis Permutation importance Feature selection SHAP PySpark

Search results