Global ETD Search

11	Combined Use of Models and Measurements for Spatial Mapping of Concentrations and Deposition of Pollutants Ambachtsheer, Pamela January 2004 (has links) When modelling pollutants in the atmosphere, it is nearly impossible to get perfect results as the chemical and mechanical processes that govern pollutant concentrations are complex. Results are dependent on the quality of the meteorological input as well as the emissions inventory used to run the model. Also, models cannot currently take every process into consideration. Therefore, the model may get results that are close to, or show the general trend of the observed values, but are not perfect. However, due to the lack of observation stations, the resolution of the observational data is poor. Furthermore, the chemistry over large bodies of water is different from land chemistry, and in North America, there are no stations located over the great lakes or the ocean. Consequently, the observed values cannot accurately cover these regions. Therefore, we have combined model output and observational data when studying ozone concentrations in north eastern North America. We did this by correcting model output at observational sites with local data. We then interpolated those corrections across the model grid, using a Kriging procedure, to produce results that have the resolution of model results with the local accuracy of the observed values. Results showed that the corrected model output is much improved over either model results or observed values alone. This improvement was observed both for sites that were used in the correction process as well as sites that were omitted from the correction process. Chemistry Air Quality Forecasting Kriging Four Dimensional Data Assimilation Ozone
12	Kernel Machine Methods for Risk Prediction with High Dimensional Data Sinnott, Jennifer Anne 22 October 2012 (has links) Understanding the relationship between genomic markers and complex disease could have a profound impact on medicine, but the large number of potential markers can make it hard to differentiate true biological signal from noise and false positive associations. A standard approach for relating genetic markers to complex disease is to test each marker for its association with disease outcome by comparing disease cases to healthy controls. It would be cost-effective to use control groups across studies of many different diseases; however, this can be problematic when the controls are genotyped on a platform different from the one used for cases. Since different platforms genotype different SNPs, imputation is needed to provide full genomic coverage, but introduces differential measurement error. In Chapter 1, we consider the effects of this differential error on association tests. We quantify the inﬂation in Type I Error by comparing two healthy control groups drawn from the same cohort study but genotyped on different platforms, and assess several methods for mitigating this error. Analyzing genomic data one marker at a time can effectively identify associations, but the resulting lists of signiﬁcant SNPs or differentially expressed genes can be hard to interpret. Integrating prior biological knowledge into risk prediction with such data by grouping genomic features into pathways reduces the dimensionality of the problem and could improve models by making them more biologically grounded and interpretable. The kernel machine framework has been proposed to model pathway effects because it allows nonlinear associations between the genes in a pathway and disease risk. In Chapter 2, we propose kernel machine regression under the accelerated failure time model. We derive a pseudo-score statistic for testing and a risk score for prediction using genes in a single pathway. We propose omnibus procedures that alleviate the need to prespecify the kernel and allow the data to drive the complexity of the resulting model. In Chapter 3, we extend methods for risk prediction using a single pathway to methods for risk prediction model using multiple pathways using a multiple kernel learning approach to select important pathways and efﬁciently combine information across pathways. high dimensional data kernel machines pathways risk prediction biostatistics
13	Daugiamačių duomenų aproksimavimas / Approximation of multi-dimensional data Katinas, Raimondas 16 July 2008 (has links) Šiais laikais vis daugiau domimasi daugiamačių duomenų aproksimavimo teorija. Daugiamatėje erdvėje aproksimavimo teorija palčiai kur taikoma, pavyzdžiui, skaitinių metodų analizėje, bangų analizėje, signalų apdorojime, įvairiose informacinių technologijų sistemose, kompiuterių grafikoje, astronomijoje, naftos klodų tyrinėjime. Ši sritis viliojanti, nes didelė dalis klasikinės matematikos sunkiai pritaikoma daugiamačiams uždaviniams analizuoti. Taigi senoms problemoms spręsti reikalingi nauji įrankiai. Funkcijų aproksimavimo uždavinių gausu įvairiose matematikos, fizikos ir technikos srityse. Gausu ir jų sprendimų būdų bei metodų. Nesunkiai šie uždaviniai sprendžiami, kai funkcija priklauso nuo vieno ar dviejų kintamųjų. Tačiau realiame gyvenime naudojamos funkcijos turi daug daugiau nežinomųjų. Didėjant kintamųjų skaičiui uždavinio sudėtingumas taip pat auga. Pavyzdžiui, kai funkcija priklauso nuo vieno kintamojo, ją galima pavaizduoti plokštumoje kaip kreivę. Dviejų kintamųjų funkciją atitinka paviršius, nubrėžtas trimatėje erdvėje. Funkcijų, kurios priklauso nuo trijų ir daugiau kintamųjų, vaizdavimas jau sukelia problemų, nes žmogus nebegali suvokti didesnio matumo erdvės. Kadangi trimatę erdvę galima pavaizduoti plokštumoje, manoma, kad panašiu principu keturmatę erdvę galima pavaizduoti trimatėje, o šią vėl plokštumoje. Jei pavyktų sugalvoti tokį metodą, erdvės matumas nebesukeltų problemų. Visgi trijų kintamųjų funkciją bandoma vaizduoti dviem būdais: 1. pateikti... [toliau žr. visą tekstą] / This Master‘s work covers a mathematical analysis system which can visualize multivariate data layers, approximate multi-dimensional functions by polynomials, estimate approximation accuracy and present few the most effective aproximation models. Multivariate approximation theory is an increasingly active research area today. It encompasses a wide range of tools for multivariate approximation such as multi-dimensional splines and finite elements, shift-invariant spaces and radial-basis functions. Approximation theory in the multivariate setting has many applications including numerical analysis, wavelet analysis, signal processing, geographic information systems, computer aided geometric design and computer graphics. The field is fascinating since much of the mathematics of the classical univariate theory does not straightforwardly generalize to the multivariate setting, so new tools are required. Graphs of one variable functions are frequantly displayed as curves, bivariate functions - as contour plots. In generally it is very hard to display or realize function in the multivariate setting. However, some efforts have been made to render functions of precisely three variables. Two obvious approaches suggest themselves: 1. Display a number of cross sections where one of the variables is held constant, or, 2. display contour surfaces where the value of function equals some constant. We will use the first method modification in this Master‘s work. All function variables except... [to full text] Mathematics Daugiamačiai duomenys Aproksimavimas Daugianariai Multi-dimensional data Approximation Polynomials
14	Statistical Methods to Enhance Clinical Prediction with High-Dimensional Data and Ordinal Response Leha, Andreas 25 March 2015 (has links) Der technologische Fortschritt ermöglicht es heute, die moleculare Konfiguration einzelner Zellen oder ganzer Gewebeproben zu untersuchen. Solche in großen Mengen produzierten hochdimensionalen Omics-Daten aus der Molekularbiologie lassen sich zu immer niedrigeren Kosten erzeugen und werden so immer häufiger auch in klinischen Fragestellungen eingesetzt. Personalisierte Diagnose oder auch die Vorhersage eines Behandlungserfolges auf der Basis solcher Hochdurchsatzdaten stellen eine moderne Anwendung von Techniken aus dem maschinellen Lernen dar. In der Praxis werden klinische Parameter, wie etwa der Gesundheitszustand oder die Nebenwirkungen einer Therapie, häufig auf einer ordinalen Skala erhoben (beispielsweise gut, normal, schlecht). Es ist verbreitet, Klassifikationsproblme mit ordinal skaliertem Endpunkt wie generelle Mehrklassenproblme zu behandeln und somit die Information, die in der Ordnung zwischen den Klassen enthalten ist, zu ignorieren. Allerdings kann das Vernachlässigen dieser Information zu einer verminderten Klassifikationsgüte führen oder sogar eine ungünstige ungeordnete Klassifikation erzeugen. Klassische Ansätze, einen ordinal skalierten Endpunkt direkt zu modellieren, wie beispielsweise mit einem kumulativen Linkmodell, lassen sich typischerweise nicht auf hochdimensionale Daten anwenden. Wir präsentieren in dieser Arbeit hierarchical twoing (hi2) als einen Algorithmus für die Klassifikation hochdimensionler Daten in ordinal Skalierte Kategorien. hi2 nutzt die Mächtigkeit der sehr gut verstandenen binären Klassifikation, um auch in ordinale Kategorien zu klassifizieren. Eine Opensource-Implementierung von hi2 ist online verfügbar. In einer Vergleichsstudie zur Klassifikation von echten wie von simulierten Daten mit ordinalem Endpunkt produzieren etablierte Methoden, die speziell für geordnete Kategorien entworfen wurden, nicht generell bessere Ergebnisse als state-of-the-art nicht-ordinale Klassifikatoren. Die Fähigkeit eines Algorithmus, mit hochdimensionalen Daten umzugehen, dominiert die Klassifikationsleisting. Wir zeigen, dass unser Algorithmus hi2 konsistent gute Ergebnisse erzielt und in vielen Fällen besser abschneidet als die anderen Methoden. 510 Predictive Modelling Classification Ordinal High Dimensional Data Informatik (PPN619939052)
15	Visualizing large-scale and high-dimensional time series data Yeqiang, Lin January 2017 (has links) Time series is one of the main research objects in the field of data mining. Visualization is an important mechanism to present processed time series for further analysis by users. In recent years researchers have designed a number of sophisticated visualization techniques for time series. However, most of these techniques focus on the static format, trying to encode the maximal amount of information through one image or plot. We propose the pixel video technique, a visualization technique displaying data in video format. Using pixel video technique, a hierarchal dimension cluster tree for generating the similarity order of dimensions is first constructed, each frame image is generated according to pixeloriented techniques displaying the data in the form of a video. visualization time series high-dimensional data Computer Systems Datorsystem
16	High-dimensional statistical data integration January 2019 (has links) archives@tulane.edu / Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A representative model for the integrative analysis of multiple data types is to decompose each data matrix into a low-rank common-source matrix generated by latent factors shared across all data types, a low-rank distinctive-source matrix corresponding to each data type, and an additive noise matrix. We propose a novel decomposition method, called the decomposition-based generalized canonical correlation analysis, which appropriately defines those matrices by imposing a desirable orthogonality constraint on distinctive latent factors that aims to sufficiently capture the common latent factors. To further delineate the common and distinctive patterns between two data types, we propose another new decomposition method, called the common and distinctive pattern analysis. This method takes into account the common and distinctive information between the coefficient matrices of the common latent factors. We develop consistent estimation approaches for both proposed decompositions under high-dimensional settings, and demonstrate their finite-sample performance via extensive simulations. We illustrate the superiority of proposed methods over the state of the arts by real-world data examples obtained from The Cancer Genome Atlas and Human Connectome Project. / 1 / Zhe Qu High-dimensional data analysis Data integration Canonical correlation analysis
17	Simultaneous Inference for High Dimensional and Correlated Data Polin, Afroza 22 August 2019 (has links) No description available. Statistics High dimensional data multiple testing correlated data
18	Hierarchické shlukování s Mahalanobis-average metrikou akcelerované na GPU / GPU-accelerated Mahalanobis-average hierarchical clustering Šmelko, Adam January 2020 (has links) Hierarchical clustering algorithms are common tools for simplifying, exploring and analyzing datasets in many areas of research. For flow cytometry, a specific variant of agglomerative clustering has been proposed, that uses cluster linkage based on Mahalanobis distance to produce results better suited for the domain. Applicability of this clustering algorithm is currently limited by its relatively high computational complexity, which does not allow it to scale to common cytometry datasets. This thesis describes a specialized, GPU-accelerated version of the Mahalanobis-average linked hierarchical clustering, which improves the algorithm performance by several orders of magnitude, thus allowing it to scale to much larger datasets. The thesis provides an overview of current hierarchical clustering algorithms, and details the construction of the variant used on GPU. The result is benchmarked on publicly available high-dimensional data from mass cytometry.
19	Improving the Accuracy of Variable Selection Using the Whole Solution Path Liu, Yang 23 July 2015 (has links) No description available. Statistics variable selection high dimensional data SPSP AIS
20	Consistent bi-level variable selection via composite group bridge penalized regression Seetharaman, Indu January 1900 (has links) Master of Science / Department of Statistics / Kun Chen / We study the composite group bridge penalized regression methods for conducting bilevel variable selection in high dimensional linear regression models with a diverging number of predictors. The proposed method combines the ideas of bridge regression (Huang et al., 2008a) and group bridge regression (Huang et al., 2009), to achieve variable selection consistency in both individual and group levels simultaneously, i.e., the important groups and the important individual variables within each group can both be correctly identi ed with probability approaching to one as the sample size increases to in nity. The method takes full advantage of the prior grouping information, and the established bi-level oracle properties ensure that the method is immune to possible group misidenti cation. A related adaptive group bridge estimator, which uses adaptive penalization for improving bi-level selection, is also investigated. Simulation studies show that the proposed methods have superior performance in comparison to many existing methods. Bi-level variable selection High-dimensional data Oracle property Penalized regression Sparse models Statistics (0463)

Search results