1 |
Efficient inference in general semiparametric regression modelsMaity, Arnab 15 May 2009 (has links)
Semiparametric regression has become very popular in the field of Statistics over the
years. While on one hand more and more sophisticated models are being developed,
on the other hand the resulting theory and estimation process has become more and
more involved. The main problems that are addressed in this work are related to
efficient inferential procedures in general semiparametric regression problems.
We first discuss efficient estimation of population-level summaries in general semiparametric
regression models. Here our focus is on estimating general population-level
quantities that combine the parametric and nonparametric parts of the model (e.g.,
population mean, probabilities, etc.). We place this problem in a general context,
provide a general kernel-based methodology, and derive the asymptotic distributions
of estimates of these population-level quantities, showing that in many cases the estimates
are semiparametric efficient.
Next, motivated from the problem of testing for genetic effects on complex traits in
the presence of gene-environment interaction, we consider developing score test in
general semiparametric regression problems that involves Tukey style 1 d.f form of
interaction between parametrically and non-parametrically modeled covariates. We
develop adjusted score statistics which are unbiased and asymptotically efficient and
can be performed using standard bandwidth selection methods. In addition, to over come the difficulty of solving functional equations, we give easy interpretations of the
target functions, which in turn allow us to develop estimation procedures that can be
easily implemented using standard computational methods.
Finally, we take up the important problem of estimation in a general semiparametric
regression model when covariates are measured with an additive measurement error
structure having normally distributed measurement errors. In contrast to methods
that require solving integral equation of dimension the size of the covariate measured
with error, we propose methodology based on Monte Carlo corrected scores to estimate
the model components and investigate the asymptotic behavior of the estimates.
For each of the problems, we present simulation studies to observe the performance of
the proposed inferential procedures. In addition, we apply our proposed methodology
to analyze nontrivial real life data sets and present the results.
|
2 |
Machine Learning for Metabolite Identification with Mass Spectrometry Data / 質量分析データによる代謝産物識別のための機械学習手法構築NGUYEN, DAI HAI 23 September 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(薬科学) / 甲第22754号 / 薬科博第128号 / 新制||薬科||14(附属図書館) / 京都大学大学院薬学研究科医薬創成情報科学専攻 / (主査)教授 馬見塚 拓, 教授 緒方 博之, 教授 石濱 泰 / 学位規則第4条第1項該当 / Doctor of Pharmaceutical Sciences / Kyoto University / DFAM
|
3 |
Geometry-Aware Learning Algorithms for Histogram Data Using Adaptive Metric Embeddings and Kernel Functions / 距離の適応埋込みとカーネル関数を用いたヒストグラムデータからの幾何認識学習アルゴリズムLe, Thanh Tam 25 January 2016 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19417号 / 情博第596号 / 新制||情||104(附属図書館) / 32442 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 山本 章博, 教授 黒橋 禎夫, 教授 鹿島 久嗣, 准教授 Cuturi Marco / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
4 |
Interpretable machine learning approaches to high-dimensional data and their applications to biomedical engineering problems / 高次元データへの解釈可能な機械学習アプローチとその医用工学問題への適用Yoshida, Kosuke 26 March 2018 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21215号 / 情博第668号 / 新制||情||115(附属図書館) / 京都大学大学院情報学研究科システム科学専攻 / (主査)教授 石井 信, 教授 下平 英寿, 教授 加納 学, 銅谷 賢治 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
5 |
Nonlinear Generalizations of Linear Discriminant Analysis: the Geometry of the Common Variance Space and Kernel Discriminant AnalysisKim, Jiae January 2020 (has links)
No description available.
|
6 |
The Kernel Method: Reproducing Kernel Hilbert Spaces in ApplicationSchaffer, Paul J. 17 May 2023 (has links)
No description available.
|
7 |
On Kernel-base Multi-Task LearningLi, Cong 01 January 2014 (has links)
Multi-Task Learning (MTL) has been an active research area in machine learning for two decades. By training multiple relevant tasks simultaneously with information shared across tasks, it is possible to improve the generalization performance of each task, compared to training each individual task independently. During the past decade, most MTL research has been based on the Regularization-Loss framework due to its flexibility in specifying various types of information sharing strategies, the opportunity it offers to yield a kernel-based methods and its capability in promoting sparse feature representations. However, certain limitations exist in both theoretical and practical aspects of Regularization-Loss-based MTL. Theoretically, previous research on generalization bounds in connection to MTL Hypothesis Space (HS)s, where data of all tasks are pre-processed by a (partially) common operator, has been limited in two aspects: First, all previous works assumed linearity of the operator, therefore completely excluding kernel-based MTL HSs, for which the operator is potentially non-linear. Secondly, all previous works, rather unnecessarily, assumed that all the task weights to be constrained within norm-balls, whose radii are equal. The requirement of equal radii leads to significant inflexibility of the relevant HSs, which may cause the generalization performance of the corresponding MTL models to deteriorate. Practically, various algorithms have been developed for kernel-based MTL models, due to different characteristics of the formulations. Most of these algorithms are a burden to develop and end up being quite sophisticated, so that practitioners may face a hard task in interpreting and implementing them, especially when multiple models are involved. This is even more so, when Multi-Task Multiple Kernel Learning (MT-MKL) models are considered. This research largely resolves the above limitations. Theoretically, a pair of new kernel-based HSs are proposed: one for single-kernel MTL, and another one for MT-MKL. Unlike previous works, we allow each task weight to be constrained within a norm-ball, whose radius is learned during training. By deriving and analyzing the generalization bounds of these two HSs, we show that, indeed, such a flexibility leads to much tighter generalization bounds, which often results to significantly better generalization performance. Based on this observation, a pair of new models is developed, one for each case: single-kernel MTL, and another one for MT-MKL. From a practical perspective, we propose a general MT-MKL framework that covers most of the prominent MT-MKL approaches, including our new MT-MKL formulation. Then, a general purpose algorithm is developed to solve the framework, which can also be employed for training all other models subsumed by this framework. A series of experiments is conducted to assess the merits of the proposed mode when trained by the new algorithm. Certain properties of our HSs and formulations are demonstrated, and the advantage of our model in terms of classification accuracy is shown via these experiments.
|
8 |
Smoothing Parameter Selection In Nonparametric Functional EstimationAmezziane, Mohamed 01 January 2004 (has links)
This study intends to build up new techniques for how to obtain completely data-driven choices of the smoothing parameter in functional estimation, within the confines of minimal assumptions. The focus of the study will be within the framework of the estimation of the distribution function, the density function and their multivariable extensions along with some of their functionals such as the location and the integrated squared derivatives.
|
9 |
Budgeted Online Kernel Classifiers for Large Scale LearningWang, Zhuang January 2010 (has links)
In the environment where new large scale problems are emerging in various disciplines and pervasive computing applications are becoming more common, there is an urgent need for machine learning algorithms that could process increasing amounts of data using comparatively smaller computing resources in a computational efficient way. Previous research has resulted in many successful learning algorithms that scale linearly or even sub-linearly with sample size and dimension, both in runtime and in space. However, linear or even sub-linear space scaling is often not sufficient, because it implies an unbounded growth in memory with sample size. This clearly opens another challenge: how to learn from large, or practically infinite, data sets or data streams using memory limited resources. Online learning is an important learning scenario in which a potentially unlimited sequence of training examples is presented one example at a time and can only be seen in a single pass. This is opposed to offline learning where the whole collection of training examples is at hand. The objective is to learn an accurate prediction model from the training stream. Upon on repetitively receiving fresh example from stream, typically, online learning algorithms attempt to update the existing model without retraining. The invention of the Support Vector Machines (SVM) attracted a lot of interest in adapting the kernel methods for both offline and online learning. Typical online learning for kernel classifiers consists of observing a stream of training examples and their inclusion as prototypes when specified conditions are met. However, such procedure could result in an unbounded growth in the number of prototypes. In addition to the danger of the exceeding the physical memory, this also implies an unlimited growth in both update and prediction time. To address this issue, in my dissertation I propose a series of kernel-based budgeted online algorithms, which have constant space and constant update and prediction time. This is achieved by maintaining a fixed number of prototypes under the memory budget. Most of the previous works on budgeted online algorithms focus on kernel perceptron. In the first part of the thesis, I review and discuss these existing algorithms and then propose a kernel perceptron algorithm which removes the prototype with the minimal impact on classification accuracy to maintain the budget. This is achieved by dual use of cached prototypes for both model presentation and validation. In the second part, I propose a family of budgeted online algorithms based on the Passive-Aggressive (PA) style. The budget maintenance is achieved by introducing an additional constraint into the original PA optimization problem. A closed-form solution was derived for the budget maintenance and model update. In the third part, I propose a budgeted online SVM algorithm. The proposed algorithm guarantees that the optimal SVM solution is maintained on all the prototype examples at any time. To maximize the accuracy, prototypes are constructed to approximate the data distribution near the decision boundary. In the fourth part, I propose a family of budgeted online algorithms for multi-class classification. The proposed algorithms are the recently proposed SVM training algorithm Pegasos. I prove that the gap between the budgeted Pegasos and the optimal SVM solution directly depends on the average model degradation due to budget maintenance. Following the analysis, I studied greedy multi-class budget maintenance methods based on removal, projection and merging of SVs. In each of these four parts, the proposed algorithms were experimentally evaluated against the state-of-art competitors. The results show that the proposed budgeted online algorithms outperform the competitive algorithm and achieve accuracy comparable to non-budget counterparts while being extremely computationally efficient. / Computer and Information Science
|
10 |
The Role of Data in Projected Quantum Kernels: The Higgs Boson Discrimination / Datans roll i projicerade kvantkärnor: Higgs Boson-diskrimineringDi Marcantonio, Francesco January 2022 (has links)
The development of quantum machine learning is bridging the way to fault tolerant quantum computation by providing algorithms running on the current noisy intermediate scale quantum devices.However, it is difficult to find use-cases where quantum computers exceed their classical counterpart.The high energy physics community is experiencing a rapid growth in the amount of data physicists need to collect, store, and analyze within the more complex experiments are being conceived.Our work approaches the study of a particle physics event involving the Higgs boson from a quantum machine learning perspective.We compare quantum support vector machine with the best classical kernel method grounding our study in a new theoretical framework based on metrics observing at three different aspects: the geometry between the classical and quantum learning spaces, the dimensionality of the feature space, and the complexity of the ML models.We exploit these metrics as a compass in the parameter space because of their predictive power. Hence, we can exclude those areas where we do not expect any advantage in using quantum models and guide our study through the best parameter configurations.Indeed, how to select the number of qubits in a quantum circuits and the number of datapoints in a dataset were so far left to trial and error attempts.We observe, in a vast parameter region, that the used classical rbf kernel model overtakes the performances of the devised quantum kernels.We include in this study the projected quantum kernel - a kernel able to reduce the expressivity of the traditional fidelity quantum kernel by projecting its quantum state back to an approximate classical representation through the measurement of local quantum systems.The Higgs dataset has been proved to be low dimensional in the quantum feature space meaning that the quantum encoding selected is not enough expressive for the dataset under study.Nonetheless, the optimization of the parameters on all the kernels proposed, classical and quantum, revealed a quantum advantage for the projected kernel which well classify the Higgs boson events and surpass the classical ML model. / Utvecklingen inom kvantmaskininlärning banar vägen för nya algoritmer att lösa krävande kvantberäkningar på dagens brusfyllda kvantkomponenter. Däremot är det en utmaning att finna användningsområden för vilka algoritmer som dessa visar sig mer effektiva än sina klassiska motsvarigheter. Forskningen inom högenergifysik upplever för tillfället en drastisk ökning i mängden data att samla, lagra och analysera inom mer komplexa experiment. Detta arbete undersöker Higgsbosonen ur ett kvantmaskinsinlärningsperspektiv. Vi jämför "quantum support vector machine" med den främsta klassiska metoden med avseende på tre olika metriker: geometrin av inlärningsrummen, dimensionaliteten av egenskapsrummen, och tidskomplexiteten av maskininlärningsmetoderna. Dessa tre metriker används för att förutsäga hur problemet manifesterar sig i parameterrummet. På så vis kan vi utesluta regioner i rummet där kvantalgoritmer inte förväntas överprestera klassiska algoritmer. Det finns en godtycklighet i hur antalet qubits och antalet datapunkter bestämms, och resultatet beror på dessa parametrar.I en utbredd region av parameterrummet observerar vi dock att den klassiska rbf-kärnmodellen överpresterar de studerade kvantkärnorna. I denna studie inkluderar vi en projicerad kvantkärna - en kärna som reducerar det totala kvanttillståndet till en ungefärlig klassisk representation genom att mäta en lokal del av kvantsystemet.Den studerade Higgs-datamängden har visat sig vara av låg dimension i kvantegenskapsrummet. Men optimering av parametrarna för alla kärnor som undersökts, klassiska såväl som kvantmekaniska, visade på ett visst kvantövertag för den projicerade kärnan som klassifierar de undersöka Higgs-händelserna som överstiger de klassiska maskininlärningsmodellerna.
|
Page generated in 0.0392 seconds