Spelling suggestions: "subject:"[een] CROSS-VALIDATION"" "subject:"[enn] CROSS-VALIDATION""
51 |
Non-Destructive VIS/NIR Reflectance Spectrometry for Red Wine Grape AnalysisFadock, Michael 04 August 2011 (has links)
A novel non-destructive method of grape berry analysis is presented that uses reflected light to predict berry composition. The reflectance spectrum was collected using a diode array spectrometer (350 to 850 nm) over the 2009 and 2010 growing seasons. Partial least squares regression (PLS) and support vector machine regression (SVMR) generated calibrations between reflected light and composition for five berry components, total soluble solids (°Brix), titratable acidity (TA), pH, total phenols, and anthocyanins. Standard methods of analysis for the
components were employed and characterized for error. Decomposition of the reflectance data was performed by principal component analysis
(PCA) and independent component analysis (ICA). Regression models were constructed using 10x10 fold cross validated PLS and SVM models subject to smoothing, differentiation, and normalization pretreatments. All generated models were validated on the alternate season using two model selection strategies: minimum root mean squared error of prediction (RMSEP), and the "oneSE" heuristic.
PCA/ICA decomposition demonstrated consistent features in the long VIS wavelengths and NIR region. The features are consistent across seasons. 2009 was generally more variable, possibly due to cold weather affects. RMSEP and R2 statistics of models indicate that PLS °Brix, pH, and TA models are well predicted for 2009 and 2010. SVM was marginally better. The R2 values of the PLS °Brix, pH, and TA models for 2009 and 2010 respectively were: 0.84, 0.58, 0.56 and: 0.89, 0.81, 0.58. 2010 °Brix models were suitable for rough screening. Optimal
pretreatments were SG smoothing and relative normalization. Anthocyanins were well predicted in 2009, R2 0.65, but not in 2010, R2
0.15. Phenols were not well predicted in either year, R2 0.15-0.25. Validation demonstrated that °Brix, pH, and TA models from 2009 transferred to 2010 with fair results, R2 0.70, 0.72, 0.31. Models generated using 2010 reflectance data did not generate models that could predict 2009 data. It is hypothesized that weather events present in
2009 and not in 2010 allowed for a forward calibration transfer, and prevented the reverse calibration transfer. Heuristic selection was superior to minimum RMSEP for transfer, indicating some overfitting in the minimum RMSEP models. The results are demonstrative of a reflectance-composition relationship in the VIS-NIR region for °Brix, pH, and TA requiring additional study and development of further calibrations.
|
52 |
Clustering, Classification, and Factor Analysis in High Dimensional Data AnalysisWang, Yanhong 17 December 2013 (has links)
Clustering, classification, and factor analysis are three popular data mining techniques. In this dissertation, we investigate these methods in high dimensional data analysis. Since there are much more features than the sample sizes and most of the features are non-informative in high dimensional data, dimension reduction is necessary before clustering or classification can be made. In the first part of this dissertation, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC; Zhang and Dai, 2009), and propose to use cross-validation to select the tuning parameter. Then we develop a variation of ODC, sparse optimal discriminant clustering (SODC) for high dimensional data, by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis. In the second part, three existing sparse principal component analysis (SPCA) methods, Lasso-PCA (L-PCA), Alternative Lasso PCA (AL-PCA), and sparse principal component analysis by choice of norm (SPCABP) are applied to a real data set the International HapMap Project for AIM selection to genome-wide SNP data, the classification accuracy is compared for them and it is demonstrated that SPCABP outperforms the other two SPCA methods. Third, we propose a novel method called sparse factor analysis by projection (SFABP) based on SPCABP, and propose to use cross-validation method for the selection of the tuning parameter and the number of factors. Our simulation studies show that SFABP has better performance than the unpenalyzed factor analysis when they are applied to classification problems.
|
53 |
Optimal Active Learning: experimental factors and membership query learningYu-hui Yeh Unknown Date (has links)
The field of Machine Learning is concerned with the development of algorithms, models and techniques that solve challenging computational problems by learning from data representative of the problem (e.g. given a set of medical images previously classified by a human expert, build a model to predict unseen images as either benign or malignant). Many important real-world problems have been formulated as supervised learning problems. The assumption is that a data set is available containing the correct output (e.g. class label or target value) for each given data point. In many application domains, obtaining the correct outputs (labels) for data points is a costly and time-consuming task. This has provided the motivation for the development of Machine Learning techniques that attempt to minimize the number of labeled data points while maintaining good generalization performance on a given problem. Active Learning is one such class of techniques and is the focus of this thesis. Active Learning algorithms select or generate unlabeled data points to be labeled and use these points for learning. If successful, an Active Learning algorithm should be able to produce learning performance (e.g test set error) comparable to an equivalent supervised learner using fewer labeled data points. Theoretical, algorithmic and experimental Active Learning research has been conducted and a number of successful applications have been demonstrated. However, the scope of many of the experimental studies on Active Learning has been relatively small and there are very few large-scale experimental evaluations of Active Learning techniques. A significant amount of performance variability exists across Active Learning experimental results in the literature. Furthermore, the implementation details and effects of experimental factors have not been closely examined in empirical Active Learning research, creating some doubt over the strength and generality of conclusions that can be drawn from such results. The Active Learning model/system used in this thesis is the Optimal Active Learning algorithm framework with Gaussian Processes for regression problems (however, most of the research questions are of general interest in many other Active Learning scenarios). Experimental and implementation details of the Active Learning system used are described in detail, using a number of regression problems and datasets of different types. It is shown that the experimental results of the system are subject to significant variability across problem datasets. The hypothesis that experimental factors can account for this variability is then investigated. The results show the impact of sampling and sizes of the datasets used when generating experimental results. Furthermore, preliminary experimental results expose performance variability across various real-world regression problems. The results suggest that these experimental factors can (to a large extent) account for the variability observed in experimental results. A novel resampling technique for Optimal Active Learning, called '3-Sets Cross-Validation', is proposed as a practical solution to reduce experimental performance variability. Further results confirm the usefulness of the technique. The thesis then proposes an extension to the Optimal Active Learning framework, to perform learning via membership queries via a novel algorithm named MQOAL. The MQOAL algorithm employs the Metropolis-Hastings Markov chain Monte Carlo (MCMC) method to sample data points for query selection. Experimental results show that MQOAL provides comparable performance to the pool-based OAL learner, using a very generic, simple MCMC technique, and is robust to experimental factors related to the MCMC implementation. The possibility of making queries in batches is also explored experimentally, with results showing that while some performance degradation does occur, it is minimal for learning in small batch sizes, which is likely to be valuable in some real-world problem domains.
|
54 |
Dynamical analysis of respiratory signals for diagnosis of sleep disordered breathing disorders.Suren Rathnayake Unknown Date (has links)
Sleep disordered breathing (SDB) is a highly prevalent but an under-diagnosed disease. Among adults in the ages between 30 to 60 years, 24% of males and 9% of females show conditions of SDB, while 82% of men and 93% of women with moderate to severe SDB remain undiagnosed. Polysomnography (PSG) is the reference diagnostic test for SDB. During PSG, a number of physiological signals are recorded during an overnight sleep and then manually scored for sleep/wake stages and SDB events to obtain the reference diagnosis. The manual scoring of SDB events is an extremely time consuming and cumbersome task with high inter- and intra-rater variations. PSG is a labour intensive, expensive and patient inconvenient test. Further, PSG facilities are limited leading to long waiting lists. There is an enormous clinical need for automation of PSG scoring and an alternative automated ambulatory method suitable for screening the population. During the work of this thesis, we focus (1) on implementing a framework that enables more reliable scoring of SDB events which also lowers manual scoring time, and (2) implementing a reliable automated screening procedure that can be used as a patient-friendly home based study. The recordings of physiological measurements obtained during patients’ sleep of- ten suffer from data losses, interferences and artefacts. In a typical sleep scoring session, artifact-corrupted signal segments are visually detected and removed from further consideration. We developed a novel framework for automated artifact detection and signal restoration, based on the redundancy among respiratory flow signals. The signals focused on are the airflow (thermistor sensors) and nasal pressure signals that are clinically significant in detecting respira- tory disturbances. We treat the respiratory system as a dynamical system, and use the celebrated Takens embedding theorem as the theoretical basis for sig- nal prediction. In this study, we categorise commonly occurring artefacts and distortions in the airflow and nasal pressure measurements into several groups and explore the efficacy of the proposed technique in detecting/recovering them. Results we obtained from a database of clinical PSG signals indicated that theproposed technique can detect artefacts/distortions with a sensitivity >88% and specificity >92%. This work has the potential to simplify the work done by sleep scoring technicians, and also to improve automated sleep scoring methods. During the next phase of the thesis we have investigated the diagnostic ability of single – and dual–channel respiratory flow measuring devices. Recent studies have shown that single channel respiratory flow measurements can be used for automated diagnosis/screening for sleep disordered breathing (SDB) diseases. Improvements for reliable home-based monitoring for SDB may be achieved with the use of predictors based on recurrence quantification analysis (RQA). RQA essentially measures the complex structures present in a time series and are relatively independent of the nonlinearities present in the respiratory measurements such as those due to breathing nonlinearities and sensor movements. The nasal pressure, thermistor-based airflow, abdominal movement and thoracic movement measurements obtained during Polysomnography, were used in this study to implement an algorithm for automated screening for SDB diseases. The algorithm predicts SDB-affected measurement segments using twelve features based on RQA, body mass index (BMI) and neck circumference using mixture discriminant analysis (MDA). The rate of SDB affected segments of data per hour of recording (RDIS) is used as a measure for the diagnosis of SDB diseases. The operating points to be chosen were the prior probability of SDB affected data segments (π1) and the RDIS threshold value, above which a patient is predicted to have a SDB disease. Cross-validation with five-folds, stratified based on the RDI values of the recordings, was used in estimating the operating points. Sensitivity and specificity rates for the final classifier were estimated using a two-layer assessment approach with the operating points chosen at the inner layer using five-fold cross-validation and the choice assessed at the outer layer using repeated learning-testing. The nasal pressure measurement showed higher accuracy compared to other respiratory measurements when used alone. The nasal pressure and thoracic movement measurements were identified as the best pair of measurements to be used in a dual channel device. The estimated sensitivity and specificity (standard error) in diagnosing SDB disease (RDI ≥ 15) are 90.3(3.1)% and 88.3(5.5)% when nasal pressure is used alone and together with the thoracic movement it was 89.5(3.7)% and 100.0(0.0)%. Present results suggest that RQA of a single respiratory measurement has potential to be used in an automated SDB screening device, while with dual-channel more reliable accuracy can be expected. Improvements may be possible by including other RQA based features and optimisation of the parameters.
|
55 |
Dynamical analysis of respiratory signals for diagnosis of sleep disordered breathing disorders.Suren Rathnayake Unknown Date (has links)
Sleep disordered breathing (SDB) is a highly prevalent but an under-diagnosed disease. Among adults in the ages between 30 to 60 years, 24% of males and 9% of females show conditions of SDB, while 82% of men and 93% of women with moderate to severe SDB remain undiagnosed. Polysomnography (PSG) is the reference diagnostic test for SDB. During PSG, a number of physiological signals are recorded during an overnight sleep and then manually scored for sleep/wake stages and SDB events to obtain the reference diagnosis. The manual scoring of SDB events is an extremely time consuming and cumbersome task with high inter- and intra-rater variations. PSG is a labour intensive, expensive and patient inconvenient test. Further, PSG facilities are limited leading to long waiting lists. There is an enormous clinical need for automation of PSG scoring and an alternative automated ambulatory method suitable for screening the population. During the work of this thesis, we focus (1) on implementing a framework that enables more reliable scoring of SDB events which also lowers manual scoring time, and (2) implementing a reliable automated screening procedure that can be used as a patient-friendly home based study. The recordings of physiological measurements obtained during patients’ sleep of- ten suffer from data losses, interferences and artefacts. In a typical sleep scoring session, artifact-corrupted signal segments are visually detected and removed from further consideration. We developed a novel framework for automated artifact detection and signal restoration, based on the redundancy among respiratory flow signals. The signals focused on are the airflow (thermistor sensors) and nasal pressure signals that are clinically significant in detecting respira- tory disturbances. We treat the respiratory system as a dynamical system, and use the celebrated Takens embedding theorem as the theoretical basis for sig- nal prediction. In this study, we categorise commonly occurring artefacts and distortions in the airflow and nasal pressure measurements into several groups and explore the efficacy of the proposed technique in detecting/recovering them. Results we obtained from a database of clinical PSG signals indicated that theproposed technique can detect artefacts/distortions with a sensitivity >88% and specificity >92%. This work has the potential to simplify the work done by sleep scoring technicians, and also to improve automated sleep scoring methods. During the next phase of the thesis we have investigated the diagnostic ability of single – and dual–channel respiratory flow measuring devices. Recent studies have shown that single channel respiratory flow measurements can be used for automated diagnosis/screening for sleep disordered breathing (SDB) diseases. Improvements for reliable home-based monitoring for SDB may be achieved with the use of predictors based on recurrence quantification analysis (RQA). RQA essentially measures the complex structures present in a time series and are relatively independent of the nonlinearities present in the respiratory measurements such as those due to breathing nonlinearities and sensor movements. The nasal pressure, thermistor-based airflow, abdominal movement and thoracic movement measurements obtained during Polysomnography, were used in this study to implement an algorithm for automated screening for SDB diseases. The algorithm predicts SDB-affected measurement segments using twelve features based on RQA, body mass index (BMI) and neck circumference using mixture discriminant analysis (MDA). The rate of SDB affected segments of data per hour of recording (RDIS) is used as a measure for the diagnosis of SDB diseases. The operating points to be chosen were the prior probability of SDB affected data segments (π1) and the RDIS threshold value, above which a patient is predicted to have a SDB disease. Cross-validation with five-folds, stratified based on the RDI values of the recordings, was used in estimating the operating points. Sensitivity and specificity rates for the final classifier were estimated using a two-layer assessment approach with the operating points chosen at the inner layer using five-fold cross-validation and the choice assessed at the outer layer using repeated learning-testing. The nasal pressure measurement showed higher accuracy compared to other respiratory measurements when used alone. The nasal pressure and thoracic movement measurements were identified as the best pair of measurements to be used in a dual channel device. The estimated sensitivity and specificity (standard error) in diagnosing SDB disease (RDI ≥ 15) are 90.3(3.1)% and 88.3(5.5)% when nasal pressure is used alone and together with the thoracic movement it was 89.5(3.7)% and 100.0(0.0)%. Present results suggest that RQA of a single respiratory measurement has potential to be used in an automated SDB screening device, while with dual-channel more reliable accuracy can be expected. Improvements may be possible by including other RQA based features and optimisation of the parameters.
|
56 |
Dynamical analysis of respiratory signals for diagnosis of sleep disordered breathing disorders.Suren Rathnayake Unknown Date (has links)
Sleep disordered breathing (SDB) is a highly prevalent but an under-diagnosed disease. Among adults in the ages between 30 to 60 years, 24% of males and 9% of females show conditions of SDB, while 82% of men and 93% of women with moderate to severe SDB remain undiagnosed. Polysomnography (PSG) is the reference diagnostic test for SDB. During PSG, a number of physiological signals are recorded during an overnight sleep and then manually scored for sleep/wake stages and SDB events to obtain the reference diagnosis. The manual scoring of SDB events is an extremely time consuming and cumbersome task with high inter- and intra-rater variations. PSG is a labour intensive, expensive and patient inconvenient test. Further, PSG facilities are limited leading to long waiting lists. There is an enormous clinical need for automation of PSG scoring and an alternative automated ambulatory method suitable for screening the population. During the work of this thesis, we focus (1) on implementing a framework that enables more reliable scoring of SDB events which also lowers manual scoring time, and (2) implementing a reliable automated screening procedure that can be used as a patient-friendly home based study. The recordings of physiological measurements obtained during patients’ sleep of- ten suffer from data losses, interferences and artefacts. In a typical sleep scoring session, artifact-corrupted signal segments are visually detected and removed from further consideration. We developed a novel framework for automated artifact detection and signal restoration, based on the redundancy among respiratory flow signals. The signals focused on are the airflow (thermistor sensors) and nasal pressure signals that are clinically significant in detecting respira- tory disturbances. We treat the respiratory system as a dynamical system, and use the celebrated Takens embedding theorem as the theoretical basis for sig- nal prediction. In this study, we categorise commonly occurring artefacts and distortions in the airflow and nasal pressure measurements into several groups and explore the efficacy of the proposed technique in detecting/recovering them. Results we obtained from a database of clinical PSG signals indicated that theproposed technique can detect artefacts/distortions with a sensitivity >88% and specificity >92%. This work has the potential to simplify the work done by sleep scoring technicians, and also to improve automated sleep scoring methods. During the next phase of the thesis we have investigated the diagnostic ability of single – and dual–channel respiratory flow measuring devices. Recent studies have shown that single channel respiratory flow measurements can be used for automated diagnosis/screening for sleep disordered breathing (SDB) diseases. Improvements for reliable home-based monitoring for SDB may be achieved with the use of predictors based on recurrence quantification analysis (RQA). RQA essentially measures the complex structures present in a time series and are relatively independent of the nonlinearities present in the respiratory measurements such as those due to breathing nonlinearities and sensor movements. The nasal pressure, thermistor-based airflow, abdominal movement and thoracic movement measurements obtained during Polysomnography, were used in this study to implement an algorithm for automated screening for SDB diseases. The algorithm predicts SDB-affected measurement segments using twelve features based on RQA, body mass index (BMI) and neck circumference using mixture discriminant analysis (MDA). The rate of SDB affected segments of data per hour of recording (RDIS) is used as a measure for the diagnosis of SDB diseases. The operating points to be chosen were the prior probability of SDB affected data segments (π1) and the RDIS threshold value, above which a patient is predicted to have a SDB disease. Cross-validation with five-folds, stratified based on the RDI values of the recordings, was used in estimating the operating points. Sensitivity and specificity rates for the final classifier were estimated using a two-layer assessment approach with the operating points chosen at the inner layer using five-fold cross-validation and the choice assessed at the outer layer using repeated learning-testing. The nasal pressure measurement showed higher accuracy compared to other respiratory measurements when used alone. The nasal pressure and thoracic movement measurements were identified as the best pair of measurements to be used in a dual channel device. The estimated sensitivity and specificity (standard error) in diagnosing SDB disease (RDI ≥ 15) are 90.3(3.1)% and 88.3(5.5)% when nasal pressure is used alone and together with the thoracic movement it was 89.5(3.7)% and 100.0(0.0)%. Present results suggest that RQA of a single respiratory measurement has potential to be used in an automated SDB screening device, while with dual-channel more reliable accuracy can be expected. Improvements may be possible by including other RQA based features and optimisation of the parameters.
|
57 |
Optimal Active Learning: experimental factors and membership query learningYu-hui Yeh Unknown Date (has links)
The field of Machine Learning is concerned with the development of algorithms, models and techniques that solve challenging computational problems by learning from data representative of the problem (e.g. given a set of medical images previously classified by a human expert, build a model to predict unseen images as either benign or malignant). Many important real-world problems have been formulated as supervised learning problems. The assumption is that a data set is available containing the correct output (e.g. class label or target value) for each given data point. In many application domains, obtaining the correct outputs (labels) for data points is a costly and time-consuming task. This has provided the motivation for the development of Machine Learning techniques that attempt to minimize the number of labeled data points while maintaining good generalization performance on a given problem. Active Learning is one such class of techniques and is the focus of this thesis. Active Learning algorithms select or generate unlabeled data points to be labeled and use these points for learning. If successful, an Active Learning algorithm should be able to produce learning performance (e.g test set error) comparable to an equivalent supervised learner using fewer labeled data points. Theoretical, algorithmic and experimental Active Learning research has been conducted and a number of successful applications have been demonstrated. However, the scope of many of the experimental studies on Active Learning has been relatively small and there are very few large-scale experimental evaluations of Active Learning techniques. A significant amount of performance variability exists across Active Learning experimental results in the literature. Furthermore, the implementation details and effects of experimental factors have not been closely examined in empirical Active Learning research, creating some doubt over the strength and generality of conclusions that can be drawn from such results. The Active Learning model/system used in this thesis is the Optimal Active Learning algorithm framework with Gaussian Processes for regression problems (however, most of the research questions are of general interest in many other Active Learning scenarios). Experimental and implementation details of the Active Learning system used are described in detail, using a number of regression problems and datasets of different types. It is shown that the experimental results of the system are subject to significant variability across problem datasets. The hypothesis that experimental factors can account for this variability is then investigated. The results show the impact of sampling and sizes of the datasets used when generating experimental results. Furthermore, preliminary experimental results expose performance variability across various real-world regression problems. The results suggest that these experimental factors can (to a large extent) account for the variability observed in experimental results. A novel resampling technique for Optimal Active Learning, called '3-Sets Cross-Validation', is proposed as a practical solution to reduce experimental performance variability. Further results confirm the usefulness of the technique. The thesis then proposes an extension to the Optimal Active Learning framework, to perform learning via membership queries via a novel algorithm named MQOAL. The MQOAL algorithm employs the Metropolis-Hastings Markov chain Monte Carlo (MCMC) method to sample data points for query selection. Experimental results show that MQOAL provides comparable performance to the pool-based OAL learner, using a very generic, simple MCMC technique, and is robust to experimental factors related to the MCMC implementation. The possibility of making queries in batches is also explored experimentally, with results showing that while some performance degradation does occur, it is minimal for learning in small batch sizes, which is likely to be valuable in some real-world problem domains.
|
58 |
Machine learning in logistics : Increasing the performance of machine learning algorithms on two specific logistic problems / Maskininlärning i logistik : Öka prestandan av maskininlärningsalgoritmer på två specifika logistikproblem.Lind Nilsson, Rasmus January 2017 (has links)
Data Ductus, a multination IT-consulting company, wants to develop an AI that monitors a logistic system and looks for errors. Once trained enough, this AI will suggest a correction and automatically right issues if they arise. This project presents how one works with machine learning problems and provides a deeper insight into how cross-validation and regularisation, among other techniques, are used to improve the performance of machine learning algorithms on the defined problem. Three techniques are tested and evaluated in our logistic system on three different machine learning algorithms, namely Naïve Bayes, Logistic Regression and Random Forest. The evaluation of the algorithms leads us to conclude that Random Forest, using cross-validated parameters, gives the best performance on our specific problems, with the other two falling behind in each tested category. It became clear to us that cross-validation is a simple, yet powerful tool for increasing the performance of machine learning algorithms. / Data Ductus, ett multinationellt IT-konsultföretag vill utveckla en AI som övervakar ett logistiksystem och uppmärksammar fel. När denna AI är tillräckligt upplärd ska den föreslå korrigering eller automatiskt korrigera problem som uppstår. Detta projekt presenterar hur man arbetar med maskininlärningsproblem och ger en djupare inblick i hur kors-validering och regularisering, bland andra tekniker, används för att förbättra prestandan av maskininlärningsalgoritmer på det definierade problemet. Dessa tekniker testas och utvärderas i vårt logistiksystem på tre olika maskininlärnings algoritmer, nämligen Naïve Bayes, Logistic Regression och Random Forest. Utvärderingen av algoritmerna leder oss till att slutsatsen är att Random Forest, som använder korsvaliderade parametrar, ger bästa prestanda på våra specifika problem, medan de andra två faller bakom i varje testad kategori. Det blev klart för oss att kors-validering är ett enkelt, men kraftfullt verktyg för att öka prestanda hos maskininlärningsalgoritmer.
|
59 |
Multilinear technics in face recognition / TÃcnicas multilineares em reconhecimento facialEmanuel Dario Rodrigues Sena 07 November 2014 (has links)
CoordenaÃÃo de AperfeiÃoamento de NÃvel Superior / In this dissertation, the face recognition problem is investigated from the standpoint of multilinear algebra,
more specifically the tensor decomposition, and by making use of Gabor wavelets. The feature extraction occurs in two stages: first the Gabor wavelets are applied holistically in feature selection; Secondly facial images are modeled as a higher-order tensor according to the multimodal factors present. Then, the HOSVD is applied to separate the multimodal factors of the images. The proposed facial
recognition approach exhibits higher average success rate and stability when there is variation in the various multimodal factors such as facial position,
lighting condition and facial expression. We also propose a systematic way to perform cross-validation on tensor models to estimate the error rate in face recognition systems that explore the nature of the multimodal ensemble.
Through the random partitioning of data organized as a tensor, the mode-n cross-validation provides folds as subtensors extracted of the desired mode, featuring a stratified method and susceptible to repetition of cross-validation with different partitioning. / Nesta dissertaÃÃo o problema de reconhecimento facial à investigado do
ponto de vista da Ãlgebra multilinear, mais especificamente por meio de
decomposiÃÃes tensoriais fazendo uso das wavelets de Gabor.
A extraÃÃo de caracterÃsticas ocorre em dois estÃgios: primeiramente as wavelets de Gabor sÃo aplicadas de maneira holÃstica na seleÃÃo de caracterÃsticas; em segundo as imagens faciais sÃo modeladas como um tensor de ordem superior de acordo com o fatores multimodais presentes. Com isso aplicamos a decomposiÃÃo tensorial Higher Order Singular Value Decomposition (HOSVD) para separar os fatores que influenciam na formaÃÃo das imagens. O mÃtodo de reconhecimento facial proposto possui uma alta taxa de acerto e estabilidade quando hà variaÃÃo nos diversos fatores multimodais, tais como, posiÃÃo facial, condiÃÃo de iluminaÃÃo e expressÃo facial. Propomos ainda uma maneira sistemÃtica para realizaÃÃo da validaÃÃo cruzada em modelos tensoriais para estimaÃÃo da taxa de erro em sistemas de reconhecimento facial que exploram a natureza multilinear do conjunto de imagens. AtravÃs do particionamento aleatÃrio dos dados organizado como um tensor, a validaÃÃo cruzada modo-n proporciona a criaÃÃo de folds extraindo subtensores no modo desejado, caracterizando um mÃtodo estratificado e susceptÃvel a repetiÃÃes da validaÃÃo cruzada com diferentes particionamentos.
|
60 |
交叉驗證用於迴歸樣條的模型選擇之探討謝式斌 Unknown Date (has links)
在無母數的迴歸當中,因為原始的函數類型未知,所以常用已知特定類型的函數來近似未知的函數,而spline函數也可以用來近似未知的函數,但是要估計spline函數就需要設定節點(knots),越多的節點越能準確近似原始函數的內容,可是如果節點太多有較多的參數要估計, 就會變得比較不準確,所以選擇適合節點個數就變得很重要。
在本研究中,用交叉驗證的方式來尋找適合的節點個數, 考慮了幾種不同切割資料方式來決定訓練資料和測試資料, 並比較不同切割資料的方式下選擇節點的結果與函數估計的效果。 / In this thesis, I consider the problem of estimating an unknown regression function using spline approximation.
Splines are piecewise polynomials jointed at knots. When using splines to approximate unknown functions, it is crucial to determine the number of knots and the knot locations. In this thesis, I determine the knot locations using least squares for given a given number of knots, and use cross-validation to find appropriate number of knots. I consider three methods to split the data into training data and testing data, and compare the estimation results.
|
Page generated in 0.0386 seconds