Spelling suggestions: "subject:"principal component 2analysis PCA"" "subject:"principal component 3analysis PCA""
31 |
Avaliação de laranjeiras doces quanto à qualidade de frutos, períodos de maturação e resistência a Guignardia citricarpaSousa, Patrícia Ferreira Cunha [UNESP] 17 February 2009 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:32:16Z (GMT). No. of bitstreams: 0
Previous issue date: 2009-02-17Bitstream added on 2014-06-13T20:23:21Z : No. of bitstreams: 1
sousa_pfc_dr_jabo.pdf: 387633 bytes, checksum: 521ab7a95343ec6b201a18d943d41027 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Apesar de sua importância comercial, o número de variedades de laranjas é muito restrito no Brasil. Os Bancos de Germoplasmas de citros possuem grande número de genótipos de laranjas doces para serem explorados e avaliados quanto aos aspectos botânicos, genéticos e agronômicos, visando elevar a variabilidade genética e as qualidades agronômicas das cultivares. Como parte desse trabalho, avaliou-se 58 genótipos de laranjeiras doces em relação aos caracteres físicos, visando mercado in natura por meio de 9 caracteres físicos (diâmetro, perímetro, altura e peso dos frutos, espessuras da casca, albedo e polpa e número de sementes) e 7 caracteres visando qualidade industrial (acidez total titulável, sólidos solúveis totais, “ratio”, peso dos frutos, rendimento de suco, ácido ascórbico e índice tecnológico= kg sólidos solúveis/40,8kg). A análise multivariada indicou a existência de variabilidade entre os genótipos em relação aos caracteres físicos visando mercado in natura e qualidade industrial. Dois componentes principais, com autovalores > 1, representaram 66,03% da variância total para os caracteres físicos. As variáveis com maior poder discriminatório na primeira componente principal foram: diâmetro, perímetro, peso e altura dos frutos. Os escores desse componente foram designados MI-CP1 (mercado in natura), e os genótipos com os maiores valores foram os mais indicados para o mercado de fruta fresca. Na segunda componente principal, as variáveis mais discriminantes foram espessura do endocarpo e rendimento de suco, cujos escores foram nomeados (S-CP2), caracteres físicos esses ideais para a qualidade industrial. Nos escores dos dois componentes principais (MI-CP1 e S-CP2), o genótipo 22- ‘Lanelate’ foi destaque, seguido por 43-Telde, 39-Rotuna, 44-Torregrossa, 46-Tua Mamede e 17-Grada. Quanto às avaliações visando qualidade industrial... / Although its commercial importance, the number of you cultivate of oranges it is very restricted in Brazil. The Banks of Germoplasmas of citros possess innumerable accesses of oranges candies to be explored and evaluated how much to the botanical, genetic and agronomics aspects, aiming at to raise the genetic variability and the agronomics qualities cultivating of them. As part of that work, was sought to evaluate 58 genotypes of sweet orange trees in relation to the physical characters, seeking market in nature and industry quality, through 9 physical characters (diameter, perimeter, height and weight of the fruits, thickness of the peel, albedo and pulp and number of seeds) and 7 characters seeking industrial quality (acidity total titillate, total soluble solids, ratio , weight of the fruits, juice revenue, ascorbic acid and technological index = kg solid solutes/40,8kg). The analysis multivariate indicated the variability existence among the genotypes in relation to the physical characters and industrial quality. Two main components, with autovalues> 1, they represented 66,03% of the total variance for the physical characters. The variables with larger power discriminate in the first main component were: diameter, perimeter, weight and height of the fruits; we named the scores of that component of MI-CP1 (market in nature), genotypes with the largest values were the most suitable to the market of fresh fruit; in the second main component the variables more discriminate were thickness of the endocarp and juice revenue, it was named (S-CP2), characters physical ideas for the industrial quality. In the scores of the two main components (MI-CP1 and S-CP2), the genotype 22-Lanelate was prominence, followed for 43-Telde, 39-Rotuna, 44- Torregrossa, 46-Tua Mamede and it 17-Grada. How much to the evaluations aiming at industrial quality (INDUST-CP1), had been distinguished: ...(Complete abstract click electronic access below)
|
32 |
Spectral methods and computational trade-offs in high-dimensional statistical inferenceWang, Tengyao January 2016 (has links)
Spectral methods have become increasingly popular in designing fast algorithms for modern highdimensional datasets. This thesis looks at several problems in which spectral methods play a central role. In some cases, we also show that such procedures have essentially the best performance among all randomised polynomial time algorithms by exhibiting statistical and computational trade-offs in those problems. In the first chapter, we prove a useful variant of the well-known Davis{Kahan theorem, which is a spectral perturbation result that allows us to bound of the distance between population eigenspaces and their sample versions. We then propose a semi-definite programming algorithm for the sparse principal component analysis (PCA) problem, and analyse its theoretical performance using the perturbation bounds we derived earlier. It turns out that the parameter regime in which our estimator is consistent is strictly smaller than the consistency regime of a minimax optimal (yet computationally intractable) estimator. We show through reduction from a well-known hard problem in computational complexity theory that the difference in consistency regimes is unavoidable for any randomised polynomial time estimator, hence revealing subtle statistical and computational trade-offs in this problem. Such computational trade-offs also exist in the problem of restricted isometry certification. Certifiers for restricted isometry properties can be used to construct design matrices for sparse linear regression problems. Similar to the sparse PCA problem, we show that there is also an intrinsic gap between the class of matrices certifiable using unrestricted algorithms and using polynomial time algorithms. Finally, we consider the problem of high-dimensional changepoint estimation, where we estimate the time of change in the mean of a high-dimensional time series with piecewise constant mean structure. Motivated by real world applications, we assume that changes only occur in a sparse subset of all coordinates. We apply a variant of the semi-definite programming algorithm in sparse PCA to aggregate the signals across different coordinates in a near optimal way so as to estimate the changepoint location as accurately as possible. Our statistical procedure shows superior performance compared to existing methods in this problem.
|
33 |
Development of Fourier transform infrared spectroscopy for drug response analysisHughes, Caryn Sian January 2011 (has links)
The feasibility of FTIR-based spectroscopy as a tool to measure cellular response to therapeutics was investigated. Fourier transform mid-infrared spectroscopy has been used in conjunction with multivariate analysis (MVA) to assess the chemistry of many clinically relevant biological materials; however, the technique has not yet found its place in a clinical setting. One issue that has held the technique back is due to the spectral distortions caused by resonant Mie scattering (RMieS), which affects the ability to confidently assign molecular assignments to the spectral signals from biomaterials. In the light of recently improved understanding of RMieS, resulting in a novel correction algorithm, the analytical robustness of corrected FTIR spectra was validated against multi-discipline methods to characterise a set of renal cell lines which were selected for their difference in morphology.After validation of the FTIR methodology by discriminating different cell lines, the second stage of analyses tested the sensitivity of FTIR technique by determining if discrete chemical differences could be highlighted within a cell population of the same origin. The renal carcinoma cell line 2245R contains a sub-population to contain a sub-population of cells displaying 'stem-cell like' properties. These stem-like cells, however, are difficult to isolate and characterise by conventional '-omic' means. Finally, cellular response to chemotherapeutics was investigated using the established renal cell lines CAKI-2 and A-498. For the model, 5-fluorouracil (5FU), an established chemotherapeutic agent with known mechanisms of action was used. Novel gold-based therapeutic compounds were also assessed in parallel to determine their efficacy against renal cell carcinoma. The novel compounds displayed initial activity, as the FTIR evidence suggested compounds were able to enter the cells in the first instance, evoking a cellular response. The long-term performance, tracked with standard proliferation assays and FTIR spectroscopy in the renal cancer cell model, however, was poor. Rather than dismissing the compounds as in-active, the compounds may simply be more effective in cancer cell types of a different nature. The FTIR-based evidence provided the means to suggest such a conclusion. Overall, the initial results suggest that the combination of FTIR and MVA, in the presence of the novel RMieS-EMSC algorithm can detect differences in cellular response to chemotherapeutics. The results were also in-line with complimentary biological-based techniques, demonstrating the powerful potential of the technique as a promising drug screening tool.
|
34 |
Classify part of day and snow on the load of timber stacks : A comparative study between partitional clustering and competitive learningNordqvist, My January 2021 (has links)
In today's society, companies are trying to find ways to utilize all the data they have, which considers valuable information and insights to make better decisions. This includes data used to keeping track of timber that flows between forest and industry. The growth of Artificial Intelligence (AI) and Machine Learning (ML) has enabled the development of ML modes to automate the measurements of timber on timber trucks, based on images. However, to improve the results there is a need to be able to get information from unlabeled images in order to decide weather and lighting conditions. The objective of this study is to perform an extensive for classifying unlabeled images in the categories, daylight, darkness, and snow on the load. A comparative study between partitional clustering and competitive learning is conducted to investigate which method gives the best results in terms of different clustering performance metrics. It also examines how dimensionality reduction affects the outcome. The algorithms K-means and Kohonen Self-Organizing Map (SOM) are selected for the clustering. Each model is investigated according to the number of clusters, size of dataset, clustering time, clustering performance, and manual samples from each cluster. The results indicate a noticeable clustering performance discrepancy between the algorithms concerning the number of clusters, dataset size, and manual samples. The use of dimensionality reduction led to shorter clustering time but slightly worse clustering performance. The evaluation results further show that the clustering time of Kohonen SOM is significantly higher than that of K-means.
|
35 |
Analyzing Recycling Habits in Mahoning County, OhioYengwia , Lawrenzo N. January 2017 (has links)
No description available.
|
36 |
Sparse Principal Component Analysis for High-Dimensional Data: A Comparative StudyBonner, Ashley J. 10 1900 (has links)
<p><strong>Background:</strong> Through unprecedented advances in technology, high-dimensional datasets have exploded into many fields of observational research. For example, it is now common to expect thousands or millions of genetic variables (p) with only a limited number of study participants (n). Determining the important features proves statistically difficult, as multivariate analysis techniques become flooded and mathematically insufficient when n < p. Principal Component Analysis (PCA) is a commonly used multivariate method for dimension reduction and data visualization but suffers from these issues. A collection of Sparse PCA methods have been proposed to counter these flaws but have not been tested in comparative detail. <strong>Methods:</strong> Performances of three Sparse PCA methods were evaluated through simulations. Data was generated for 56 different data-structures, ranging p, the number of underlying groups and the variance structure within them. Estimation and interpretability of the principal components (PCs) were rigorously tested. Sparse PCA methods were also applied to a real gene expression dataset. <strong>Results:</strong> All Sparse PCA methods showed improvements upon classical PCA. Some methods were best at obtaining an accurate leading PC only, whereas others were better for subsequent PCs. There exist different optimal choices of Sparse PCA methods when ranging within-group correlation and across-group variances; thankfully, one method repeatedly worked well under the most difficult scenarios. When applying methods to real data, concise groups of gene expressions were detected with the most sparse methods. <strong>Conclusions:</strong> Sparse PCA methods provide a new insightful way to detect important features amidst complex high-dimension data.</p> / Master of Science (MSc)
|
37 |
Ionic Characterization of Laundry Detergents: Implications for Consumer Choice and Inland Freshwater SalinizationMendoza, Kent Gregory 11 April 2024 (has links)
Increased salinity in freshwater systems – also called the Freshwater Salinization Syndrome (FSS) – can have far-ranging implications for the natural and built environment, agriculture, and public health at large. Such risks are clearly on display in the Occoquan Reservoir – a drinking water source for roughly one million people in the northern Virginia/ National Capital Region. Sodium concentrations in the Occoquan Reservoir are approaching levels that can affect taste and health. The Reservoir is also noteworthy as a flagship example of indirect potable reuse, which further adds complexity to understanding the sources of rising levels of sodium and other types of salinity. To help understand the role residential discharges might play in salinization of the Occoquan Reservoir, a suite of laundry detergent products was identified based upon survey data collected in the northern Virginia region. The ionic compositions of these products were then characterized using ion chromatography and inductively coupled plasma-mass spectrometry to quantify select ionic and elemental analytes. Sodium, chloride, and sulfate were consistently found in appreciable amounts. To comparatively characterize the laundry detergents, principal component analysis was employed to identify clusters of similar products. The physical formulation of the products was identified as a marker for their content, with dry formulations (free-flowing and encapsulated powders) being more enriched in sodium and sulfate. This result was corroborated by comparing nonparametric bootstrap intervals for individual analytes. The study's findings suggest an opportunity wherein consumer choice can play a role in mediating residential salt inputs in receiving bodies such as the Occoquan Reservoir. / Master of Science / Many streams, rivers, and other freshwater systems have become increasingly salty in recent decades. A rise in salinity can be problematic, stressing aquatic life, corroding pipes, and even enhancing the release of more pollutants into the water. This phenomenon, called Freshwater Salinization Syndrome, can threaten such systems' ability to serve as sources of drinking water, as is the case for the Occoquan Reservoir in northern Virginia. Serving roughly one million people, the Reservoir is notable for being one of the first in the country to purposely incorporate highly treated wastewater upstream of a drinking water supply. Despite the Reservoir's prominence, the reasons behind its rising salt levels are not well understood. This study sought to understand the role that individual residences could play when household products travel down the drain and are ultimately discharged into the watershed. Laundry detergents are potentially high-salt products. A survey of northern Virginian's laundry habits was conducted to understand local tastes and preferences. Informed by the survey, a suite of laundry detergents was chemically characterized to measure salt and element concentrations. The detergents were found to have notable amounts of sodium, chloride, and sulfate in particular, with sodium being the most abundant analyte in every detergent. However, not all detergents were equally salty; statistical tools revealed that dry formulations (such as powdered and powder-filled pak detergents) contributed more sodium and sulfate, among other things. This study's findings suggest that laundry detergents could be contributing to Freshwater Salinization Syndrome in the Occoquan Reservoir, and that local consumers' choice of detergents could make a difference.
|
38 |
Outlier detection with ensembled LSTM auto-encoders on PCA transformed financial data / Avvikelse-detektering med ensemble LSTM auto-encoders på PCA-transformerad finansiell dataStark, Love January 2021 (has links)
Financial institutions today generate a large amount of data, data that can contain interesting information to investigate to further the economic growth of said institution. There exists an interest in analyzing these points of information, especially if they are anomalous from the normal day-to-day work. However, to find these outliers is not an easy task and not possible to do manually due to the massive amounts of data being generated daily. Previous work to solve this has explored the usage of machine learning to find outliers in these financial datasets. Previous studies have shown that the pre-processing of data usually stands for a big part in information loss. This work aims to study if there is a proper balance in how the pre-processing is carried out to retain the highest amount of information while simultaneously not letting the data remain too complex for the machine learning models. The dataset used consisted of Foreign exchange transactions supplied by the host company and was pre-processed through the use of Principal Component Analysis (PCA). The main purpose of this work is to test if an ensemble of Long Short-Term Memory Recurrent Neural Networks (LSTM), configured as autoencoders, can be used to detect outliers in the data and if the ensemble is more accurate than a single LSTM autoencoder. Previous studies have shown that Ensemble autoencoders can prove more accurate than a single autoencoder, especially when SkipCells have been implemented (a configuration that skips over LSTM cells to make the model perform with more variation). A datapoint will be considered an outlier if the LSTM model has trouble properly recreating it, i.e. a pattern that is hard to classify, making it available for further investigations done manually. The results show that the ensembled LSTM model proved to be more accurate than that of a single LSTM model in regards to reconstructing the dataset, and by our definition of an outlier, more accurate in outlier detection. The results from the pre-processing experiments reveal different methods of obtaining an optimal number of components for your data. One of those is by studying retained variance and accuracy of PCA transformation compared to model performance for a certain number of components. One of the conclusions from the work is that ensembled LSTM networks can prove very powerful, but that alternatives to pre-processing should be explored such as categorical embedding instead of PCA. / Finansinstitut genererar idag en stor mängd data, data som kan innehålla intressant information värd att undersöka för att främja den ekonomiska tillväxten för nämnda institution. Det finns ett intresse för att analysera dessa informationspunkter, särskilt om de är avvikande från det normala dagliga arbetet. Att upptäcka dessa avvikelser är dock inte en lätt uppgift och ej möjligt att göra manuellt på grund av de stora mängderna data som genereras dagligen. Tidigare arbete för att lösa detta har undersökt användningen av maskininlärning för att upptäcka avvikelser i finansiell data. Tidigare studier har visat på att förbehandlingen av datan vanligtvis står för en stor del i förlust av emphinformation från datan. Detta arbete syftar till att studera om det finns en korrekt balans i hur förbehandlingen utförs för att behålla den högsta mängden information samtidigt som datan inte förblir för komplex för maskininlärnings-modellerna. Det emphdataset som användes bestod av valutatransaktioner som tillhandahölls av värdföretaget och förbehandlades genom användning av Principal Component Analysis (PCA). Huvudsyftet med detta arbete är att undersöka om en ensemble av Long Short-Term Memory Recurrent Neural Networks (LSTM), konfigurerad som autoenkodare, kan användas för att upptäcka avvikelser i data och om ensemblen är mer precis i sina predikteringar än en ensam LSTM-autoenkodare. Tidigare studier har visat att en ensembel avautoenkodare kan visa sig vara mer precisa än en singel autokodare, särskilt när SkipCells har implementerats (en konfiguration som hoppar över vissa av LSTM-cellerna för att göra modellerna mer varierade). En datapunkt kommer att betraktas som en avvikelse om LSTM-modellen har problem med att återskapa den väl, dvs ett mönster som nätverket har svårt att återskapa, vilket gör datapunkten tillgänglig för vidare undersökningar. Resultaten visar att en ensemble av LSTM-modeller predikterade mer precist än en singel LSTM-modell när det gäller att återskapa datasetet, och då enligt vår definition av avvikelser, mer precis avvikelse detektering. Resultaten från förbehandlingen visar olika metoder för att uppnå ett optimalt antal komponenter för dina data genom att studera bibehållen varians och precision för PCA-transformation jämfört med modellprestanda. En av slutsatserna från arbetet är att en ensembel av LSTM-nätverk kan visa sig vara mycket kraftfulla, men att alternativ till förbehandling bör undersökas, såsom categorical embedding istället för PCA.
|
39 |
Contribution à la modélisation de la qualité de l'orge et du malt pour la maîtrise du procédé de maltage / Modeling contribution of barley and malt quality for the malting process controlAjib, Budour 18 December 2013 (has links)
Dans un marché en permanente progression et pour répondre aux besoins des brasseurs en malt de qualité, la maîtrise du procédé de maltage est indispensable. La qualité du malt est fortement dépendante des conditions opératoires, en particulier des conditions de trempe, mais également de la qualité de la matière première : l'orge. Dans cette étude, nous avons établi des modèles polynomiaux qui mettent en relation les conditions opératoires et la qualité du malt. Ces modèles ont été couplés à nos algorithmes génétiques et nous ont permis de déterminer les conditions optimales de maltage, soit pour atteindre une qualité ciblée de malt (friabilité), soit pour permettre un maltage à faible teneur en eau (pour réduire la consommation en eau et maîtriser les coûts environnementaux de production) tout en conservant une qualité acceptable de malt. Cependant, la variabilité de la matière première est un facteur limitant de notre approche. Les modèles établis sont en effet très sensibles à l'espèce d'orge (printemps, hiver) ou encore à la variété d'orge utilisée. Les modèles sont surtout très dépendants de l'année de récolte. Les variations observées sur les propriétés d'une année de récolte à une autre sont mal caractérisées et ne sont donc pas intégrées dans nos modèles. Elles empêchent ainsi de capitaliser l'information expérimentale au cours du temps. Certaines propriétés structurelles de l'orge (porosité, dureté) ont été envisagées comme nouveaux facteurs pour mieux caractériser la matière première mais ils n'ont pas permis d'expliquer les variations observés en malterie.Afin de caractériser la matière première, 394 échantillons d'orge issus de 3 années de récolte différentes 2009-2010-2011 ont été analysés par spectroscopie MIR. Les analyses ACP ont confirmé l'effet notable des années de récolte, des espèces, des variétés voire des lieux de culture sur les propriétés de l'orge. Une régression PLS a permis, pour certaines années et pour certaines espèces, de prédire les teneurs en protéines et en béta-glucanes de l'orge à partir des spectres MIR. Cependant, ces résultats, pourtant prometteurs, se heurtent toujours à la variabilité. Ces nouveaux modèles PLS peuvent toutefois être exploités pour mettre en place des stratégies de pilotage du procédé de maltage à partir de mesures spectroscopiques MIR / In a continuously growing market and in order to meet the needs of Brewers in high quality malt, control of the malting process is a great challenge. Malt quality is highly dependent on the malting process operating conditions, especially on the steeping conditions, but also the quality of the raw material: barley. In this study, we established polynomial models that relate the operating conditions and the malt quality. These models have been coupled with our genetic algorithms to determine the optimal steeping conditions, either to obtain a targeted quality of malt (friability), or to allow a malting at low water content while maintaining acceptable quality of malt (to reduce water consumption and control the environmental costs of malt production). However, the variability of the raw material is a limiting factor for our approach. Established models are very sensitive to the species (spring and winter barley) or to the barley variety. The models are especially highly dependent on the crop year. Variations on the properties of a crop from one to another year are poorly characterized and are not incorporated in our models. They thus prevent us to capitalize experimental information over time. Some structural properties of barley (porosity, hardness) were considered as new factors to better characterize barley but they did not explain the observed variations.To characterize barley, 394 samples from 3 years of different crops 2009-2010-2011 were analysed by MIR spectroscopy. ACP analyses have confirmed the significant effect of the crop-years, species, varieties and sometimes of places of harvest on the properties of barley. A PLS regression allowed, for some years and for some species, to predict content of protein and beta-glucans of barley using MIR spectra. These results thus still face product variability, however, these new PLS models are very promising and could be exploited to implement control strategies in malting process using MIR spectroscopic measurements
|
40 |
Spatio-temporal dynamics in land use and habit fragmentation in Sandveld, South AfricaJames Takawira Magidi January 2010 (has links)
<p>This research assessed landuse changes and trends in vegetation cover in the Sandveld, using remote sensing images. Landsat TM satellite images of 1990, 2004 and 2007 were classified using the maximum likelihood classifier into seven landuse classes, namely water, agriculture, fire patches, natural vegetation, wetlands, disturbed veld, and open sands. Change detection using remote sensing algorithms and landscape metrics was performed on these multi-temporal landuse maps using the Land Change Modeller and Patch Analyst respectively. Markov stochastic modelling techniques were used to predict future scenarios in landuse change based on the classified images and their transitional probabilities. MODIS NDVI multi-temporal datasets with a 16day temporal resolution were used to assess seasonal and annual trends in vegetation cover using time series analysis (PCA and time profiling).Results indicated that natural vegetation decreased from 46% to 31% of the total landscape between 1990 and 2007 and these biodiversity losses were attributed to an increasing agriculture footprint. Predicted future scenario based on transitional probabilities revealed a continual loss in natural habitat and increase in the agricultural footprint. Time series analysis results (principal components and temporal profiles) suggested that the landscape has a high degree of overall dynamic change with pronounced inter and intra-annual changes and there was an overall increase in greenness associated with increase in agricultural activity. The study concluded that without future conservation interventions natural habitats would continue to disappear, a condition that will impact heavily on biodiversity and significant waterdependent ecosystems such as wetlands. This has significant implications for the long-term provision of water from ground water reserves and for the overall sustainability of current agricultural practices.</p>
|
Page generated in 0.1049 seconds