351 |
Multivariate Statistical Methods Applied to the Analysis of Trace EvidenceSzkudlarek, Cheryl Ann 22 August 2013 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The aim of this study was to use multivariate statistical techniques to: (1) determine the reproducibility of fiber evidence analyzed by MSP, (2) determine whether XRF is an appropriate technique for forensic tape analysis, and (3) determine if DART/MS is an appropriate technique for forensic tape analysis. This was achieved by employing several multivariate statistical techniques including agglomerative hierarchical clustering, principal component analysis, discriminant analysis, and analysis of variance. First, twelve dyed textile fibers were analyzed by UV-Visible MSP. This analysis included an inter-laboratory study, external validations, differing preprocessing techniques, and color coordinates. The inter-laboratory study showed no statistically significant difference between the different instruments. The external validations had overall acceptable results. Using first derivatives as a preprocessing technique and color coordinates to define color did not result in any additional information. Next, the tape backings of thirty-three brands were analyzed by XRF. After chemometric analysis it was concluded that the 3M tapes with black adhesive can be classified by brand except for Super 33+ (Cold Weather) and Super 88. The colorless adhesive tapes were separated into two large groups which were correlated with the presence of aluminosilicate filler. Overall, no additional discrimination was seen by using XRF compared to the traditional instrumentation for tape analysis previously published. Lastly, the backings of eighty-nine brands of tape were analyzed by DART/MS. The analysis of the black adhesive tapes showed that again discrimination between brands is possible except for Super 33+ and Super 88. However, now Tartan and Temflex have become indistinguishable. The colorless adhesive tapes again were more or less indistinguishable from one another with the exception of Tuff Hand Tool, Qualpack, and a roll of 3M Tartan, which were found to be unique. It cannot be determined if additional discrimination was achieved with DART/MS because the multivariate statistical techniques have not been applied to the other instrumental techniques used during tape analysis.
|
352 |
Chemometric Applications To A Complex Classification Problem: Forensic Fire Debris AnalysisWaddell, Erin 01 January 2013 (has links)
Fire debris analysis currently relies on visual pattern recognition of the total ion chromatograms, extracted ion profiles, and target compound chromatograms to identify the presence of an ignitable liquid. This procedure is described in the ASTM International E1618-10 standard method. For large data sets, this methodology can be time consuming and is a subjective method, the accuracy of which is dependent upon the skill and experience of the analyst. This research aimed to develop an automated classification method for large data sets and investigated the use of the total ion spectrum (TIS). The TIS is calculated by taking an average mass spectrum across the entire chromatographic range and has been shown to contain sufficient information content for the identification of ignitable liquids. The TIS of ignitable liquids and substrates were compiled into model data sets. Substrates are defined as common building materials and household furnishings that are typically found at the scene of a fire and are, therefore, present in fire debris samples. Fire debris samples were also used which were obtained from laboratory-scale and large-scale burns. An automated classification method was developed using computational software that was written in-house. Within this method, a multi-step classification scheme was used to detect ignitable liquid residues in fire debris samples and assign these to the classes defined in ASTM E1618-10. Classifications were made using linear discriminant analysis, quadratic discriminant analysis (QDA), and soft independent modeling of class analogy (SIMCA). The model data sets iv were tested by cross-validation and used to classify fire debris samples. Correct classification rates were calculated for each data set. Classifier performance metrics were also calculated for the first step of the classification scheme which included false positive rates, true positive rates, and the precision of the method. The first step, which determines a sample to be positive or negative for ignitable liquid residue, is arguably the most important in the forensic application. Overall, the highest correct classification rates were achieved using QDA for the first step of the scheme and SIMCA for the remaining steps. In the first step of the classification scheme, correct classification rates of 95.3% and 89.2% were obtained using QDA to classify the crossvalidation test set and fire debris samples, respectively. For this step, the cross-validation test set resulted in a true positive rate of 96.2%, a false positive rate of 9.3%, and a precision of 98.2%. The fire debris data set had a true positive rate of 82.9%, a false positive rate of 1.3%, and a precision of 99.0%. Correct classifications rates of 100% were achieved for both data sets in the majority of the remaining steps which used SIMCA for classification. The lowest correct classification rate, 69.2%, was obtained for the fire debris samples in one of the final steps in the classification scheme. In this research, the first statistically valid error rates for fire debris analysis have been developed through cross-validation of large data sets. The fire debris analyst can use the automated method as a tool for detecting and classifying ignitable liquid residues in fire debris samples. The error rates reduce the subjectivity associated with the current methods and provide a level of confidence in sample classification that does not currently exist in forensic fire debris analysis.
|
353 |
Classification of Repeated Measurement Data Using Growth Curves and Neural NetworksAndersson, Kasper January 2022 (has links)
This thesis focuses on statistical and machine learning methods designed for sequential and repeated measurement data. We start off by considering the classic general linear model (MANOVA) followed by its generalization, the growth curve model (GMANOVA), designed for analysis of repeated measurement data. By considering a binary classification problem of normal data together with the corresponding maximum likelihood estimators for the growth curve model, we demonstrate how a classification rule based on linear discriminant analysis can be derived which can be used for repeated measurement data in a meaningful way. We proceed to the topics of neural networks which serve as our second method of classification. The reader is introduced to classic neural networks and relevant subtopics are discussed. We present a generalization of the classic neural network model to the recurrent neural network model and the LSTM model which are designed for sequential data. Lastly, we present three types of data sets with an total of eight cases where the discussed classification methods are tested. / Den här uppsatsen introducerar klassificeringsmetoder skapade för data av typen upprepade mätningar och sekventiell data. Den klassiska MANOVA modellen introduceras först som en grund för den mer allmäna tillväxtkurvemodellen(GMANOVA), som i sin tur används för att modellera upprepade mätningar på ett meningsfullt sätt. Under antagandet av normalfördelad data så härleds en binär klassificeringsmetod baserad på linjär diskriminantanalys, som tillsammans med maximum likelihood-skattningar från tillväxtkurvemodellen ger en binär klassificeringsregel för data av typen upprepade mätningarn. Vi fortsätter med att introducera läsaren för klassiska neurala nätverk och relevanta ämnen diskuteras. Vi generaliserar teorin kring neurala nätverk till typen "recurrent" neurala nätverk och LSTM som är designade för sekventiell data. Avslutningsvis så testas klassificeringsmetoderna på tre typer av data i totalt åtta olika fall.
|
354 |
Three Stage Level Set Segmentation of Mass Core, Periphery, and Spiculations for Automated Image Analysis of Digital MammogramsBall, John E 05 May 2007 (has links)
In this dissertation, level set methods are employed to segment masses in digital mammographic images and to classify land cover classes in hyperspectral data. For the mammography computer aided diagnosis (CAD) application, level set-based segmentation methods are designed and validated for mass periphery segmentation, spiculation segmentation, and core segmentation. The proposed periphery segmentation uses the narrowband level set method in conjunction with an adaptive speed function based on a measure of the boundary complexity in the polar domain. The boundary complexity term is shown to be beneficial for delineating challenging masses with ill-defined and irregularly shaped borders. The proposed method is shown to outperform periphery segmentation methods currently reported in the literature. The proposed mass spiculation segmentation uses a generalized form of the Dixon and Taylor Line Operator along with narrowband level sets using a customized speed function. The resulting spiculation features are shown to be very beneficial for classifying the mass as benign or malignant. For example, when using patient age and texture features combined with a maximum likelihood (ML) classifier, the spiculation segmentation method increases the overall accuracy to 92% with 2 false negatives as compared to 87% with 4 false negatives when using periphery segmentation approaches. The proposed mass core segmentation uses the Chan-Vese level set method with a minimal variance criterion. The resulting core features are shown to be effective and comparable to periphery features, and are shown to reduce the number of false negatives in some cases. Most mammographic CAD systems use only a periphery segmentation, so those systems could potentially benefit from core features.
|
355 |
A Revision of the <i>Pleopeltis polypodioides</i> Species Complex (POLYPODIACEAE)Sprunt, Susan V. 17 August 2010 (has links)
No description available.
|
356 |
Computational Models of the Production and Perception of Facial ExpressionsSrinivasan, Ramprakash 07 November 2018 (has links)
No description available.
|
357 |
Bayes Optimality in Classification, Feature Extraction and Shape AnalysisHamsici, Onur C. 11 September 2008 (has links)
No description available.
|
358 |
Novel chemometric proposals for advanced multivariate data analysis, processing and interpretationVitale, Raffaele 03 November 2017 (has links)
The present Ph.D. thesis, primarily conceived to support and reinforce the relation between academic and industrial worlds, was developed in collaboration with Shell Global Solutions (Amsterdam, The Netherlands) in the endeavour of applying and possibly extending well-established latent variable-based approaches (i.e. Principal Component Analysis - PCA - Partial Least Squares regression - PLS - or Partial Least Squares Discriminant Analysis - PLSDA) for complex problem solving not only in the fields of manufacturing troubleshooting and optimisation, but also in the wider environment of multivariate data analysis. To this end, novel efficient algorithmic solutions are proposed throughout all chapters to address very disparate tasks, from calibration transfer in spectroscopy to real-time modelling of streaming flows of data. The manuscript is divided into the following six parts, focused on various topics of interest:
Part I - Preface, where an overview of this research work, its main aims and justification is given together with a brief introduction on PCA, PLS and PLSDA;
Part II - On kernel-based extensions of PCA, PLS and PLSDA, where the potential of kernel techniques, possibly coupled to specific variants of the recently rediscovered pseudo-sample projection, formulated by the English statistician John C. Gower, is explored and their performance compared to that of more classical methodologies in four different applications scenarios: segmentation of Red-Green-Blue (RGB) images, discrimination of on-/off-specification batch runs, monitoring of batch processes and analysis of mixture designs of experiments;
Part III - On the selection of the number of factors in PCA by permutation testing, where an extensive guideline on how to accomplish the selection of PCA components by permutation testing is provided through the comprehensive illustration of an original algorithmic procedure implemented for such a purpose;
Part IV - On modelling common and distinctive sources of variability in multi-set data analysis, where several practical aspects of two-block common and distinctive component analysis (carried out by methods like Simultaneous Component Analysis - SCA - DIStinctive and COmmon Simultaneous Component Analysis - DISCO-SCA - Adapted Generalised Singular Value Decomposition - Adapted GSVD - ECO-POWER, Canonical Correlation Analysis - CCA - and 2-block Orthogonal Projections to Latent Structures - O2PLS) are discussed, a new computational strategy for determining the number of common factors underlying two data matrices sharing the same row- or column-dimension is described, and two innovative approaches for calibration transfer between near-infrared spectrometers are presented;
Part V - On the on-the-fly processing and modelling of continuous high-dimensional data streams, where a novel software system for rational handling of multi-channel measurements recorded in real time, the On-The-Fly Processing (OTFP) tool, is designed;
Part VI - Epilogue, where final conclusions are drawn, future perspectives are delineated, and annexes are included. / La presente tesis doctoral, concebida principalmente para apoyar y reforzar la relación entre la academia y la industria, se desarrolló en colaboración con Shell Global Solutions (Amsterdam, Países Bajos) en el esfuerzo de aplicar y posiblemente extender los enfoques ya consolidados basados en variables latentes (es decir, Análisis de Componentes Principales - PCA - Regresión en Mínimos Cuadrados Parciales - PLS - o PLS discriminante - PLSDA) para la resolución de problemas complejos no sólo en los campos de mejora y optimización de procesos, sino también en el entorno más amplio del análisis de datos multivariados. Con este fin, en todos los capítulos proponemos nuevas soluciones algorítmicas eficientes para abordar tareas dispares, desde la transferencia de calibración en espectroscopia hasta el modelado en tiempo real de flujos de datos.
El manuscrito se divide en las seis partes siguientes, centradas en diversos temas de interés:
Parte I - Prefacio, donde presentamos un resumen de este trabajo de investigación, damos sus principales objetivos y justificaciones junto con una breve introducción sobre PCA, PLS y PLSDA;
Parte II - Sobre las extensiones basadas en kernels de PCA, PLS y PLSDA, donde presentamos el potencial de las técnicas de kernel, eventualmente acopladas a variantes específicas de la recién redescubierta proyección de pseudo-muestras, formulada por el estadista inglés John C. Gower, y comparamos su rendimiento respecto a metodologías más clásicas en cuatro aplicaciones a escenarios diferentes: segmentación de imágenes Rojo-Verde-Azul (RGB), discriminación y monitorización de procesos por lotes y análisis de diseños de experimentos de mezclas;
Parte III - Sobre la selección del número de factores en el PCA por pruebas de permutación, donde aportamos una guía extensa sobre cómo conseguir la selección de componentes de PCA mediante pruebas de permutación y una ilustración completa de un procedimiento algorítmico original implementado para tal fin;
Parte IV - Sobre la modelización de fuentes de variabilidad común y distintiva en el análisis de datos multi-conjunto, donde discutimos varios aspectos prácticos del análisis de componentes comunes y distintivos de dos bloques de datos (realizado por métodos como el Análisis Simultáneo de Componentes - SCA - Análisis Simultáneo de Componentes Distintivos y Comunes - DISCO-SCA - Descomposición Adaptada Generalizada de Valores Singulares - Adapted GSVD - ECO-POWER, Análisis de Correlaciones Canónicas - CCA - y Proyecciones Ortogonales de 2 conjuntos a Estructuras Latentes - O2PLS). Presentamos a su vez una nueva estrategia computacional para determinar el número de factores comunes subyacentes a dos matrices de datos que comparten la misma dimensión de fila o columna y dos planteamientos novedosos para la transferencia de calibración entre espectrómetros de infrarrojo cercano;
Parte V - Sobre el procesamiento y la modelización en tiempo real de flujos de datos de alta dimensión, donde diseñamos la herramienta de Procesamiento en Tiempo Real (OTFP), un nuevo sistema de manejo racional de mediciones multi-canal registradas en tiempo real;
Parte VI - Epílogo, donde presentamos las conclusiones finales, delimitamos las perspectivas futuras, e incluimos los anexos. / La present tesi doctoral, concebuda principalment per a recolzar i reforçar la relació entre l'acadèmia i la indústria, es va desenvolupar en col·laboració amb Shell Global Solutions (Amsterdam, Països Baixos) amb l'esforç d'aplicar i possiblement estendre els enfocaments ja consolidats basats en variables latents (és a dir, Anàlisi de Components Principals - PCA - Regressió en Mínims Quadrats Parcials - PLS - o PLS discriminant - PLSDA) per a la resolució de problemes complexos no solament en els camps de la millora i optimització de processos, sinó també en l'entorn més ampli de l'anàlisi de dades multivariades. A aquest efecte, en tots els capítols proposem noves solucions algorítmiques eficients per a abordar tasques dispars, des de la transferència de calibratge en espectroscopia fins al modelatge en temps real de fluxos de dades.
El manuscrit es divideix en les sis parts següents, centrades en diversos temes d'interès:
Part I - Prefaci, on presentem un resum d'aquest treball de recerca, es donen els seus principals objectius i justificacions juntament amb una breu introducció sobre PCA, PLS i PLSDA;
Part II - Sobre les extensions basades en kernels de PCA, PLS i PLSDA, on presentem el potencial de les tècniques de kernel, eventualment acoblades a variants específiques de la recentment redescoberta projecció de pseudo-mostres, formulada per l'estadista anglés John C. Gower, i comparem el seu rendiment respecte a metodologies més clàssiques en quatre aplicacions a escenaris diferents: segmentació d'imatges Roig-Verd-Blau (RGB), discriminació i monitorització de processos per lots i anàlisi de dissenys d'experiments de mescles;
Part III - Sobre la selecció del nombre de factors en el PCA per proves de permutació, on aportem una guia extensa sobre com aconseguir la selecció de components de PCA a través de proves de permutació i una il·lustració completa d'un procediment algorítmic original implementat per a la finalitat esmentada;
Part IV - Sobre la modelització de fonts de variabilitat comuna i distintiva en l'anàlisi de dades multi-conjunt, on discutim diversos aspectes pràctics de l'anàlisis de components comuns i distintius de dos blocs de dades (realitzat per mètodes com l'Anàlisi Simultània de Components - SCA - Anàlisi Simultània de Components Distintius i Comuns - DISCO-SCA - Descomposició Adaptada Generalitzada en Valors Singulars - Adapted GSVD - ECO-POWER, Anàlisi de Correlacions Canòniques - CCA - i Projeccions Ortogonals de 2 blocs a Estructures Latents - O2PLS). Presentem al mateix temps una nova estratègia computacional per a determinar el nombre de factors comuns subjacents a dues matrius de dades que comparteixen la mateixa dimensió de fila o columna, i dos plantejaments nous per a la transferència de calibratge entre espectròmetres d'infraroig proper;
Part V - Sobre el processament i la modelització en temps real de fluxos de dades d'alta dimensió, on dissenyem l'eina de Processament en Temps Real (OTFP), un nou sistema de tractament racional de mesures multi-canal registrades en temps real;
Part VI - Epíleg, on presentem les conclusions finals, delimitem les perspectives futures, i incloem annexos. / Vitale, R. (2017). Novel chemometric proposals for advanced multivariate data analysis, processing and interpretation [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90442
|
359 |
A predictive model of the states of financial health in South African businessesNaidoo, Surendra Ramoorthee 11 1900 (has links)
The prediction of a company's financial health is of critical importance to a variety of stakeholders ranging from auditors, creditors, customers, employees, financial institutions and investors through to management.
There has been considerable research in this field, ranging from the univariate dichotomous approach of Beaver (1966) to the multivariate multi-state approaches of Lau (1987) and Ward (1994). All of the South African studies namely, Strebel and Andrews (1977), Daya (1977), De La Rey (1981), Clarke et al (1991) and Court et al (1999), and even, Lukhwareni's (2005) four separate models, were dichotomous in nature providing either a "Healthy" or a "Failed" state; or a "Winner" or "Loser" as in the latter case. Notwithstanding, all of these models would be classified as first stage, initial screening models.
This study has focused on following a two stage approach to identifying (first stage) and analysing (second stage) the States of Health in a company. It has not adopted the rigid "Healthy" or "Failed" dichotomous methodology.
For the first stage, three-state models were developed classifying a company as Healthy, Intermittent or Distressed. Both three year and five year Profit after Tax (PAT) averages for Real Earnings Growth (REG) calculations were used to determine the superior definition for the Intermittent state; with the latter coming out as superior. Models were developed for the current year (Yn), one (Yn-1), two (Yn-2) and three years (Yn-3) forward using a Test sample of twenty companies and their predictive accuracy determined by using a Holdout sample of twenty-two companies and all their data points or years of information. The statistical methods employed were a Naïve model using the simple Shareholder Value Added (SVA) ratio, CHAID and MDA, with the latter providing very disappointing results - for the Yn year (five year average), the Test sample results were 100%, 95% and 95%, respectively; with the Holdout sample results being 81.3%, 83.8% and 52.5%, respectively. The Yn-1 to Yn-3 models produced very good results for the Test sample but somewhat disappointing Holdout sample results.
The best two Yn models namely, the Naïve and the CHAID models, were modified so as to enable a comparison with the notable, dichotomous De La Rey (1981) model. As such, three different approaches were adopted and in all cases, both the modified Naïve (100%, 81.3%, 100%) and the modified CHAID (100%, 85.9%, 98%) produced superior results to the De La Rey model (84.8%, 62.6%, 75.3%).
For the second stage, a Financial Risk Analysis Model (FRAM) using ratios in the categories of Growth, Performance Analysis, Investment Analysis and Financial Status were used to provide underlying information or clues, independent of the first stage model, so as to enable the stakeholder to establish a more meaningful picture of the company. This would pave the way for the appropriate strategy and course of action to be followed, to take the company to the next level; whether it be taking the company out of a Distressed State (D) or further improving on its Healthy status (H). / Business Management / D. BL.
|
360 |
A comparison of the performance of three multivariate methods in investigating the effects of province and power usage on the amount of five power modes in South AfricaKanyama, Busanga Jerome 06 1900 (has links)
Researchers perform multivariate techniques MANOVA, discriminant analysis and factor analysis. The
most common applications in social science are to identify and test the effects from the analysis. The
use of this multivariate technique is uncommon in investigating the effects of power usage and Province
in South Africa on the amounts of the five power modes. This dissertation discusses this issue, the
methodology and practical problems of the three multivariate techniques. The author examines the
applications of each technique in social public research and comparisons are made between the three
multivariate techniques.
This dissertation concludes with a discussion of both the concepts of the present multivariate
techniques and the results found on the use of the three multivariate techniques in the energy
household consumption. The author recommends focusing on the hypotheses of the study or typical
questions surrounding of each technique to guide the researcher in choosing the appropriate analysis in
the social research, as each technique has some strengths and limitations. / Statistics / M. Sc. (Statistics)
|
Page generated in 0.0761 seconds