Global ETD Search

141	Chemometric Applications To A Complex Classification Problem: Forensic Fire Debris Analysis Waddell, Erin 01 January 2013 (has links) Fire debris analysis currently relies on visual pattern recognition of the total ion chromatograms, extracted ion profiles, and target compound chromatograms to identify the presence of an ignitable liquid. This procedure is described in the ASTM International E1618-10 standard method. For large data sets, this methodology can be time consuming and is a subjective method, the accuracy of which is dependent upon the skill and experience of the analyst. This research aimed to develop an automated classification method for large data sets and investigated the use of the total ion spectrum (TIS). The TIS is calculated by taking an average mass spectrum across the entire chromatographic range and has been shown to contain sufficient information content for the identification of ignitable liquids. The TIS of ignitable liquids and substrates were compiled into model data sets. Substrates are defined as common building materials and household furnishings that are typically found at the scene of a fire and are, therefore, present in fire debris samples. Fire debris samples were also used which were obtained from laboratory-scale and large-scale burns. An automated classification method was developed using computational software that was written in-house. Within this method, a multi-step classification scheme was used to detect ignitable liquid residues in fire debris samples and assign these to the classes defined in ASTM E1618-10. Classifications were made using linear discriminant analysis, quadratic discriminant analysis (QDA), and soft independent modeling of class analogy (SIMCA). The model data sets iv were tested by cross-validation and used to classify fire debris samples. Correct classification rates were calculated for each data set. Classifier performance metrics were also calculated for the first step of the classification scheme which included false positive rates, true positive rates, and the precision of the method. The first step, which determines a sample to be positive or negative for ignitable liquid residue, is arguably the most important in the forensic application. Overall, the highest correct classification rates were achieved using QDA for the first step of the scheme and SIMCA for the remaining steps. In the first step of the classification scheme, correct classification rates of 95.3% and 89.2% were obtained using QDA to classify the crossvalidation test set and fire debris samples, respectively. For this step, the cross-validation test set resulted in a true positive rate of 96.2%, a false positive rate of 9.3%, and a precision of 98.2%. The fire debris data set had a true positive rate of 82.9%, a false positive rate of 1.3%, and a precision of 99.0%. Correct classifications rates of 100% were achieved for both data sets in the majority of the remaining steps which used SIMCA for classification. The lowest correct classification rate, 69.2%, was obtained for the fire debris samples in one of the final steps in the classification scheme. In this research, the first statistically valid error rates for fire debris analysis have been developed through cross-validation of large data sets. The fire debris analyst can use the automated method as a tool for detecting and classifying ignitable liquid residues in fire debris samples. The error rates reduce the subjectivity associated with the current methods and provide a level of confidence in sample classification that does not currently exist in forensic fire debris analysis. Forensic science fire debris analysis gas chromatography mass spectrometry chemometrics multivariate statistics discriminant analysis principal components analysis (pca) error rates cross validation Chemistry
142	Flying in the Academic Environment : An Exploratory Panel Data Analysis of CO2 Emission at KTH Artman, Arvid January 2024 (has links) In this study, a panel data set of flights made by employees at the Royal Institute of Technology (KTH) in Sweden is analyzed using generalized linear modeling approaches, with the aim to create a model with high predictive capability of the quarterly CO2 emission and the number of flights, for a year not included in the model estimation. A Zero-inflated Gamma regression model is fitted to the CO2 emission variable and a Zero-inflated Negative Binomial regression model is used for the number of flights. To build the models, cross-validation is performed with the observations from 2018 as the training set and the observations from the next year, 2019, as the test set. One at a time, the variable that best improves the prediction of the test set data (either as included in the count model or the zero-inflation model) is selected until an additional variable turns out insignificant on a 5% significance level in the estimated model. In addition to the variables in the data, three lags of the dependent variables (CO2 emission and flights) were included, as well as transformed versions of the continuous variables, and a random intercept each for the categorical variables indicating quarter and department at KTH, respectively. Neither model selected through the cross-validation process turned out to be particularly good at predicting the values for the upcoming year, but a number of variables were proven to have a statistically significant association with the respective dependent variable. KTH CO2 flying panel data generalized linear models zero-inflated model gamma regression negative binomial regression fixed-effects random-effects random intercept cross validation Probability Theory and Statistics Sannolikhetsteori och statistik
143	[en] NON-PARAMETRIC ESTIMATIONS OF INTEREST RATE CURVES : MODEL SELECTION CRITERION: MODEL SELECTION CRITERIONPERFORMANCE DETERMINANT FACTORS AND BID-ASK S / [pt] ESTIMAÇÕES NÃO PARAMÉTRICAS DE CURVAS DE JUROS: CRITÉRIO DE SELEÇÃO DE MODELO, FATORES DETERMINANTES DEDESEMPENHO E BID-ASK SPREAD ANDRE MONTEIRO DALMEIDA MONTEIRO 11 June 2002 (has links) [pt] Esta tese investiga a estimação de curvas de juros sob o ponto de vista de métodos não-paramétricos. O texto está dividido em dois blocos. O primeiro investiga a questão do critério utilizado para selecionar o método de melhor desempenho na tarefa de interpolar a curva de juros brasileira em uma dada amostra. Foi proposto um critério de seleção de método baseado em estratégias de re-amostragem do tipo leave-k-out cross validation, onde K k £ £ 1 e K é função do número de contratos observados a cada curva da amostra. Especificidades do problema reduzem o esforço computacional requerido, tornando o critério factível. A amostra tem freqüência diária: janeiro de 1997 a fevereiro de 2001. O critério proposto apontou o spline cúbico natural -utilizado com método de ajuste perfeito aos dados - como o método de melhor desempenho. Considerando a precisão de negociação, este spline mostrou-se não viesado. A análise quantitativa de seu desempenho identificou, contudo, heterocedasticidades nos erros simulados. A partir da especificação da variância condicional destes erros e de algumas hipóteses, foi proposto um esquema de intervalo de segurança para a estimação de taxas de juros pelo spline cúbico natural, empregado como método de ajuste perfeito aos dados. O backtest sugere que o esquema proposto é consistente, acomodando bem as hipóteses e aproximações envolvidas. O segundo bloco investiga a estimação da curva de juros norte-americana construída a partir dos contratos de swaps de taxas de juros dólar-Libor pela Máquina de Vetores Suporte (MVS), parte do corpo da Teoria do Aprendizado Estatístico. A pesquisa em MVS tem obtido importantes avanços teóricos, embora ainda sejam escassas as implementações em problemas reais de regressão. A MVS possui características atrativas para a modelagem de curva de juros: é capaz de introduzir já na estimação informações a priori sobre o formato da curva e sobre aspectos da formação das taxas e liquidez de cada um dos contratos a partir dos quais ela é construída. Estas últimas são quantificadas pelo bid-ask spread (BAS) de cada contrato. A formulação básica da MVS é alterada para assimilar diferentes valores do BAS sem que as propriedades dela sejam perdidas. É dada especial atenção ao levantamento de informação a priori para seleção dos parâmetros da MVS a partir do formato típico da curva. A amostra tem freqüência diária: março de 1997 a abril de 2001. Os desempenhos fora da amostra de diversas especificações da MVS foram confrontados com aqueles de outros métodos de estimação. A MVS foi o método que melhor controlou o trade- off entre viés e variância dos erros. / [en] This thesis investigates interest rates curve estimation under non-parametric approach. The text is divided into two parts. The first one focus on which criterion to use to select the best performance method in the task of interpolating Brazilian interest rate curve. A selection criterion is proposed to measure out-of-sample performance by combining resample strategies leave-k-out cross validation applied upon the whole sample curves, where K k £ £ 1 and K is function of observed contract number in each curve. Some particularities reduce substantially the required computational effort, making the proposed criterion feasible. The data sample range is daily, from January 1997 to February 2001. The proposed criterion selected natural cubic spline, used as data perfect-fitting estimation method. Considering the trade rate precision, the spline is non-biased. However, quantitative analysis of performance determinant factors showed the existence of out-of-sample error heteroskedasticities. From a conditional variance specification of these errors, a security interval scheme is proposed for interest rate generated by perfect-fitting natural cubic spline. A backtest showed that the proposed security interval is consistent, accommodating the evolved assumptions and approximations. The second part estimate US free-for-floating interest rate swap contract curve by using Support Vector Machine (SVM), a method derived from Statistical Learning Theory. The SVM research has got important theoretical results, however the number of implementation on real regression problems is low. SVM has some attractive characteristics for interest rates curves modeling: it has the ability to introduce already in its estimation process a priori information about curve shape and about liquidity and price formation aspects of the contracts that generate the curve. The last information set is quantified by the bid-ask spread. The basic SVM formulation is changed in order to be able to incorporate the different values for bid-ask spreads, without losing its properties. Great attention is given to the question of how to extract a priori information from swap curve typical shape to be used in MVS parameter selection. The data sample range is daily, from March 1997 to April 2001. The out-of-sample performances of different SVM specifications are faced with others method performances. SVM got the better control of trade- off between bias and variance of out-of-sample errors. [pt] CURVA DE JUROS [en] INTEREST RATE CURVE [pt] ESTIMACAO NAO-PARAMETRICA [en] NON-PARAMETRIC ESTIMATION [pt] SPLINES CUBICOS [en] CUBIC SPLINES [pt] CRITERIO DE SELECAO [en] SELECTION CRITERION [pt] RE-AMOSTRAGEM [en] RESAMPLE [pt] LEAVE-K-OUT CROSS-VALIDATION [en] LEAVE-K-OUT CROSS-VALIDATION [pt] MAQUINA DE VETORES SUPORTES [en] SUPPORT VECTOR MACHINE [pt] BID-ASK SPREAD [en] BID-ASK SPREAD [pt] CURVA DE SWAPS DE TAXAS DE JUROS [en] INTEREST RATE SWAP CURVE [pt] SELECAO DE PARAMETROS [en] PARAMETER SELECTION
144	Klientų duomenų valdymas bankininkystėje / Client data management in banking Žiupsnys, Giedrius 09 July 2011 (has links) Darbas apima banko klientų kredito istorinių duomenų dėsningumų tyrimą. Pirmiausia nagrinėjamos banko duomenų saugyklos, siekiant kuo geriau perprasti bankinius duomenis. Vėliau naudojant banko duomenų imtis, kurios apima kreditų grąžinimo istoriją, siekiama įvertinti klientų nemokumo riziką. Tai atliekama adaptuojant algoritmus bei programinę įrangą duomenų tyrimui, kuris pradedamas nuo informacijos apdorojimo ir paruošimo. Paskui pritaikant įvairius klasifikavimo algoritmus, sudarinėjami modeliai, kuriais siekiama kuo tiksliau suskirstyti turimus duomenis, nustatant nemokius klientus. Taip pat siekiant įvertinti kliento vėluojamų mokėti paskolą dienų skaičių pasitelkiami regresijos algoritmai bei sudarinėjami prognozės modeliai. Taigi darbo metu atlikus numatytus tyrimus, pateikiami duomenų vitrinų modeliai, informacijos srautų schema. Taip pat nurodomi klasifikavimo ir prognozavimo modeliai bei algoritmai, geriausiai įvertinantys duotas duomenų imtis. / This work is about analysing regularities in bank clients historical credit data. So first of all bank information repositories are analyzed to comprehend banks data. Then using data mining algorithms and software for bank data sets, which describes credit repayment history, clients insolvency risk is being tried to estimate. So first step in analyzis is information preprocessing for data mining. Later various classification algorithms is used to make models wich classify our data sets and help to identify insolvent clients as accurate as possible. Besides clasiffication, regression algorithms are analyzed and prediction models are created. These models help to estimate how long client are late to pay deposit. So when researches have been done data marts and data flow schema are presented. Also classification and regressions algorithms and models, which shows best estimation results for our data sets, are introduced. Duomenų tyrimas Duomenų vitrina Kredito rizikos vertinimas Klasifikavimas Prognozavimas Kryžminis patikrinimas Nesutapimų matrica Tiesinė regresija Klasifikavimo taisyklė Sprendimų medis. data mining Data mart Credit risk estimation Classification Regression Cross validation Confusion matrix Linear regression Classification rule Desicion tree
145	Validation croisée et pénalisation pour l'estimation de densité / Cross-validation and penalization for density estimation Magalhães, Nelo 26 May 2015 (has links) Cette thèse s'inscrit dans le cadre de l'estimation d'une densité, considéré du point de vue non-paramétrique et non-asymptotique. Elle traite du problème de la sélection d'une méthode d'estimation à noyau. Celui-ci est une généralisation, entre autre, du problème de la sélection de modèle et de la sélection d'une fenêtre. Nous étudions des procédures classiques, par pénalisation et par rééchantillonnage (en particulier la validation croisée V-fold), qui évaluent la qualité d'une méthode en estimant son risque. Nous proposons, grâce à des inégalités de concentration, une méthode pour calibrer la pénalité de façon optimale pour sélectionner un estimateur linéaire et prouvons des inégalités d'oracle et des propriétés d'adaptation pour ces procédures. De plus, une nouvelle procédure rééchantillonnée, reposant sur la comparaison entre estimateurs par des tests robustes, est proposée comme alternative aux procédures basées sur le principe d'estimation sans biais du risque. Un second objectif est la comparaison de toutes ces procédures du point de vue théorique et l'analyse du rôle du paramètre V pour les pénalités V-fold. Nous validons les résultats théoriques par des études de simulations. / This thesis takes place in the density estimation setting from a nonparametric and nonasymptotic point of view. It concerns the statistical algorithm selection problem which generalizes, among others, the problem of model and bandwidth selection. We study classical procedures, such as penalization or resampling procedures (in particular V-fold cross-validation), which evaluate an algorithm by estimating its risk. We provide, thanks to concentration inequalities, an optimal penalty for selecting a linear estimator and we prove oracle inequalities and adaptative properties for resampling procedures. Moreover, new resampling procedure, based on estimator comparison by the mean of robust tests, is introduced as an alternative to procedures relying on the unbiased risk estimation principle. A second goal of this work is to compare these procedures from a theoretical point of view and to understand the role of V for V-fold penalization. We validate these theoretical results on empirical studies. Statistiques non-paramétriques Estimation de densité Sélection d'estimateur Sélection d'une méthode d'estimation Estimateurs linéaires Validation croisée V-fold Pénalisation T-estimation Inégalités d'oracle Heuristique de pente Estimation adaptative Perte Hellinger Non-parametric statistics Density estimation Estimator selection Statistical algorithm selection Linear estimators V-fold cross-validation Penalization T-estimation Oracle inequalities Slope heuristics Adaptative estimation Hellinger loss
146	Improving Efficiency of Prevention in Telemedicine / Zlepšování učinnosti prevence v telemedicíně Nálevka, Petr January 2010 (has links) This thesis employs data-mining techniques and modern information and communication technology to develop methods which may improve efficiency of prevention oriented telemedical programs. In particular this thesis uses the ITAREPS program as a case study and demonstrates that an extension of the program based on the proposed methods may significantly improve the program's efficiency. ITAREPS itself is a state of the art telemedical program operating since 2006. It has been deployed in 8 different countries around the world, and solely in the Czech republic it helped prevent schizophrenic relapse in over 400 participating patients. Outcomes of this thesis are widely applicable not just to schizophrenic patients but also to other psychotic or non-psychotic diseases which follow a relapsing path and satisfy certain preconditions defined in this thesis. Two main areas of improvement are proposed. First, this thesis studies various temporal data-mining methods to improve relapse prediction efficiency based on diagnostic data history. Second, latest telecommunication technologies are used in order to improve quality of the gathered diagnostic data directly at the source.
147	臺灣地區的人口推估研究 / The study of population projection: a case study in Taiwan area 黃意萍 Unknown Date (has links) 台灣地區的人口隨著生育率及死亡率的雙重下降而呈現快速老化，其中生育率的降低影響尤為顯著。民國50年時，台灣平均每位婦女生育5.58個小孩，到了民國70年卻只生育1.67個小孩，去年（民國90年）生育率更創歷年新低，只有1.4。死亡率的下降可由平均壽命的延長看出，民國75年時男性為70.97歲，女性為75.88歲；到了民國90年，男性延長到72.75歲，女性延長到78.49歲。由於生育率的變化幅度高於死亡率，對人口結構的影響較大，因此本文分成兩個部份，主要在研究台灣地區15至49歲婦女生育率的變化趨勢，再將研究結果用於台灣地區未來人口總數及其結構的預測。　　本研究第一部分是生育率的研究，引進Gamma函數、Gompertz函數、Lee-Carter法三種模型及單一年齡組個別估計法，以民國40年至84年（西元1951年至1995年）的資料為基礎，民國85年至89年（西元1996年至2000年）資料為檢測樣本，比較模型的優劣，尋求較適合台灣地區生育率的模型，再以最合適的模型預測民國91年至140年（西元2002年至2051年）的生育率。第二部分是人口推估，採用人口變動要素合成方法（Cohort Component Projection Method）推估台灣地區未來50年的人口總數及其結構，其中生育率採用上述最適合台灣地區的模型、死亡率則引進國外知名的Lee-Carter法及SOA法（Society of Actuaries），探討人口結構，並與人力規劃處的結果比較之。 / Both the fertility rate and mortality rate have been experiencing dramatic decreases in recent years. As a result, the population aging has become one of the major concerns in Taiwan area, and the proportion of the elderly (age 65 and over) increases promptly from 2.6% in 1965 to 8.8% in 2001. The decrease of fertility rate is especially significant. For example, the total fertility rate was 5.58 in 1961, and then decreases dramatically to 1.67 in 1981 (1.4 in 2001), a reduction of almost 70% within 20 years. 　　The goal of this paper is to study the population aging in Taiwan area, in particular, the fertility pattern. The first part of this paper is to explore the fertility models and decide which model is the most suitable based on age-fertility fertility rates in Taiwan. The models considered are Gamma function, Gompertz function, Lee-Carter method and individual group estimation. We use the data from 1951 to 1995 as pilot data and 1996 to 2000 as test data to judge which model fit well. The second part of this study is to project the Taiwan population for the next 50 years, i.e. 2002-2051. The projection method used is Cohort Component Projection method, assuming the population in Taiwan area is closed. We also compare our projection result to that by Council for Economic Planning and Development, the Executive Yuan of the Republic of China. 生育率推估 Gamma函數 Gompertz函數 Lee-Carter法單一年齡組個別估計法交叉驗證 SOA法人口變動要素合成方法拔靴法 Population Projection Gamma Function Gompetrz Function Lee-Carter Method Fertility Rates Cross Validation SOA Method Cohort Component Projection Method Bootstrap
148	以部分法修正地理加權迴歸 / A conditional modification to geographically weighted regression 梁穎誼, Leong , Yin Yee Unknown Date (has links) 在二十世紀九十年代，學者提出地理加權迴歸（Geographically Weighted Regression;簡稱GWR)。GWR是一個企圖解決空間非穩定性的方法。此方法最大的特性，是模型中的迴歸係數可以依空間的不同而改變，這也意味著不同的地理位置可以有不同的迴歸係數。在係數的估計上，每個觀察值都擁有一個固定環寬，而估計值可以由環寬範圍內的觀察值取得。然而，若變數之間的特性不同，固定環寬的設定可能會產生不可靠的估計值。為了解決這個問題，本文章提出CGWR(Conditional-based GWR)的方法嘗試修正估計值，允許各迴歸變數有不同的環寬。在估計的程序中，CGWR運用疊代法與交叉驗證法得出最終的估計值。本文驗證了CGWR的收斂性，也同時透過電腦模擬比較GWR, CGWR與local linear法(Wang and Mei, 2008)的表現。研究發現，當迴歸係數之間存有正相關時，CGWR比其他兩個方法來的優異。最後，本文使用CGWR分析台灣高齡老人失能資料，驗證CGWR的效果。 / Geographically weighted regression (GWR), first proposed in the 1990s, is a modelling technique used to deal with spatial non-stationarity. The main characteristic of GWR is that it allows regression coefficients to vary across space, and so the values of the parameters can vary depending on locations. The parameters for each location can be estimated by observations within a fixed range (or bandwidth). However, if the parameters differ considerably, the fixed bandwidth may produce unreliable or even unstable estimates. To deal with the estimation of greatly varying parameter values, we propose Conditional-based GWR (CGWR), where a different bandwidth is selected for each independent variable. The bandwidths for the independent variables are derived via an iteration algorithm using cross-validation. In addition to showing the convergence of the algorithm, we also use computer simulation to compare the proposed method with the basic GWR and a local linear method (Wang and Mei, 2008). We found that the CGWR outperforms the other two methods if the parameters are positively correlated. In addition, we use elderly disability data from Taiwan to demonstrate the proposed method. 地理加權迴歸廣義加法模型交叉驗證法 Jacobi疊代法電腦模擬 MAUP問題 Geographically weighted regression Generalized additive model Cross validation Jacobi iteration Computer simulation Modifiable areal unit problem
149	High angular resolution diffusion-weighted magnetic resonance imaging: adaptive smoothing and applications Metwalli, Nader 07 July 2010 (has links) Diffusion-weighted magnetic resonance imaging (MRI) has allowed unprecedented non-invasive mapping of brain neural connectivity in vivo by means of fiber tractography applications. Fiber tractography has emerged as a useful tool for mapping brain white matter connectivity prior to surgery or in an intraoperative setting. The advent of high angular resolution diffusion-weighted imaging (HARDI) techniques in MRI for fiber tractography has allowed mapping of fiber tracts in areas of complex white matter fiber crossings. Raw HARDI images, as a result of elevated diffusion-weighting, suffer from depressed signal-to-noise ratio (SNR) levels. The accuracy of fiber tractography is dependent on the performance of the various methods extracting dominant fiber orientations from the HARDI-measured noisy diffusivity profiles. These methods will be sensitive to and directly affected by the noise. In the first part of the thesis this issue is addressed by applying an objective and adaptive smoothing to the noisy HARDI data via generalized cross-validation (GCV) by means of the smoothing splines on the sphere method for estimating the smooth diffusivity profiles in three dimensional diffusion space. Subsequently, fiber orientation distribution functions (ODFs) that reveal dominant fiber orientations in fiber crossings are then reconstructed from the smoothed diffusivity profiles using the Funk-Radon transform. Previous ODF smoothing techniques have been subjective and non-adaptive to data SNR. The GCV-smoothed ODFs from our method are accurate and are smoothed without external intervention facilitating more precise fiber tractography. Diffusion-weighted MRI studies in amyotrophic lateral sclerosis (ALS) have revealed significant changes in diffusion parameters in ALS patient brains. With the need for early detection of possibly discrete upper motor neuron (UMN) degeneration signs in patients with early ALS, a HARDI study is applied in order to investigate diffusion-sensitive changes reflected in the diffusion tensor imaging (DTI) measures axial and radial diffusivity as well as the more commonly used measures fractional anisotropy (FA) and mean diffusivity (MD). The hypothesis is that there would be added utility in considering axial and radial diffusivities which directly reflect changes in the diffusion tensors in addition to FA and MD to aid in revealing neurodegenerative changes in ALS. In addition, applying adaptive smoothing via GCV to the HARDI data further facilitates the application of fiber tractography by automatically eliminating spurious noisy peaks in reconstructed ODFs that would mislead fiber tracking. Orientation distribution function (ODF) Q-ball imaging (QBI) Smoothing splines on the sphere Generalized cross-validation (GCV) Funk-Radon transform (FRT) Radial diffusivity Axial diffusivity Amyotrophic lateral sclerosis (ALS) Diffusion tensor imaging (DTI) Magnetic resonance imaging MRI Brain mapping Diagnostic imaging Diffusion magnetic resonance imaging
150	Three Essays on Application of Semiparametric Regression: Partially Linear Mixed Effects Model and Index Model / Drei Aufsätze über Anwendung der Semiparametrischen Regression: Teilweise Lineares Gemischtes Modell und Index Modell Ohinata, Ren 03 May 2012 (has links) No description available. 310 Statistik EGCG 080 EGCH 250 EGCP 200 LCB 011 LCB 020 Economics Semiparametrisches Modell Kernregression Index Modell Gemischtes Modell Paneldaten Kreuzvalidierung Wild Bootstrap Bandweitenwahl Dimensionsreduktion Hauptkomponenten Wohlfahrtsindikator Semiparametric model Kernel regression Index model Mixed model Panel data Cross validation Wild bootstrap Bandwidth selection Dimension reduction Principal components Welfare indicator 31.73 83.03

Search results