Global ETD Search

71	Individual Differences in Reading Proficiency: Investigating Influencing Factors and How They Interact Nisbet, Kelly January 2021 (has links) This thesis investigates individual differences and their impact on reading proficiency using different measures of proficiency, a variety of data collection and statistical methods, and different populations. The goal was to examine the impact that individual differences in certain reading-related skills and cognitive abilities have on reading proficiency and how these differences interact. Through three key studies that make up this thesis, several important discoveries and contributions were made to the field. Chapter 2 introduces an easy-to-use application for measuring cloze probability. ‘ClozApp’, was created and made publicly available, along with a user manual and sample code for programming. Chapter 3 contributed through the development of a novel statistical method used to analyze variance between populations with different linguistic backgrounds. This method was used to demonstrate how an individual’s linguistic background (i.e., whether they were first- or second-language speakers of English) impacted how individual differences in reading skills influence their reading fluency, as indicated through their eye-movements. This statistical prediction method is open source and was made widely available for use along with sample data and code. In Chapter 4, a new connection was found between two important cognitive factors that are well-known in the reading literature: statistical learning and motivation. Using mediation analyses, this project discovered an interaction between these factors that further highlights the ways they impact reading proficiency. This thesis demonstrates a comprehensive approach to investigating individual differences in reading proficiency in the following ways: (i) both reading fluency and comprehension were investigated as measures of reading proficiency, (ii) data collection included a variety of reading-related skills, cognitive abilities, and group differences, and (iii) unique statistical analysis methods were utilized to investigate both individual and group differences. This thesis highlights important new discoveries and makes significant lasting contributions to the field of reading research. / Thesis / Doctor of Philosophy (PhD) / This thesis investigates how individual differences influence reading proficiency. Specifically, it asks how the ways in which people differ on certain reading-related skills and cognitive abilities can determine how well they read. Using different measures of proficiency, a variety of data collection and statistical methods, and looking across different populations, the goal of this thesis was to examine the ways in which people differ in these skills and abilities, how these differences interact, and the resulting impact on reading proficiency. This thesis resulted in three significant contributions to the field. First, it made available a new application for collecting data on an important variable in reading research – cloze probability. In addition, it culminated in the development of a novel statistical method that demonstrates how an individual’s linguistic background can influence their reading fluency. Finally, a new connection was found between two important cognitive factors that interact to influence reading comprehension. Reading Proficiency Individual Differences Eye-tracking Linguistic Distance Statistical Learning Motivation Cloze Probability
72	CONTENT TRADING AND PRIVACY-AWARE PRICING FOR EFFICIENT SPECTRUM UTILIZATION Alotaibi, Faisal F. January 2019 (has links) No description available. Electrical Engineering Cellular Network Offloading Dynamic Pricing Content Trading Demand Profile Multi Armed Bandits Statistical Learning
73	Multi-class Supervised Classification Techniques for High-dimensional Data: Applications to Vehicle Maintenance at Scania / Övervakade Klassificerings Modeller för Högdimensionell Data och Multipla Klasser: Tillämpningar inom Fordonsunderhåll på Scania Berlin, Daniel January 2017 (has links) In vehicle repairs, many times locating the cause of error could turn out more time consuming than the reparation itself. Hence a systematic way to accurately predict a fault causing part would constitute a valuable tool especially for errors difficult to diagnose. This thesis explores the predictive ability of Diagnostic Trouble Codes (DTC’s), produced by the electronic system on Scania vehicles, as indicators for fault causing parts. The statistical analysis is based on about 18800 observations of vehicles where both DTC’s and replaced parts could be identified during the period march 2016 - march 2017. Two different approaches of forming classes is evaluated. Many classes had only few observations and, to give the classifiers a fair chance, it is decided to omit observations of classes based on their frequency in data. After processing, the resulting data could comprise 1547 observations on 4168 features, demonstrating very high dimensionality and making it impossible to apply standard methods of large-sample statistical inference. Two procedures of supervised statistical learning, that are able to cope with high dimensionality and multiple classes, Support Vector Machines and Neural Networks are exploited and evaluated. The analysis showed that on data with 1547 observations of 4168 features (unique DTC’s) and 7 classes SVM yielded an average prediction accuracy of 79.4% compared to 75.4% using NN.The conclusion of the analysis is that DTC’s holds potential to be used as indicators for fault causing parts in a predictive model, but in order to increase prediction accuracy learning data needs improvements. Scope for future research to improve and expand the model, along with practical suggestions for exploiting supervised classifiers at Scania is provided. keywords: Statistical learning, Machine learning, Neural networks, Deep learning, Supervised learning, High dimensionality / Många gånger i samband med fordonsreparationer är felsökningen mer tidskrävande än själva reparationen. Således skulle en systematisk metod för att noggrant prediktera felkällan vara ett värdefullt verktyg för att diagnostisera reparationsåtgärder. I denna uppsats undersöks möjligheten att använda Diagnostic Trouble Codes (DTC:er), som genereras av de elektroniska systemen i Scanias fordon, som indikatorer för att peka ut felorsaken. Till grund för analysen användes ca 18800 observationer av fordon där både DTC:er samt utbytta delar kunnat identifieras under perioden mars 2016 - mars 2017. Två olika strategier för att generera klasser har utvärderats. Till många av klasserna fanns det endast ett fåtal observationer, och för att ge de prediktiva modellerna bra förutsättningar så användes endast klasser med tillräckligt många observationer i träningsdata. Efter bearbetning kunde data innehålla 1547 observationer 4168 attribut, vilket demonstrerar problemets höga dimensionalitet och gör det omöjligt att applicera standard metoder för statistisk analys på stora datamängder. Två metoder för övervakad statistisk inlärning, lämpliga för högdimensionell data med multipla klasser, Södvectormaskiner (SVM) samt Neurala Nätverk (NN) implementeras och deras resultat utvärderas. Analysen visade att på data med 1547 observationer av 4168 attribut (unika DTC:er) och 7 klasser kunde SVM prediktera observationer till klasserna med 79.4% noggrannhet jämfört med 75.4% för NN. De slutsatser som kunde dras av analysen var att DTC:er tycks ha potential att användas för att indikera felorsaker med en prediktiv modell, men att den data som ligger till grund för analysen bör förbättras för att öka noggrannheten i de prediktiva modellerna. Framtida forskningsmöjligheter för att ytterligare förbättra samt utveckla modellen, tillsammans med förslag för hur övervakade klassificerings modeller kan användas på Scnaia har identifierats. Statistical learning Machine learning Neural networks Deep learning Supervised Probability Theory and Statistics Sannolikhetsteori och statistik
74	An Efficient Implementation of a Robust Clustering Algorithm Blostein, Martin January 2016 (has links) Clustering and classification are fundamental problems in statistical and machine learning, with a broad range of applications. A common approach is the Gaussian mixture model, which assumes that each cluster or class arises from a distinct Gaussian distribution. This thesis studies a robust, high-dimensional extension of the Gaussian mixture model that automatically detects outliers and noise, and a computationally efficient implementation thereof. The contaminated Gaussian distribution is a robust elliptic distribution that allows for automatic detection of ``bad points'', and is used to make robust the usual factor analysis model. In turn, the mixtures of contaminated Gaussian factor analyzers (MCGFA) algorithm allows high-dimesional, robust clustering, classification and detection of bad points. A family of MCGFA models is created through the introduction of different constraints on the covariance structure. A new, efficient implementation of the algorithm is presented, along with an account of its development. The fast implementation permits thorough testing of the MCGFA algorithm, and its performance is compared to two natural competitors: parsimonious Gaussian mixture models (PGMM) and mixtures of modified t factor analyzers (MMtFA). The algorithms are tested systematically on simulated and real data. / Thesis / Master of Science (MSc) clustering classification statistical learning machine learning robust computational statistics mixture models
75	Statistical learning and predictive modeling in data mining Li, Bin 13 September 2006 (has links) No description available. Statistics Bayesian robustness Boosting Flat-tailed prior distribution Interpretation MART Statistical learning
76	Role of Majorization in Learning the Kernel within a Gaussian Process Regression Framework Kapat, Prasenjit 21 October 2011 (has links) No description available. Statistics gaussian process learning the kernel majorization hadamard product rkhs machine learning statistical learning
77	Foundations of Vocabulary: Does Statistical Segmentation of Events Contribute to Word Learning? Levine, Dani Fara January 2017 (has links) This dissertation evaluates the untested assumption that the individuation of events into units matters for word learning, particularly the learning of terms which map onto relational event units (Gentner & Boroditsky, 2001; Maguire et al., 2006). We predicted that 3-year-old children’s statistical action segmentation abilities would relate to their verb comprehension and to their overall vocabulary knowledge (Research Question 1). We also hypothesized that statistical action segmentation would facilitate children’s learning of novel verbs (Research Question 2). Largely confirming our first prediction, children who were better able to statistically segment novel action sequences into reliable units had more sophisticated overall vocabularies and were quicker to select the correct referents of overall vocabulary items and verb vocabulary items; nevertheless, they did not have larger verb vocabularies. Unexpectedly, statistical action segmentation did not facilitate children’s learning of verbs for statistically consistent action units. However, children showed greater learning of verbs labeling statistical action part-units than verbs labeling statistical action non-units, providing some evidence for our second prediction. In sum, this dissertation takes an important step towards understanding how event segmentation may contribute to vocabulary acquisition. / Psychology Psychology Psychology, Developmental Psychology, Cognitive Action Segmentation Statistical Learning Verb Learning Vocabulary Acquisition
78	APPLICATIONS OF STATISTICAL LEARNING ALGORITHMS IN ELECTRON SPECTROSCOPY / TOWARDS CALIBRATION-INVARIANT SPECTROSCOPY USING DEEP LEARNING Chatzidakis, Michael 06 1900 (has links) Building on the recent advances in computer vision with convolutional neural networks, we have built SpectralNet, a spectroscopy-optimized convolutional neural network architecture capable of classifying spectra despite large temporal (i.e. translational, chemical, calibration) shifts. Present methods of measuring the local chemical environment of atoms at the nano-scale involve manual feature extraction and dimensionality reduction of the original signal such as: using the peak onset, the ratio of peaks, or the full-width half maximum of peaks. Convolutional neural networks like SpectralNet are able to automatically find parts of the spectra (i.e. features) of the spectra which maximally discriminate between the classes without requiring manual feature extraction. The advantage of such a process is to remove bias and qualitative interpretation in spectroscopy analysis which occurs during manual feature extraction. Because of this automated feature extraction process, this method of spectroscopy analysis is also immune to instrument calibration differences since it performs classification based on the shape of the spectra. Convolutional neural networks are an ideal statistical classifier for spectroscopy data (i.e. time-series data) due to its shared weighting scheme in neural network weights which is ideal for identifying local correlations between adjacent dimensions of the time-series data. Over 2000 electron energy loss spectra were collected using a scanning transmission electron microscope of three oxidation states of Mn. SpectralNet was trained to learn the differences between them. We prove generalizability by training SpectralNet on electron energy loss spectroscopy data from one instrument, and test it on a variety of reference spectra found in the literature with perfect accuracy. We also test SpectralNet against a wide variety of high noise samples which a trained human spectroscopist would find incomprehensible. We also compare other neural network architectures used in the literature and determine that SpectralNet, a dense-layer free neural network, is immune to calibration differences whereas other styles of network are not. / Thesis / Master of Applied Science (MASc) / Spectroscopy is the study of the interaction between photons or electrons and a material to determine what that material is made of. One advanced way to make accurate measurements down to the atomic scale is to use high energy electrons in a transmission electron microscope. Using this instrument, a special type of photograph can be taken of the material (a spectrograph or spectrum) which is detailed enough to identify which kinds of atoms are in the material. The spectrographs are very complicated to interpret and the human eye struggles to find patterns in noisy and low resolution data. Depending on which instrument that the spectrographs are taken on, the resulting spectrograph will also change which adds extra difficulty. In this study, advanced algorithms are used to identify which types of atoms can be identified in the noisy signal from the spectrograph regardless of which instrument is used. These algorithms (convolutional neural networks) are also used in self-driving cars for a similar task of identifying objects whereas in this study we use it for identifying atoms. Spectroscopy machine learning statistical learning neural networks electron microscopy electron energy loss spectroscopy
79	Inteligência estatística na tomada de decisão médica: um estudo de caso em pacientes traumatizados / Statistical intelligence in medical decision making: a case study in traumatized patients Garcia, Marcelo 22 November 2018 (has links) O principal objetivo do estudo foi utilizar informações de ocorrência do Traumatismo Crânio Encefálico (TCE) que possam inferir/gerar descobertas associadas ao risco de gravidade do paciente, bem como auxiliar na tomada de decisão médica ao definir o melhor prognóstico, indicando quais as possíveis medidas que podem ser escolhidas para a gravidade na lesão sofrida pela vítima. Inicialmente, foram analisadas as estatísticas descritivas dos dados dos pacientes de TCE de um hospital do interior de São Paulo. Participaram desse estudo 50 pacientes. Os resultados mostraram que a maior frequência do trauma é por acidentes de trânsito (62%), seguidos de acidentes por queda (24%). Traumas em pacientes do sexo masculino (88%) são muito mais frequentes do que em pacientes do sexo feminino. Para modelagem, transformou-se a variável resposta \"Abbreviated Injury Scale (AIS)\" em dicotômica, considerando 0 (zero) aos pacientes fora de risco e 1 (um) aos que apresentaram algum tipo de risco. Em seguida, técnicas de aprendizado estatístico foram utilizadas de modo a comparar o desempenho dos classificadores Regressão Logística sendo um caso do Generalized Linear Model (GLM), Random Forest (RF), Support Vector Machine (SVM) e redes probabilísticas Naïve Bayes (NB). O modelo com melhor desempenho (RF) combinou os índices Accuracy (ACC) , Area Under ROC Curve (AUC) , Sensitivity (SEN), Specificity (SPE) e Matthews Correlation Coefficient (MCC), que apresentaram os resultados mais favoráveis no quesito de apoio no auxílio da tomada de decisão médica, possibilitando escolher o estudo clínico mais adequado das vítimas traumatizadas ao considerar o risco de vida do indivíduo. Conforme o modelo selecionado foi possível gerar um ranking para estimar a probabilidade de risco de vida do paciente. Em seguida foi realizado uma comparação de desempenho entre o modelo RF (novo classificador) e os índices Revisited Trauma Score (RTS), Injury Severity Score (ISS) , Índice de Barthel (IB) referente à classificação de risco dos pacientes. / The main objective of this study was to consider the information related to the occurrence of traumatic brain injury (TBI) that can infer new results associated with the patients risk of severity as well as assisting in the medical decision in order to find the best prognosis; this can lead to indicate possible measures that can be chosen for severity in the injury suffered by the victim. Initially, we have presented descriptive statistics from the patients with TBI from a hospital located in the heartland of São Paulo. Fifty patients were recruited for this study. Descriptive analyzes showed that the highest frequency of trauma is due to traffic accidents (62 %) followed by crashes per accident (24 %). The causes related to trauma occur much more often in male patients (88 %) than in female patients. To order model, the response variable Abbreviated Injury Scale (AIS) was considered as dichotomous, where 0 (zero) was to out-of-risk patients and 1 (one) to those who presented some type of risk. Further, statistical learning techniques were used in order to compare the performance of the Logistic Regression as a Generalized Linear Model (GLM), Random Forest (RF), Support Vector Machine (SVM) and Naive Bayes (NB) model. The best performing (RF) model combined the Accuracy (ACC) , Area Under ROC Curve (AUC) , Sensitivity (SEN), Specificity (SPE) e Matthews Correlation Coefficient (MCC), which presented the most favorable results in terms of support in medical decision, making it possible to choose the most appropriate clinical study of traumatized victims based on the individual life risk. According to the selected model it was possible to generate a rank to estimate the probability of life risk of the patient. Then a performance comparison was performed between the RF model (proposed classifier) and the Revisited Trauma Score (RTS), Injury Severity Score (ISS), Barthel index (IB) referring to the risk classification of patients. Aprendizado estatístico Decision theory Índice de gravidade Lifetime risk Risco de vida Risk index Statistical learning Tomada de decisão Trauma Trauma
80	Evolutionary algorithms in statistical learning : Automating the optimization procedure / Evolutionära algoritmer i statistisk inlärning : Automatisering av optimeringsprocessen Sjöblom, Niklas January 2019 (has links) Scania has been working with statistics for a long time but has invested in becoming a data driven company more recently and uses data science in almost all business functions. The algorithms developed by the data scientists need to be optimized to be fully utilized and traditionally this is a manual and time consuming process. What this thesis investigates is if and how well evolutionary algorithms can be used to automate the optimization process. The evaluation was done by implementing and analyzing four variations of genetic algorithms with different levels of complexity and tuning parameters. The algorithm subject to optimization was XGBoost, a gradient boosted tree model, applied to data that had previously been modelled in a competition. The results show that evolutionary algorithms are applicable in finding good models but also emphasizes the importance of proper data preparation. / Scania har länge jobbat med statistik men har på senare år investerat i att bli ett mer datadrivet företag och använder nu data science i nästan alla avdelningar på företaget. De algoritmer som utvecklas av data scientists måste optimeras för att kunna utnyttjas till fullo och detta är traditionellt sett en manuell och tidskrävade process. Detta examensarbete utreder om och hur väl evolutionära algoritmer kan användas för att automatisera optimeringsprocessen. Utvärderingen gjordes genom att implementera och analysera fyra varianter avgenetiska algoritmer med olika grader av komplexitet och trimningsparameterar. Algoritmen som var målet för optimering var XGBoost, som är en gradient boosted trädbaserad modell. Denna applicerades på data som tidigare hade modellerats i entävling. Resultatet visar att evolutionära algoritmer är applicerbara i att hitta bra modellermen påvisar även hur fundamentalt det är att arbeta med databearbetning innan modellering. evolutionary algorithms statistical learning gradient boosting automation artificial intelligence evolutionära algoritmer statistisk inlärning gradient boosting automation artificiell intelligens Mathematics Matematik

Search results