Global ETD Search

101	基於眼動軌跡之閱讀模式分析 / Classification of reading patterns based on gaze information 張晉文, Chang, Chin Wen Unknown Date (has links) 閱讀是吸收知識的途徑，不同的閱讀模式所帶來的閱讀成效也會不同。如何透過機器學習的方式，從凝視點找出閱讀行為的關聯性，將是本研究的目標。實驗選擇低成本眼動儀紀錄讀者閱讀過程中的眼動資料，採用dispersion-based演算法找出凝視點，以計算凝視點特徵，包含凝視時間、凝視距離、凝視位置以及凝視方向。本研究將閱讀模式分成五種類別，包含快讀、慢讀、精讀、跳讀與關鍵字識別，透過不同文章的呈現，引導30位測試者遵循其內容進行閱讀，藉此收集不同行為模式的眼動資料。實驗流程中所有的眼動資料會隨機被分成為兩份，依序建立不同維度的訓練資料，由交叉驗證的分類結果找出理想之特徵與維度。以每次挑選6位測試者的眼動數據為測試資料進行5次分類驗證，其平均正確率為78.24%、74.19%、93.75%、87.96%以及96.20%，均達到不錯的分類結果。 / Reading is one of the paths to acquire knowledge. The efficiency is different when different reading patterns are involved. It is the objective of this research to classify reading patterns from fixation data using machine learning techniques. In our experiment, a low-cost eye tracker is employed to record the eye movements during the reading process. A dispersion-based algorithm is implemented to identify fixation from the recorded data. Features pertaining to fixation including duration, path length, landing position and fixation direction are extracted for classification purposes. Five categories of reading pattern are defined and investigated in this study, namely, speed reading, slow reading, in-depth reading, skim-and-skip, and keyword spotting. We have recruited thirty subjects to participate in our experiment. The participants are instructed to read different articles using specific styles designated by the experimenter in order to assign label to the collected data. Feature selection is achieved by analyzing the predictive results of cross-validation from the training data obtained from all subjects. The average classification accuracies in five-fold cross-validation are 78.24%, 74.19%, 93.75%, 87.96% and 96.20% using the eye movements of the six randomly selected subjects as test data. 眼動資料閱讀模式眼動儀交叉驗證 Eye movement Reading pattern Eye tracker Cross-validation
102	Statistical models for an MTPL portfolio / Statistical models for an MTPL portfolio Pirozhkova, Daria January 2017 (has links) In this thesis, we consider several statistical techniques applicable to claim frequency models of an MTPL portfolio with a focus on overdispersion. The practical part of the work is focused on the application and comparison of the models on real data represented by an MTPL portfolio. The comparison is presented by the results of goodness-of-fit measures. Furthermore, the predictive power of selected models is tested for the given dataset, using the simulation method. Hence, this thesis provides a combination of the analysis of goodness-of-fit results and the predictive power of the models.
103	Approche pour la construction de modèles d'estimation réaliste de l'effort/coût de projet dans un environnement incertain : application au domaine du développement logiciel / Approach to build realistic models for estimating project effort/cost in an uncertain environment : application to the software development field Laqrichi, Safae 17 December 2015 (has links) L'estimation de l'effort de développement logiciel est l'une des tâches les plus importantes dans le management de projets logiciels. Elle constitue la base pour la planification, le contrôle et la prise de décision. La réalisation d'estimations fiables en phase amont des projets est une activité complexe et difficile du fait, entre autres, d'un manque d'informations sur le projet et son avenir, de changements rapides dans les méthodes et technologies liées au domaine logiciel et d'un manque d'expérience avec des projets similaires. De nombreux modèles d'estimation existent, mais il est difficile d'identifier un modèle performant pour tous les types de projets et applicable à toutes les entreprises (différents niveaux d'expérience, technologies maitrisées et pratiques de management de projet). Globalement, l'ensemble de ces modèles formule l'hypothèse forte que (1) les données collectées sont complètes et suffisantes, (2) les lois reliant les paramètres caractérisant les projets sont parfaitement identifiables et (3) que les informations sur le nouveau projet sont certaines et déterministes. Or, dans la réalité du terrain cela est difficile à assurer. Deux problématiques émergent alors de ces constats : comment sélectionner un modèle d'estimation pour une entreprise spécifique ? et comment conduire une estimation pour un nouveau projet présentant des incertitudes ? Les travaux de cette thèse s'intéressent à répondre à ces questions en proposant une approche générale d'estimation. Cette approche couvre deux phases : une phase de construction du système d'estimation et une phase d'utilisation du système pour l'estimation de nouveaux projets. La phase de construction du système d'estimation est composée de trois processus : 1) évaluation et comparaison fiable de différents modèles d'estimation, et sélection du modèle d'estimation le plus adéquat, 2) construction d'un système d'estimation réaliste à partir du modèle d'estimation sélectionné et 3) utilisation du système d'estimation dans l'estimation d'effort de nouveaux projets caractérisés par des incertitudes. Cette approche intervient comme un outil d'aide à la décision pour les chefs de projets dans l'aide à l'estimation réaliste de l'effort, des coûts et des délais de leurs projets logiciels. L'implémentation de l'ensemble des processus et pratiques développés dans le cadre de ces travaux ont donné naissance à un prototype informatique open-source. Les résultats de cette thèse s'inscrivent dans le cadre du projet ProjEstimate FUI13. / Software effort estimation is one of the most important tasks in the management of software projects. It is the basis for planning, control and decision making. Achieving reliable estimates in projects upstream phases is a complex and difficult activity because, among others, of the lack of information about the project and its future, the rapid changes in the methods and technologies related to the software field and the lack of experience with similar projects. Many estimation models exist, but it is difficult to identify a successful model for all types of projects and that is applicable to all companies (different levels of experience, mastered technologies and project management practices). Overall, all of these models form the strong assumption that (1) the data collected are complete and sufficient, (2) laws linking the parameters characterizing the projects are fully identifiable and (3) information on the new project are certain and deterministic. However, in reality on the ground, that is difficult to be ensured.Two problems then emerge from these observations: how to select an estimation model for a specific company ? and how to conduct an estimate for a new project that presents uncertainties ?The work of this thesis interested in answering these questions by proposing a general estimation framework. This framework covers two phases: the construction phase of the estimation system and system usage phase for estimating new projects. The construction phase of the rating system consists of two processes: 1) evaluation and reliable comparison of different estimation models then selection the most suitable estimation model, 2) construction of a realistic estimation system from the selected estimation model and 3) use of the estimation system in estimating effort of new projects that are characterized by uncertainties. This approach acts as an aid to decision making for project managers in supporting the realistic estimate of effort, cost and time of their software projects. The implementation of all processes and practices developed as part of this work has given rise to an open-source computer prototype. The results of this thesis fall in the context of ProjEstimate FUI13 project. Sélection de modèle d'estimation Incertitude Validation croisée Réseaux de neurones Forêts d'arbres décisionnels Bootstrap Estimation model selection Uncertainty Cross validation Neural networks Random forests Bootstrap 658.404
104	Beam position diagnostics with higher order modes in third harmonic superconducting accelerating cavities Zhang, Pei January 2013 (has links) Higher order modes (HOM) are electromagnetic resonant fields. They can be excited by an electron beam entering an accelerating cavity, and constitute a component of the wakefield. This wakefield has the potential to dilute the beam quality and, in the worst case, result in a beam-break-up instability. It is therefore important to ensure that these fields are well suppressed by extracting energy through special couplers. In addition, the effect of the transverse wakefield can be reduced by aligning the beam on the cavity axis. This is due to their strength depending on the transverse offset of the excitation beam. For suitably small offsets the dominant components of the transverse wakefield are dipole modes, with a linear dependence on the transverse offset of the excitation bunch. This fact enables the transverse beam position inside the cavity to be determined by measuring the dipole modes extracted from the couplers, similar to a cavity beam position monitor (BPM), but requires no additional vacuum instrumentation.At the FLASH facility in DESY, 1.3 GHz (known as TESLA) and 3.9 GHz (third harmonic) cavities are installed. Wakefields in 3.9 GHz cavities are significantly larger than in the 1.3 GHz cavities. It is therefore important to mitigate the adverse effects of HOMs to the beam by aligning the beam on the electric axis of the cavities. This alignment requires an accurate beam position diagnostics inside the 3.9 GHz cavities. It is this aspect that is focused on in this thesis. Although the principle of beam diagnostics with HOM has been demonstrated on 1.3 GHz cavities, the realization in 3.9 GHz cavities is considerably more challenging. This is due to the dense HOM spectrum and the relatively strong coupling of most HOMs amongst the four cavities in the third harmonic cryo-module. A comprehensive series of simulations and HOM spectra measurements have been performed in order to study the modal band structure of the 3.9 GHz cavities. The dependencies of various dipole modes on the offset of the excitation beam were subsequently studied using a spectrum analyzer. Various data analysis methods were used: modal identification, direct linear regression, singular value decomposition and k-means clustering. These studies lead to three modal options promising for beam position diagnostics, upon which a set of test electronics has been built. The experiments with these electronics suggest a resolution of 50 micron accuracy in predicting local beam position in the cavity and a global resolution of 20 micron over the complete module. This constitutes the first demonstration of HOM-based beam diagnostics in a third harmonic 3.9 GHz superconducting cavity module. These studies have finalized the design of the online HOM-BPM for 3.9 GHz cavities at FLASH. 539.7
105	Extending covariance structure analysis for multivariate and functional data Sheppard, Therese January 2010 (has links) For multivariate data, when testing homogeneity of covariance matrices arising from two or more groups, Bartlett's (1937) modified likelihood ratio test statistic is appropriate to use under the null hypothesis of equal covariance matrices where the null distribution of the test statistic is based on the restrictive assumption of normality. Zhang and Boos (1992) provide a pooled bootstrap approach when the data cannot be assumed to be normally distributed. We give three alternative bootstrap techniques to testing homogeneity of covariance matrices when it is both inappropriate to pool the data into one single population as in the pooled bootstrap procedure and when the data are not normally distributed. We further show that our alternative bootstrap methodology can be extended to testing Flury's (1988) hierarchy of covariance structure models. Where deviations from normality exist, we show, by simulation, that the normal theory log-likelihood ratio test statistic is less viable compared with our bootstrap methodology. For functional data, Ramsay and Silverman (2005) and Lee et al (2002) together provide four computational techniques for functional principal component analysis (PCA) followed by covariance structure estimation. When the smoothing method for smoothing individual profiles is based on using least squares cubic B-splines or regression splines, we find that the ensuing covariance matrix estimate suffers from loss of dimensionality. We show that ridge regression can be used to resolve this problem, but only for the discretisation and numerical quadrature approaches to estimation, and that choice of a suitable ridge parameter is not arbitrary. We further show the unsuitability of regression splines when deciding on the optimal degree of smoothing to apply to individual profiles. To gain insight into smoothing parameter choice for functional data, we compare kernel and spline approaches to smoothing individual profiles in a nonparametric regression context. Our simulation results justify a kernel approach using a new criterion based on predicted squared error. We also show by simulation that, when taking account of correlation, a kernel approach using a generalized cross validatory type criterion performs well. These data-based methods for selecting the smoothing parameter are illustrated prior to a functional PCA on a real data set. 519.5
106	Apprentissage ciblé et Big Data : contribution à la réconciliation de l'estimation adaptative et de l’inférence statistique / Targeted learning in Big Data : bridging data-adaptive estimation and statistical inference Zheng, Wenjing 21 July 2016 (has links) Cette thèse porte sur le développement de méthodes semi-paramétriques robustes pour l'inférence de paramètres complexes émergeant à l'interface de l'inférence causale et la biostatistique. Ses motivations sont les applications à la recherche épidémiologique et médicale à l'ère des Big Data. Nous abordons plus particulièrement deux défis statistiques pour réconcilier, dans chaque contexte, estimation adaptative et inférence statistique. Le premier défi concerne la maximisation de l'information tirée d'essais contrôlés randomisés (ECRs) grâce à la conception d'essais adaptatifs. Nous présentons un cadre théorique pour la construction et l'analyse d'ECRs groupes-séquentiels, réponses-adaptatifs et ajustés aux covariable (traduction de l'expression anglaise « group-sequential, response-adaptive, covariate-adjusted », d'où l'acronyme CARA) qui permettent le recours à des procédures adaptatives d'estimation à la fois pour la construction dynamique des schémas de randomisation et pour l'estimation du modèle de réponse conditionnelle. Ce cadre enrichit la littérature existante sur les ECRs CARA notamment parce que l'estimation des effets est garantie robuste même lorsque les modèles sur lesquels s'appuient les procédures adaptatives d'estimation sont mal spécificiés. Le second défi concerne la mise au point et l'étude asymptotique d'une procédure inférentielle semi-paramétrique avec estimation adaptative des paramètres de nuisance. A titre d'exemple, nous choisissons comme paramètre d'intérêt la différence des risques marginaux pour un traitement binaire. Nous proposons une version cross-validée du principe d'inférence par minimisation ciblée de pertes (« Cross-validated Targeted Mimum Loss Estimation » en anglais, d'où l'acronyme CV-TMLE) qui, comme son nom le suggère, marie la procédure TMLE classique et le principe de la validation croisée. L'estimateur CV-TMLE ainsi élaboré hérite de la propriété typique de double-robustesse et aussi des propriétés d'efficacité du TMLE classique. De façon remarquable, le CV-TMLE est linéairement asymptotique sous des conditions minimales, sans recourir aux conditions de type Donsker. / This dissertation focuses on developing robust semiparametric methods for complex parameters that emerge at the interface of causal inference and biostatistics, with applications to epidemiological and medical research in the era of Big Data. Specifically, we address two statistical challenges that arise in bridging the disconnect between data-adaptive estimation and statistical inference. The first challenge arises in maximizing information learned from Randomized Control Trials (RCT) through the use of adaptive trial designs. We present a framework to construct and analyze group sequential covariate-adjusted response-adaptive (CARA) RCTs that admits the use of data-adaptive approaches in constructing the randomization schemes and in estimating the conditional response model. This framework adds to the existing literature on CARA RCTs by allowing flexible options in both their design and analysis and by providing robust effect estimates even under model mis-specifications. The second challenge arises from obtaining a Central Limit Theorem when data-adaptive estimation is used to estimate the nuisance parameters. We consider as target parameter of interest the marginal risk difference of the outcome under a binary treatment, and propose a Cross-validated Targeted Minimum Loss Estimator (TMLE), which augments the classical TMLE with a sample-splitting procedure. The proposed Cross-Validated TMLE (CV-TMLE) inherits the double robustness properties and efficiency properties of the classical TMLE , and achieves asymptotic linearity at minimal conditions by avoiding the Donsker class condition. Apprentissage ciblé Big data Essais randomisé Modèles semi-paramétrique Schéma adaptatif Validation croisée Adaptive design Big data Cross-validation Randomized controlled trials Semi-parametric models Targeted learning 519
107	Credit Scoring using Machine Learning Approaches Chitambira, Bornvalue January 2022 (has links) This project will explore machine learning approaches that are used in creditscoring. In this study we consider consumer credit scoring instead of corporatecredit scoring and our focus is on methods that are currently used in practiceby banks such as logistic regression and decision trees and also compare theirperformance against machine learning approaches such as support vector machines (SVM), neural networks and random forests. In our models we addressimportant issues such as dataset imbalance, model overfitting and calibrationof model probabilities. The six machine learning methods we study are support vector machine, logistic regression, k-nearest neighbour, artificial neuralnetworks, decision trees and random forests. We implement these models inpython and analyse their performance on credit dataset with 30000 observations from Taiwan, extracted from the University of California Irvine (UCI)machine learning repository. Credit Scoring Logistic Regression Decision Trees Artificial Neural Networks Random forests Support Vector Machine k-nearest neighbour cross validation imbalanced dataset Mathematics Matematik
108	Prediction of Credit Risk using Machine Learning Models Isaac, Philip January 2022 (has links) This thesis aims to investigate different machine learning (ML) models and their performance to find the best performing model to predict credit risk at a specific company. Since granting credit to corporate customers is a part of this company's core business, managing the credit risk is of high importance. The company has of today only one credit risk measurement, which is obtained through an external company, and the goal is to find a model that outperforms this measurement. The study consists of two ML models, Logistic Regression (LR) and eXtreme Gradient Boosting. This thesis proves that both methods perform better than the external risk measurement and the LR method achieves the overall best performance. One of the most important analyses done in this thesis was handling the dataset and finding the best-suited combination of features that the ML models should use. Credit Risk Credit Risk Scorecard Machine Learning Artificial Intelligence AI Logistic Regression eXtreme Gradient Boosting ROC-AUC Binning Cross-Validation Correlation Computer Sciences Datavetenskap (datalogi)
109	Classifying human activities through machine learning Lannge, Jakob, Majed, Ali January 2018 (has links) Klassificering av dagliga aktiviteter (ADL) kan användas i system som bevakar människors aktiviteter i olika syften. T.ex., i nödsituationssystem. Med machine learning och bärbara sensor som samlar in data kan ADL klassificeras med hög noggrannhet. I detta arbete, ett proof-of-concept system med tre olika machine learning algoritmer utvärderas och jämförs mellan tre olika dataset, ett som är allmänt tillgängligt på (Ugulino, et al., 2012), och två som har samlats in i rapporten med hjälp av en android enhet. Algoritmerna som har använts är: Multiclass Decision Forest, Multiclass Decision Jungle and Multiclass Neural Network. Sensorerna som har använts är en accelerometer och ett gyroskop. Resultatet visar hur ett konceptuellt system kan byggas i Azure Machine Learning Studio, och hur tre olika algoritmer presterar vid klassificering av tre olika dataset. En algoritm visar högre precision vid klassning av Ugolino’s dataset, jämfört med machine learning modellen som ursprungligen används i rapporten. / Classifying Activities of daily life (ADL) can be used in a system that monitor people’s activities for different purposes. For example, in emergency systems. Machine learning is a way to classify ADL with high accuracy, using wearable sensors as an input. In this paper, a proof-of-concept system consisting of three different machine learning algorithms is evaluated and compared between tree different datasets, one publicly available at (Ugulino, et al., 2012), and two collected in this paper using an android device’s accelerometer and gyroscope sensor. The algorithms are: Multiclass Decision Forest, Multiclass Decision Jungle and Multiclass Neural Network. The two sensors used are an accelerometer and a gyroscope. The result shows how a system can be implemented using Azure Machine Learning Studio, and how three different algorithms performs when classifying three different datasets. One algorithm achieves a higher accuracy compared to the machine learning model initially used with the Ugolino data set. machine learning activity of daily life ADL supervised learning multiclass decision forest multiclass decision jungle multiclass neural network cross validation Azure Android Java gyroscope accelerometer Engineering and Technology Teknik och teknologier
110	Performance evaluation of security mechanisms in Cloud Networks Kannan, Anand January 2012 (has links) Infrastructure as a Service (IaaS) is a cloud service provisioning model which largely focuses on data centre provisioning of computing and storage facilities. The networking aspects of IaaS beyond the data centre are a limiting factor preventing communication services that are sensitive to network characteristics from adopting this approach. Cloud networking is a new technology which integrates network provisioning with the existing cloud service provisioning models thereby completing the cloud computing picture by addressing the networking aspects. In cloud networking, shared network resources are virtualized, and provisioned to customers and end-users on-demand in an elastic fashion. This technology allows various kinds of optimization, e.g., reducing latency and network load. Further, this allows service providers to provision network performance guarantees as a part of their service offering. However, this new approach introduces new security challenges. Many of these security challenges are addressed in the CloNe security architecture. This thesis presents a set of potential techniques for securing different resource in a cloud network environment which are not addressed in the existing CloNe security architecture. The thesis begins with a holistic view of the Cloud networking, as described in the Scalable and Adaptive Internet Solutions (SAIL) project, along with its proposed architecture and security goals. This is followed by an overview of the problems that need to be solved and some of the different methods that can be applied to solve parts of the overall problem, specifically a comprehensive, tightly integrated, and multi-level security architecture, a key management algorithm to support the access control mechanism, and an intrusion detection mechanism. For each method or set of methods, the respective state of the art is presented. Additionally, experiments to understand the performance of these mechanisms are evaluated on a simple cloud network test bed. The proposed key management scheme uses a hierarchical key management approach that provides fast and secure key update when member join and member leave operations are carried out. Experiments show that the proposed key management scheme enhances the security and increases the availability and integrity. A newly proposed genetic algorithm based feature selection technique has been employed for effective feature selection. Fuzzy SVM has been used on the data set for effective classification. Experiments have shown that the proposed genetic based feature selection algorithm reduces the number of features and hence decreases the classification time, while improving detection accuracy of the fuzzy SVM classifier by minimizing the conflicting rules that may confuse the classifier. The main advantages of this intrusion detection system are the reduction in false positives and increased security. / Infrastructure as a Service (IaaS) är en Cloudtjänstmodell som huvudsakligen är inriktat på att tillhandahålla ett datacenter för behandling och lagring av data. Nätverksaspekterna av en cloudbaserad infrastruktur som en tjänst utanför datacentret utgör en begränsande faktor som förhindrar känsliga kommunikationstjänster från att anamma denna teknik. Cloudnätverk är en ny teknik som integrerar nätverkstillgång med befintliga cloudtjänstmodeller och därmed fullbordar föreställningen av cloud data genom att ta itu med nätverkaspekten. I cloudnätverk virtualiseras delade nätverksresurser, de avsätts till kunder och slutanvändare vid efterfrågan på ett flexibelt sätt. Denna teknik tillåter olika typer av möjligheter, t.ex. att minska latens och belastningen på nätet. Vidare ger detta tjänsteleverantörer ett sätt att tillhandahålla garantier för nätverksprestandan som en del av deras tjänsteutbud. Men denna nya strategi introducerar nya säkerhetsutmaningar, exempelvis VM migration genom offentligt nätverk. Många av dessa säkerhetsutmaningar behandlas i CloNe’s Security Architecture. Denna rapport presenterar en rad av potentiella tekniker för att säkra olika resurser i en cloudbaserad nätverksmiljö som inte behandlas i den redan existerande CloNe Security Architecture. Rapporten inleds med en helhetssyn på cloudbaserad nätverk som beskrivs i Scalable and Adaptive Internet Solutions (SAIL)-projektet, tillsammans med dess föreslagna arkitektur och säkerhetsmål. Detta följs av en översikt över de problem som måste lösas och några av de olika metoder som kan tillämpas för att lösa delar av det övergripande problemet. Speciellt behandlas en omfattande och tätt integrerad multi-säkerhetsarkitektur, en nyckelhanteringsalgoritm som stödjer mekanismens åtkomstkontroll och en mekanism för intrångsdetektering. För varje metod eller för varje uppsättning av metoder, presenteras ståndpunkten för respektive teknik. Dessutom har experimenten för att förstå prestandan av dessa mekanismer utvärderats på testbädd av ett enkelt cloudnätverk. Den föreslagna nyckelhantering system använder en hierarkisk nyckelhantering strategi som ger snabb och säker viktig uppdatering när medlemmar ansluta sig till och medlemmarna lämnar utförs. Försöksresultat visar att den föreslagna nyckelhantering system ökar säkerheten och ökar tillgänglighet och integritet. En nyligen föreslagna genetisk algoritm baserad funktion valet teknik har använts för effektiv funktion val. Fuzzy SVM har använts på de uppgifter som för effektiv klassificering. Försök har visat att den föreslagna genetiska baserad funktion selekteringsalgoritmen minskar antalet funktioner och därmed minskar klassificering tiden, och samtidigt förbättra upptäckt noggrannhet fuzzy SVM klassificeraren genom att minimera de motstående regler som kan förvirra klassificeraren. De främsta fördelarna med detta intrångsdetekteringssystem är den minskning av falska positiva och ökad säkerhet. Cloud Networks Key management mobile agent telco cloud open flow Intrusion Detection System (IDS) Genetic Algorithm (GA) Fuzzy Support Vector Machine (FSVM) tenfold cross validation Communication Systems Kommunikationssystem

Search results