• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 299
  • 103
  • 39
  • 35
  • 32
  • 23
  • 11
  • 10
  • 9
  • 8
  • 8
  • 6
  • 6
  • 5
  • 5
  • Tagged with
  • 691
  • 126
  • 126
  • 123
  • 105
  • 93
  • 89
  • 82
  • 76
  • 70
  • 59
  • 57
  • 54
  • 53
  • 53
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
401

Analyse statistique de processus stochastiques : application sur des données d’orages / Inference for some stochastic processes : with application on thunderstorm data

Do, Van-Cuong 19 April 2019 (has links)
Les travaux présentés dans cette thèse concernent l'analyse statistique de cas particuliers du processus de Cox. Dans une première partie, nous proposons une synthèse des résultats existants sur le processus power-law (processus d'intensité puissance), synthèse qui ne peut être exhaustive étant donné la popularité de ce processus. Nous considérons une approche bayésienne pour l'inférence des paramètres de ce processus qui nous conduit à introduire et à étudier en détails une distribution que nous appelons loi H-B. Cette loi est une loi conjuguée. Nous proposons des stratégies d'élicitation des hyperparamètres et étudions le comportement des estimateurs de Bayes par des simulations. Dans un deuxième temps, nous étendons ces travaux au cas du processus d’intensité exponentielle (exponential-law process). De la même façon, nous définissons et étudions une loi conjuguée pour l'analyse bayésienne de ce dernier. Dans la dernière partie de la thèse, nous considérons un processus auto-excité qui intègre une covariable. Ce travail est motivé, à l'origine, par un problème de fiabilité qui concerne des données de défaillances de matériels exposés à des environnements sévères. Les résultats sont illustrés par des applications sur des données d'activités orageuses collectées dans deux départements français. Enfin, nous donnons quelques directions de travail et perspectives de futurs développements de l'ensemble de nos travaux. / The work presented in this PhD dissertation concerns the statistical analysis of some particular cases of the Cox process. In a first part, we study the power-law process (PLP). Since the literature for the PLP is abundant, we suggest a state-of-art for the process. We consider the classical approach and recall some important properties of the maximum likelihood estimators. Then we investigate a Bayesian approach with noninformative priors and conjugate priors considering different parametrizations and scenarios of prior guesses. That leads us to define a family of distributions that we name H-B distribution as the natural conjugate priors for the PLP. Bayesian analysis with the conjugate priors are conducted via a simulation study and an application on real data. In a second part, we study the exponential-law process (ELP). We review the maximum likelihood techniques. For Bayesian analysis of the ELP, we define conjugate priors: the modified- Gumbel distribution and Gamma-modified-Gumbel distribution. We conduct a simulation study to compare maximum likelihood estimates and Bayesian estimates. In the third part, we investigate self-exciting point processes and we integrate a power-law covariate model to this intensity of this process. A maximum likelihood procedure for the model is proposed and the Bayesian approach is suggested. Lastly, we present an application on thunderstorm data collected in two French regions. We consider a strategy to define a thunderstorm as a temporal process associated with the charges in a particular location. Some selected thunderstorms are analyzed. We propose a reduced maximum likelihood procedure to estimate the parameters of the Hawkes process. Then we fit some thunderstorms to the power-law covariate self-exciting point process taking into account the associated charges. In conclusion, we give some perspectives for further work.
402

Bayesian Approach for Reliable GNSS-based Vehicle Localization in Urban Areas

Obst, Marcus 19 December 2014 (has links)
Nowadays, satellite-based localization is a well-established technical solution to support several navigation tasks in daily life. Besides the application inside of portable devices, satellite-based positioning is used for in-vehicle navigation systems as well. Moreover, due to its global coverage and the availability of inexpensive receiver hardware it is an appealing technology for numerous applications in the area of Intelligent Transportation Systems (ITSs). However, it has to be admitted that most of the aforementioned examples either rely on modest accuracy requirements or are not sensitive to temporary integrity violations. Although technical concepts of Advanced Driver Assistance Systems (ADASs) based on Global Navigation Satellite Systems (GNSSs) have been successfully demonstrated under open sky conditions, practice reveals that such systems suffer from degraded satellite signal quality when put into urban areas. Thus, the main research objective of this thesis is to provide a reliable vehicle positioning concept which can be used in urban areas without the aforementioned limitations. Therefore, an integrated probabilistic approach which preforms fault detection & exclusion, localization and multi-sensor data fusion within one unified Bayesian framework is proposed. From an algorithmic perspective, the presented concept is based on a probabilistic data association technique with explicit handling of outlier measurements as present in urban areas. By that approach, the accuracy, integrity and availability are improved at the same time, that is, a consistent positioning solution is provided. In addition, a comprehensive and in-depth analysis of typical errors in urban areas within the pseudorange domain is performed. Based on this analysis, probabilistic models are proposed and later on used to facilitate the positioning algorithm. Moreover, the presented concept clearly targets towards mass-market applications based on low-cost receivers and hence aims to replace costly sensors by smart algorithms. The benefits of these theoretical contributions are implemented and demonstrated on the example of a real-time vehicle positioning prototype as used inside of the European research project GAlileo Interactive driviNg (GAIN). This work describes all necessary parts of this system including GNSS signal processing, fault detection and multi-sensor data fusion within one processing chain. Finally, the performance and benefits of the proposed concept are examined and validated both with simulated and comprehensive real-world sensor data from numerous test drives.
403

Categorization of Swedish e-mails using Supervised Machine Learning / Kategorisering av svenska e-postmeddelanden med användning av övervakad maskininlärning

Mann, Anna, Höft, Olivia January 2021 (has links)
Society today is becoming more digitalized, and a common way of communication is to send e-mails. Currently, the company Auranest has a filtering method for categorizing e-mails, but the method is a few years old. The filter provides a classification of valuable e-mails for jobseekers, where employers can make contact. The company wants to know if the categorization can be performed with a different method and improved. The degree project aims to investigate whether the categorization can be proceeded with higher accuracy using machine learning. Three supervised machine learning algorithms, Naïve Bayes, Support Vector Machine (SVM), and Decision Tree, have been examined, and the algorithm with the highest results has been compared with Auranest's existing filter. Accuracy, Precision, Recall, and F1 score have been used to determine which machine learning algorithm received the highest results and in comparison, with Auranest's filter. The results showed that the supervised machine learning algorithm SVM achieved the best results in all metrics. The comparison between Auranest's existing filter and SVM showed that SVM performed better in all calculated metrics, where the accuracy showed 99.5% for SVM and 93.03% for Auranest’s filter. The comparative results showed that accuracy was the only factor that received similar results. For the other metrics, there was a noticeable difference. / Dagens samhälle blir alltmer digitaliserat och ett vanligt kommunikationssätt är att skicka e-postmeddelanden. I dagsläget har företaget Auranest ett filter för att kategorisera e-postmeddelanden men filtret är några år gammalt. Användningsområdet för filtret är att sortera ut värdefulla e-postmeddelanden för arbetssökande, där kontakt kan ske från arbetsgivare. Företaget vill veta ifall kategoriseringen kan göras med en annan metod samt förbättras. Målet med examensarbetet är att undersöka ifall filtreringen kan göras med högre träffsäkerhet med hjälp av maskininlärning. Tre övervakade maskininlärningsalgoritmer, Naïve Bayes, Support Vector Machine (SVM) och Decision Tree, har granskats och algoritmen med de högsta resultaten har jämförts med Auranests befintliga filter. Träffsäkerhet, precision, känslighet och F1-poäng har använts för att avgöra vilken maskininlärningsalgoritm som gav högst resultat sinsemellan samt i jämförelse med Auranests filter. Resultatet påvisade att den övervakade maskininlärningsmetoden SVM åstadkom de främsta resultaten i samtliga mätvärden. Jämförelsen mellan Auranests befintliga filter och SVM visade att SVM presterade bättre i alla kalkylerade mätvärden, där träffsäkerheten visade 99,5% för SVM och 93,03% för Auranests filter. De jämförande resultaten visade att träffsäkerheten var den enda faktorn som gav liknande resultat. För de övriga mätvärdena var det en märkbar skillnad.
404

Ärendehantering genom maskininlärning

Bennheden, Daniel January 2023 (has links)
Det här examensarbetet undersöker hur artificiell intelligens kan användas för att automatisktkategorisera felanmälan som behandlas i ett ärendehanteringssystem genom att användamaskininlärning och tekniker som text mining. Studien utgår från Design Science ResearchMethodology och Peffers sex steg för designmetodologi som utöver design även berör kravställningoch utvärdering av funktion. Maskininlärningsmodellerna som tagits fram tränades på historiskadata från ärendehanteringssystem Infracontrol Online med fyra typer av olika algoritmer, NaiveBayes, Support Vector Machine, Neural Network och Random Forest. En webapplikation togs framför att demonstrera hur en av de maskininlärningsmodeller som tränats fungerar och kan användasför att kategorisera text. Olika användare av systemet har därefter haft möjlighet att testafunktionen och utvärdera hur den fungerar genom att markera när kategoriseringen avtextprompter träffar rätt respektive fel.Resultatet visar på att det är möjligt att lösa uppgiften med hjälp av maskininlärning. En avgörandedel av utvecklingsarbetet för att göra modellen användbar var urvalet av data som användes för attträna modellen. Olika kunder som använder systemet, använder det på olika sätt, vilket gjorde detfördelaktigt att separera dem och träna modeller för olika kunder individuellt. En källa tillinkonsistenta resultat är hur organisationer förändrar sina processer och ärendehantering över tidoch problemet hanterades genom att begränsa hur långt tillbaka i tiden modellen hämtar data förträning. Dessa två strategier för att hantera problem har nackdelen att den mängd historiska datasom finns tillgänglig att träna modellen på minskar, men resultaten visar inte någon tydlig nackdelför de maskininlärningsmodeller som tränats på mindre datamängder utan även de har en godtagbarträffsäkerhet. / This thesis investigates how artificial intelligence can be used to automatically categorize faultreports that are processed in a case management system by using machine learning and techniquessuch as text mining. The study is based on Design Science Research Methodology and Peffer's sixsteps of design methodology, which in addition to design of an artifact concerns requirements andevaluation. The machine learning models that were developed were trained on historical data fromthe case management system Infracontrol Online, using four types of algorithms, Naive Bayes,Support Vector Machine, Neural Network, and Random Forest. A web application was developed todemonstrate how one of the machine learning models trained works and can be used to categorizetext. Regular users of the system have then had the opportunity to test the performance of themodel and evaluate how it works by marking where it categorizes text prompts correctly.The results show that it is possible to solve the task using machine learning. A crucial part of thedevelopment was the selection of data used to train the model. Different customers using thesystem use it in different ways, which made it advantageous to separate them and train models fordifferent customers independently. Another source of inconsistent results is how organizationschange their processes and thus case management over time. This issue was addressed by limitinghow far back in time the model retrieves data for training. The two strategies for solving the issuesmentioned have the disadvantage that the amount of historical data available for training decreases,but the results do not show any clear disadvantage for the machine learning models trained onsmaller data sets. They perform well and tests show an acceptable level of accuracy for theirpredictions
405

Machine Learning Algorithms to Predict Cost Account Codes in an ERP System : An Exploratory Case Study

Wirdemo, Alexander January 2023 (has links)
This study aimed to investigate how Machine Learning (ML) algorithms can be used to predict the cost account code to be used when handling invoices in an Enterprise Resource Planning (ERP) system commonly found in the Swedish public sector. This implied testing which one of the tested algorithms that performs the best and what criteria that need to be met in order to perform the best. Previous studies on ML and its use in invoice classification have focused on either the accounts payable side or the accounts receivable side of the balance sheet. The studies have used a variety of methods, some not only involving common ML algorithms such as Random forest, Naïve Bayes, Decision tree, Support Vector Machine, Logistic regression, Neural network or k-nearest Neighbor but also other classifiers such as rule classifiers and naïve classifiers. The general conclusion from previous studies is that several algorithms can classify invoices with a satisfactory accuracy score and that Random forest, Naïve Bayes and Neural network have shown the most promising results. The study was performed as an exploratory case study. The case company was a small municipal community where the finance clerks handles received invoices through an ERP system. The accounting step of invoice handling involves selecting the proper cost account code before submitting the invoice for review and approval. The data used was invoice summaries holding the organization number, bankgiro, postgiro and account code used. The algorithms selected for the task were the supervised learning algorithms Random forest and Naïve Bayes and the instance-based algorithm k-Nearest Neighbor (k-NN). The findings indicated that ML could be used to predict which cost account code to be used by providing a pre-filled suggestion when the clerk opens the invoice. Among the algorithms tested, Random forest performed the best with 78% accuracy (Naïve Bayes and k-NN performed at 69% and 70% accuracy, respectively). One reason for this is Random forest’s ability to handle several input variables, generate an unbiased estimate of the generalization error, and its ability to give information about the relationship between the variables and classification. However, a high level of support is needed in order to get the algorithm to perform at its best, where 335 occurrences is a guiding number in this case. / Syftet med denna studie var att undersöka hur Machine Learning (ML) algoritmer kan användas för att förutsäga vilken kontokod som ska användas vid hantering av fakturor i ett affärssystem som är vanligt förekommande i svensk offentlig sektor. Detta innebar att undersöka vilken av de testade algoritmerna som presterar bäst och vilka kriterier som måste uppfyllas för att prestera bäst. Tidigare studier om ML och dess användning vid fakturaklassificering har fokuserat på antingen balansräkningens leverantörsreskontra (leverantörsskulder) eller kundreskontrasidan (kundfordringar) i balansräkningen. Studierna har använt olika metoder, några involverar inte bara vanliga ML-algoritmer som Random forest, Naive Bayes, beslutsträd, Support Vector Machine, Logistisk regression, Neuralt nätverk eller k-nearest Neighbour, utan även andra klassificerare som regelklassificerare och naiva klassificerare. Den generella slutsatsen från tidigare studier är att det finns flera algoritmer som kan klassificera fakturor med en tillfredsställande noggrannhet, och att Random forest, Naive Bayes och neurala nätverk har visat de mest lovande resultaten. Studien utfördes som en explorativ fallstudie. Fallföretaget var en mindre kommun där ekonomiassistenter hanterar inkommande fakturor genom ett affärssystem. Bokföringssteget för fakturahantering innebär att användaren väljer rätt kostnadskontokod innan fakturan skickas för granskning och godkännande. Uppgifterna som användes var fakturasammandrag med organisationsnummer, bankgiro, postgiro och kontokod. Algoritmerna som valdes för uppgiften var de övervakade inlärningsalgoritmerna Random forest och Naive Bayes och den instansbaserade algoritmen k-Nearest Neighbour. Resultaten tyder på att ML skulle kunna användas för att förutsäga vilken kostnadskod som ska användas genom att ge ett förifyllt förslag när expediten öppnar fakturan. Bland de testade algoritmerna presterade Random forest bäst med 78 % noggrannhet (Naïve Bayes och k-Nearest Neighbour presterade med 69 % respektive 70 % noggrannhet). En förklaring till detta är Random forests förmåga att hantera flera indatavariabler, generera en opartisk skattning av generaliseringsfelet och dess förmåga att ge information om sambandet mellan variablerna och klassificeringen. Det krävs dock en högt antal dataobservationer för att få algoritmen att prestera som bäst, där 335 förekomster är ett minimum i detta fall.
406

Maskininlärning för dokumentklassificering av finansielladokument med fokus på fakturor / Machine Learning for Document Classification of FinancialDocuments with Focus on Invoices

Khalid Saeed, Nawar January 2022 (has links)
Automatiserad dokumentklassificering är en process eller metod som syftar till att bearbeta ochhantera dokument i digitala former. Många företag strävar efter en textklassificeringsmetodiksom kan lösa olika problem. Ett av dessa problem är att klassificera och organisera ett stort antaldokument baserat på en uppsättning av fördefinierade kategorier.Detta examensarbete syftar till att hjälpa Medius, vilket är ett företag som arbetar med fakturaarbetsflöde, att klassificera dokumenten som behandlas i deras fakturaarbetsflöde till fakturoroch icke-fakturor. Detta har åstadkommits genom att implementera och utvärdera olika klassificeringsmetoder för maskininlärning med avseende på deras noggrannhet och effektivitet för attklassificera finansiella dokument, där endast fakturor är av intresse.I denna avhandling har två dokumentrepresentationsmetoder "Term Frequency Inverse DocumentFrequency (TF-IDF) och Doc2Vec" använts för att representera dokumenten som vektorer. Representationen syftar till att minska komplexiteten i dokumenten och göra de lättare att hantera.Dessutom har tre klassificeringsmetoder använts för att automatisera dokumentklassificeringsprocessen för fakturor. Dessa metoder var Logistic Regression, Multinomial Naïve Bayes och SupportVector Machine.Resultaten från denna avhandling visade att alla klassificeringsmetoder som använde TF-IDF, föratt representera dokumenten som vektorer, gav goda resultat i from av prestanda och noggranhet.Noggrannheten för alla tre klassificeringsmetoderna var över 90%, vilket var kravet för att dennastudie skulle anses vara lyckad. Dessutom verkade Logistic Regression att ha det lättare att klassificera dokumenten jämfört med andra metoder. Ett test på riktiga data "dokument" som flödarin i Medius fakturaarbetsflöde visade att Logistic Regression lyckades att korrekt klassificeranästan 96% av dokumenten.Avslutningsvis, fastställdes Logistic Regression tillsammans med TF-IDF som de övergripandeoch mest lämpliga metoderna att klara av problmet om dokumentklassficering. Dessvärre, kundeDoc2Vec inte ge ett bra resultat p.g.a. datamängden inte var anpassad och tillräcklig för attmetoden skulle fungera bra. / Automated document classification is an essential technique that aims to process and managedocuments in digital forms. Many companies strive for a text classification methodology thatcan solve a plethora of problems. One of these problems is classifying and organizing a massiveamount of documents based on a set of predefined categories.This thesis aims to help Medius, a company that works with invoice workflow, to classify theirdocuments into invoices and non-invoices. This has been accomplished by implementing andevaluating various machine learning classification methods in terms of their accuracy and efficiencyfor the task of financial document classification, where only invoices are of interest. Furthermore,the necessary pre-processing steps for achieving good performance are considered when evaluatingthe mentioned classification methods.In this study, two document representation methods "Term Frequency Inverse Document Frequency (TF-IDF) and Doc2Vec" were used to represent the documents as fixed-length vectors.The representation aims to reduce the complexity of the documents and make them easier tohandle. In addition, three classification methods have been used to automate the document classification process for invoices. These methods were Logistic Regression, Multinomial Naïve Bayesand Support Vector Machine.The results from this thesis indicate that all classification methods used TF-IDF, to represent thedocuments as vectors, give high performance and accuracy. The accuracy of all three classificationmethods is over 90%, which is the prerequisite for the success of this study. Moreover, LogisticRegression appears to cope with this task very easily, since it classifies the documents moreefficiently compared to the other methods. A test of real data flowing into Medius’ invoiceworkflow shows that Logistic Regression is able to correctly classify up to 96% of the data.In conclusion, the Logistic Regression together with TF-IDF is determined to be the overall mostappropriate method out of the other tested methods. In addition, Doc2Vec suffers to providea good result because the data set is not customized and sufficient for the method to workwell.
407

Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data

Zhong, Jianling January 2015 (has links)
<p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation
408

Fire size in tunnels

Carvel, Richard Oswald January 2004 (has links)
In recent years, a number of high profile accidental fires have occurred in several road and rail tunnels throughout the world. Many of these fires grew rapidly to catastrophic size and claimed many lives. The processes involved in the rapid growth and extremely severe of these fires are not adequately understood as yet. The introduction to this thesis reviews a number of these accidental fires and describes much of the previous experimental research which has brought about the current understanding of tunnel fire behaviour. A detailed review of the relevant parts of elementary fire dynamics is also presented. This thesis addresses two main questions: 1. What is the influence of longitudinal ventilation on fire size in tunnels? and 2. What is the influence of tunnel geometry on fire size? The answers to both these questions are determined using a probabilistic method called Bayes Theorem. This provides a method of answering the above two questions using the handful of experimental data which are available. It is found that the heat release rate (HRR) of a heavy goods vehicle (HGV) fire may be greatly increased in magnitude by longitudinal ventilation, for example by about a factor of 5 with a longitudinal ventilation velocity of 3ms-1. It is also found that longitudinal ventilation may cause a significant increase in the HRR of large pool fires, but may cause a decrease in the HRR of small pool fires and car fires. An equation is derived to predict the influence of tunnel geometry on HRR. It is found that HRR varies principally with the width of the tunnel and the width of the fire object. The HRR of a fire in a tunnel my be increased up to four times due to the geometry of the tunnel.
409

Bimodal adaptive hypermedia and interactive multimedia a web-based learning environment based on Kolb's theory of learning style

Salehian, Bahram January 2003 (has links)
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
410

Stochastické modely tvorby škodních rezerv / Stochastic Loss Reserving Models

Košová, Nataša January 2012 (has links)
In present thesis we study and describe a stochastic loss reserve model for individual insurers. Specifically, it is the model based on the three following features. Modelling of expected claims depends on unknown parameters which estimates need to be the most accurate. Aggregated occurred and paid losses for particular years are modelled by a collective risk model. The final reserve is estimated by Bayesian methodology that uses a prior information from a significant number of insurers. Part of the thesis is also an implementation of the program that calculates reserves by using our model and its testing on simulated data.

Page generated in 0.0523 seconds