Global ETD Search

431	Maskininlärning för dokumentklassificering av finansielladokument med fokus på fakturor / Machine Learning for Document Classification of FinancialDocuments with Focus on Invoices Khalid Saeed, Nawar January 2022 (has links) Automatiserad dokumentklassificering är en process eller metod som syftar till att bearbeta ochhantera dokument i digitala former. Många företag strävar efter en textklassificeringsmetodiksom kan lösa olika problem. Ett av dessa problem är att klassificera och organisera ett stort antaldokument baserat på en uppsättning av fördefinierade kategorier.Detta examensarbete syftar till att hjälpa Medius, vilket är ett företag som arbetar med fakturaarbetsflöde, att klassificera dokumenten som behandlas i deras fakturaarbetsflöde till fakturoroch icke-fakturor. Detta har åstadkommits genom att implementera och utvärdera olika klassificeringsmetoder för maskininlärning med avseende på deras noggrannhet och effektivitet för attklassificera finansiella dokument, där endast fakturor är av intresse.I denna avhandling har två dokumentrepresentationsmetoder "Term Frequency Inverse DocumentFrequency (TF-IDF) och Doc2Vec" använts för att representera dokumenten som vektorer. Representationen syftar till att minska komplexiteten i dokumenten och göra de lättare att hantera.Dessutom har tre klassificeringsmetoder använts för att automatisera dokumentklassificeringsprocessen för fakturor. Dessa metoder var Logistic Regression, Multinomial Naïve Bayes och SupportVector Machine.Resultaten från denna avhandling visade att alla klassificeringsmetoder som använde TF-IDF, föratt representera dokumenten som vektorer, gav goda resultat i from av prestanda och noggranhet.Noggrannheten för alla tre klassificeringsmetoderna var över 90%, vilket var kravet för att dennastudie skulle anses vara lyckad. Dessutom verkade Logistic Regression att ha det lättare att klassificera dokumenten jämfört med andra metoder. Ett test på riktiga data "dokument" som flödarin i Medius fakturaarbetsflöde visade att Logistic Regression lyckades att korrekt klassificeranästan 96% av dokumenten.Avslutningsvis, fastställdes Logistic Regression tillsammans med TF-IDF som de övergripandeoch mest lämpliga metoderna att klara av problmet om dokumentklassficering. Dessvärre, kundeDoc2Vec inte ge ett bra resultat p.g.a. datamängden inte var anpassad och tillräcklig för attmetoden skulle fungera bra. / Automated document classification is an essential technique that aims to process and managedocuments in digital forms. Many companies strive for a text classification methodology thatcan solve a plethora of problems. One of these problems is classifying and organizing a massiveamount of documents based on a set of predefined categories.This thesis aims to help Medius, a company that works with invoice workflow, to classify theirdocuments into invoices and non-invoices. This has been accomplished by implementing andevaluating various machine learning classification methods in terms of their accuracy and efficiencyfor the task of financial document classification, where only invoices are of interest. Furthermore,the necessary pre-processing steps for achieving good performance are considered when evaluatingthe mentioned classification methods.In this study, two document representation methods "Term Frequency Inverse Document Frequency (TF-IDF) and Doc2Vec" were used to represent the documents as fixed-length vectors.The representation aims to reduce the complexity of the documents and make them easier tohandle. In addition, three classification methods have been used to automate the document classification process for invoices. These methods were Logistic Regression, Multinomial Naïve Bayesand Support Vector Machine.The results from this thesis indicate that all classification methods used TF-IDF, to represent thedocuments as vectors, give high performance and accuracy. The accuracy of all three classificationmethods is over 90%, which is the prerequisite for the success of this study. Moreover, LogisticRegression appears to cope with this task very easily, since it classifies the documents moreefficiently compared to the other methods. A test of real data flowing into Medius’ invoiceworkflow shows that Logistic Regression is able to correctly classify up to 96% of the data.In conclusion, the Logistic Regression together with TF-IDF is determined to be the overall mostappropriate method out of the other tested methods. In addition, Doc2Vec suffers to providea good result because the data set is not customized and sufficient for the method to workwell. Document classification Text classification Invoices NLP TF-IDF Doc2vec Machine Learning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Dokumentklassificering Textklassificering Fakturor NLP TF-IDF Doc2vec Maskininlärning Logistic Regression Multinomial Naïve Bayes Support Vector Machine. Computer Sciences Datavetenskap (datalogi)
432	The Passionate Combining Entrepreneurs Nordström, Carin January 2015 (has links) Entrepreneurs are portrayed as salient drivers of regional development and for a number of years nascent entrepreneurs have been studied in a large number of countries as part of the Global Entrepreneurship Monitor project and the Panel Study of Entrepreneurial Dynamics. Scholars have devoted much effort to investigating factors that determine how individuals engage in entrepreneurial activities, with most of the discussion limited to business start-ups. However, since this type of project does not follow identical nascent entrepreneurs over time, limited knowledge exists about their development and whether they stay in this nascent phase for a long time. In practice, it is common for entrepreneurs to run a business and at the same time work in wage work, so-called combining entrepreneurs. In Sweden, almost half of all business owners combine wage work with a business. However, not all combining entrepreneurs will eventually decide to leave the wage work and invest fully in the business. Consequently, much research has focused on the first step of entering entrepreneurship full time, but less has focused on the second step, the transition from the combining phase to full-time self-employment. The aim of this thesis is therefore to contribute to the theory of entrepreneurship by gaining a deeper understanding of combining entrepreneurs and their motives and intentions. In the context of combining entrepreneurs, the theory of identity, resources and choice overload has been used to examine how entrepreneurs’ age (when starting the business), entrepreneurial tenure (the length of engagement in the side-business), hours spent (weekly involvement in the side-business), involvement in entrepreneurial teams (leading the business with one or more partners) and involvement in networks (business networks) influence their passion for engaging in entrepreneurship while sustaining wage work. Different categories of combining entrepreneurs and their intentions have also been examined. A survey was administered to 1457 entrepreneurs within the creative sector in two counties in Sweden (Gävleborgs County and Jämtlands County). Since there were no separate mailing lists to only combining entrepreneurs, the survey was sent to all entrepreneurs within the chosen industry and counties. The total response rate was 33.5 percent and of them 57.6 percent combined, yielding 262 combining entrepreneurs who answered the questionnaire. The survey was then followed up with eight focus group interviews and two single interviews to validate the answers from the questionnaire. The results indicate three types of combining entrepreneurs: nascent – with the intention to leave the combining phase for a transition into full-time self-employment, lifestyle – with the intention to stay in the combining phase, and occasional – with the intention to leave the combining phase for full-time wage work and close down the business. Transitioning fully to self-employment increases with the individual’s age. Also, a positive interactive effect exists with involvement in entrepreneurial networks. The results also indicate that the ability to work with something one is passionate about is the top motive for combining wage work with a side-business. Passion is also more likely to be the main motive behind the combining form among individuals who are older at business start-up, but passion is less likely to be the main motive behind the combining form among individuals who spend more time on the business. The longer the individual has had the side-business, the less likely passion is the main motive behind the combining form, and passion is less likely to be the main motive among those who are part of an entrepreneurial team. / <p>Avhandlingen baseras på fem delarbeten, tre var opublicerade vid tidpunkten för disputationen, två länkas här.</p> Combining entrepreneurs hybrid entrepreneurs self-employment nascent entrepreneurs lifestyle entrepreneurs occasional entrepreneurs passion entrepreneurial tenure entrepreneurial teams choice logistic regression
433	BRIDGE END SETTLEMENT EVALUATION AND PREDICTION Zhang, Jiwen 01 January 2016 (has links) A bridge approach is usually built to provide a smooth and safe transition for vehicles from the roadway pavement to the bridge structure. However, differential settlement between the roadway pavement resting on embankment fill and the bridge abutment built on more rigid foundation often creates a bump in the roadway. Previous work examined this issue at a microscopic level and presented new methods for eliminating or minimizing the effects at specific locations. This research studies the problem at a macroscopic level by determining methods to predict settlement severity to assist designers in developing remediation plans during project development to minimize the lifecycle costs of bridge bump repairs. The study is based on historic data from a wide range of Kentucky roads and bridges relating to bridge approach inspection and maintenance history. A macro method considering a combination of maintenance times, maintenance measures, and observed settlement was used to classify the differential settlement scale as minimal, moderate, and severe, corresponding to the approach performance status good, fair, and poor. A series of project characteristics influencing differential settlement were identified and used as parameters to develop a model to accurately predict settlement severity during preliminary design. Eighty-seven bridges with different settlement severities were collected as the first sample by conducting a survey of local bridge engineers in 12 transportation districts. Sample two was created by randomly selecting 600 bridges in the inspection history of bridges in Kentucky. Ordinal and/or multinomial logistic regression analyses were implemented to identify the relationships between the levels of differential settlement and the input variables. Two predictive models were developed. Prediction of bridge approach settlement can play an important role in selecting proper design, construction, and maintenance techniques and measures. The users can select one or two models to predict the approach settlement level for a new bridge or an existing bridge with different purposes. The significance of this study lies in its identification of parameters that had the most influence on the settlement severity at bridge ends, and how those parameters interacted in developing of a prediction model. The important parameters include geographic regions, approach age, average daily traffic (ADT), the use of approach slabs, and the foundation soil depth. The regression results indicate that the use of approach slabs can improve the performance of approaches on mitigating the problem caused by differential settlement. In addition, current practices regarding differential settlement prediction and mitigation were summarized by surveying the bridge engineers in 5 transportation districts. bump at the end of the bridge bridge approach differential settlement approach slab prediction model logistic regression Construction Engineering and Management Transportation Engineering
434	The Passionate Combining Entrepreneurs Nordström, Carin January 2015 (has links) Entrepreneurs are portrayed as salient drivers of regional development and for a number of years nascent entrepreneurs have been studied in a large number of countries as part of the Global Entrepreneurship Monitor project and the Panel Study of Entrepreneurial Dynamics. Scholars have devoted much effort to investigating factors that determine how individuals engage in entrepreneurial activities, with most of the discussion limited to business start-ups. However, since this type of project does not follow identical nascent entrepreneurs over time, limited knowledge exists about their development and whether they stay in this nascent phase for a long time. In practice, it is common for entrepreneurs to run a business and at the same time work in wage work, so-called combining entrepreneurs. In Sweden, almost half of all business owners combine wage work with a business. However, not all combining entrepreneurs will eventually decide to leave the wage work and invest fully in the business. Consequently, much research has focused on the first step of entering entrepreneurship full time, but less has focused on the second step, the transition from the combining phase to full-time self-employment. The aim of this thesis is therefore to contribute to the theory of entrepreneurship by gaining a deeper understanding of combining entrepreneurs and their motives and intentions. In the context of combining entrepreneurs, the theory of identity, resources and choice overload has been used to examine how entrepreneurs’ age (when starting the business), entrepreneurial tenure (the length of engagement in the side-business), hours spent (weekly involvement in the side-business), involvement in entrepreneurial teams (leading the business with one or more partners) and involvement in networks (business networks) influence their passion for engaging in entrepreneurship while sustaining wage work. Different categories of combining entrepreneurs and their intentions have also been examined. A survey was administered to 1457 entrepreneurs within the creative sector in two counties in Sweden (Gävleborgs County and Jämtlands County). Since there were no separate mailing lists to only combining entrepreneurs, the survey was sent to all entrepreneurs within the chosen industry and counties. The total response rate was 33.5 percent and of them 57.6 percent combined, yielding 262 combining entrepreneurs who answered the questionnaire. The survey was then followed up with eight focus group interviews and two single interviews to validate the answers from the questionnaire. The results indicate three types of combining entrepreneurs: nascent – with the intention to leave the combining phase for a transition into full-time self-employment, lifestyle – with the intention to stay in the combining phase, and occasional – with the intention to leave the combining phase for full-time wage work and close down the business. Transitioning fully to self-employment increases with the individual’s age. Also, a positive interactive effect exists with involvement in entrepreneurial networks. The results also indicate that the ability to work with something one is passionate about is the top motive for combining wage work with a side-business. Passion is also more likely to be the main motive behind the combining form among individuals who are older at business start-up, but passion is less likely to be the main motive behind the combining form among individuals who spend more time on the business. The longer the individual has had the side-business, the less likely passion is the main motive behind the combining form, and passion is less likely to be the main motive among those who are part of an entrepreneurial team. Combining entrepreneurs hybrid entrepreneurs self-employment nascent entrepreneurs lifestyle entrepreneurs occasional entrepreneurs passion entrepreneurial tenure entrepreneurial teams choice logistic regression
435	Linear programming algorithms for detecting separated data in binary logistic regression models Konis, Kjell Peter January 2007 (has links) This thesis is a study of the detection of separation among the sample points in binary logistic regression models. We propose a new algorithm for detecting separation and demonstrate empirically that it can be computed fast enough to be used routinely as part of the fitting process for logistic regression models. The parameter estimates of a binary logistic regression model fit using the method of maximum likelihood sometimes do not converge to finite values. This phenomenon (also known as monotone likelihood or infinite parameters) occurs because of a condition among the sample points known as separation. There are two classes of separation. When complete separation is present among the sample points, iterative procedures for maximizing the likelihood tend to break down, when it would be clear that there is a problem with the model. However, when quasicomplete separation is present among the sample points, the iterative procedures for maximizing the likelihood tend to satisfy their convergence criterion before revealing any indication of separation. The new algorithm is based on a linear program with a nonnegative objective function that has a positive optimal value when separation is present among the sample points. We compare several approaches for solving this linear program and find that a method based on determining the feasibility of the dual to this linear program provides a numerically reliable test for separation among the sample points. A simulation study shows that this test can be computed in a similar amount of time as fitting the binary logistic regression model using the method of iteratively reweighted least squares: hence the test is fast enough to be used routinely as part of the fitting procedure. An implementation of our algorithm (as well as the other methods described in this thesis) is available in the R package safeBinaryRegression. 519
436	Les déterminants de l'accès à l'emploi chez les jeunes diplômés de la formation professionnelle au Maroc Schonholzer, Jennifer January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal. Accès à l'emploi Access to employment Formation professionnelle Vocational training Maroc Morocco Régression logistique Logistic regression Politiques gouvernementales Public policies
437	Régression logistique bayésienne : comparaison de densités a priori Deschênes, Alexandre 07 1900 (has links) La régression logistique est un modèle de régression linéaire généralisée (GLM) utilisé pour des variables à expliquer binaires. Le modèle cherche à estimer la probabilité de succès de cette variable par la linéarisation de variables explicatives. Lorsque l’objectif est d’estimer le plus précisément l’impact de différents incitatifs d’une campagne marketing (coefficients de la régression logistique), l’identification de la méthode d’estimation la plus précise est recherchée. Nous comparons, avec la méthode MCMC d’échantillonnage par tranche, différentes densités a priori spécifiées selon différents types de densités, paramètres de centralité et paramètres d’échelle. Ces comparaisons sont appliquées sur des échantillons de différentes tailles et générées par différentes probabilités de succès. L’estimateur du maximum de vraisemblance, la méthode de Gelman et celle de Genkin viennent compléter le comparatif. Nos résultats démontrent que trois méthodes d’estimations obtiennent des estimations qui sont globalement plus précises pour les coefficients de la régression logistique : la méthode MCMC d’échantillonnage par tranche avec une densité a priori normale centrée en 0 de variance 3,125, la méthode MCMC d’échantillonnage par tranche avec une densité Student à 3 degrés de liberté aussi centrée en 0 de variance 3,125 ainsi que la méthode de Gelman avec une densité Cauchy centrée en 0 de paramètre d’échelle 2,5. / Logistic regression is a model of generalized linear regression (GLM) used to explain binary variables. The model seeks to estimate the probability of success of this variable by the linearization of explanatory variables. When the goal is to estimate more accurately the impact of various incentives from a marketing campaign (coefficients of the logistic regression), the identification of the choice of the optimum prior density is sought. In our simulations, using the MCMC method of slice sampling, we compare different prior densities specified by different types of density, location and scale parameters. These comparisons are applied to samples of different sizes generated with different probabilities of success. The maximum likelihood estimate, Gelman’s method and Genkin’s method complement the comparative. Our simulations demonstrate that the MCMC method with a normal prior density centered at 0 with variance of 3,125, the MCMC method with a Student prior density with 3 degrees of freedom centered at 0 with variance of 3,125 and Gelman’s method with a Cauchy density centered at 0 with scale parameter of 2,5 get estimates that are globally the most accurate of the coefficients of the logistic regression. Régression logistique Bayésien Densité a priori Simulation MCMC Logistic regression Bayesian Prior density MCMC simulation
438	Vztah mezi užíváním alkoholu a tabáku se subjektivním well-being u žáků 6. tříd / Relationship between the use of alcohol and tobacco and subjective well-being among 6th graders Černá, Michala January 2015 (has links) Mental health, integral part of the general health, has recently been increasingly associated with a term well-being. An alarming problem which occurs not only in Czech society is a high degree of early use of legal addictive substances (alcohol and tobacco) among children. This thesis provides an insight into the issue of the use of addictive substances by children in the context of subjective perception of personal well-being. Finding the prevalence of the use of these substances among children enabled the identification of links between the use and several factors which altogether represent subjective well-being. Satisfaction with oneself has proved insignificant in relation to substance abuse. Furthermore, it was found that children can compensate low self-esteem for alcohol. Based on these data using logistic regression there was designed a mathematical model which allows, while knowing the factors of personal well-being, the prediction of the probability of the use of tobacco and alcohol among children. Key words: prevention, substance abuse, well-being, alcohol, tobacco, logistic regression
439	Role Business Intelligence a data-miningu v pojistném fraud managamentu / The Role of Business Intelligence and Data Mining in the Insurance Fraud Management Betíková, Veronika January 2013 (has links) No description available.
440	Essays in nonlinear macroeconomic modeling and econometrics. Atems, Bebonchu January 1900 (has links) Doctor of Philosophy / Department of Economics / Lance J. Bachmeier / This dissertation consists of three essays in nonlinear macroeconomic modeling and econometrics. In the first essay, we decompose oil price movements into oil demand (stock market) shocks and oil supply (oil-market) shocks, and examine the response of the stock market to these shocks. We find that when oil prices are “net-increasing”, a stock market shock that causes the S&P 500 to rise by one percentage point will cause the price of oil to rise approximately 0.2 percentage points, with a statistically significant positive effect one day after the stock market shock. On the other hand, the response of the stock market to an oil market shock is a decline of 6.8 percent when the price of oil doubles. For other days, the initial response of the oil market to a stock market shock is the same as in the net oil price increase case (by construction). We then analyze the response of monetary policy to the identified stock market and oil market shocks and find that short-term interest rates respond to the stock market shocks but not the oil market shocks. Finally, we evaluate the predictive power of the decomposed stock market and oil shocks relative to the change in the price of oil. We find statistically significant gains in both the in-sample fit and out-of-sample forecast accuracy when using the identified stock market and oil market shocks rather than the change in the price of oil. The second essay revisits the statistical specification of near-multicollinearity in the logistic regression model using the Probabilistic Reduction approach. We argue that the ceteris paribus clause invoked with near-multicollinearity is rather misleading. This assumption states that one can assess the impact of near-multicollinearity by holding the parameters of the logistic regression model constant, while examining the impact on their standard errors and t-ratios as the correlation (\rho) between the regressors increases. Using the Probabilistic Reduction approach, we derive the parameters (and related statisitics) of the logistic regression model and show that they are functions of \rho , indicating the ceteris paribus clause in the traditional account of near multicollinearity is unattainable. Monte carlo simulations in the paper confirm these findings. We also show that traditional near-multicollinearity diagnostics, such as the variance inflation factor and condition number can fail to detect near-multicollinearity. Overall, the paper finds that near-multicollinearity in the logistic model is highly variable and may not lead to the problems indicated by the traditional account. Therefore, unexpected, unreliable or unstable estimates and inferences should not be blamed on near-multicollinearity. Rather the modeler should return to economic theory or statistical respecification of their model to address these problems. The third essay examines the correlations between income inequality and economic growth using a panel of income distribution data for 3,109 counties of the U.S. We examine the non-spatial dynamic correlations between county inequality and growth using a System GMM approach, and find significant negative relationships between changes in inequality in one period and growth in the subsequent period. We show that this finding is robust across different sample sizes. We further argue that because the space-specific time-invariant variables that affect economic growth and inequality can differ significantly across counties, failure to incorporate spatial effects into a model of growth and inequality may lead to biased results.We assume that dependence among counties only arises from the disturbance process, hence the estimation of a spatial error model. Our results indicate that the bias in the parameter for inequality amounts to about 2.66 percent, while that for initial income amounts to about 21.51 percent. Economic growth Macroeconomic shocks Oil market shocks Logistic regression Near-multicollinearity Inequality Economics (0501) Economic Theory (0511)

Search results