Global ETD Search

261	Perception of biases in machine learning in production research: a structured literature review dissecting bias categories Götte, Gesa, Antons, Oliver, Herzog, Andreas, Arlinghaus, Julia C. 04 November 2024 (has links) Factories are evolving into Cyber-Physical Production Systems, producing vast data volumes that can be leveraged using computational power. However, an easy and sorrowless integration of machine learning (ML) can lead to too simplistic or false pattern extraction, i.e. biased ML applications. Especially when trained on big data this poses a significant risk when deploying ML. Research has shown that there are sources for undesired biases among the whole ML life cycle and feedback loop between human, data and the ML model. Methods to detect, mitigate and prevent those undesired biases in order to achieve ''fair'' ML solutions have been developed and established in tool boxes in the past years. In this article, we utilize a structured literature review to address the underappreciated biases in ML for production application and highlight the ambiguity of the term bias. It emphasizes the necessity for research on ML biases in production and shows off the most relevant blind spots so far. Filling those blind spots with research and guidelines to incorporate bias screening, treatment and risk assessment in the ML life cycle of industrial applications promises to enhance their robustness, resilience and trustworthiness.
262	Expert Knowledge Elicitation for Machine Learning : Insights from a Survey and Industrial Case Study Svensson, Samuel, Persson, Oskar January 2023 (has links) While machine learning has shown success in many fields, it can be challenging when there are limitations with insufficient training data. By incorporating knowledge into the machine learning pipeline, one can overcome such limitations. Therefore, eliciting expert knowledge can play an important role in the machine learning project pipeline. Expert knowledge can come in many forms, and it is seldom easy to elicit and formalize it in a way that is easily implementable into a machine learning project. While it has been done, not much focus has been on how. Furthermore, the motivations for why knowledge was elicited in a particular way as well as the challenges that may exist with the elicitation, are not always focused on either. Making educated decisions for knowledge elicitation can therefore be challenging for researchers. Hence, this work aims to explore and categorize how expert knowledge elicitation has been done by researchers previously. This was done by developing a taxonomy that was then used for analyzing articles. A total of 43 articles were found, containing 97 elicitation paths that were categorized in order to identify trends and common approaches. The findings from our study were used to provide guidance for an industrial case in its initial stage to show how the taxonomy presented in this work can be applied in a real-world scenario. knowledge elicitation machine learning expert knowledge informed machine learning hybrid machine learning survey taxonomy Computer Systems Datorsystem
263	Classifying Receipts and Invoices in Visma Mobile Scanner Yasser, Almodhi January 2016 (has links) This paper presents a study on classifying receipts and invoices using Machine Learning. Furthermore, Naïve Bayes Algorithm and the advantages of using it will be discussed. With information gathered from theory and previous research, I will show how to classify images into a receipt or an invoice. Also, it includes pre-processing images using a variety of pre-processing methods and text extraction using Optical Character Recognition (OCR). Moreover, the necessity of pre-processing images to reach a higher accuracy will be discussed. A result shows a comparison between Tesseract OCR engine and FineReader OCR engine. After embracing much knowledge from theory and discussion, the results showed that combining FineReader OCR engine and Machine Learning is increasing the accuracy of the image classification. Machine Learning classifying OCR Tesseract Fine Reader
264	VisuNet: Visualizing Networks of feature interactions in rule-based classifiers Anyango, Stephen Omondi Otieno January 2016 (has links) No description available.
265	Study of Single and Ensemble Machine Learning Models on Credit Data to Detect Underlying Non-performing Loans Li, Qiongzhu January 2016 (has links) In this paper, we try to compare the performance of two feature dimension reduction methods, the LASSO and PCA. Both simulation study and empirical study show that the LASSO is superior to PCA when selecting significant variables. We apply Logistics Regression (LR), Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT) and their corresponding ensemble machines constructed by bagging and adaptive boosting (adaboost) in our study. Three experiments are conducted to explore the impact of class-unbalanced data set on all models. Empirical study indicates that when the percentage of performing loans exceeds 83.3%, the training models shall be carefully applied. When we have class-balanced data set, ensemble machines indeed have a better performance over single machines. The weaker the single machine, the more obvious the improvement we can observe. Machine learning Feature Dimension Reduction NPL
266	Applicability analysis of computation double entendre humor recognition with machine learning methods Johansson, David January 2016 (has links) No description available. Natural language processing computational humor machine learning
267	Feasibility of using network support data to predict risk level of trouble tickets Laurentz, Henrik January 2016 (has links) Internet Service Providers gather vast amounts of data in the form of trouble tickets created from connectivity related issues. This data is often stored and seldom used for proactive purposes. This thesis explores the feasibility of finding correlations in network support data through the use of data mining activities. Correlations such as these could be used for improving troubleshooting or staffing related activities. The approach uses the data mining methodology CRISP-DM to investigate typical data mining operations from the perspective of a Network Operation Center. The results show that correlations between the solving time and other ticket related attributes do exist and that support data could be used for the activities mentioned. The results also show that it exists a lot of room for improvement when it comes to data mining activities in network support data. Network support data Risk level Machine learning
268	On Effectively Creating Ensembles of Classifiers : Studies on Creation Strategies, Diversity and Predicting with Confidence Löfström, Tuwe January 2015 (has links) An ensemble is a composite model, combining the predictions from several other models. Ensembles are known to be more accurate than single models. Diversity has been identified as an important factor in explaining the success of ensembles. In the context of classification, diversity has not been well defined, and several heuristic diversity measures have been proposed. The focus of this thesis is on how to create effective ensembles in the context of classification. Even though several effective ensemble algorithms have been proposed, there are still several open questions regarding the role diversity plays when creating an effective ensemble. Open questions relating to creating effective ensembles that are addressed include: what to optimize when trying to find an ensemble using a subset of models used by the original ensemble that is more effective than the original ensemble; how effective is it to search for such a sub-ensemble; how should the neural networks used in an ensemble be trained for the ensemble to be effective? The contributions of the thesis include several studies evaluating different ways to optimize which sub-ensemble would be most effective, including a novel approach using combinations of performance and diversity measures. The contributions of the initial studies presented in the thesis eventually resulted in an investigation of the underlying assumption motivating the search for more effective sub-ensembles. The evaluation concluded that even if several more effective sub-ensembles exist, it may not be possible to identify which sub-ensembles would be the most effective using any of the evaluated optimization measures. An investigation of the most effective ways to train neural networks to be used in ensembles was also performed. The conclusions are that effective ensembles can be obtained by training neural networks in a number of different ways but that high average individual accuracy or much diversity both would generate effective ensembles. Several findings regarding diversity and effective ensembles presented in the literature in recent years are also discussed and related to the results of the included studies. When creating confidence based predictors using conformal prediction, there are several open questions regarding how data should be utilized effectively when using ensembles. Open questions related to predicting with confidence that are addressed include: how can data be utilized effectively to achieve more efficient confidence based predictions using ensembles; how do problems with class imbalance affect the confidence based predictions when using conformal prediction? Contributions include two studies where it is shown in the first that the use of out-of-bag estimates when using bagging ensembles results in more effective conformal predictors and it is shown in the second that a conformal predictor conditioned on the class labels to avoid a strong bias towards the majority class is more effective on problems with class imbalance. The research method used is mainly inspired by the design science paradigm, which is manifested by the development and evaluation of artifacts. / En ensemble är en sammansatt modell som kombinerar prediktionerna från flera olika modeller. Det är välkänt att ensembler är mer träffsäkra än enskilda modeller. Diversitet har identifierats som en viktig faktor för att förklara varför ensembler är så framgångsrika. Diversitet hade fram tills nyligen inte definierats entydigt för klassificering vilket resulterade i att många heuristiska diverstitetsmått har föreslagits. Den här avhandlingen fokuserar på hur klassificeringsensembler kan skapas på ett ändamålsenligt (eng. effective) sätt. Den vetenskapliga metoden är huvudsakligen inspirerad av design science-paradigmet vilket lämpar sig väl för utveckling och evaluering av IT-artefakter. Det finns sedan tidigare många framgångsrika ensembleralgoritmer men trots det så finns det fortfarande vissa frågetecken kring vilken roll diversitet spelar vid skapande av välpresterande (eng. effective) ensemblemodeller. Några av de frågor som berör diversitet som behandlas i avhandlingen inkluderar: Vad skall optimeras när man söker efter en delmängd av de tillgängliga modellerna för att försöka skapa en ensemble som är bättre än ensemblen bestående av samtliga modeller; Hur väl fungerar strategin att söka efter sådana delensembler; Hur skall neurala nätverk tränas för att fungera så bra som möjligt i en ensemble? Bidraget i avhandlingen inkluderar flera studier som utvärderar flera olika sätt att finna delensembler som är bättre än att använda hela ensemblen, inklusive ett nytt tillvägagångssätt som utnyttjar en kombination av både diversitets- och prestandamått. Resultaten i de första studierna ledde fram till att det underliggande antagandet som motiverar att söka efter delensembler undersöktes. Slutsatsen blev, trots att det fanns flera delensembler som var bättre än hela ensemblen, att det inte fanns något sätt att identifiera med tillgänglig data vilka de bättre delensemblerna var. Vidare undersöktes hur neurala nätverk bör tränas för att tillsammans samverka så väl som möjligt när de används i en ensemble. Slutsatserna från den undersökningen är att det är möjligt att skapa välpresterande ensembler både genom att ha många modeller som är antingen bra i genomsnitt eller olika varandra (dvs diversa). Insikter som har presenterats i litteraturen under de senaste åren diskuteras och relateras till resultaten i de inkluderade studierna. När man skapar konfidensbaserade modeller med hjälp av ett ramverk som kallas för conformal prediction så finns det flera frågor kring hur data bör utnyttjas på bästa sätt när man använder ensembler som behöver belysas. De frågor som relaterar till konfidensbaserad predicering inkluderar: Hur kan data utnyttjas på bästa sätt för att åstadkomma mer effektiva konfidensbaserade prediktioner med ensembler; Hur påverkar obalanserad datade konfidensbaserade prediktionerna när man använder conformal perdiction? Bidragen inkluderar två studier där resultaten i den första visar att det mest effektiva sättet att använda data när man har en baggingensemble är att använda sk out-of-bag estimeringar. Resultaten i den andra studien visar att obalanserad data behöver hanteras med hjälp av en klassvillkorad konfidensbaserad modell för att undvika en stark tendens att favorisera majoritetsklassen. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: In press.</p> / Dataanalys för detektion av läkemedelseffekter (DADEL) Machine Learning Predictive Modeling Ensembles Conformal Prediction
269	Dynamic planning and scheduling in manufacturing systems with machine learning approaches Yang, Donghai., 杨东海. January 2008 (has links) published_or_final_version / Industrial and Manufacturing Systems Engineering / Doctoral / Doctor of Philosophy Machine learning. Algorithms.
270	On Effectively Creating Ensembles of Classifiers : Studies on Creation Strategies, Diversity and Predicting with Confidence Löfström, Tuwe January 2015 (has links) An ensemble is a composite model, combining the predictions from several other models. Ensembles are known to be more accurate than single models. Diversity has been identified as an important factor in explaining the success of ensembles. In the context of classification, diversity has not been well defined, and several heuristic diversity measures have been proposed. The focus of this thesis is on how to create effective ensembles in the context of classification. Even though several effective ensemble algorithms have been proposed, there are still several open questions regarding the role diversity plays when creating an effective ensemble. Open questions relating to creating effective ensembles that are addressed include: what to optimize when trying to find an ensemble using a subset of models used by the original ensemble that is more effective than the original ensemble; how effective is it to search for such a sub-ensemble; how should the neural networks used in an ensemble be trained for the ensemble to be effective? The contributions of the thesis include several studies evaluating different ways to optimize which sub-ensemble would be most effective, including a novel approach using combinations of performance and diversity measures. The contributions of the initial studies presented in the thesis eventually resulted in an investigation of the underlying assumption motivating the search for more effective sub-ensembles. The evaluation concluded that even if several more effective sub-ensembles exist, it may not be possible to identify which sub-ensembles would be the most effective using any of the evaluated optimization measures. An investigation of the most effective ways to train neural networks to be used in ensembles was also performed. The conclusions are that effective ensembles can be obtained by training neural networks in a number of different ways but that high average individual accuracy or much diversity both would generate effective ensembles. Several findings regarding diversity and effective ensembles presented in the literature in recent years are also discussed and related to the results of the included studies. When creating confidence based predictors using conformal prediction, there are several open questions regarding how data should be utilized effectively when using ensembles. Open questions related to predicting with confidence that are addressed include: how can data be utilized effectively to achieve more efficient confidence based predictions using ensembles; how do problems with class imbalance affect the confidence based predictions when using conformal prediction? Contributions include two studies where it is shown in the first that the use of out-of-bag estimates when using bagging ensembles results in more effective conformal predictors and it is shown in the second that a conformal predictor conditioned on the class labels to avoid a strong bias towards the majority class is more effective on problems with class imbalance. The research method used is mainly inspired by the design science paradigm, which is manifested by the development and evaluation of artifacts. / En ensemble är en sammansatt modell som kombinerar prediktionerna från flera olika modeller. Det är välkänt att ensembler är mer träffsäkra än enskilda modeller. Diversitet har identifierats som en viktig faktor för att förklara varför ensembler är så framgångsrika. Diversitet hade fram tills nyligen inte definierats entydigt för klassificering vilket resulterade i att många heuristiska diverstitetsmått har föreslagits. Den här avhandlingen fokuserar på hur klassificeringsensembler kan skapas på ett ändamålsenligt (eng. effective) sätt. Den vetenskapliga metoden är huvudsakligen inspirerad av design science-paradigmet vilket lämpar sig väl för utveckling och evaluering av IT-artefakter. Det finns sedan tidigare många framgångsrika ensembleralgoritmer men trots det så finns det fortfarande vissa frågetecken kring vilken roll diversitet spelar vid skapande av välpresterande (eng. effective) ensemblemodeller. Några av de frågor som berör diversitet som behandlas i avhandlingen inkluderar: Vad skall optimeras när man söker efter en delmängd av de tillgängliga modellerna för att försöka skapa en ensemble som är bättre än ensemblen bestående av samtliga modeller; Hur väl fungerar strategin att söka efter sådana delensembler; Hur skall neurala nätverk tränas för att fungera så bra som möjligt i en ensemble? Bidraget i avhandlingen inkluderar flera studier som utvärderar flera olika sätt att finna delensembler som är bättre än att använda hela ensemblen, inklusive ett nytt tillvägagångssätt som utnyttjar en kombination av både diversitets- och prestandamått. Resultaten i de första studierna ledde fram till att det underliggande antagandet som motiverar att söka efter delensembler undersöktes. Slutsatsen blev, trots att det fanns flera delensembler som var bättre än hela ensemblen, att det inte fanns något sätt att identifiera med tillgänglig data vilka de bättre delensemblerna var. Vidare undersöktes hur neurala nätverk bör tränas för att tillsammans samverka så väl som möjligt när de används i en ensemble. Slutsatserna från den undersökningen är att det är möjligt att skapa välpresterande ensembler både genom att ha många modeller som är antingen bra i genomsnitt eller olika varandra (dvs diversa). Insikter som har presenterats i litteraturen under de senaste åren diskuteras och relateras till resultaten i de inkluderade studierna. När man skapar konfidensbaserade modeller med hjälp av ett ramverk som kallas för conformal prediction så finns det flera frågor kring hur data bör utnyttjas på bästa sätt när man använder ensembler som behöver belysas. De frågor som relaterar till konfidensbaserad predicering inkluderar: Hur kan data utnyttjas på bästa sätt för att åstadkomma mer effektiva konfidensbaserade prediktioner med ensembler; Hur påverkar obalanserad datade konfidensbaserade prediktionerna när man använder conformal perdiction? Bidragen inkluderar två studier där resultaten i den första visar att det mest effektiva sättet att använda data när man har en baggingensemble är att använda sk out-of-bag estimeringar. Resultaten i den andra studien visar att obalanserad data behöver hanteras med hjälp av en klassvillkorad konfidensbaserad modell för att undvika en stark tendens att favorisera majoritetsklassen. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 8: In press.</p> / Dataanalys för detektion av läkemedelseffekter (DADEL) Machine Learning Predictive Modeling Ensembles Conformal Prediction

Search results