Spelling suggestions: "subject:"automatic cachine 1earning"" "subject:"automatic cachine c1earning""
1 |
Total Organic Carbon and Clay Estimation in Shale Reservoirs Using Automatic Machine LearningHu, Yue 21 September 2021 (has links)
High total organic carbon (TOC) and low clay content are two criteria to identify the "sweet spots" in shale gas plays. Recently, machine learning has been proved to be effective to estimate TOC and clay from well loggings. The remaining questions are what algorithm we should choose in the first place and whether we can improve the already built models. Automatic machine learning (AutoML) appears as a promising tool to solve those realistic questions by training multiple models and compares them automatically. Two wells with conventional well loggings and elemental capture spectroscopy are selected from a shale gas play to test the AutoML's ability in TOC and clay estimation. TOC and clay content are extracted from the Schlumberger's ELAN interpretation and calibrated to cores. Generalizability is proved in the blind test well and the mean absolute test errors for TOC and clay estimation are 0.23% and 3.77%. 829 data points are used to generate the final models with the train-test ratio of 75:25. The mean absolute test errors are 0.26% and 2.68% for TOC and clay, respectively, which are very low for TOC ranging from 0-6% and clay from 35-65%. The results show the AutoML's success and efficiency in the estimation. The trained models are interpreted to understand the variables effects in predictions. 235 wells are selected through data quality checking and feed into the models to create TOC and clay distribution maps. The maps provide guidance on where to drill a new well for higher shale gas production. / Master of Science / Locating "sweet spots", where the shale gas production is much higher than the average areas, is critical for a shale reservoir's successful commercial exploitation. Among the properties of shale, total organic carbon (TOC) and clay content are often selected to evaluate the gas production potential. For TOC and clay estimation, multiple machine learning models have been tested in recent studies and are proved successful. The questions are what algorithm to choose for a specific task and whether the already built models can be improved. Automatic machine learning (AutoML) has the potential to solve the problems by automatically training multiple models and comparing them to achieve the best performance. In our study, AutoML is tested to estimate TOC and clay using data from two gas wells in a shale gas field. First, one well is treated as blind test well and the other is used as trained well to examine the generalizability. The mean absolute errors for TOC and clay content are 0.23% and 3.77%, indicating reliable generalization. Final models are built using 829 data points which are split into train-test sets with the ratio of 75:25. The mean absolute test errors are 0.26% and 2.68% for TOC and clay, respectively, which are very low for TOC ranging from 0-6% and clay from 35-65%. Moreover, AutoML requires very limited human efforts and liberate researchers or engineers from tedious parameter-tuning process that is the critical part of machine learning. Trained models are interpreted to understand the mechanism behind the models. Distribution maps of TOC and clay are created by selecting 235 gas wells that pass the data quality checking, feeding them into trained models, and interpolating. The maps provide guidance on where to drill a new well for higher shale gas production.
|
2 |
Predicting Consumer Purchase behavior using Automatic Machine Learning : A case study in online purchase flows / Prediktering av Konsumentbeteenden med Automatisk Maskininlärning : En fallstudie i onlinebaserade köpflödenSandström, Olle January 2022 (has links)
Online payment purchase flows are designed to be as effective and smooth as possible in regards to the user experience. The user is in the center of this process, who, to a certain degree decides whether the purchase eventually will be placed. What is left up to the payment provider is the process of enabling an effective purchase flow where information needs to be collected for various purposes. To design these purchase flows as efficiently as possible, this research investigates if and how consumer purchase behavior can be predicted. Which algorithms perform the best at modeling the outcome and what kind of underlying features can be used to model the outcome? The features are graded in regard to their feature importance to see how and how much they affect the best-performing model. To investigate consumer behavior, the task was set up as a supervised binary classification problem to model the outcome of user purchase sessions. Either the sessions result in a purchase or they do not. Several automatic machine learning (also referred to as automated machine learning) frameworks were considered before the choice of using H2O AutoML because of its historical performance on other supervised binary classification problems. The dataset contained information from user sessions relating to the consumer, the transaction, and the time when the purchase was initiated. These variables were either in a numerical or categorical format and were then evaluated using the SHAP importance metric as well as an aggregated SHAP summary plot, which describes how features are affecting the model. The results show that the Distributed Random Forest Algorithm performed the best, generating a 26 percentage points improvement in accuracy, predicting whether a session will be converted into a purchase from an undersampled baseline of 50%. Furthermore two of the most important features according to the model were categorical features related to the intersection of consumer and transaction information. Another time-based categorical variable also proved to be important in the model prediction. The research also shows that automatic machine learning has come a long way in the pre-processing of variables, enabling the developer of the models to more efficiently deploy these kinds of machine learning problems. The results echo some earlier findings confirming the possibility of predicting consumer purchase behavior and in particular, the outcome of a purchase flow consumer session. This implies that payment providers hypothetically could use these kinds of insights and predictions in the development of their flows, to individually cater to specific groups of consumers, enabling a more efficient and personalized payment flow. / Köpflöden för onlinebetalningar är utformade för att vara så effektiva och smidiga som möjligt med avseende på användarupplevelsen. I processen står användaren i centrum, som delvis avgör om köpet i slutändan konverteras eller ej. Det som är upp till betalningsleverantören är möjliggörandet av ett effektivt köpflöde där information behöver samlas in för olika ändamål. För att utforma dessa köpflöden så effektivt som möjligt undersöker detta arbete om och hur konsumenters köpbeteende kan förutsägas. Vilka algoritmer fungerar bäst på att modellera resultatet och vilken typ av underliggande attribut kan användas för att modellera resultatet? Dessa attribut graderas med avseende på deras relevans (feature importance) för att se hur och hur mycket de faktiskt påverkar den bäst presterande modellen. För att undersöka konsumentbeteendet sattes uppgiften upp som ett övervakat binärt klassificeringsproblem för att modellera resultatet av användarnas sessioner. Antingen resulterar sessionerna i ett köp eller så gör de det inte. Flera ramverk för automatisk maskininlärning övervägdes innan valet att använda H2O AutoML på grund av dess historiska prestanda på andra övervakade binära klassificeringsproblem. Dataunderlaget innehöll information från användarsessioner som rör konsumenten, transaktionen och tidpunkten då köpet påbörjades. Dessa variabler var antingen i ett numeriskt eller kategoriskt format och utvärderades sedan med hjälp av SHAP-viktighetsmåttet (SHAP Feature Importance) såväl som ett aggregerat SHAP-diagram, som beskriver hur de olika attributen påverkar modellen. Resultaten visar att Distributed Random Forest algoritmen presterade bäst, genererade en förbättring på 26 procentenheter i noggrannhet (accuracy), i prediktionen av ifall en session omvandlas till ett köp eller ej, baserat på ett undersamplat dataset med en baslinje på 50%. Dessutom var två av de viktigaste attributen enligt modellen kategoriska attribut relaterade till skärningspunkten mellan konsument- och transaktionsinformation. En annan tidsbaserad kategorisk variabel visade sig också vara viktig i prediktionen. Arbetet visar också att automatisk maskininlärning har kommit långt i förbearbetningen av variabler, vilket gör det möjligt för utvecklaren av modellerna att mer effektivt distribuera den här typen av maskininlärningsproblem. Resultaten återspeglar tidigare insikter som bekräftar möjligheten att förutsäga konsumenternas köpbeteende och i synnerhet resultatet av en konsumentsession i ett köpflöde. Detta innebär att betalningsleverantörer hypotetiskt skulle kunna använda denna typ av insikter och förutsägelser i utvecklingen av sina flöden, för att individuellt tillgodose specifika grupper av konsumenter, vilket möjliggör ett ännu mer effektivt och skräddarsytt betalningsflöde.
|
3 |
Predicting Customer Satisfaction in the Context of Last-Mile Delivery using Supervised and Automatic Machine LearningHöggren, Carl January 2022 (has links)
The prevalence of online shopping has steadily risen in the last few years. In response to these changes, last-mile delivery services have emerged that enable goods to reach customers within a shorter timeframe compared to traditional logistics providers. However, with decreased lead times follows greater exposure to risks that directly influence customer satisfaction. More specifically, this report aims to investigate the extent to which Supervised and Automatic Machine Learning can be leveraged to extract those features that have the highest explanatory power dictating customer ratings. The implementation suggests that Random Forest Classifier outperforms both Multi-Layer Perceptron and Support Vector Machine in predicting customer ratings on a highly imbalanced version of the dataset, while AutoML soars when the dataset is subject to undersampling. Using Permutation Feature Importance and Shapley Additive Explanations, it was further concluded that whether the delivery is on time, whether the delivery is executed within the stated time window, and whether the delivery is executed during the morning, afternoon, or evening, are paramount drivers of customer ratings. / Förekomsten av online-shopping har kraftigt ökat de senaste åren. I kölvattnet av dessa förändringar har flertalet sista-milen företag etablerats som möjliggör för paket att nå kunder inom en kortare tidsperiod jämfört med traditionella logistikföretag. Däremot, med minskade ledtider följer större exponering mot risker som direkt påverkar kundernas upplevelse av sista-milen tjänsten. Givet detta syftar denna rapport till att undersöka huruvida övervakad och automtisk maskininlärning kan användas för att extrahera de parametrar som har störst påverkan på kundnöjdhet. Implementationen visar att slumpmässiga beslutsträd överträffar både neurala nätverk och stödvektorsmaskiner i syfte att förutspå kundnöjdhet på en obalanserad version av träningsdatan, medan automatisk maskininlärning överträffar övriga modeller på en balanserad version. Genom användning av metoderna Permutation Feature Importance och Shapley Additive Explanations, framgick att huruvida paketet är försenad, huruvida paketet levereras inom det angivet tidsfönster, och huruvida paketet anländer under morgonen, eftermiddagen, eller kvällen, har störst påverkan på kundnöjdhet.
|
Page generated in 0.1138 seconds