Spelling suggestions: "subject:"[een] DECISION TREE"" "subject:"[enn] DECISION TREE""
231 |
Maskininlärning med konform förutsägelse för prediktiva underhållsuppgifter i industri 4.0 / Machine Learning with Conformal Prediction for Predictive Maintenance tasks in Industry 4.0 : Data-driven ApproachLiu, Shuzhou, Mulahuko, Mpova January 2023 (has links)
This thesis is a cooperation with Knowit, Östrand \& Hansen, and Orkla. It aimed to explore the application of Machine Learning and Deep Learning models with Conformal Prediction for a predictive maintenance situation at Orkla. Predictive maintenance is essential in numerous industrial manufacturing scenarios. It can help to reduce machine downtime, improve equipment reliability, and save unnecessary costs. In this thesis, various Machine Learning and Deep Learning models, including Decision Tree, Random Forest, Support Vector Regression, Gradient Boosting, and Long short-term memory, are applied to a real-world predictive maintenance dataset. The Orkla dataset was originally planned to use in this thesis project. However, due to some challenges met and time limitations, one NASA C-MAPSS dataset with a similar data structure was chosen to study how Machine Learning models could be applied to predict the remaining useful lifetime (RUL) in manufacturing. Besides, conformal prediction, a recently developed framework to measure the prediction uncertainty of Machine Learning models, is also integrated into the models for more reliable RUL prediction. The thesis project results show that both the Machine Learning and Deep Learning models with conformal prediction could predict RUL closer to the true RUL while LSTM outperforms the Machine Learning models. Also, the conformal prediction intervals provide informative and reliable information about the uncertainty of the predictions, which can help inform personnel at factories in advance to take necessary maintenance actions. Overall, this thesis demonstrates the effectiveness of utilizing machine learning and Deep Learning models with Conformal Prediction for predictive maintenance situations. Moreover, based on the modeling results of the NASA dataset, some insights are discussed on how to transfer these experiences into Orkla data for RUL prediction in the future.
|
232 |
[en] A METHOD FOR INTERPRETING CONCEPT DRIFTS IN A STREAMING ENVIRONMENT / [pt] UM MÉTODO PARA INTERPRETAÇÃO DE MUDANÇAS DE REGIME EM UM AMBIENTE DE STREAMINGJOAO GUILHERME MATTOS DE O SANTOS 10 August 2021 (has links)
[pt] Em ambientes dinâmicos, os modelos de dados tendem a ter desempenho
insatisfatório uma vez que a distribuição subjacente dos dados muda. Este
fenômeno é conhecido como Concept Drift. Em relação a este tema, muito
esforço tem sido direcionado ao desenvolvimento de métodos capazes de
detectar tais fenômenos com antecedência suficiente para que os modelos
possam se adaptar. No entanto, explicar o que levou ao drift e entender
suas consequências ao modelo têm sido pouco explorado pela academia.
Tais informações podem mudar completamente a forma como adaptamos os
modelos. Esta dissertação apresenta uma nova abordagem, chamada Detector
de Drift Interpretável, que vai além da identificação de desvios nos dados. Ele
aproveita a estrutura das árvores de decisão para prover um entendimento
completo de um drift, ou seja, suas principais causas, as regiões afetadas do
modelo e sua severidade. / [en] In a dynamic environment, models tend to perform poorly once the
underlying distribution shifts. This phenomenon is known as Concept Drift.
In the last decade, considerable research effort has been directed towards
developing methods capable of detecting such phenomena early enough so
that models can adapt. However, not so much consideration is given to
explain the drift, and such information can completely change the handling
and understanding of the underlying cause. This dissertation presents a novel
approach, called Interpretable Drift Detector, that goes beyond identifying
drifts in data. It harnesses decision trees’ structure to provide a thorough
understanding of a drift, i.e., its principal causes, the affected regions of a tree model, and its severity. Moreover, besides all information it provides, our
method also outperforms benchmark drift detection methods in terms of falsepositive rates and true-positive rates across several different datasets available in the literature.
|
233 |
[pt] OTIMIZAÇÃO DE ESTRATÉGIAS DINÂMICAS DE COMERCIALIZAÇÃO DE ENERGIA COM RESTRIÇÕES DE RISCO SOB INCERTEZAS DE CURTO E LONGO PRAZO / [en] RISK-CONSTRAINED OPTIMAL DYNAMIC TRADING STRATEGIES UNDER SHORT- AND LONG-TERM UNCERTAINTIESANA SOFIA VIOTTI DAKER ARANHA 23 November 2021 (has links)
[pt] Mudanças recentes em mercados de energia com alta penetração de fontes
renováveis destacaram a necessidade de estratégias complexas que, além de
maximizar o lucro, proporcionam proteção contra a volatilidade de preços
e incerteza na geração. Neste contexto, este trabalho propõe um modelo
dinâmico para representar a tomada de decisão sequencial no cenário atual.
Ao contrário de trabalhos relatados anteriormente, este método fornece uma
estrutura para considerar as incertezas nos níveis estratégico (longo prazo)
e operacional (curto prazo) simultaneamente. É utilizado um modelo de
programação estocástica multiestágio em que as correlações entre previsões
de vazão, geração renovável, preços spot e preços contratuais são consideradas
por meio de uma árvore de decisão multi-escala. Além disso, a aversão ao risco
do agente comercializador é considerada por meio de restrições intuitivas e
consistentes no tempo. É apresentado um estudo de caso do setor elétrico
brasileiro, no qual dados reais foram utilizados para definir a estratégia
ótima de comercialização de um gerador de energia eólica, condicionada à
evolução futura dos preços de mercado. O modelo fornece ao comercializador
informações úteis, como o montante contratado ideal, além do momento
ótimo de negociação e duração dos contratos. Além disso, o valor desta
solução é demonstrado quando comparado a abordagens estáticas, através de
uma medida de desempenho baseada no equivalente de certo do problema
multiestágio. / [en] Recent market changes in power systems with high renewable energy penetration
highlighted the need for complex profit maximization and protection
against price volatility and generation uncertainty. This work proposes a dynamic
model to represent sequential decision making in this current scenario.
Unlike previously reported works, we contemplate uncertainties in both strategic
(long-term) and operational (short-term) levels, all considered as pathdependent
stochastic processes. The problem is represented as a multistage
stochastic programming model in which the correlations between inflow forecasts,
renewable generation, spot and contract prices are accounted for by
means of interconnected long- and short-term decision trees. Additionally, risk
aversion is considered through intuitive time-consistent constraints. A case
study of the Brazilian power sector is presented, in which real data was used
to define the optimal trading strategy of a wind power generator, conditioned
to the future evolution of market prices. The model provides the trader with
useful information such as the optimal contractual amount, settlement timing,
and term. Furthermore, the value of this solution is demonstrated when compared
to state-of-the-art static approaches using a multistage-based certainty
equivalent performance measure.
|
234 |
Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris / Using machine learning to extract information from real estate listings in order to maximize selling priceEkeberg, Lukas, Fahnehjelm, Alexander January 2018 (has links)
The Swedish real estate market has been digitalized over the past decade with the current practice being to post your real estate advertisement online. A question that has arisen is how a seller can optimize their public listing to maximize the selling premium. This paper analyzes the use of three machine learning methods to solve this problem: Linear Regression, Decision Tree Regressor and Random Forest Regressor. The aim is to retrieve information regarding how certain attributes contribute to the premium value. The dataset used contains apartments sold within the years of 2014-2018 in the Östermalm / Djurgården district in Stockholm, Sweden. The resulting models returned an R2-value of approx. 0.26 and Mean Absolute Error of approx. 0.06. While the models were not accurate regarding prediction of premium, information was still able to be extracted from the models. In conclusion, a high amount of views and a publication made in April provide the best conditions for an advertisement to reach a high selling premium. The seller should try to keep the amount of days since publication lower than 15.5 days and avoid publishing on a Tuesday. / Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar.
|
235 |
Detection and Classification of Anomalies in Road Traffic using Spark StreamingConsuegra Rengifo, Nathan Adolfo January 2018 (has links)
Road traffic control has been around for a long time to guarantee the safety of vehicles and pedestrians. However, anomalies such as accidents or natural disasters cannot be avoided. Therefore, it is important to be prepared as soon as possible to prevent a higher number of human losses. Nevertheless, there is no system accurate enough that detects and classifies anomalies from the road traffic in real time. To solve this issue, the following study proposes the training of a machine learning model for detection and classification of anomalies on the highways of Stockholm. Due to the lack of a labeled dataset, the first phase of the work is to detect the different kind of outliers that can be found and manually label them based on the results of a data exploration study. Datasets containing information regarding accidents and weather are also included to further expand the amount of anomalies. All experiments use real world datasets coming from either the sensors located on the highways of Stockholm or from official accident and weather reports. Then, three models (Decision Trees, Random Forest and Logistic Regression) are trained to detect and classify the outliers. The design of an Apache Spark streaming application that uses the model with the best results is also provided. The outcomes indicate that Logistic Regression is better than the rest but still suffers from the imbalanced nature of the dataset. In the future, this project can be used to not only contribute to future research on similar topics but also to monitor the highways of Stockholm. / Vägtrafikkontroll har funnits länge för att garantera säkerheten hos fordon och fotgängare. Emellertid kan avvikelser som olyckor eller naturkatastrofer inte undvikas. Därför är det viktigt att förberedas så snart som möjligt för att förhindra ett större antal mänskliga förluster. Ändå finns det inget system som är noggrannt som upptäcker och klassificerar avvikelser från vägtrafiken i realtid. För att lösa detta problem föreslår följande studie utbildningen av en maskininlärningsmodell för detektering och klassificering av anomalier på Stockholms vägar. På grund av bristen på en märkt dataset är den första fasen av arbetet att upptäcka olika slags avvikare som kan hittas och manuellt märka dem utifrån resultaten av en datautforskningsstudie. Dataset som innehåller information om olyckor och väder ingår också för att ytterligare öka antalet anomalier. Alla experiment använder realtidsdataset från antingen sensorerna på Stockholms vägar eller från officiella olyckor och väderrapporter. Därefter utbildas tre modeller (beslutsträd, slumpmässig skog och logistisk regression) för att upptäcka och klassificera outliersna. Utformningen av en Apache Spark streaming-applikation som använder modellen med de bästa resultaten ges också. Resultaten tyder på att logistisk regression är bättre än resten men fortfarande lider av datasetets obalanserade natur. I framtiden kan detta projekt användas för att inte bara bidra till framtida forskning kring liknande ämnen utan även att övervaka Stockholms vägar.
|
236 |
Learning to Grasp Unknown Objects using Weighted Random Forest Algorithm from Selective Image and Point Cloud FeatureIqbal, Md Shahriar 01 January 2014 (has links)
This method demonstrates an approach to determine the best grasping location on an unknown object using Weighted Random Forest Algorithm. It used RGB-D value of an object as input to find a suitable rectangular grasping region as the output. To accomplish this task, it uses a subspace of most important features from a very high dimensional extensive feature space that contains both image and point cloud features. Usage of most important features in the grasping algorithm has enabled the system to be computationally very fast while preserving maximum information gain. In this approach, the Random Forest operates using optimum parameters e.g. Number of Trees, Number of Features at each node, Information Gain Criteria etc. ensures optimization in learning, with highest possible accuracy in minimum time in an advanced practical setting. The Weighted Random Forest chosen over Support Vector Machine (SVM), Decision Tree and Adaboost for implementation of the grasping system outperforms the stated machine learning algorithms both in training and testing accuracy and other performance estimates. The Grasping System utilizing learning from a score function detects the rectangular grasping region after selecting the top rectangle that has the largest score. The system is implemented and tested in a Baxter Research Robot with Parallel Plate Gripper in action.
|
237 |
Inclusive hyper- to dilute-concentrated suspended sediment transport study using modified rouse model: parametrized power-linear coupled approach using machine learningKumar, S., Singh, H.P., Balaji, S., Hanmaiahgari, P.R., Pu, Jaan H. 31 July 2022 (has links)
Yes / The transfer of suspended sediment can range widely from being diluted to being hyperconcentrated, depending on the local flow and ground conditions. Using the Rouse model and the
Kundu and Ghoshal (2017) model, it is possible to look at the sediment distribution for a range of
hyper-concentrated and diluted flows. According to the Kundu and Ghoshal model, the sediment
flow follows a linear profile for the hyper-concentrated flow regime and a power law applies for the
dilute concentrated flow regime. This paper describes these models and how the Kundu and
Ghoshal parameters (linear-law coefficients and power-law coefficients) are dependent on sediment
flow parameters using machine-learning techniques. The machine-learning models used are
XGboost Classifier, Linear Regressor (Ridge), Linear Regressor (Bayesian), K Nearest Neighbours,
Decision Tree Regressor, and Support Vector Machines (Regressor). The models were implemented
on Google Colab and the models have been applied to determine the relationship between every
Kundu and Ghoshal parameter with each sediment flow parameter (mean concentration, Rouse
number, and size parameter) for both a linear profile and a power-law profile. The models correctly
calculated the suspended sediment profile for a range of flow conditions ( 0.268 𝑚𝑚𝑚𝑚 ≤ 𝑑𝑑50 ≤
2.29 𝑚𝑚𝑚𝑚, 0.00105 𝑔𝑔
𝑚𝑚𝑚𝑚3 ≤ particle density ≤ 2.65 𝑔𝑔
𝑚𝑚𝑚𝑚3 , 0.197 𝑚𝑚𝑚𝑚
𝑠𝑠 ≤ 𝑣𝑣𝑠𝑠 ≤ 96 𝑚𝑚𝑚𝑚
𝑠𝑠 , 7.16 𝑚𝑚𝑚𝑚
𝑠𝑠 ≤ 𝑢𝑢∗ ≤
63.3 𝑚𝑚𝑚𝑚
𝑠𝑠 , 0.00042 ≤ 𝑐𝑐̅≤ 0.54), including a range of Rouse numbers (0.0076 ≤ 𝑃𝑃 ≤ 23.5). The models
showed particularly good accuracy for testing at low and extremely high concentrations for type I
to III profiles.
|
238 |
Assessing Machine Learning Algorithms to Develop Station-based Forecasting Models for Public Transport : Case Study of Bus Network in StockholmMovaghar, Mahsa January 2022 (has links)
Public transport is essential for both residents and city planners because of its environmentally and economically beneficial characteristics. During the past decade climatechange, coupled with fuel and energy crises have attracted significant attention toward public transportation. Increasing the demand for public transport on the one hand and its complexity on the other hand have made the optimum network design quite challenging for city planners. The ridership is affected by numerous variables and features like space and time. These fluctuations, coupled with inherent uncertaintiesdue to different travel behaviors, make this procedure challenging. Any demand and supply mismatching can result in great user dissatisfaction and waste of energy on the horizon. During the past years, due to recent technologies in recording and storing data and advances in data analysis techniques, finding patterns, and predicting ridership based on historical data have improved significantly. This study aims to develop forecasting models by regressing boardings toward population, time of day, month, and station. Using the available boarding dataset for blue bus line number 4 in Stockholm, Sweden, seven different machine learning algorithms were assessed for prediction: Multiple Linear Regression, Decision Tree, Random Forest, Bayesian Ridge Regression, Neural Networks, Support Vector Machines, K-Nearest Neighbors. The models were trained and tested on the dataset from 2012 to 2019, before the start of the pandemic. The best model, KNN, with an average R-squared of 0.65 in 10-fold cross-validation was accepted as the best model. This model is then used to predict reduced ridership during the pandemic in 2020 and 2021. The results showed a reduction of 48.93% in 2020 and 82.24% in 2021 for the studied bus line.
|
239 |
A Cloud-Based Intelligent and Energy Efficient Malware Detection Framework. A Framework for Cloud-Based, Energy Efficient, and Reliable Malware Detection in Real-Time Based on Training SVM, Decision Tree, and Boosting using Specified Heuristics Anomalies of Portable Executable FilesMirza, Qublai K.A. January 2017 (has links)
The continuity in the financial and other related losses due to cyber-attacks prove the substantial growth of malware and their lethal proliferation techniques. Every successful malware attack highlights the weaknesses in the defence mechanisms responsible for securing the targeted computer or a network. The recent cyber-attacks reveal the presence of sophistication and intelligence in malware behaviour having the ability to conceal their code and operate within the system autonomously. The conventional detection mechanisms not only possess the scarcity in malware detection capabilities, they consume a large amount of resources while scanning for malicious entities in the system. Many recent reports have highlighted this issue along with the challenges faced by the alternate solutions and studies conducted in the same area. There is an unprecedented need of a resilient and autonomous solution that takes proactive approach against modern malware with stealth behaviour. This thesis proposes a multi-aspect solution comprising of an intelligent malware detection framework and an energy efficient hosting model. The malware detection framework is a combination of conventional and novel malware detection techniques. The proposed framework incorporates comprehensive feature heuristics of files generated by a bespoke static feature extraction tool. These comprehensive heuristics are used to train the machine learning algorithms; Support Vector Machine, Decision Tree, and Boosting to differentiate between clean and malicious files. Both these techniques; feature heuristics and machine learning are combined to form a two-factor detection mechanism. This thesis also presents a cloud-based energy efficient and scalable hosting model, which combines multiple infrastructure components of Amazon Web Services to host the malware detection framework. This hosting model presents a client-server architecture, where client is a lightweight service running on the host machine and server is based on the cloud. The proposed framework and the hosting model were evaluated individually and combined by specifically designed experiments using separate repositories of clean and malicious files. The experiments were designed to evaluate the malware detection capabilities and energy efficiency while operating within a system. The proposed malware detection framework and the hosting model showed significant improvement in malware detection while consuming quite low CPU resources during the operation.
|
240 |
Decision Trees for Classification of Repeated MeasurementsHolmberg, Julianna January 2024 (has links)
Classification of data from repeated measurements is useful in various disciplines, for example that of medicine. This thesis explores how classification trees (CART) can be used for classifying repeated measures data. The reader is introduced to variations of the CART algorithm which can be used for classifying the data set and tests the performance of these algorithms on a data set that can be modelled using bilinear regression. The performance is compared with that of a classification rule based on linear discriminant analysis. It is found that while the performance of the CART algorithm can be satisfactory, using linear discriminant analysis is more reliable for achieving good results. / Klassificering av data från upprepade mätningar är användbart inom olika discipliner, till exempel medicin. Denna uppsats undersöker hur klassificeringsträd (CART) kan användas för att klassificera upprepade mätningar. Läsaren introduceras till varianter av CART-algoritmen som kan användas för att klassificera datamängden och testar prestandan för dessa algoritmer på en datamängd som kan modelleras med hjälp av bilinjär regression. Prestandan jämförs med en klassificeringsregel baserad på linjär diskriminantanalys. Det har visar sig att även om prestandan för CART-algoritmen kan vara tillfredsställande, är användning av linjär diskriminantanalys mer tillförlitlig för att uppnå goda resultat.
|
Page generated in 0.0505 seconds