Spelling suggestions: "subject:"[een] DECISION TREE"" "subject:"[enn] DECISION TREE""
161 |
Exploring Alarm Data for Improved Return Prediction in Radios : A Study on Imbalanced Data ClassificationFärenmark, Sofia January 2023 (has links)
The global tech company Ericsson has been tracking the return rate of their products for over 30 years, using it as a key performance indicator (KPI). These KPIs play a critical role in making sound business decisions, identifying areas for improvement, and planning. To enhance the customer experience, the company highly values the ability to predict the number of returns in advance each month. However, predicting returns is a complex problem affected by multiple factors that determine when radios are returned. Analysts at the company have observed indications of a potential correlation between alarm data and the number of returns. This paper aims to address the need for better prediction models to improve return rate forecasting for radios, utilizing alarm data. The alarm data, which is stored in an internal database, includes logs of activated alarms at various sites, along with technical and logistical information about the products, as well as the historical records of returns. The problem is approached as a classification task, where radios are classified as either "return" or "no return" for a specific month, using the alarm dataset as input. However, due to the significantly smaller number of returned radios compared to the distributed ones, the dataset suffers from a heavy class imbalance. The imbalance class problem has garnered considerable attention in the field of machine learning in recent years, as traditional classification models struggle to identify patterns in the minority class of imbalanced datasets. Therefore, a specific method that addresses the imbalanced class problem was required to construct an effective prediction model for returns. Therefore, this paper has adopted a systematic approach inspired by similar problems. It applies the feature selection methods LASSO and Boruta, along with the resampling technique SMOTE, and evaluates various classifiers including the Support vector machine (SVM), Random Forest classifier (RFC), Decision tree (DT), and a Neural network (NN) with weights to identify the best-performing model. As accuracy is not suitable as an evaluation metric for imbalanced datasets, the AUC and AUPRC values were calculated for all models to assess the impact of feature selection, weights, resampling techniques, and the choice of classifier. The best model was determined to be the NN with weights, achieving a median AUC value of 0.93 and a median AUPRC value of 0.043. Likewise, both the LASSO+SVM+SMOTE and LASSO+RFC+SMOTE models demonstrated similar performance with median AUC values of 0.92 and 0.93, and median AUPRC values of 0.038 and 0.041, respectively. The baseline for the AUPRC value for this data set was 0.005. Furthermore, the results indicated that resampling techniques are necessary for successful classification of the minority class. Thorough pre-processing and a balanced split between the test and training sets are crucial before applying resampling, as this technique is sensitive to noisy data. While feature selection improved performance to some extent, it could also lead to unreadable results due to noise. The choice of classifier did not have an equal impact on model performance compared to the effects of resampling and feature selection.
|
162 |
[en] DECISION DIAGRAMS FOR CLASSIFICATION: NEW CONSTRUCTIVE APPROACHES / [pt] DIAGRAMAS DE DECISÃO PARA CLASSIFICAÇÃO: NOVAS ABORDAGENS CONSTRUTIVASPEDRO SARMENTO BARBOSA MARTINS 16 October 2023 (has links)
[pt] Diagramas de decisão são uma generalização de árvores de decisão, já
propostos como um modelo de aprendizado de máquina para classificação supervisionada mas não largamente adotados. A razão é a dificuldade em treinar
o modelo, já que o requerimento de decidir splits (partições) e merges (uniões
de nós) em conjunto pode levar a problemas difíceis de otimização combinatória. Um diagrama de decisão tem importantes vantagens sobre árvores de
decisão, pois melhor expressa conceitos binários disjuntos, evitando o problema
de duplicação de subárvores e, portanto, apresentando menos fragmentação em
nós internos. Por esse motivo, desenvolver algoritmos efetivos de construção é
um esforço importante. Nesse contexto, o algoritmo Optimal Decision Diagram
(ODD) foi recentemente proposto, formulando a construção do diagrama com
programação inteira mista (MILP na sigla em inglês), com um warm start proveniente de uma heurística construtiva gulosa. Experimentos mostraram que
essa heurística poderia ser aperfeiçoada, a fim de encontrar soluções próximas
do ótimo de maneira mais efetiva, e por sua vez prover um warm start melhor.
Nesse estudo, reportamos aperfeiçoamentos para essa heurística construtiva,
sendo eles a randomização das decisões de split, a poda de fluxos puros (ou
seja, fluxos de exemplos pertencentes a uma única classe), e aplicando uma
poda bottom-up (de baixo para cima), que considera a complexidade do modelo além da sua acurácia. Todos os aperfeiçoamentos propostos têm efeitos
positivos na acurácia e generalização, assim como no valor objetivo do algoritmo ODD. A poda bottom-up, em especial, tem impacto significativo no valor
objetivo, e portanto na capacidade da formulação MILP de encontrar soluções
ótimas. Ademais, provemos experimentos sobre a expressividade de diagramas
de decisão em comparação a árvores no contexto de pequenas funções booleanas em Forma Normal Disjuntiva (DNF na sigla em inglês), assim como uma
aplicação web para a exploração visual dos métodos construtivos propostos. / [en] Decision diagrams are a generalization of decision trees. They have
been repeatedly proposed as a supervised classification model for machine
learning but have not been widely adopted. The reason appears to be the
difficulty of training the model, as the requirement of deciding splits and
merging nodes can lead to difficult combinatorial optimization problems.
A decision diagram has marked advantages over decision trees because it
better models disjoint binary concepts, avoiding the replication of subtrees
and thus has less sample fragmentation in internal nodes. Because of this,
devising an effective construction algorithm is important. In this context, the
Optimal Decision Diagram (ODD) algorithm was recently proposed, which
formulates the problem of building a diagram as a mixed-integer linear program
(MILP), with a warm start provided by a greedy constructive heuristic. Initial
experiments have shown that this heuristic can be improved upon, in order
to find close-to-optimal solutions more effectively and in turn provide the
MILP with a better warm start. In this study, we report improvements to this
constructive heuristic, by randomizing the split decisions, pruning pure flows
(i.e. flows with samples from a single class), and applying bottom-up pruning,
which considers the complexity of the model in addition to its accuracy. All
proposed improvements have positive effects on accuracy and generalization,
as well as the objective value of the ODD algorithm. The bottom-up pruning
strategy, in particular, has a substantial impact on the objective value, and
thus on the ability of the MILP solver to find optimal solutions. In addition, we
provide experiments on the expressiveness of decision diagrams when compared
to trees in the context of small boolean functions in Disjoint Normal Form
(DNF), as well as a web application for the visual exploration of the proposed
constructive approaches.
|
163 |
A REVIEW AND ANALYSIS OF THE LINKED DECISIONS IN THE CONFISCATION OF ILLEGALLY TRADED TURTLESSmith, Desiree 14 November 2023 (has links) (PDF)
Over the last few decades, freshwater turtles have become more common in the global illegal wildlife trade because of the growing demand in the pet trade. Illegally traded turtles may be intercepted and deposited by a number of agencies. However, when turtles are confiscated, many uncertainties and risks make releasing them back to the wild difficult. Therefore, we used tools from decision analysis to achieve the following three objectives: (1) to identify points of intervention in illegal turtle trade using conceptual models, (2) to outline the linked decisions for turtle confiscation and repatriation using decision trees, and (3) to evaluate the decision trees for two example scenarios, one with complete information and one with uncertainty. We used the wood turtle (Glyptemys insculpta) as a case study, which is a species of conservation concern, in part due to illegal wildlife trafficking. We conducted informational interviews of biologists, law enforcement, land managers, and zoo staff, which we refer to as a decision makers. Interviews revealed that decisions regarding the disposition of confiscated turtles are complicated by uncertainty in disease status and potential differences in origin and confiscation locations. Decision makers that handle confiscated turtles also recognize that their decisions are linked, where linkages rely on personal contacts. In evaluating our decision trees, we found that despite different amounts and kinds of uncertainties, release of the confiscated wood turtles to the wild provided the highest conservation value. Collectively, our research shows how the use of decision trees can help improve decision making in the face of uncertainty.
|
164 |
Tillämpning av maskininlärning för att införa automatisk adaptiv uppvärmning genom en studie på KTH Live-In Labs lägenheterVik, Emil, Åsenius, Ingrid January 2020 (has links)
The purpose of this study is to investigate if it is possible to decrease Sweden's energy consumption through adaptive heating that uses climate data to detect occupancy in apartments using machine learning. The application of the study has been made using environmental data from one of KTH Live-In Labs apartments. The data was first used to investigate the possibility to detect occupancy through machine learning and was then used as input in an adaptive heating model to investigate potential benefits on the energy consumption and costs of heating. The result of the study show that occupancy can be detected using environmental data but not with 100% accuracy. It also shows that the features that have greatest impact in detecting occupancy is light and carbon dioxide and that the best performing machine learning algorithm, for the used dataset, is the Decision Tree algorithm. The potential energy savings through adaptive heating was estimated to be up to 10,1%. In the final part of the paper, it is discussed how a value creating service can be created around adaptive heating and its possibility to reach the market.
|
165 |
A Data Analytics Framework for Regional Voltage ControlYang, Duotong 16 August 2017 (has links)
Modern power grids are some of the largest and most complex engineered systems. Due to economic competition and deregulation, the power systems are operated closer their security limit. When the system is operating under a heavy loading condition, the unstable voltage condition may cause a cascading outage. The voltage fluctuations are presently being further aggravated by the increasing integration of utility-scale renewable energy sources. In this regards, a fast response and reliable voltage control approach is indispensable.
The continuing success of synchrophasor has ushered in new subdomains of power system applications for real-time situational awareness, online decision support, and offline system diagnostics. The primary objective of this dissertation is to develop a data analytic based framework for regional voltage control utilizing high-speed data streams delivered from synchronized phasor measurement units. The dissertation focuses on the following three studies: The first one is centered on the development of decision-tree based voltage security assessment and control. The second one proposes an adaptive decision tree scheme using online ensemble learning to update decision model in real time. A system network partition approach is introduced in the last study. The aim of this approach is to reduce the size of training sample database and the number of control candidates for each regional voltage controller. The methodologies proposed in this dissertation are evaluated based on an open source software framework. / Ph. D. / Modern power grids are some of the largest and most complex engineered systems. When the system is heavily loaded, a small contingency may cause a large system blackout. In this regard, a fast response and reliable control approach is indispensable. Voltage is one of the most important metrics to indicate the system condition. This dissertation develops a cost-effective control method to secure the power system based on the real-time voltage measurements. The proposed method is developed based on an open source framework.
|
166 |
Using random forest and decision tree models for a new vehicle prediction approach in computational toxicologyMistry, Pritesh, Neagu, Daniel, Trundle, Paul R., Vessey, J.D. 22 October 2015 (has links)
Yes / Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.
|
167 |
The Foundation of Pattern Structures and their ApplicationsLumpe, Lars 06 October 2021 (has links)
This thesis is divided into a theoretical part, aimed at developing statements around the newly introduced concept of pattern morphisms, and a practical part, where we present use cases of pattern structures.
A first insight of our work clarifies the facts on projections of pattern structures. We discovered that a projection of a pattern structure does not always lead again to a pattern structure.
A solution to this problem, and one of the most important points of this thesis, is the introduction of pattern morphisms in Chapter4. Pattern morphisms make it possible to describe relationships between pattern structures, and thus enable a deeper understanding of pattern structures in general. They also provide the means to describe projections of pattern structures that lead to pattern structures again. In Chapter5 and Chapter6, we looked at the impact of morphisms between pattern structures on concept lattices and on their representations and thus clarified the theoretical background of existing research in this field.
The application part reveals that random forests can be described through pattern structures, which constitutes another central achievement of our work.
In order to demonstrate the practical relevance of our findings, we included a use case where this finding is used to build an algorithm that solves a real world classification problem of red wines. The prediction accuracy of the random forest is better, but the high interpretability makes our algorithm valuable.
Another approach to the red wine classification problem is presented in Chapter 8, where, starting from an elementary pattern structure, we built a classification model that yielded good results.
|
168 |
A Deep Learning Based Pipeline for Image Grading of Diabetic RetinopathyWang, Yu 21 June 2018 (has links)
Diabetic Retinopathy (DR) is one of the principal sources of blindness due to diabetes mellitus. It can be identified by lesions of the retina, namely microaneurysms, hemorrhages, and exudates. DR can be effectively prevented or delayed if discovered early enough and well-managed. Prior studies on diabetic retinopathy typically extract features manually but are time-consuming and not accurate. In this research, we propose a research framework using advanced retina image processing, deep learning, and a boosting algorithm for high-performance DR grading. First, we preprocess the retina image datasets to highlight signs of DR, then follow by a convolutional neural network to extract features of retina images, and finally apply a boosting tree algorithm to make a prediction based on extracted features. Experimental results show that our pipeline has excellent performance when grading diabetic retinopathy images, as evidenced by scores for both the Kaggle dataset and the IDRiD dataset. / Master of Science / Diabetes is a disease in which insulin can not work very well, that leads to long-term high blood sugar level. Diabetic Retinopathy (DR), a result of diabetes mellitus, is one of the leading causes of blindness. It can be identified by lesions on the surface of the retina. DR can be effectively prevented or delayed if discovered early enough and well-managed. Prior image processing studies of diabetic retinopathy typically detect features manually, like retinal lesions, but are time-consuming and not accurate. In this research, we propose a framework using advanced retina image processing, deep learning, and a boosting decision tree algorithm for high-performance DR grading. Deep learning is a method that can be used to extract features of the image. A boosting decision tree is a method widely used in classification tasks. We preprocess the retina image datasets to highlight signs of DR, followed by deep learning to extract features of retina images. Then, we apply a boosting decision tree algorithm to make a prediction based on extracted features. The results of experiments show that our pipeline has excellent performance when grading the diabetic retinopathy score for both Kaggle and IDRiD datasets.
|
169 |
Implementation of decision trees for embedded systemsBadr, Bashar January 2014 (has links)
This research work develops real-time incremental learning decision tree solutions suitable for real-time embedded systems by virtue of having both a defined memory requirement and an upper bound on the computation time per training vector. In addition, the work provides embedded systems with the capabilities of rapid processing and training of streamed data problems, and adopts electronic hardware solutions to improve the performance of the developed algorithm. Two novel decision tree approaches, namely the Multi-Dimensional Frequency Table (MDFT) and the Hashed Frequency Table Decision Tree (HFTDT) represent the core of this research work. Both methods successfully incorporate a frequency table technique to produce a complete decision tree. The MDFT and HFTDT learning methods were designed with the ability to generate application specific code for both training and classification purposes according to the requirements of the targeted application. The MDFT allows the memory architecture to be specified statically before learning takes place within a deterministic execution time. The HFTDT method is a development of the MDFT where a reduction in the memory requirements is achieved within a deterministic execution time. The HFTDT achieved low memory usage when compared to existing decision tree methods and hardware acceleration improved the performance by up to 10 times in terms of the execution time.
|
170 |
官員職等陞遷分類預測之研究 / Classification prediction on government official’s rank promotion賴隆平, Lai, Long Ping Unknown Date (has links)
公務人員的人事陞遷是一個複雜性極高,其中隱藏著許多不變的定律及過程,長官與部屬、各公務人員人之間的關係,更是如同蜘蛛網狀般的錯綜複雜,而各公務人員的陞遷狀況,更是隱藏著許多派系之間的鬥爭拉扯連動,或是提攜後進的過程,目前透過政府公開的總統府公報-總統令,可以清楚得知所有公務人員的任職相關資料,其中包含各職務之間的陞遷、任命、派免等相關資訊,而每筆資料亦包含機關、單位、職稱及職等資料,可以提供各種研究使用。
本篇係整理出一種陞遷序列的資料模型來進行研究,透過資料探勘的相關演算法-支撐向量機(Support Vector Machine,簡稱SVM)及決策樹(Decision Tree)的方式,並透過人事的領域知識加以找出較具影響力的屬性,來設計實驗的模型,並使用多組模型及多重資料進行實驗,透過整體平均預測結果及圖表方式來呈現各類別的預測狀況,再以不同的屬性資料來運算產生其相對結果,來分析其合理性,最後再依相關數據來評估此一方法的合理及可行性。
透過資料探勘設計的分類預測模型,其支撐向量機與決策樹都具有訓練量越大,展現之預測結果也愈佳之現象,這跟一般模型是相同的,而挖掘的主管職務屬性參數及關鍵屬性構想都跟人事陞遷的邏輯不謀而合,而預測結果雖各有所長,但整體來看則為支撐向量機略勝一籌,惟支撐向量機有一狀況,必須先行排除較不具影響力之屬性參數資料,否則其產生超平面的邏輯運算過程將產生拉扯作用,導致影響其預測結果;而決策樹則無是類狀況,且其應用較為廣泛,可以透過宣告各屬性值的類型,來進行不同屬性資料類型的分類實驗。
而透過支撐向量機與決策樹的產生的預測結果,其正確率為百分之77至82左右,如此顯示出國內中高階文官的陞遷制度是有脈絡可循的,其具有一定的制度規範及穩定性,而非隨意的任免陞遷;如此透過以上資料探勘的應用,藉著此特徵研究提供公務部門在進行人力資源管理、組織發展、陞遷發展以及組織部門精簡規劃上,作為調整設計參考的一些相關資訊;另透過一些相關屬性的輸入,可提供尚在服務的公務人員協助其預估陞遷發展的狀況,以提供其進行相關生涯規劃。 / The employee promotion is a highly complexity task in Government office, it include many invariable laws and the process, between the senior officer and the subordinate, various relationships with other government employees, It’s the similar complex with the spider lattice, and it hides many clique's struggles in Government official’s promotion, and help to process the promote for the junior generation, through the government public presidential palace - presidential order, it‘s able to get clearly information about all government employees’ correlation data, include various related information like promotion, recruitment , and each data also contains the instruction, like the job unit, job title and job rank for all research reference.
It organizes a promoted material model to conduct the research, by the material exploration's related calculating method – Support Vector Machine (SVM) and the decision tree, and through by knowledge of human resource to discover the influence to design the experiment's model, and uses the multi-group models and materials to process, and by this way , it can get various categories result by overall average forecasting and the graph, then operates by different attribute material to get relative result and analyzes its rationality, finally it depends on the correlation data to re-evaluate its method reasonable and feasibility.
To this classification forecast model design, the SVM and the decision tree got better performance together with the good training quality, it’s the same with the general model, and it’s the same view to find the details job description for senior management and employee promotion, however the forecasting result has their own strong points, but for the totally, the SVM is slightly better, only if any accidents occurred, it needs to elimination the attribute parameter material which is not have the big influence, otherwise it will have the planoid logic operation process to produce resist status, and will affect its forecasting result, but the decision tree does not have this problem, and its application is more widespread, it can through by different type to make the different experiment.
The forecasting result through by SVM and decision tree, its correction percentage can be achieved around 77% - 82% , so it indicated the high position level promotion policy should be have its own rules to follow, it has certain system standard and the stability, but non-optional promoted, so trough by the above data mining, follow by this characteristic to provide Government office to do the Human resource management, organization development, employee promotion and simplify planning to the organization, takes the re-design information for reference, In addition through by some related attribute input, it may provide the government employee who is still on duty and assist them to evaluate promotion development for future career plan.
|
Page generated in 0.0671 seconds