Global ETD Search

251	[en] A METHOD FOR INTERPRETING CONCEPT DRIFTS IN A STREAMING ENVIRONMENT / [pt] UM MÉTODO PARA INTERPRETAÇÃO DE MUDANÇAS DE REGIME EM UM AMBIENTE DE STREAMING JOAO GUILHERME MATTOS DE O SANTOS 10 August 2021 (has links) [pt] Em ambientes dinâmicos, os modelos de dados tendem a ter desempenho insatisfatório uma vez que a distribuição subjacente dos dados muda. Este fenômeno é conhecido como Concept Drift. Em relação a este tema, muito esforço tem sido direcionado ao desenvolvimento de métodos capazes de detectar tais fenômenos com antecedência suficiente para que os modelos possam se adaptar. No entanto, explicar o que levou ao drift e entender suas consequências ao modelo têm sido pouco explorado pela academia. Tais informações podem mudar completamente a forma como adaptamos os modelos. Esta dissertação apresenta uma nova abordagem, chamada Detector de Drift Interpretável, que vai além da identificação de desvios nos dados. Ele aproveita a estrutura das árvores de decisão para prover um entendimento completo de um drift, ou seja, suas principais causas, as regiões afetadas do modelo e sua severidade. / [en] In a dynamic environment, models tend to perform poorly once the underlying distribution shifts. This phenomenon is known as Concept Drift. In the last decade, considerable research effort has been directed towards developing methods capable of detecting such phenomena early enough so that models can adapt. However, not so much consideration is given to explain the drift, and such information can completely change the handling and understanding of the underlying cause. This dissertation presents a novel approach, called Interpretable Drift Detector, that goes beyond identifying drifts in data. It harnesses decision trees’ structure to provide a thorough understanding of a drift, i.e., its principal causes, the affected regions of a tree model, and its severity. Moreover, besides all information it provides, our method also outperforms benchmark drift detection methods in terms of falsepositive rates and true-positive rates across several different datasets available in the literature. [pt] ARVORE DE DECISAO [pt] ENTENDIMENTO DE DRIFT [pt] INTERPRETACAO DE DRIFT [pt] DETECCAO DE DRIFT [pt] MINERACAO DE DADOS DE TRAJETORIAS [en] DECISION TREE [en] DRIFT UNDERSTANDING [en] DRIFT INTERPRETATION [en] DRIFT DETECTION [en] TRAJECTORY DATA MINING
252	[pt] OTIMIZAÇÃO DE ESTRATÉGIAS DINÂMICAS DE COMERCIALIZAÇÃO DE ENERGIA COM RESTRIÇÕES DE RISCO SOB INCERTEZAS DE CURTO E LONGO PRAZO / [en] RISK-CONSTRAINED OPTIMAL DYNAMIC TRADING STRATEGIES UNDER SHORT- AND LONG-TERM UNCERTAINTIES ANA SOFIA VIOTTI DAKER ARANHA 23 November 2021 (has links) [pt] Mudanças recentes em mercados de energia com alta penetração de fontes renováveis destacaram a necessidade de estratégias complexas que, além de maximizar o lucro, proporcionam proteção contra a volatilidade de preços e incerteza na geração. Neste contexto, este trabalho propõe um modelo dinâmico para representar a tomada de decisão sequencial no cenário atual. Ao contrário de trabalhos relatados anteriormente, este método fornece uma estrutura para considerar as incertezas nos níveis estratégico (longo prazo) e operacional (curto prazo) simultaneamente. É utilizado um modelo de programação estocástica multiestágio em que as correlações entre previsões de vazão, geração renovável, preços spot e preços contratuais são consideradas por meio de uma árvore de decisão multi-escala. Além disso, a aversão ao risco do agente comercializador é considerada por meio de restrições intuitivas e consistentes no tempo. É apresentado um estudo de caso do setor elétrico brasileiro, no qual dados reais foram utilizados para definir a estratégia ótima de comercialização de um gerador de energia eólica, condicionada à evolução futura dos preços de mercado. O modelo fornece ao comercializador informações úteis, como o montante contratado ideal, além do momento ótimo de negociação e duração dos contratos. Além disso, o valor desta solução é demonstrado quando comparado a abordagens estáticas, através de uma medida de desempenho baseada no equivalente de certo do problema multiestágio. / [en] Recent market changes in power systems with high renewable energy penetration highlighted the need for complex profit maximization and protection against price volatility and generation uncertainty. This work proposes a dynamic model to represent sequential decision making in this current scenario. Unlike previously reported works, we contemplate uncertainties in both strategic (long-term) and operational (short-term) levels, all considered as pathdependent stochastic processes. The problem is represented as a multistage stochastic programming model in which the correlations between inflow forecasts, renewable generation, spot and contract prices are accounted for by means of interconnected long- and short-term decision trees. Additionally, risk aversion is considered through intuitive time-consistent constraints. A case study of the Brazilian power sector is presented, in which real data was used to define the optimal trading strategy of a wind power generator, conditioned to the future evolution of market prices. The model provides the trader with useful information such as the optimal contractual amount, settlement timing, and term. Furthermore, the value of this solution is demonstrated when compared to state-of-the-art static approaches using a multistage-based certainty equivalent performance measure. [pt] GERENCIAMENTO DE RISCOS [pt] OTIMIZACAO DINAMICA ESTOCASTICA [pt] SELECAO DE PORTFOLIO [pt] COMERCIALIZACAO DE ENERGIA [pt] ARVORE DE DECISAO [en] RISK MANAGEMENT [en] MULTISTAGE STOCHASTIC OPTIMIZATION [en] PORTFOLIO SELECTION [en] ENERGY COMMERCIALIZATION [en] DECISION TREE
253	Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris / Using machine learning to extract information from real estate listings in order to maximize selling price Ekeberg, Lukas, Fahnehjelm, Alexander January 2018 (has links) The Swedish real estate market has been digitalized over the past decade with the current practice being to post your real estate advertisement online. A question that has arisen is how a seller can optimize their public listing to maximize the selling premium. This paper analyzes the use of three machine learning methods to solve this problem: Linear Regression, Decision Tree Regressor and Random Forest Regressor. The aim is to retrieve information regarding how certain attributes contribute to the premium value. The dataset used contains apartments sold within the years of 2014-2018 in the Östermalm / Djurgården district in Stockholm, Sweden. The resulting models returned an R2-value of approx. 0.26 and Mean Absolute Error of approx. 0.06. While the models were not accurate regarding prediction of premium, information was still able to be extracted from the models. In conclusion, a high amount of views and a publication made in April provide the best conditions for an advertisement to reach a high selling premium. The seller should try to keep the amount of days since publication lower than 15.5 days and avoid publishing on a Tuesday. / Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar. correlation linear regression decision tree regressor random forest regressor gini impurity pricing property market data features predictive models machine learning algorithms Computer and Information Sciences Data- och informationsvetenskap
254	Detection and Classification of Anomalies in Road Traffic using Spark Streaming Consuegra Rengifo, Nathan Adolfo January 2018 (has links) Road traffic control has been around for a long time to guarantee the safety of vehicles and pedestrians. However, anomalies such as accidents or natural disasters cannot be avoided. Therefore, it is important to be prepared as soon as possible to prevent a higher number of human losses. Nevertheless, there is no system accurate enough that detects and classifies anomalies from the road traffic in real time. To solve this issue, the following study proposes the training of a machine learning model for detection and classification of anomalies on the highways of Stockholm. Due to the lack of a labeled dataset, the first phase of the work is to detect the different kind of outliers that can be found and manually label them based on the results of a data exploration study. Datasets containing information regarding accidents and weather are also included to further expand the amount of anomalies. All experiments use real world datasets coming from either the sensors located on the highways of Stockholm or from official accident and weather reports. Then, three models (Decision Trees, Random Forest and Logistic Regression) are trained to detect and classify the outliers. The design of an Apache Spark streaming application that uses the model with the best results is also provided. The outcomes indicate that Logistic Regression is better than the rest but still suffers from the imbalanced nature of the dataset. In the future, this project can be used to not only contribute to future research on similar topics but also to monitor the highways of Stockholm. / Vägtrafikkontroll har funnits länge för att garantera säkerheten hos fordon och fotgängare. Emellertid kan avvikelser som olyckor eller naturkatastrofer inte undvikas. Därför är det viktigt att förberedas så snart som möjligt för att förhindra ett större antal mänskliga förluster. Ändå finns det inget system som är noggrannt som upptäcker och klassificerar avvikelser från vägtrafiken i realtid. För att lösa detta problem föreslår följande studie utbildningen av en maskininlärningsmodell för detektering och klassificering av anomalier på Stockholms vägar. På grund av bristen på en märkt dataset är den första fasen av arbetet att upptäcka olika slags avvikare som kan hittas och manuellt märka dem utifrån resultaten av en datautforskningsstudie. Dataset som innehåller information om olyckor och väder ingår också för att ytterligare öka antalet anomalier. Alla experiment använder realtidsdataset från antingen sensorerna på Stockholms vägar eller från officiella olyckor och väderrapporter. Därefter utbildas tre modeller (beslutsträd, slumpmässig skog och logistisk regression) för att upptäcka och klassificera outliersna. Utformningen av en Apache Spark streaming-applikation som använder modellen med de bästa resultaten ges också. Resultaten tyder på att logistisk regression är bättre än resten men fortfarande lider av datasetets obalanserade natur. I framtiden kan detta projekt användas för att inte bara bidra till framtida forskning kring liknande ämnen utan även att övervaka Stockholms vägar. anomaly detection traffic flow accidents weather decision tree random forest logistic regression streaming. anomalitetsdetektering trafikflöde olyckor väder beslutsträd slumpmässig skog logistisk regression streaming. Computer and Information Sciences Data- och informationsvetenskap
255	Learning to Grasp Unknown Objects using Weighted Random Forest Algorithm from Selective Image and Point Cloud Feature Iqbal, Md Shahriar 01 January 2014 (has links) This method demonstrates an approach to determine the best grasping location on an unknown object using Weighted Random Forest Algorithm. It used RGB-D value of an object as input to find a suitable rectangular grasping region as the output. To accomplish this task, it uses a subspace of most important features from a very high dimensional extensive feature space that contains both image and point cloud features. Usage of most important features in the grasping algorithm has enabled the system to be computationally very fast while preserving maximum information gain. In this approach, the Random Forest operates using optimum parameters e.g. Number of Trees, Number of Features at each node, Information Gain Criteria etc. ensures optimization in learning, with highest possible accuracy in minimum time in an advanced practical setting. The Weighted Random Forest chosen over Support Vector Machine (SVM), Decision Tree and Adaboost for implementation of the grasping system outperforms the stated machine learning algorithms both in training and testing accuracy and other performance estimates. The Grasping System utilizing learning from a score function detects the rectangular grasping region after selecting the top rectangle that has the largest score. The system is implemented and tested in a Baxter Research Robot with Parallel Plate Gripper in action. Robotic grasping weighted random forest support vector machine decision tree adaboost image and point cloud feature baxter research robot Electrical and Computer Engineering Electrical and Electronics Engineering
256	Inclusive hyper- to dilute-concentrated suspended sediment transport study using modified rouse model: parametrized power-linear coupled approach using machine learning Kumar, S., Singh, H.P., Balaji, S., Hanmaiahgari, P.R., Pu, Jaan H. 31 July 2022 (has links) Yes / The transfer of suspended sediment can range widely from being diluted to being hyperconcentrated, depending on the local flow and ground conditions. Using the Rouse model and the Kundu and Ghoshal (2017) model, it is possible to look at the sediment distribution for a range of hyper-concentrated and diluted flows. According to the Kundu and Ghoshal model, the sediment flow follows a linear profile for the hyper-concentrated flow regime and a power law applies for the dilute concentrated flow regime. This paper describes these models and how the Kundu and Ghoshal parameters (linear-law coefficients and power-law coefficients) are dependent on sediment flow parameters using machine-learning techniques. The machine-learning models used are XGboost Classifier, Linear Regressor (Ridge), Linear Regressor (Bayesian), K Nearest Neighbours, Decision Tree Regressor, and Support Vector Machines (Regressor). The models were implemented on Google Colab and the models have been applied to determine the relationship between every Kundu and Ghoshal parameter with each sediment flow parameter (mean concentration, Rouse number, and size parameter) for both a linear profile and a power-law profile. The models correctly calculated the suspended sediment profile for a range of flow conditions ( 0.268 𝑚𝑚𝑚𝑚 ≤ 𝑑𝑑50 ≤ 2.29 𝑚𝑚𝑚𝑚, 0.00105 𝑔𝑔 𝑚𝑚𝑚𝑚3 ≤ particle density ≤ 2.65 𝑔𝑔 𝑚𝑚𝑚𝑚3 , 0.197 𝑚𝑚𝑚𝑚 𝑠𝑠 ≤ 𝑣𝑣𝑠𝑠 ≤ 96 𝑚𝑚𝑚𝑚 𝑠𝑠 , 7.16 𝑚𝑚𝑚𝑚 𝑠𝑠 ≤ 𝑢𝑢∗ ≤ 63.3 𝑚𝑚𝑚𝑚 𝑠𝑠 , 0.00042 ≤ 𝑐𝑐̅≤ 0.54), including a range of Rouse numbers (0.0076 ≤ 𝑃𝑃 ≤ 23.5). The models showed particularly good accuracy for testing at low and extremely high concentrations for type I to III profiles. Rouse number Mean concentration Suspended sediment transport Sediment size parameter Parameterized power-linear model Machine learning Decision tree regressor Support vector machines
257	Assessing Machine Learning Algorithms to Develop Station-based Forecasting Models for Public Transport : Case Study of Bus Network in Stockholm Movaghar, Mahsa January 2022 (has links) Public transport is essential for both residents and city planners because of its environmentally and economically beneficial characteristics. During the past decade climatechange, coupled with fuel and energy crises have attracted significant attention toward public transportation. Increasing the demand for public transport on the one hand and its complexity on the other hand have made the optimum network design quite challenging for city planners. The ridership is affected by numerous variables and features like space and time. These fluctuations, coupled with inherent uncertaintiesdue to different travel behaviors, make this procedure challenging. Any demand and supply mismatching can result in great user dissatisfaction and waste of energy on the horizon. During the past years, due to recent technologies in recording and storing data and advances in data analysis techniques, finding patterns, and predicting ridership based on historical data have improved significantly. This study aims to develop forecasting models by regressing boardings toward population, time of day, month, and station. Using the available boarding dataset for blue bus line number 4 in Stockholm, Sweden, seven different machine learning algorithms were assessed for prediction: Multiple Linear Regression, Decision Tree, Random Forest, Bayesian Ridge Regression, Neural Networks, Support Vector Machines, K-Nearest Neighbors. The models were trained and tested on the dataset from 2012 to 2019, before the start of the pandemic. The best model, KNN, with an average R-squared of 0.65 in 10-fold cross-validation was accepted as the best model. This model is then used to predict reduced ridership during the pandemic in 2020 and 2021. The results showed a reduction of 48.93% in 2020 and 82.24% in 2021 for the studied bus line. Public transport ridership machine learning Multiple Linear Regression Decision Tree Random Forest Bayesian Ridge Regression Neural Networks Support Vector Machines K-Nearest Neighbors Engineering and Technology Teknik och teknologier
258	A Cloud-Based Intelligent and Energy Efficient Malware Detection Framework. A Framework for Cloud-Based, Energy Efficient, and Reliable Malware Detection in Real-Time Based on Training SVM, Decision Tree, and Boosting using Specified Heuristics Anomalies of Portable Executable Files Mirza, Qublai K.A. January 2017 (has links) The continuity in the financial and other related losses due to cyber-attacks prove the substantial growth of malware and their lethal proliferation techniques. Every successful malware attack highlights the weaknesses in the defence mechanisms responsible for securing the targeted computer or a network. The recent cyber-attacks reveal the presence of sophistication and intelligence in malware behaviour having the ability to conceal their code and operate within the system autonomously. The conventional detection mechanisms not only possess the scarcity in malware detection capabilities, they consume a large amount of resources while scanning for malicious entities in the system. Many recent reports have highlighted this issue along with the challenges faced by the alternate solutions and studies conducted in the same area. There is an unprecedented need of a resilient and autonomous solution that takes proactive approach against modern malware with stealth behaviour. This thesis proposes a multi-aspect solution comprising of an intelligent malware detection framework and an energy efficient hosting model. The malware detection framework is a combination of conventional and novel malware detection techniques. The proposed framework incorporates comprehensive feature heuristics of files generated by a bespoke static feature extraction tool. These comprehensive heuristics are used to train the machine learning algorithms; Support Vector Machine, Decision Tree, and Boosting to differentiate between clean and malicious files. Both these techniques; feature heuristics and machine learning are combined to form a two-factor detection mechanism. This thesis also presents a cloud-based energy efficient and scalable hosting model, which combines multiple infrastructure components of Amazon Web Services to host the malware detection framework. This hosting model presents a client-server architecture, where client is a lightweight service running on the host machine and server is based on the cloud. The proposed framework and the hosting model were evaluated individually and combined by specifically designed experiments using separate repositories of clean and malicious files. The experiments were designed to evaluate the malware detection capabilities and energy efficiency while operating within a system. The proposed malware detection framework and the hosting model showed significant improvement in malware detection while consuming quite low CPU resources during the operation. Malware detection File heuristics Support vector machine (SVM) Decision tree Boosting Cloud computing Energy efficiency Real-time detection Automated static analysis Portable executable (PE)
259	Decision Trees for Classification of Repeated Measurements Holmberg, Julianna January 2024 (has links) Classification of data from repeated measurements is useful in various disciplines, for example that of medicine. This thesis explores how classification trees (CART) can be used for classifying repeated measures data. The reader is introduced to variations of the CART algorithm which can be used for classifying the data set and tests the performance of these algorithms on a data set that can be modelled using bilinear regression. The performance is compared with that of a classification rule based on linear discriminant analysis. It is found that while the performance of the CART algorithm can be satisfactory, using linear discriminant analysis is more reliable for achieving good results. / Klassificering av data från upprepade mätningar är användbart inom olika discipliner, till exempel medicin. Denna uppsats undersöker hur klassificeringsträd (CART) kan användas för att klassificera upprepade mätningar. Läsaren introduceras till varianter av CART-algoritmen som kan användas för att klassificera datamängden och testar prestandan för dessa algoritmer på en datamängd som kan modelleras med hjälp av bilinjär regression. Prestandan jämförs med en klassificeringsregel baserad på linjär diskriminantanalys. Det har visar sig att även om prestandan för CART-algoritmen kan vara tillfredsställande, är användning av linjär diskriminantanalys mer tillförlitlig för att uppnå goda resultat. Repeated Measurement Data Growth Curve Model Linear Discriminant Analysis Decision Tree Bootstrap Aggregating CART CART-LC Probability Theory and Statistics Sannolikhetsteori och statistik
260	[en] DEALING WITH DECISION POINTS IN PROCESS MINING / [pt] TRATANDO PONTOS DE DECISÃO EM MINERAÇÃO DE PROCESSOS DANIEL DUQUE GUIMARAES SARAIVA 26 April 2019 (has links) [pt] Devido ao grande aumento da competitividade e da, cada vez maior, demanda por eficiência, muitas empresas perceberam que é necessário repensar e melhorar seus processos. Para atingir este objetivo, elas têm cada vez mais buscado técnicas computacionais que sejam capazes de extrair novas informações e conhecimentos de suas grandes bases de dados. Os processos das empresas, normalmente, possuem momentos em que uma decisão deve ser tomada. É razoável esperar que casos similares tenham decisões parecidas sendo tomadas ao longo do processo. O objetivo desta dissertação é criar um minerador de decisão que seja capaz the automatizar a tomada de decisão dentro de um processo. A primeira parte do trabalho consiste na identificação dos pontos de decisão em uma rede de Petri. Em seguida, transformamos a tomada de decisão em um problema de classificação no qual cada possibilidade da decisão se torna uma classe. Para fazer a automatização, é utilizada uma árvore de decisão treinada com os atributos dos dados que estão presentes nos logs dos eventos. Um estudo de caso real é utilizado para validar que o minerador de decisão é confiável para processos reais. / [en] Due to the increasing competitiveness and demand for higher performance, many companies realized that it is necessary to rethink and enhance their business processes. In order to achieve this goal, companies have been turning to computational techniques that are capable of extracting new information and insights from their, ever-increasing, datasets. Business processes, normally, have many places where a decision has to be made. It is reasonable to expect that similar inputs have the same decisions made to them during the process. The goal of this dissertation is to create a decision miner that automates the decision-making inside a process. First, we will identify decision points in a Petri net model. Then, we will transform the decision-making problem into a classification one, where each of the possible decisions becomes a class. In order to automate the decision-making, a decision tree is trained using data attributes from the event logs. A real world case study is used to validate that the decision miner is reliable when using real world data. [pt] ARVORE DE DECISAO [pt] PONTO DE DECISAO [pt] MINERACAO DE DECISAO [pt] MINERACAO DE PROCESSOS [pt] REDES DE PETRI [en] DECISION TREE [en] PROCESS MINING [en] PETRI NETS

Search results