Spelling suggestions: "subject:"decisiontree"" "subject:"decisionsdegree""
41 |
CloudIntell: An intelligent malware detection systemMirza, Qublai K.A., Awan, Irfan U., Younas, M. 25 July 2017 (has links)
Yes / Enterprises and individual users heavily rely on the abilities of antiviruses and
other security mechanisms. However, the methodologies used by such software
are not enough to detect and prevent most of the malicious activities and also
consume a huge amount of resources of the host machine for their regular oper-
ations. In this paper, we propose a combination of machine learning techniques
applied on a rich set of features extracted from a large dataset of benign and
malicious les through a bespoke feature extraction tool. We extracted a rich
set of features from each le and applied support vector machine, decision tree,
and boosting on decision tree to get the highest possible detection rate. We also
introduce a cloud-based scalable architecture hosted on Amazon web services to
cater the needs of detection methodology. We tested our methodology against
di erent scenarios and generated high achieving results with lowest energy con-
sumption of the host machine.
|
42 |
Power Efficient Wireless Sensor Node through Edge IntelligenceDamle, Abhishek Priyadarshan 04 August 2022 (has links)
Edge intelligence can reduce power dissipation to enable power-hungry long-range wireless applications. This work applies edge intelligence to quantify the reduction in power dissipation. We designed a wireless sensor node with a LoRa radio and implemented a decision tree classifier, in situ, to classify behaviors of cattle. We estimate that employing edge intelligence on our wireless sensor node reduces its average power dissipation by up to a factor of 50, from 20.10 mW to 0.41 mW. We also observe that edge intelligence increases the link budget without significantly affecting average power dissipation. / Master of Science / Battery powered sensor nodes have access to a limited amount of energy. However, many applications of sensor nodes such as animal monitoring require energy intensive, long range data transmissions. In this work, we used machine learning to process motion data within our sensor node to classify cattle behaviors. We estimate that transmitting processed data dissipates up to 50 times less power when compared to transmitting raw data. Due to the properties of our transmission protocol, we also observe that transmitting processed data increases the range of transmissions without impacting power dissipation.
|
43 |
Event categorisation and Machine-learning Techniques in Searches for Higgs Boson Pairs in the ATLAS Experiment at the LHCEmadi, Milads January 2023 (has links)
This thesis investigates the pair production of Higgs bosons (di-Higgs events) at the ATLAS experiment in the Large Hadron Collider (LHC), focusing on the channel where one Higgs boson decays into two bottom quarks and the other decays into two tau leptons. The main objective was to determine whether introducing a split in the invariant mass of the decay products from the two Higgs bosons (the di-Higgs mass) and using this as an analysis variable improves the sensitivity of the Boosted Decision Tree (BDT) machine learning algorithm to the di-Higgs signal. A mass split was performed at 350 GeV, and the BDT algorithm was trained on both the split and un-split data sets, where the split data set included a high-mass region (di-Higgs mass above 350 GeV) using the Standard Model Higgs boson coupling constant of 1 and a low-mass region (di-Higgs mass below 350 GeV) using the enhanced coupling constant of 10 to create a low-mass region more sensitive to the signal. The results showed that the BDT algorithm training performed on the split data set provided a 3.6% improvement in the exclusion limits, indicating an improvement in the algorithm's sensitivity to the di-Higgs signal compared to the training performed on the un-split data set. This finding suggests that the introduction of a split at 350 GeV can enhance the accuracy and efficiency of machine learning algorithms in detecting di-Higgs boson production at the LHC. The improvement in sensitivity was attributed to the enhanced discrimination between signal and background events provided by the split in the di-Higgs mass analysis variable. The improved separation between the signal and background events lead to a higher signal-to-background ratio and a corresponding increase in the BDT algorithm's sensitivity to the di-Higgs signal. In conclusion, this thesis provided evidence that introducing a split in the di-Higgs mass analysis variable can improve the sensitivity of machine learning algorithms to the di-Higgs signal in the channel where one Higgs boson decays into two bottom quarks and the other into two tau particles. This finding has important implications for future research on di-Higgs boson production at the LHC and could lead to more accurate and efficient detection of this rare and important process.
|
44 |
Machine Learning in credit risk : Evaluation of supervised machine learning models predicting credit risk in the financial sectorLundström, Love, Öhman, Oscar January 2019 (has links)
When banks lend money to another party they face a risk that the borrower will not fulfill its obligation towards the bank. This risk is called credit risk and it’s the largest risk banks faces. According to the Basel accord banks need to have a certain amount of capital requirements to protect themselves towards future financial crisis. This amount is calculated for each loan with an attached risk-weighted asset, RWA. The main parameters in RWA is probability of default and loss given default. Banks are today allowed to use their own internal models to calculate these parameters. Thus hold capital with no gained interest is a great cost, banks seek to find tools to better predict probability of default to lower the capital requirement. Machine learning and supervised algorithms such as Logistic regression, Neural network, Decision tree and Random Forest can be used to decide credit risk. By training algorithms on historical data with known results the parameter probability of default (PD) can be determined with a higher certainty degree compared to traditional models, leading to a lower capital requirement. On the given data set in this article Logistic regression seems to be the algorithm with highest accuracy of classifying customer into right category. However, it classifies a lot of people as false positive meaning the model thinks a customer will honour its obligation but in fact the customer defaults. Doing this comes with a great cost for the banks. Through implementing a cost function to minimize this error, we found that the Neural network has the lowest false positive rate and will therefore be the model that is best suited for this specific classification task. / När banker lånar ut pengar till en annan part uppstår en risk i att låntagaren inte uppfyller sitt antagande mot banken. Denna risk kallas för kredit risk och är den största risken en bank står inför. Enligt Basel föreskrifterna måste en bank avsätta en viss summa kapital för varje lån de ger ut för att på så sätt skydda sig emot framtida finansiella kriser. Denna summa beräknas fram utifrån varje enskilt lån med tillhörande risk-vikt, RWA. De huvudsakliga parametrarna i RWA är sannolikheten att en kund ej kan betala tillbaka lånet samt summan som banken då förlorar. Idag kan banker använda sig av interna modeller för att estimera dessa parametrar. Då bundet kapital medför stora kostnader för banker, försöker de sträva efter att hitta bättre verktyg för att uppskatta sannolikheten att en kund fallerar för att på så sätt minska deras kapitalkrav. Därför har nu banker börjat titta på möjligheten att använda sig av maskininlärningsalgoritmer för att estimera dessa parametrar. Maskininlärningsalgoritmer såsom Logistisk regression, Neurala nätverk, Beslutsträd och Random forest, kan användas för att bestämma kreditrisk. Genom att träna algoritmer på historisk data med kända resultat kan parametern, chansen att en kund ej betalar tillbaka lånet (PD), bestämmas med en högre säkerhet än traditionella metoder. På den givna datan som denna uppsats bygger på visar det sig att Logistisk regression är den algoritm med högst träffsäkerhet att klassificera en kund till rätt kategori. Däremot klassifiserar denna algoritm många kunder som falsk positiv vilket betyder att den predikterar att många kunder kommer betala tillbaka sina lån men i själva verket inte betalar tillbaka lånet. Att göra detta medför en stor kostnad för bankerna. Genom att istället utvärdera modellerna med hjälp av att införa en kostnadsfunktion för att minska detta fel finner vi att Neurala nätverk har den lägsta falsk positiv ration och kommer därmed vara den model som är bäst lämpad att utföra just denna specifika klassifierings uppgift.
|
45 |
Predictions of train delays using machine learning / Förutsägelser av tågförseningar med hjälp av maskininlärningNilsson, Robert, Henning, Kim January 2018 (has links)
Train delays occur on a daily basis in the commuter rail of Stockholm. This means that the travellers might become delayed themselves for their particular destination. To find the most accurate method for predicting train delays, the machine learning methods decision tree with and without AdaBoost and neural network were compared with different settings. Neural network achieved the best result when used with 3 layers and 22 neurons in each layer. Its delay predictions had an average error of 122 seconds, compared to the actual delay. It might therefore be the best method for predicting train delays. However the study was very limited in time and more train departure data would need to be collected. / Tågförseningar inträffar dagligen i Stockholms pendeltågstrafik. Det orsakar att resenärerna själva kan bli försenade till deras destinationer. För att hitta den mest träffsäkra metoden för att förutspå tågförseningar jämfördes maskininlärningsmetoderna beslutsträd, med och utan AdaBoost, och artificiella neuronnät med olika inställningar. Det artificiella neuronnätet gav det bästa resultatet när det användes med 3 lager och 22 neuroner i varje lager. Dess förseningsförutsägelse hade ett genomsnittligt fel på 122 sekunder jämfört med den verkliga förseningen. Det kan därför vara den bästa metoden för att förutspå tågförseningar. Den här studien hade dock väldigt begränsat med tid och mer information om tågavgångar hade behövts samlas in.
|
46 |
AN EXAMINATION OF MILK QUALITY EFFECTS ON MILK YIELD AND DAIRY PRODUCTION ECONOMICS IN THE SOUTHEASTERN UNITED STATESNolan, Derek T. 01 January 2017 (has links)
Mastitis is one of the most costly diseases to dairy producers around the world with milk yield loss being the biggest contributor to economic losses. The objective of first study of this thesis was to determine the impacts of high somatic cell counts on milk yield loss. To accomplish this, over one million cow data records were collected from Southeastern US dairy herds. The objective of the second study was to determine optimum treatment cost of clinical mastitis by combining two economic modeling approaches used in animal health economics. The last objective of this thesis was to determine how much Southeastern US dairy producers are spending to control milk quality on farm and determine if they understand how milk quality affects them economically. This was accomplished through a collaborative project within the Southeast Quality Milk Initiative.
|
47 |
Investiční možnosti obyvatel v ČR / Investing posibilities of citizen in the Czech RepublicNocar, Jan January 2010 (has links)
This thesis discusses the options households have when it comes to investing in capital markets in the Czech Republic. The issue of investing and capital market options is analyzed. Following this analysis comes the description of financial instruments, their characteristics, and the usability of these instruments by small investors. On the basis of the theory presented, a study was conducted to examine the usage of individual financial products. The collected data was processed using modern software tools, which helped in drawing several conclusions, results, and recommendations for investors and financial instrument providers alike.
|
48 |
Identification of Flying Drones in Mobile Networks using Machine Learning / Identifiering av flygande drönare i mobila nätverk med hjälp av maskininlärningAlesand, Elias January 2019 (has links)
Drone usage is increasing, both in recreational use and in the industry. With it comes a number of problems to tackle. Primarily, there are certain areas in which flying drones pose a security threat, e.g., around airports or other no-fly zones. Other problems can appear when there are drones in mobile networks which can cause interference. Such interference comes from the fact that radio transmissions emitted from drones can travel more freely than those from regular UEs (User Equipment) on the ground since there are few obstructions in the air. Additionally, the data traffic sent from drones is often high volume in the form of video streams. The goal of this thesis is to identify so-called "rogue drones" connected to an LTE network. Rogue drones are flying drones that appear to be regular UEs in the network. Drone identification is a binary classification problem where UEs in a network are classified as either a drone or a regular UE and this thesis proposes machine learning methods that can be used to solve it. Classifications are based on radio measurements and statistics reported by UEs in the network. The data for the work in this thesis is gathered through simulations of a heterogenous LTE network in an urban scenario. The primary idea of this thesis is to use a type of cascading classifier, meaning that classifications are made in a series of stages with increasingly complex models where only a subset of examples are passed forward to subsequent stages. The motivation for such a structure is to minimize the computational requirements at the entity making the classifications while still being complex enough to achieve high accuracy. The models explored in this thesis are two-stage cascading classifiers using decision trees and ensemble learning techniques. It is found that close to 60% of the UEs in the dataset can be classified without errors in the first of the two stages. The rest is forwarded to a more complex model which requires more data from the UEs and can achieve up to 98% accuracy.
|
49 |
Explorando técnicas para modelagem de dados agregados de óbitos provenientes de acidentes por automóvel / Exploring techniques for modeling of aggregates data from deaths automobile accidentsSantos, Murilo Castanho dos 01 October 2015 (has links)
Esta dissertação se baseia na exploração de técnicas para modelagem de óbitos provenientes de acidentes por automóvel no estado de São Paulo. A análise foi agregada por área, e utilizou a razão de óbitos por população, por área e por fluxo veicular como variáveis dependentes e as variáveis independentes foram características socioeconômicas, área, frota de veículos, IDHM, fluxo veicular anual e distâncias entre microrregiões. Os dados do ano 2000 foram utilizados na calibração e dados de 2010 na validação dos modelos, com a técnica de mineração de dados (algoritmos de Árvore de Decisão - AD: CART - Classification And Regression Tree e CHAID - Chi-squared Automatic Interaction Detection) e Regressão Linear Múltipla (RLM) para fins comparativos com os modelos de AD. A partir dos resultados verifica-se que a RLM foi a técnica que obteve melhores erro médio, erro médio absoluto e coeficiente de correlação, e o algoritmo CART da AD o menor erro médio normalizado. Ao comparar as taxas de óbitos, a relação por área apresentou melhor erro médio e coeficiente de correlação, já a relação por população obteve menor erro médio normalizado e erro médio absoluto. Vale ressaltar que os algoritmos de AD são técnicas adequadas para classificação de áreas segundo faixas de valores de variáveis explicativas e valores médios da variável objeto de estudo. Além disso, tais técnicas são mais flexíveis em relação a alguns pressupostos de modelos de regressão. Dessa forma, a principal contribuição deste trabalho consiste na exploração de tais algoritmos para previsão de acidentes e classificação de regiões. / This dissertation is based on techniques exploration for modeling of deaths from automobile accidents on the state of São Paulo. The analysis was aggregated by area, and used the ratio of deaths per population, by area and by vehicle flow as dependent variables and the independent variables were socioeconomic characteristics, area, vehicle fleet, Municipal Human Development Index (MHDI), annual vehicle flow and distances between micro-regions. The 2000 data were used for calibration and 2010 data to validate the models with data mining technique (decision tree - DT algorithms: CART - Classification And Regression Tree and CHAID - Chi-squared Automatic Interaction Detection) and Multiple Linear Regression (MLR) for comparative purposes with the DT models. From the results it appears that the RLM was the technique that achieved better mean error, mean absolute error and correlation coefficient values, while the CART algorithm presented the lowest value of mean normalized error. When comparing death rates, a relation by area showed better mean error and correlation coefficient values, as the ratio by population had lower mean normalized error and mean absolute error values. It is noteworthy that the DT algorithms are suitable techniques for classification of areas in accordance with explanatory variables of value ranges and average values of the variable object of study. Furthermore, such techniques are more flexible compared to some assumptions regression models. Thus, the main contribution of this study is the exploration of such algorithms for prediction of accidents and regions classification.
|
50 |
Modelo para tomada de decisão na escolha de sistema de tratamento de esgoto sanitário / A decision making model for choosing sewage treatment systemsOliveira, Sonia Valle Walter Borges de 10 November 2004 (has links)
A escolha do sistema de tratamento de esgoto sanitário a ser instalado em uma cidade pode tornar-se uma difícil decisão, uma vez que diversas variáveis interferem em seu custo e em sua qualidade ambiental. Este trabalho pretende mostrar a possibilidade de se usar técnicas da análise de decisão para a escolha do sistema de tratamento de esgoto sanitário de forma ecológica e econômica, como árvore de decisão e análise de sensibilidade. Para a avaliação dos sistemas, foi desenvolvido um modelo com oito alternativas, compostas de processos biológicos anaeróbios Reator Anaeróbio de Manta de Lodo e Lagoa Anaeróbia seguidos de aeróbios Lodos Ativados, Lagoa Facultativa, Filtro Biológico Percolador e Lagoa Aerada com Lagoa de Decantação. O modelo elabora o dimensionamento das unidades de tratamento e, a partir desses dados, a estimativa de custo de cada sistema. O custo total de cada alternativa foi composto por itens de implantação, operação e manutenção. O modelo foi avaliado para quatro casos de populações distintas, apresentando variação nas alternativas mais indicadas para cada um. A análise de sensibilidade se mostrou eficaz em identificar as alternativas mais significativas no custo total dos sistemas. Os resultados encontrados indicam, de maneira promissora, que o modelo poderá auxiliar a escolha de sistemas de tratamento, bem como seu pré-dimensionamento, com base em características peculiares à sua localidade. / The selection of the wastewater treatment system to be installed in a city can be a difficult decision, once several variables interfere in its cost and in its environmental quality. This study intends to show the possibility of using techniques of decision analysis to select the wastewater treatment system in an ecological and economical way, as decision tree and sensitivity analysis. For the evaluation of systems, a model was developed with eight alternatives, composed of anaerobic biological process - Upflow Anaerobic Sludge Blanket and Anaerobic Pond - followed by aerobic process Activated-Sludge, Facultative Pond, Trickling Filter and Aerated Lagoon with Sedimentation Basin. The model elaborates the treatment units dimensioning and cost estimation of each system, based on dimensioning data. The total cost of each alternative was composed by construction, operation and maintenance items. The model was evaluated using four cases of different populations, presenting variation in the most suitable alternatives for each one. The sensitivity analysis was shown effective in identifying the most significant alternatives in the total cost of the systems. The results indicate, in a promising way , that the model will be able to help the choice of treatment systems, as well as its pre-dimensioning, based in local peculiar characteristics.
|
Page generated in 0.0988 seconds