Spelling suggestions: "subject:"supervised cachine learning"" "subject:"supervised amachine learning""
1 |
Evaluating and enhancing the security of cyber physical systems using machine learning approachesSharma, Mridula 08 April 2020 (has links)
The main aim of this dissertation is to address the security issues of the physical layer of Cyber Physical Systems. The network security is first assessed using a 5-level Network Security Evaluation Scheme (NSES).
The network security is then enhanced using a novel Intrusion Detection System that is designed using Supervised Machine Learning. Defined as a complete architecture, this framework includes a complete packet analysis of radio traffic of Routing Protocol for Low-Power and Lossy Networks (RPL). A dataset of 300 different simulations of RPL network is defined for normal traffic, hello flood attack, DIS attack, increased version attack and decreased rank attack. The IDS is a multi-model detection model that provides an efficient detection against the known as well as new attacks.
The model analysis is done with the cross-validation method as well as using the new data from a similar network. To detect the known attacks, the model performed at 99% accuracy rate and for the new attack, 85% accuracy is achieved. / Graduate
|
2 |
Maximum margin learning under uncertaintyTzelepis, Christos January 2018 (has links)
In this thesis we study the problem of learning under uncertainty using the statistical learning paradigm. We rst propose a linear maximum margin classi er that deals with uncertainty in data input. More speci cally, we reformulate the standard Support Vector Machine (SVM) framework such that each training example can be modeled by a multi-dimensional Gaussian distribution described by its mean vector and its covariance matrix { the latter modeling the uncertainty. We address the classi cation problem and de ne a cost function that is the expected value of the classical SVM cost when data samples are drawn from the multi-dimensional Gaussian distributions that form the set of the training examples. Our formulation approximates the classical SVM formulation when the training examples are isotropic Gaussians with variance tending to zero. We arrive at a convex optimization problem, which we solve e - ciently in the primal form using a stochastic gradient descent approach. The resulting classi er, which we name SVM with Gaussian Sample Uncertainty (SVM-GSU), is tested on synthetic data and ve publicly available and popular datasets; namely, the MNIST, WDBC, DEAP, TV News Channel Commercial Detection, and TRECVID MED datasets. Experimental results verify the e ectiveness of the proposed method. Next, we extended the aforementioned linear classi er so as to lead to non-linear decision boundaries, using the RBF kernel. This extension, where we use isotropic input uncertainty and we name Kernel SVM with Isotropic Gaussian Sample Uncertainty (KSVM-iGSU), is used in the problems of video event detection and video aesthetic quality assessment. The experimental results show that exploiting input uncertainty, especially in problems where only a limited number of positive training examples are provided, can lead to better classi cation, detection, or retrieval performance. Finally, we present a preliminary study on how the above ideas can be used under the deep convolutional neural networks learning paradigm so as to exploit inherent sources of uncertainty, such as spatial pooling operations, that are usually used in deep networks.
|
3 |
Övervakad maskininlärning för att identifiera nya kunder på energimarknaden / Supervised machine learning as a tool for identifying new customers on the energy marketBojs, Robert, Feng, Benny January 2017 (has links)
This paper explores alternative ways for smaller actors on the energy market to identify potential customers using publicly available data and different machine learning algorithms. During recent years, price has been considered to have the biggest impact on the behaviour of the consumers on the energy market. Since the bigger actors on the market can use their economies of scale to lower their prices, smaller actors need to find alternative ways to reach out to consumers. The machine learning algorithms in this paper will use the sales data from a small energy company, operating in Sweden and attempt to find a connection between existing customers using their demographic properties. By acquiring a deeper knowledge of what differentiates consumers that are willing to purchase energy from the energy company and the other consumers, the energy company may increase their rate of successful sales. Due to the lack of customer data avilable coupled with a lack of relevant public data, the results in this paper are not conclusive. However, it provides a baseline for future research as the results may be more reliable when the number of customers purchasing energy from The Energy Company increases. / Det här arbetet utforskar alternativa tillvägagångssätt för för mindre aktörer på energimarknaden att identifiera nya potentiella kunder, baserat på publikt tillgänglig data som analyseras med hjälp av maskininlärningsalgoritmer. På senare år har pris ansetts vara den faktor som påverkar val av leverantör mest. Eftersom större aktörer på marknaden kan utnyttja skalfördelar kan de pressa priserna hårt, medans mindre aktörer måste finna andra vägar att vinna nya kunder. Maskininlärningsalgoritmerna i den här uppsatsen kommer att använda försäljningsdata från ett litet energibolag, som bedriver verksamhet i Sverige, med målet att hitta ett mönster mellan existerande kunder och deras demografiska data. Genom att förskaffa sig djupare kunskap om vad som differentierar kunder kan energibolaget förbättra sin försäljning. På grund av en förhållandevis liten mängd kunddata och brist på publik data gick det inte att hitta ett betydande samband mellan kunderna och deras demografiska data. Resultaten utgör dock en bra grund för fortsatt forskning då resultaten blir mer pålitliga då mer kunddata införskaffas, vilket blir en naturlig följd av att energibolagets försäljning fortsätter utvecklas.
|
4 |
Gene fusions in cancer: Classification of fusion events and regulation patterns of fusion pathway neighborsHughes, Katelyn 05 May 2016 (has links)
Cancer is a leading cause of death worldwide, resulting in an estimated 1.6 million mortalities and 600,000 new cases in the US alone in 2015. Gene fusions, hybrid genes formed from two originally separated genes, are known drivers of cancer. However, gene fusions have also been found in healthy cells due to routine errors in replication. This project aims to understand the role of gene fusion in cancer. Specifically, we seek to achieve two goals. First, we would like to develop a computational method that predicts if a gene fusion event is associated with the cancer or healthy sample. Second, we would like to use this information to determine and characterize molecular mechanisms behind the gene fusion events. Recent studies have attempted to address these problems, but without explicit consideration of the fact that there are overlapping fusion events in both cancer and healthy cells. Here, we address this problem using FUsion Enriched Learning of CANcer Mutations (FUELCAN), a semi-supervised model, which classifies all overlapping fusion events as unlabeled to start. The model is trained using the known cancer and healthy samples and tested using the unlabeled dataset. Unlabeled data is classified as associated with healthy or cancer samples and the top 20 data points are put back into the training set. The process continues until all have been appropriately classified. Three datasets were analyzed from Acute Lymphoblastic Leukemia (ALL), breast cancer and colorectal cancer. We obtained similar results for both supervised and semi-supervised classification. To improve our model, we assessed the functional landscape of gene fusion events and observed that the pathway neighbors of both gene fusion partners are differentially expressed in each cancer dataset. The significant neighbors are also shown to have direct connections to cancer pathways and functions, indicating that these gene fusions are important for cancer development. Future directions include applying the acquired transcriptomic knowledge to our machine learning algorithm, counting transcription factors and kinases within the gene fusion events and their neighbors and assessing the differences between upstream and downstream effects within the pathway neighbors.
|
5 |
A Machine Learning Framework to Classify Mosquito Species from Smart-phone ImagesMinakshi, Mona 12 June 2018 (has links)
Mosquito borne diseases have been a constant scourge across the globe resulting in numerous diseases with debilitating consequences, and also death. To derive trends on population of mosquitoes in an area, trained personnel lay traps, and after collecting trapped specimens, they spend hours under a microscope to inspect each specimen for identifying the actual species and logging it. This is vital, because multiple species of mosquitoes can reside in any area, and the vectors that some of them carry are not the same ones carried by others. The species identification process is naturally laborious, and imposes severe cognitive burden, since sometimes, hundreds of mosquitoes can get trapped. Most importantly, common citizens cannot aid in this task. In this paper, we design a system based on smart-phone images for mosquito species identification, that integrates image processing, feature selection, unsupervised clustering, and support vector machine based algorithm for classification. Results with a total of 101 female mosquito specimens spread across 9 different vector carrying species (that were captured from a real outdoor trap) demonstrate an overall accuracy of 77% in species identification. When implemented as a smart-phone app, the latency and energy consumption were minimal. In terms of practical impact, common citizens can benefit from our system to identify mosquito species by themselves, and also share images to local/ global mosquito control centers. In economically disadvantaged areas across the globe, tools like these can enable novel citizen-science enabled mechanisms to combat spread of mosquitoes.
|
6 |
A data-assisted approach to supporting instructional interventions in technology enhanced learning environments2012 December 1900 (has links)
The design of intelligent learning environments requires significant up-front resources and expertise. These environments generally maintain complex and comprehensive knowledge bases describing pedagogical approaches, learner traits, and content models. This has limited the influence of these technologies in higher education, which instead largely uses learning content management systems in order to deliver non-classroom instruction to learners.
This dissertation puts forth a data-assisted approach to embedding intelligence within learning environments. In this approach, instructional experts are provided with summaries of the activities of learners who interact with technology enhanced learning tools. These experts, which may include instructors, instructional designers, educational technologists, and others, use this data to gain insight into the activities of their learners. These insights lead experts to form instructional interventions which can be used to enhance the learning experience. The novel aspect of this approach is that the actions of the intelligent learning environment are now not just those of the learners and software constructs, but also those of the educational experts who may be supporting the learning process.
The kinds of insights and interventions that come from application of the data-assisted approach vary with the domain being taught, the epistemology and pedagogical techniques being employed, and the particulars of the cohort being instructed. In this dissertation, three investigations using the data-assisted approach are described. The first of these demonstrates the effects of making available to instructors novel sociogram-based visualizations of online asynchronous discourse. By making instructors aware of the discussion habits of both themselves and learners, the instructors are better able to measure the effect of their teaching practice. This enables them to change their activities in response to the social networks that form between their learners, allowing them to react to deficiencies in the learning environment. Through these visualizations it is demonstrated that instructors can effectively change their pedagogy based on seeing data of their students’ interactions.
The second investigation described in this dissertation is the application of unsupervised machine learning to the viewing habits of learners using lecture capture facilities. By clustering learners into groups based on behaviour and correlating groups with academic outcome, a model of positive learning activity can be described. This is particularly useful for instructional designers who are evaluating the role of learning technologies in programs as it contextualizes how technologies enable success in learners. Through this investigation it is demonstrated that the viewership data of learners can be used to assist designers in building higher level models of learning that can be used for evaluating the use of specific tools in blended learning situations.
Finally, the results of applying supervised machine learning to the indexing of lecture video is described. Usage data collected from software is increasingly being used by software engineers to make technologies that are more customizable and adaptable. In this dissertation, it is demonstrated that supervised machine learning can provide human-like indexing of lecture videos that is more accurate than current techniques. Further, these indices can be customized for groups of learners, increasing the level of personalization in the learning environment. This investigation demonstrates that the data-assisted approach can also be used by application developers who are building software features for personalization into intelligent learning environments.
Through this work, it is shown that a data-assisted approach to supporting instructional interventions in technology enhanced learning environments is both possible and can positively impact the teaching and learning process. By making available to instructional experts the online activities of learners, experts can better understand and react to patterns of use that develop, making for a more effective and personalized learning environment. This approach differs from traditional methods of building intelligent learning environments, which apply learning theories a priori to instructional design, and do not leverage the in situ data collected about learners.
|
7 |
Simulation of suicide tendency by using machine learningCalderon-Vilca, Hugo D., Wun-Rafael, William I., Miranda-Loarte, Roberto 07 1900 (has links)
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / Suicide is one of the most distinguished causes of death on the news worldwide. There are several factors and variables that can lead a person to commit this act, for example, stress, self-esteem, depression, among others. The causes and profiles of suicide cases are not revealed in detail by the competent institutions. We propose a simulation with a systematically generated dataset; such data reflect the adolescent population with suicidal tendency in Peru. We will evaluate three algorithms of supervised machine learning as a result of the algorithm C4.5 which is based on the trees to classify in a better way the suicidal tendency of adolescents. We finally propose a desktop tool that determines the suicidal tendency level of the adolescent. / Revisión por pares
|
8 |
Predictive maintenance for a wood chipper using supervised machine learningLindström, Johan January 2018 (has links)
With a predictive model that can predict failures of a manufacturing machine, many benefits can be obtained. Unnecessary downtime and accidents can be avoided. In this study a wood chipper which has 12 replaceable knives was examined. The specific task was to create a predictive model that can predict if a knife change is needed or not. To create a predictive model, supervised machine learning was used. Decision forest was the algorithm used in this study. Data samples were collected from vibration measurements. Each sample was labeled with help of ocular inspections of the knives. Microsoft Azure learning studio was the workspace used to train all models. The data set acquired consist of 106 samples, were only 9 samples belongs to the minority class. Two strategies of training a model were used, with and without oversampling. The result for the best model without oversampling obtained 87.5% precision and 77.8% recall. The best model with oversampling achieved 79% precision and 86.7% recall. This result indicates that the trained models can be useful. However, the validity of the result has been hurt by a small data set and many uncertainness of acquiring the data set.
|
9 |
Automated Essay Scoring : Scoring Essays in SwedishSmolentzov, Andre January 2013 (has links)
Good writing skills are essential in the education system at all levels. However, the evaluation of essays is labor intensive and can entail a subjective bias. Automated Essay Scoring (AES) is a tool that may be able to save teacher time and provide more objective evaluations. There are several successful AES systems for essays in English that are used in large scale tests. Supervised machine learning algorithms are the core component in developing these systems. In this project four AES systems were developed and evaluated. The AES systems were based on standard supervised machine learning software, i.e., LDAC, SVM with RBF kernel, polynomial kernel and Extremely Randomized Trees. The training data consisted of 1500 high school essays that had been scored by the students' teachers and blind raters. To evaluate the AES systems, the agreement between blind raters' scores and AES scores was compared to agreement between blind raters' and teacher scores. On average, the agreement between blind raters and the AES systems was better than between blind raters and teachers. The AES based on LDAC software had the best agreement with a quadratic weighted kappa value of 0.475. In comparison, the teachers and blind raters had a value of 0.391. However the AES results do not meet the required minimum agreement of a quadratic weighted kappa of 0.7 as defined by the US based nonprofit organization Educational Testing Services. / Jag har utvecklat och utvärderat fyra system för automatisk betygsättning av uppsatser (AES). LDAC, SVM med RBF kernel, SVM med Polynomial kernel och "Extremely Randomized trees" som är standard klassificerarprogramvaror har använts som grunden för att bygga respektivt AES system.
|
10 |
Context-based Human Activity Recognition Using Multimodal Wearable SensorsBharti, Pratool 17 November 2017 (has links)
In the past decade, Human Activity Recognition (HAR) has been an important part of the regular day to day life of many people. Activity recognition has wide applications in the field of health care, remote monitoring of elders, sports, biometric authentication, e-commerce and more. Each HAR application needs a unique approach to provide solutions driven by the context of the problem. In this dissertation, we are primarily discussing two application of HAR in different contexts. First, we design a novel approach for in-home, fine-grained activity recognition using multimodal wearable sensors on multiple body positions, along with very small Bluetooth beacons deployed in the environment. State-of-the-art in-home activity recognition schemes with wearable devices are mostly capable of detecting coarse-grained activities (sitting, standing, walking, or lying down), but cannot distinguish complex activities (sitting on the floor versus on the sofa or bed). Such schemes are not effective for emerging critical healthcare applications – for example, in remote monitoring of patients with Alzheimer's disease, Bulimia, or Anorexia – because they require a more comprehensive, contextual, and fine-grained recognition of complex daily user activities. Second, we introduced Watch-Dog – a self-harm activity recognition engine, which attempts to infer self-harming activities from sensing accelerometer data using wearable sensors worn on a subject's wrist. In the United States, there are more than 35,000 reported suicides with approximately 1,800 of them being psychiatric inpatients every year. Staff perform intermittent or continuous observations in order to prevent such tragedies, but a study of 98 articles over time showed that 20% to 62% of suicides happened while inpatients were on an observation schedule. Reducing the instances of suicides of inpatients is a problem of critical importance to both patients and healthcare providers. Watch-dog uses supervised learning algorithm to model the system which can discriminate the harmful activities from non-harmful activities. The system is not only very accurate but also energy efficient. Apart from these two HAR systems, we also demonstrated the difference in activity pattern between elder and younger age group. For this experiment, we used 5 activities of daily living (ADL). Based on our findings we recommend that a context aware age-specific HAR model would be a better solution than all age-mixed models. Additionally, we find that personalized models for each individual elder person perform better classification than mixed models.
|
Page generated in 0.1325 seconds