Global ETD Search

831	The effect of quality metrics on the user watching behaviour in media content broadcast Setterquist, Erik January 2016 (has links) Understanding the effects of quality metrics on the user behavior is important for the increasing number of content providers in order to maintain a competitive edge. The two data sets used are gathered from a provider of live streaming and a provider of video on demand streaming. The important quality and non quality features are determined by using both correlation metrics and relative importance determined by machine learning methods. A model that can predict and simulate the user behavior is developed and tested. A time series model, machine learning model and a combination of both are compared. Results indicate that both quality features and non quality features are important in understanding user behavior, and the importance of quality features are reduced over time. For short prediction times the model using quality features is performing slightly better than the model not using quality features. BigData Analytics Machine Learning Time Series Analysis QoE Media Broadcast
832	Hidden Higgses and Dark Matter at Current and Future Colliders Pyarelal, Adarsh, Pyarelal, Adarsh January 2017 (has links) Despite its indisputable successes, the Standard Model of particle physics (SM) is widely considered to be an effective low-energy approximation to an underlying theory that describes physics at higher energy scales. While there are many candidates for such a theory, nearly all of them predict the existence of additional particles beyond those of the Standard Model. In this work, we present three analyses aimed at discovering new particles at current and future particle colliders. The first two analyses are designed to probe extended scalar sectors, which often arise in theories beyond the Standard Model (BSM). The structure of these extended scalar sectors can be described by a physically well-motivated class of models, known collectively as Two- Higgs Doublet Models (2HDMs). The scalar mass spectrum of 2HDMs is comprised of two CP-even states h and H, a CP-odd state A, and a pair of charged states H± . Traditional searches for these states at particle colliders focus on finding them via their decays to SM particles. However, there are compelling scenarios in which these heavy scalars decay through exotic modes to non-SM final states. In certain regions of parameter space, these exotic modes can even dominate the conven- tional decay modes to SM final states, and thus provide a complementary avenue for discovering new Higgs bosons. The first analysis presented aims to discover charged Higgs bosons H± via top decay at the LHC. We find that the exotic decay modes outperform the conventional decay modes for regions of parameter space with low values of the 2HDM parameter tan β. The second analysis aims to systematically cover all the exotic decay scenarios that are consistent with theoretical and experimental con- straints, at both the 14 TeV LHC and a future 100 TeV hadron collider. We find that the preliminary results are promising - we are able to ex- clude a large swathe of 2HDM parameter space, up to scalar masses of 3.5 TeV, for a wide range of values of tan β, at a 100 TeV collider. In addition to these two analyses, we also present a third, aimed at discovering pair produced higgsinos that decay to binos at a 100 TeV collider. Higgsinos and binos are new fermion states that arise in the Minimal Supersymmetric Standard Model (MSSM). This heavily- studied model is the minimal phenomenologically viable incorporation of supersymmetry - a symmetry that connects fermions and bosons - into the Standard Model. In the scenario we consider, the bino is the lightest supersymmetric partner, which makes it a good candidate for dark matter. Using razor variables and boosted decision trees, we are able to exclude Higgsinos up to 1.8 TeV for binos up to 1.3 TeV. Collider Phenomenology Machine Learning Supersymmetry Two-Higgs Doublet Models
833	Evaluación de viabilidad para una plataforma marketplace para las PYMES del subsector moda de la ciudad de Lima / Viability assessment for a marketplace platform for SMEs in the fashion subsector of the city of Lima Cruz Cuevas, Martha Alejandra, Ríos Carranza, Betty Sofía, Villena León, Alex David 05 August 2019 (has links) En el contexto actual del país, donde entorno tecnológico crece de manera importante, especialmente para el comercio electrónico, surge la pregunta si las PYMEs del subsector moda están listos para integrarse de manera efectiva y rentable. El presente trabajo de investigación tiene como objetivo evaluar la viabilidad de la implementación de un marketplace que, potenciada con tecnología de machine learning, ofrezca tanto a PYMEs como clientes una propuesta de valor única que resuelva las barreras de entrada a este canal, ofreciendo ventajas sobre el canal físico. Además, la propuesta se basa en desarrollar una plataforma web y móvil, bajo el concepto de customer centricity, que brinde una experiencia de compra superior y atractiva tanto para las pequeñas marcas de moda y para los compradores de los segmentos socioeconómicos A, B y C entre los 18 y 55 años, especialmente la generación llamadas “Millennials”. De acuerdo con los supuestos planteados para el modelo de negocios, el resultado de la evaluación financiera a 5 años arroja que el proyecto es viable, el análisis de sensibilidad para diversos escenarios demuestra que existe un grado de importante de riesgo que no justificarían la inversión, especialmente por la falta de capacidad técnica para la gestión de los pequeños empresarios y la informalidad del rubro. Finalmente, se recomienda enfocar los emprendimientos tecnológicos para las PYMEs desde un punto de vista social y con participación estatal. / For current context of country, where technological environment grows significantly, especially for e-commerce, question arises whether SMEs in the fashion subsector are ready to integrate effectively and profitably. The purpose of this research work is to evaluate viability for implementation of a marketplace that, powered by machine learning technology, offers both SMEs and customers a unique value proposition that resolves barriers to entry to this channel, offering advantages over the physical channel. In addition, the proposal is based on developing a web and mobile platform, under concept of customer centricity, which provides a superior shopping experience and attractive to both small fashion brands and customers of socioeconomic segments A, B and C between 18 and 55 years, especially the generation called "Millennials". According to the assumptions made for business model, the result of 5-year financial evaluation shows that project is viable, sensitivity analyzes for several scenarios demonstrates that there is a significant degree of risk that would not justify the investment, especially due to lack of technical capacity for the management of small entrepreneurs and informality. Finally, it is recommended to focus on technological enterprises for SMEs from a social point of view and with state participation. / Trabajo de investigación Marketplace PYMEs Machine learning Ropa Moda Millennials SMEs Clothing Fashion
834	Development of a simple artificial intelligence method to accurately subtype breast cancers based on gene expression barcodes Esterhuysen, Fanechka Naomi January 2018 (has links) >Magister Scientiae - MSc / INTRODUCTION: Breast cancer is a highly heterogeneous disease. The complexity of achieving an accurate diagnosis and an effective treatment regimen lies within this heterogeneity. Subtypes of the disease are not simply molecular, i.e. hormone receptor over-expression or absence, but the tumour itself is heterogeneous in terms of tissue of origin, metastases, and histopathological variability. Accurate tumour classification vastly improves treatment decisions, patient outcomes and 5-year survival rates. Gene expression studies aided by transcriptomic technologies such as microarrays and next-generation sequencing (e.g. RNA-Sequencing) have aided oncology researcher and clinician understanding of the complex molecular portraits of malignant breast tumours. Mechanisms governing cancers, which include tumorigenesis, gene fusions, gene over-expression and suppression, cellular process and pathway involvementinvolvement, have been elucidated through comprehensive analyses of the cancer transcriptome. Over the past 20 years, gene expression signatures, discovered with both microarray and RNA-Seq have reached clinical and commercial application through the development of tests such as Mammaprint®, OncotypeDX®, and FoundationOne® CDx, all which focus on chemotherapy sensitivity, prediction of cancer recurrence, and tumour mutational level. The Gene Expression Barcode (GExB) algorithm was developed to allow for easy interpretation and integration of microarray data through data normalization with frozen RMA (fRMA) preprocessing and conversion of relative gene expression to a sequence of 1's and 0's. Unfortunately, the algorithm has not yet been developed for RNA-Seq data. However, implementation of the GExB with feature-selection would contribute to a machine-learning based robust breast cancer and subtype classifier. METHODOLOGY: For microarray data, we applied the GExB algorithm to generate barcodes for normal breast and breast tumour samples. A two-class classifier for malignancy was developed through feature-selection on barcoded samples by selecting for genes with 85% stable absence or presence within a tissue type, and differentially stable between tissues. A multi-class feature-selection method was employed to identify genes with variable expression in one subtype, but 80% stable absence or presence in all other subtypes, i.e. 80% in n-1 subtypes. For RNA-Seq data, a barcoding method needed to be developed which could mimic the GExB algorithm for microarray data. A z-score-to-barcode method was implemented and differential gene expression analysis with selection of the top 100 genes as informative features for classification purposes. The accuracy and discriminatory capability of both microarray-based gene signatures and the RNA-Seq-based gene signatures was assessed through unsupervised and supervised machine-learning algorithms, i.e., K-means and Hierarchical clustering, as well as binary and multi-class Support Vector Machine (SVM) implementations. RESULTS: The GExB-FS method for microarray data yielded an 85-probe and 346-probe informative set for two-class and multi-class classifiers, respectively. The two-class classifier predicted samples as either normal or malignant with 100% accuracy and the multi-class classifier predicted molecular subtype with 96.5% accuracy with SVM. Combining RNA-Seq DE analysis for feature-selection with the z-score-to-barcode method, resulted in a two-class classifier for malignancy, and a multi-class classifier for normal-from-healthy, normal-adjacent-tumour (from cancer patients), and breast tumour samples with 100% accuracy. Most notably, a normal-adjacent-tumour gene expression signature emerged, which differentiated it from normal breast tissues in healthy individuals. CONCLUSION: A potentially novel method for microarray and RNA-Seq data transformation, feature selection and classifier development was established. The universal application of the microarray signatures and validity of the z-score-to-barcode method was proven with 95% accurate classification of RNA-Seq barcoded samples with a microarray discovered gene expression signature. The results from this comprehensive study into the discovery of robust gene expression signatures holds immense potential for further R&F towards implementation at the clinical endpoint, and translation to simpler and cost-effective laboratory methods such as qtPCR-based tests. Microarray RNA-Seq Gene expression barcode Feature selection Machine learning
835	Local And Network Ransomware Detection Comparison Ahlgren, Filip January 2019 (has links) Background. Ransomware is a malicious application encrypting important files on a victim's computer. The ransomware will ask the victim for a ransom to be paid through cryptocurrency. After the system is encrypted there is virtually no way to decrypt the files other than using the encryption key that is bought from the attacker. Objectives. In this practical experiment, we will examine how machine learning can be used to detect ransomware on a local and network level. The results will be compared to see which one has a better performance. Methods. Data is collected through malware and goodware databases and then analyzed in a virtual environment to extract system information and network logs. Different machine learning classifiers will be built from the extracted features in order to detect the ransomware. The classifiers will go through a performance evaluation and be compared with each other to find which one has the best performance. Results. According to the tests, local detection was both more accurate and stable than network detection. The local classifiers had an average accuracy of 96% while the best network classifier had an average accuracy of 89.6%. Conclusions. In this case the results show that local detection has better performance than network detection. However, this can be because the network features were not specific enough for a network classifier. The network performance could have been better if the ransomware samples consisted of fewer families so better features could have been selected. Ransomware Detection Machine Learning Computer Sciences Datavetenskap (datalogi)
836	Adversarial Anomaly Detection Radhika Bhargava (7036556) 02 August 2019 (has links) <p>Considerable attention has been given to the vulnerability of machine learning to adversarial samples. This is particularly critical in anomaly detection; uses such as detecting fraud, intrusion, and malware must assume a malicious adversary. We specifically address poisoning attacks, where the adversary injects carefully crafted benign samples into the data, leading to concept drift that causes the anomaly detection to misclassify the actual attack as benign. Our goal is to estimate the vulnerability of an anomaly detection method to an unknown attack, in particular the expected</p> <p>minimum number of poison samples the adversary would need to succeed. Such an estimate is a necessary step in risk analysis: do we expect the anomaly detection to be sufficiently robust to be useful in the face of attacks? We analyze DBSCAN, LOF,</p> <p>one-class SVM as an anomaly detection method, and derive estimates for robustness to poisoning attacks. The analytical estimates are validated against the number of poison samples needed for the actual anomalies in standard anomaly detection test</p> <p>datasets. We then develop defense mechanism, based on the concept drift caused by the poisonous points, to identify that an attack is underway. We show that while it is possible to detect the attacks, it leads to a degradation in the performance of the</p> <p>anomaly detection method. Finally, we investigate whether the generated adversarial samples for one anomaly detection method transfer to another anomaly detection method.</p> Computer System Security Anomaly Detection Adversarial Machine Learning
837	A Machine Learning-Based Statistical Analysis of Predictors for Spinal Cord Stimulation Success Jacobson, Trolle, Segerberg, Gustav January 2019 (has links) Spinal Cord Stimulation (SCS) is a treatment for lumbar back pain and despitethe proven effcacy of the technology, there is a lack of knowledge in how the treatment outcome varies between different patients groups. Furthermore, since the method is costly, in the sense of material, surgery and follow-up time, a more accurate patient targeting would decrease healthcare costs. Within recent years, Real World Data (RWD) has become a vital source of information to describe the effects of medical treatments. The complexity, however, calls for new, innovative methods using a larger set of useful features to explain the outcome of SCS treatments. This study has employed machine learning algorithms, e.g., Random Forest Classier (RFC) boosting algorithms to finally compare the result with the baseline; Logistic regression (LR). The results retrieved was that RFC tend to classify successful and unsuccessful patients better while logistic regression was unstable regarding unbalanced data. In order to interpret the insights of the models, we also proposed a Soft Accuracy Measurement (SAM) method to explain how RFC and LR differ. Some factors have shown to impact the success of SCS. These factors were age, income, pain experience time and educational level. Many of these variables could also be supported by earlier studies on factors of success from lumbar spine surgery. Machine Learning Predictors Spinal Cord Stimulation SCS Outcomes Mathematics Matematik
838	Machine Learning Methods for Personalized Medicine Using Electronic Health Records Wu, Peng January 2019 (has links) The theme of this dissertation focuses on methods for estimating personalized treatment using machine learning algorithms leveraging information from electronic health records (EHRs). Current guidelines for medical decision making largely rely on data from randomized controlled trials (RCTs) studying average treatment effects. However, RCTs are usually conducted under specific inclusion/exclusion criteria, they may be inadequate to make individualized treatment decisions in real-world settings. Large-scale EHR provides opportunities to fulfill the goals of personalized medicine and learn individualized treatment rules (ITRs) depending on patient-specific characteristics from real-world patient data. On the other hand, since patients' electronic health records (EHRs) document treatment prescriptions in the real world, transferring information in EHRs to RCTs, if done appropriately, could potentially improve the performance of ITRs, in terms of precision and generalizability. Furthermore, EHR data domain usually consists text notes or similar structures, thus topic modeling techniques can be adapted to engineer features. In the first part of this work, we address challenges with EHRs and propose a machine learning approach based on matching techniques (referred as M-learning) to estimate optimal ITRs from EHRs. This new learning method performs matching method instead of inverse probability weighting as commonly used in many existing methods for estimating ITRs to more accurately assess individuals' treatment responses to alternative treatments and alleviate confounding. Matching-based value functions are proposed to compare matched pairs under a unified framework, where various types of outcomes for measuring treatment response (including continuous, ordinal, and discrete outcomes) can easily be accommodated. We establish the Fisher consistency and convergence rate of M-learning. Through extensive simulation studies, we show that M-learning outperforms existing methods when propensity scores are misspecified or when unmeasured confounders are present in certain scenarios. In the end of this part, we apply M-learning to estimate optimal personalized second-line treatments for type 2 diabetes patients to achieve better glycemic control or reduce major complications using EHRs from New York Presbyterian Hospital (NYPH). In the second part, we propose a new domain adaptation method to learn ITRs in by incorporating information from EHRs. Unless assuming no unmeasured confounding in EHRs, we cannot directly learn the optimal ITR from the combined EHR and RCT data. Instead, we first pre-train “super" features from EHRs that summarize physicians' treatment decisions and patients' observed benefits in the real world, which are likely to be informative of the optimal ITRs. We then augment the feature space of the RCT and learn the optimal ITRs stratifying by these features using RCT patients only. We adopt Q-learning and a modified matched-learning algorithm for estimation. We present theoretical justifications and conduct simulation studies to demonstrate the performance of our proposed method. Finally, we apply our method to transfer information learned from EHRs of type 2 diabetes (T2D) patients to improve learning individualized insulin therapies from an RCT. In the last part of this work, we report M-learning proposed in the first part to learn ITRs using interpretable features extracted from EHR documentation of medications and ICD diagnoses codes. We use a latent Dirichlet allocation (LDA) model to extract latent topics and weights as features for learning ITRs. Our method achieves confounding reduction in observational studies through matching treated and untreated individuals and improves treatment optimization by augmenting feature space with clinically meaningful LDA-based features. We apply the method to extract LDA-based features in EHR data collected at NYPH clinical data warehouse in studying optimal second-line treatment for T2D patients. We use cross validation to show that ITRs outperforms uniform treatment strategies (i.e., assigning insulin or another class of oral organic compounds to all individuals), and including topic modeling features leads to more reduction of post-treatment complications. Biometry Medical records--Data processing Machine learning Personalized medicine
839	Anpassning av mobilnotifikationer med hjälp av maskininlärning Saveh, Diana January 2019 (has links) The aim of this study has been to answer the question whether it is possible to obtain notifications that work with the user, instead of against, which can be experienced as stressful and bothersome. To decrease the stressful notifications an application was created which acted as a notification control. The application used machine learning to predict when the user wanted to receive their notifications. For an artificial intelligence to work there needs to be a pattern recognition. In this case the pattern recognition that was used is called the association rule analysis. The association rule analysis used a tree called fp-growth. After the application was made, a usability test was made before and after the installation of the application. The usability test was testing if the user experienced stress and how the application worked. The study showed that screen time decreased by one hour and the number of times the mobile was opened was also reduced. This survey requires more data as it may be that the user was not affected by the application but only randomly used the mobile phone less. / Denna studie handlade om att försöka minska störande notifikationer som kan upplevas som stressande och irriterande. Det som skapades var en applikation som agerade som en notifikationskontroll. Denna applikation fungerar med hjälp av maskininlärning som ska förutse när användaren ville ta emot sina notifikationer. Den mönsterigenkännande artificiella intelligensen som användes kallas associationsregelanalys. Associationsregelanalysen använde sig av ett träd som kallas fp-growth. Det gjordes ett användartest före installation av applikationen och ett användartest efter för att se hur användaren upplevde stress men även själva applikationen. Studien visade att skärmtiden minskade med en timme och antalet gånger som mobilen öppnades minskades också. Denna undersökning kräver mer data då det kan vara så att användaren inte blev påverkad av applikationen utan endast slumpmässigt använde mobiltelefonen mindre. Human-mobile-interaction Java machine learning AI Software Engineering Programvaruteknik
840	Modelagem de propensão ao atrito no setor de telecomunicações / Modeling Attrition Propensity in the Telecommunication Sector Arruda, Rodolfo Augusto da Silva 12 March 2019 (has links) A satisfação dos clientes é fundamental para a manutenção do relacionamento com a empresa. Quando eles precisam resolver algum problema, a empresa necessita proporcionar bom atendimento e ter capacidade de resolutividade. No entanto, o atendimento massificado, muitas vezes, impossibilita soluções sensíveis às necessidades dos clientes. A metodologia estatística pode ajudar a empresa na priorização de clientes com perfil a reclamar em um órgão de defesa ao consumidor (ODC), evitando assim uma situação de atrito. Neste projeto, foi realizada a modelagem do comportamento do cliente com relação à propensão ao atrito. Foram testadas as técnicas de Regressão Logística, Random Forest e Algoritmos Genéticos. Os resultados mostraram que os Algoritmos Genéticos são uma boa opção para tornar o modelo mais simples (parcimonioso), sem perda de performance, e que o Random Forest possibilitou ganho de performance, porém torna o modelo mais complexo, tanto do ponto de vista computacional quanto prático no que tange à implantação em sistemas de produção da empresa. / Customer satisfaction is key to maintaining the relationship with the company. When these need to solve some problem, the company needs to provide good service and have resolving capacity. However, the mass service often makes it impossible. The statistical methodology can help the company in the prioritization of clients with profile to complain in ODC, thus avoiding a situation of attrition. In this project was carried out the modeling of the behavior of the client in relation to the propensity to attrition. Logistic Regression, Random Forest and Genetic Algorithms were tested. The results showed that the Genetic Algorithms are a good option to make the model simpler (parsimonious) without loss of performance and that Random Forest allowed performance gain, but it makes the model more complex, both from the point of view computational and practical in relation to the implantation in production systems of the company. Aprendizado de máquina Atrito Attrition Machine learning Propensity score Score de propensão

Search results