• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 339
  • 26
  • 21
  • 13
  • 8
  • 5
  • 5
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 507
  • 507
  • 272
  • 270
  • 147
  • 135
  • 129
  • 128
  • 113
  • 92
  • 88
  • 77
  • 76
  • 74
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Classification of Video Traffic : An Evaluation of Video Traffic Classification using Random Forests and Gradient Boosted Trees

Andersson, Ricky January 2017 (has links)
Traffic classification is important for Internet providers and other organizations to solve some critical network management problems.The most common methods for traffic classification is Deep Packet Inspection (DPI) and port based classification. These methods are starting to become obsolete as more and more traffic are being encrypted and applications are starting to use dynamic ports and ports of other popular applications. An alternative method for traffic classification uses Machine Learning (ML).This ML method uses statistical features of network traffic flows, which solves the fundamental problems of DPI and port based classification for encrypted flows.The data used in this study is divided into video and non-video traffic flows and the goal of the study is to create a model which can classify video flows accurately in real-time.Previous studies found tree-based algorithms to work well in classifying network traffic. In this study random forest and gradient boosted trees are examined and compared as they are two of the best performing tree-based classification models.Random forest was found to work the best as the classification speed was significantly faster than gradient boosted trees. Over 93% correctly classified flows were achieved while keeping the random forest model small enough to keep fast classification speeds. / HITS, 4707
242

Probabilistic Models to Detect Important Sites in Proteins

Dang, Truong Khanh Linh 24 September 2020 (has links)
No description available.
243

Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approaches

Ji-young Shin (11560495) 14 October 2021 (has links)
<p>The present dissertation investigated the impact of scales / scoring methods and prompt linguistic features on the meausrement quality of L2 English elicited imitation (EI). Scales / scoring methods are an important feature for the validity and reliabilty of L2 EI test, but less is known (Yan et al., 2016). Prompt linguistic features are also known to influence EI test quaity, particularly item difficulty, but item discrimination or corpus-based, fine-grained meausres have rarely been incorporated into examining the contribution of prompt linguistic features. The current study addressed the research needs, using item response theory (IRT) and random forest modeling.</p><p>Data consisted of 9,348 oral responses to forty-eight items, including EI prompts, item scores, and rater comments, which were collected from 779 examinees of an L2 English EI test at Purdue Universtiy. First, the study explored the current and alternative EI scales / scoring methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project identified important prompt linguistic features that predict EI item difficulty and discrimination across different scales / scoring methods and proficiency, using multi-level modeling and random forest regression (RQ5 and RQ6 in Phase Ⅱ).</p><p>The main findings were (although not limited to): 1) collapsing exact repetition and paraphrase categories led to more optimal measurement (i.e., adequacy of item parameter values, category functioning, and model / item / person fit) (RQ1); there were fewer misfitting persons with lower proficiency and higher frequency of unexpected responses in the extreme categories (RQ2); the inconsistency of qualitatively distinguishing semantic errors and the wide range of grammatical accuracy in the minor error category contributed to misfit (RQ3); a quantity-based, 4-category ordinal scale outperformed quality-based or binary scales (RQ4); sentence length significantly explained item difficulty only, with small variance explained (RQ5); Corpus-based lexical measures and phrase-level syntactic complexity were important to predicting item difficulty, particularly for the higher ability level. The findings made implications for EI scale / item development in human and automatic scoring settings and L2 English proficiency development.</p>
244

Identifikace a verifikace osob pomocí záznamu EKG / ECG based human authentication and identification

Waloszek, Vojtěch January 2021 (has links)
In the past years, utilization of ECG for verification and identification in biometry is investigated. The topic is investigated in this thesis. Recordings from ECG ID database from PhysioNet and our own ECG recordings recorded using Apple Watch 4 are used for training and testing this method. Many of the existing methods have proven the possibility of using ECG for biometry, however they were using clinical ECG devices. This thesis investigates using recordings from wearable devices, specifically smart watch. 16 features are extracted from ECG recordings and a random forest classifier is used for verification and identification. The features include time intervals between fiducial points, voltage difference between fiducial points and PR intervals variability in a recording. The average performance of verification model of 14 people is TRR 96,19 %, TAR 84,25 %.
245

Comparison of Undersampling Methods for Prediction of Casting Defects Based on Process Parameters

Lööv, Simon January 2021 (has links)
Prediction of both big and small decisions is something most companies have to make on a daily basis. The importance of having a highly accurate technique for different decision-making is not something that is new. However, even though the importance of prediction is a fact to most people, current techniques for estimation are still often highly inaccurate. The consequences of an inaccurate prediction can be huge in the differences between the misclassifications. Not just in the industry but for many different areas. Machine learning have in the recent couple of years improved significantly and are now considered a reliable method to use for prediction. The main goal of this research is to predict casting defects with the help of a machine-learning algorithm based on process parameters. In order to achieve the main goal, some sub-objectives have been identified to successfully reach those goals. A problem when dealing with machine learning is an unbalanced dataset. When training a network, it is essential that the dataset is balanced. In this research we have successfully balanced the dataset. Undersampling was the method used in our research to establish our balanced dataset. The research compares and evaluates a couple of different undersample methods in order to see which undersampling is best suited for this project. Three different machine models, “random forest”, “artificial neural network”, and “k-nearest neighbor”, are also compared to each other to see what model performs best. The conlcusion reached was that the best method for both undersampling and machine learning model varied due to many different reasons. So, in order to find the best model with the best method for a specific job, all the models and methods need to be tested. However, the undersampling method that provided best performances most times in our research was the NearMiss version 2 model. Artificial Neural Network was the machine learning model that had most success in our research. It performed best in two out of three evaluations and comparisons.
246

Machine Learning-based Quality Prediction in the Froth Flotation Process of Mining : Master’s Degree Thesis in Microdata Analysis

Kwame Osei, Eric January 2019 (has links)
In the iron ore mining fraternity, in order to achieve the desired quality in the froth flotation processing plant, stakeholders rely on conventional laboratory test technique which usually takes more than two hours to ascertain the two variables of interest. Such a substantial dead time makes it difficult to put the inherent stochastic nature of the plant system in steady-state. Thus, the present study aims to evaluate the feasibility of using machine learning algorithms to predict the percentage of silica concentrate (SiO2) in the froth flotation processing plant in real-time. The predictive model has been constructed using iron ore mining froth flotation system dataset obtain from Kaggle. Different feature selection methods including Random Forest and backward elimination technique were applied to the dataset to extract significant features. The selected features were then used in Multiple Linear Regression, Random Forest and Artificial Neural Network models and the prediction accuracy of all the models have been evaluated and compared with each other. The results show that Artificial Neural Network has the ability to generalize better and predictions were off by 0.38% mean square error (mse) on average, which is significant considering that the SiO2 range from 0.77%- 5.53% -( mse 1.1%) . These results have been obtained within real-time processing of 12s in the worst case scenario on an Inter i7 hardware. The experimental results also suggest that reagents variables have the most significant influence in SiO2 prediction and less important variable is the Flotation Column.02.air.Flow. The experiments results have also indicated a promising prospect for both the Multiple Linear Regression and Random Forest models in the field of SiO2 prediction in iron ore mining froth flotation system in general. Meanwhile, this study provides management, metallurgists and operators with a better choice for SiO2 prediction in real-time per the accuracy demand as opposed to the long dead time laboratory test analysis causing incessant loss of iron ore discharged to tailings.
247

Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based Methods

Al-Mter, Yusur January 2020 (has links)
Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs.
248

Predicting the impact of prior physical activity on shooting performance / Prediktion av tidigare fysisk aktivitets inverkan på skytteprestanda

Berkman, Anton, Andersson, Gustav January 2019 (has links)
The objectives of this thesis were to develop a machine learning tool-chain and to investigate the relationship between heart rate and trigger squeeze and shooting accuracy when firing a handgun in a simulated environment. There are several aspects that affects the accuracy of a shooter. To accelerate the learning process and to complement the instructors, different sensors can be used by the shooter. By extracting sensor data and presenting this to the shooter in real-time the rate of improvement can potentially be accelerated. An experiment which replicated precision shooting was conducted at SAAB AB using their GC-IDT simulator. 14 participants with experience ranging from zero to over 30 years participated. The participants were randomly divided into two groups where one group started the experiment with a heart rate of at least 150 beats per minute. The iTouchGlove2.3 was used to measure trigger squeeze and Polar H10 heart rate belt was used to measure heart rate. Random forest regression was then used to predict accuracy on the data collected from the experiment. A machine learning tool-chain was successfully developed to process raw sensor data which was then used by a random forest regression algorithm to form a prediction. This thesis provides insights and guidance for further experimental explorations of handgun exercises and shooting performance.
249

FAULT DETECTION FOR SMALL-SCALE PHOTOVOLTAIC POWER INSTALLATIONS : A Case Study of a Residential Solar Power System

Brüls, Maxim January 2020 (has links)
Fault detection for residential photovoltaic power systems is an often-ignored problem. This thesis introduces a novel method for detecting power losses due to faults in solar panel performance. Five years of data from a residential system in Dalarna, Sweden, was applied on a random forest regression to estimate power production. Estimated power was compared to true power to assess the performance of the power generating systems. By identifying trends in the difference and estimated power production, faults can be identified. The model is sufficiently competent to identify consistent energy losses of 10% or greater of the expected power output, while requiring only minimal modifications to existing power generating systems.
250

Essays on Health Economics Using Big Data

Zarebanadkoki, Samane 01 January 2019 (has links)
This dissertation consists of three essays addressing different topics in health economics. In the first essay, we perform a systematic review of peer-reviewed articles examining consumer preference for the main electronic cigarette (e-cigarette) attributes namely flavor, nicotine strength, and type. The search resulted in a pool of 12,933 articles; 66 articles met the inclusion criteria for this review. Current literature suggests consumers preferred flavored e-cigarettes, and such preference varies with age groups and smoking status. Consumer preference for nicotine strength and types depend on smoking status, e-cigarette use history, and gender. Adolescents consider flavor the most important factor trying e-cigarettes and were more likely to initiate vaping through flavored e-cigarettes. Young adults prefer sweet, menthol, and cherry flavors, while non-smokers, in particular, prefer coffee and menthol flavors. Adults in general also prefer sweet flavors (though smokers like tobacco flavor the most) and dislike flavors that elicit bitterness or harshness. Non-smokers and inexperienced e-cigarettes users tend to prefer no nicotine or low nicotine e-cigarettes while smokers and experienced e-cigarettes users prefer medium and high nicotine e-cigarettes. Weak evidence exists regarding a positive interaction between menthol flavor and nicotine strength. In the second essay, we investigate U.S. adult consumer preference for three key e-cigarette attributes––flavor, nicotine strength, and type––by applying a discrete choice model to the Nielsen scanner data (Consumer Panel data combined with retail data) for 2013 through 2017, generating novel findings as well as complementing the large literature on the topic using focus groups, surveys, and experiments. We found that (adult) vapers prefer tobacco flavor, medium nicotine strength, and disposables, and such preference can vary over cigarette smoking status, purchase frequency, gender, race, and age. In particular, smokers prefer tobacco flavor, non-smokers or female vapers prefer medium strength, and infrequent vapers prefer disposables. Vapers also display loyalty (inertia) to e-cigarette brands, flavor, and nicotine strength. One key policy implication is that a flavor ban will likely have a relatively larger impact on adolescents and young adults than adults. The third essay employs a machine learning algorithm, particularly a random forest, to identify the importance of BMI information during kindergarten on predicting children most likely to be obese by the 4th grade. We use the Arkansas BMI screening program dataset. The potential value of BMI information during early childhood to predict the likelihood of obesity later in life is one of the main benefits of a BMI screening program. This study identifies the value of this information by comparing the results of two random forests trained with and without kindergarten BMI information to assess the ability of BMI screening to improve a predictive model beyond personal, demographic, and socioeconomic measures that are typically used to identify children at high risk of excess weight gain. The BMI z-score from kindergarten is the most important variable and increases the accuracy of the prediction by 14%. The ability of BMI screening programs to identify children at greatest risk of becoming obese is an important but neglected dimension that should be used in evaluating the overall utility. In the last essay, we use Nielson retail scanner dataset and apply a difference-in-differences (DID) approach and synthetic control method, and we test whether consumers in Utah reduced beef purchases after the 2009 Salmonella outbreak of ground beef products. The result of DID approach indicates that the Salmonella event reduced ground beef purchases in Utah by 17% in four weeks after the recall. Price elasticity of demand is also estimated to be -2.04; therefore, the reduction in ground beef purchases as a result of recall is comparable to almost 8.3% increase in the price of this product. Using the synthetic control method that allows us to use all of the control states to produce synthetic Utah, we found the effect of this event minimal compared to the DID effect.

Page generated in 0.0778 seconds