241 |
PePIP : a Pipeline for Peptide-Protein Interaction-site Prediction / PePIP : en Pipeline for Förutsägelse av Peptid-Protein Bindnings-siteJohansson-Åkhe, Isak January 2017 (has links)
Protein-peptide interactions play a major role in several biological processes, such as cellproliferation and cancer cell life-cycles. Accurate computational methods for predictingprotein-protein interactions exist, but few of these method can be extended to predictinginteractions between a protein and a particularly small or intrinsically disordered peptide. In this thesis, PePIP is presented. PePIP is a pipeline for predicting where on a given proteina given peptide will most probably bind. The pipeline utilizes structural aligning to perusethe Protein Data Bank for possible templates for the interaction to be predicted, using thelarger chain as the query. The possible templates are then evaluated as to whether they canrepresent the query protein and peptide using a Random Forest classifier machine learningalgorithm, and the best templates are found by using the evaluation from the Random Forest in combination with hierarchical clustering. These final templates are then combined to givea prediction of binding site. PePIP is proven to be highly accurate when testing on a set of 502 experimentally determinedprotein-peptide structures, suggesting a binding site on the correct part of the protein- surfaceroughly 4 out of 5 times.
|
242 |
Divergence and reproductive isolation in the bushcricket Mecopoda elongataDutta, Rochishnu January 2015 (has links)
The evolution of isolating mechanisms within a species population impedes gene flow. This allows isolated populations to diverge along different trajectories, which may ultimately lead to the formation of new species. Our attempts to understand the evolution of isolating barriers have benefited enormously from studies of divergent populations that are still recognized as members of the same species. The co-occurrence of five acoustically distinct populations of the bushcricket Mecopoda elongata in south India provided us with the opportunity to study one such divergence of sympatric populations of a single species. In sympatric populations that share identical ecology, sexual selection has the potential to play a prominent role in the maintenance of reproductive isolation. Based on a previous traditional morphometric study, Mecopoda elongata in India were thought to be a morphologically indistinguishable cryptic species complex. The lack of morphological divergence suggests a less significant role of ecology in the divergence of the group. One possibility is that songtypes may be maintained by the preference of Mecopoda elongate females for mating with a specific songtype. In this thesis I show that female phonotaxis to their ‘own’ call has the potential to contribute to behavioural isolation among the songtypes and in particular between two songtypes with overlapping temporal call parameters. This finding is supported by an independent no-choice mating experiment utilizing the same two songtypes. To investigate the cues other than song that Mecopoda elongata females’ may use to exercise preference for their own type, I examined the composition of cuticular lipids in the cuticle and the detailed structure of secondary sexual characters. I was able to differentiate all Mecopoda elongata songtypes with high probability based on CHC profiles and geometric morphometrics of the sub genital plate and cerci. My study reveals that divergence in sexual traits other than acoustic signals, although dramatically less obvious in nature, is present among Mecopoda elongata populations. This provides potential mechanisms for premating isolation among Mecopoda elongata songtypes in the wild suggesting that reproductive isolation is maintained by female preferences for male sexual signals. Additionally, I discovered a parasitoid Tachinid fly responsible for infecting three different songtypes of Mecopoda elongata, namely Double Chirper, Two Part and Helicopter. This Tachinid fly appears to have specialized hearing organ to track down calling Mecopoda elongata males throwing light on potential selection pressure and possible mechanism for Mecopoda elongata song divergence.
|
243 |
Využití statistických metod při oceňování nemovitostí / Valuation of real estates using statistical methodsFuniok, Ondřej January 2017 (has links)
The thesis deals with the valuation of real estates in the Czech Republic using statistical methods. The work focuses on a complex task based on data from an advertising web portal. The aim of the thesis is to create a prototype of the statistical predication model of the residential properties valuation in Prague and to further evaluate the dissemination of its possibilities. The structure of the work is conceived according to the CRISP-DM methodology. On the pre-processed data are tested the methods regression trees and random forests, which are used to predict the price of real estate.
|
244 |
Classification of Video Traffic : An Evaluation of Video Traffic Classification using Random Forests and Gradient Boosted TreesAndersson, Ricky January 2017 (has links)
Traffic classification is important for Internet providers and other organizations to solve some critical network management problems.The most common methods for traffic classification is Deep Packet Inspection (DPI) and port based classification. These methods are starting to become obsolete as more and more traffic are being encrypted and applications are starting to use dynamic ports and ports of other popular applications. An alternative method for traffic classification uses Machine Learning (ML).This ML method uses statistical features of network traffic flows, which solves the fundamental problems of DPI and port based classification for encrypted flows.The data used in this study is divided into video and non-video traffic flows and the goal of the study is to create a model which can classify video flows accurately in real-time.Previous studies found tree-based algorithms to work well in classifying network traffic. In this study random forest and gradient boosted trees are examined and compared as they are two of the best performing tree-based classification models.Random forest was found to work the best as the classification speed was significantly faster than gradient boosted trees. Over 93% correctly classified flows were achieved while keeping the random forest model small enough to keep fast classification speeds. / HITS, 4707
|
245 |
Probabilistic Models to Detect Important Sites in ProteinsDang, Truong Khanh Linh 24 September 2020 (has links)
No description available.
|
246 |
Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approachesJi-young Shin (11560495) 14 October 2021 (has links)
<p>The present dissertation investigated
the impact of scales / scoring methods and prompt linguistic features on the
meausrement quality of L2 English elicited imitation (EI). Scales / scoring
methods are an important feature for the validity and reliabilty of L2 EI test,
but less is known (Yan et al., 2016). Prompt linguistic features are also known
to influence EI test quaity, particularly item difficulty, but item
discrimination or corpus-based, fine-grained meausres have rarely been incorporated
into examining the contribution of prompt linguistic features. The current
study addressed the research needs, using item response theory (IRT) and random
forest modeling.</p><p>Data consisted of 9,348 oral responses
to forty-eight items, including EI prompts, item scores, and rater comments, which
were collected from 779 examinees of an L2 English EI test at Purdue
Universtiy. First, the study explored the current and alternative EI scales / scoring
methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based
measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project
identified important prompt linguistic features that predict EI item difficulty
and discrimination across different scales / scoring methods and proficiency, using
multi-level modeling and random forest regression (RQ5 and RQ6 in Phase
Ⅱ).</p><p>The main findings were
(although not limited to): 1) collapsing exact repetition and paraphrase
categories led to more optimal measurement (i.e., adequacy of item parameter values, category
functioning, and model / item / person fit) (RQ1); there were fewer misfitting
persons with lower proficiency and higher frequency of unexpected responses in
the extreme categories (RQ2); the inconsistency of qualitatively distinguishing
semantic errors and the wide range of grammatical accuracy in the minor error
category contributed to misfit (RQ3); a quantity-based, 4-category ordinal
scale outperformed quality-based or binary scales (RQ4); sentence length
significantly explained item difficulty only, with small variance explained
(RQ5); Corpus-based lexical measures and
phrase-level syntactic complexity were important to predicting item difficulty,
particularly for the higher ability level. The findings made implications for
EI scale / item development in human and automatic scoring settings and L2
English proficiency development.</p>
|
247 |
Identifikace a verifikace osob pomocí záznamu EKG / ECG based human authentication and identificationWaloszek, Vojtěch January 2021 (has links)
In the past years, utilization of ECG for verification and identification in biometry is investigated. The topic is investigated in this thesis. Recordings from ECG ID database from PhysioNet and our own ECG recordings recorded using Apple Watch 4 are used for training and testing this method. Many of the existing methods have proven the possibility of using ECG for biometry, however they were using clinical ECG devices. This thesis investigates using recordings from wearable devices, specifically smart watch. 16 features are extracted from ECG recordings and a random forest classifier is used for verification and identification. The features include time intervals between fiducial points, voltage difference between fiducial points and PR intervals variability in a recording. The average performance of verification model of 14 people is TRR 96,19 %, TAR 84,25 %.
|
248 |
Comparison of Undersampling Methods for Prediction of Casting Defects Based on Process ParametersLööv, Simon January 2021 (has links)
Prediction of both big and small decisions is something most companies have to make on a daily basis. The importance of having a highly accurate technique for different decision-making is not something that is new. However, even though the importance of prediction is a fact to most people, current techniques for estimation are still often highly inaccurate. The consequences of an inaccurate prediction can be huge in the differences between the misclassifications. Not just in the industry but for many different areas. Machine learning have in the recent couple of years improved significantly and are now considered a reliable method to use for prediction. The main goal of this research is to predict casting defects with the help of a machine-learning algorithm based on process parameters. In order to achieve the main goal, some sub-objectives have been identified to successfully reach those goals. A problem when dealing with machine learning is an unbalanced dataset. When training a network, it is essential that the dataset is balanced. In this research we have successfully balanced the dataset. Undersampling was the method used in our research to establish our balanced dataset. The research compares and evaluates a couple of different undersample methods in order to see which undersampling is best suited for this project. Three different machine models, “random forest”, “artificial neural network”, and “k-nearest neighbor”, are also compared to each other to see what model performs best. The conlcusion reached was that the best method for both undersampling and machine learning model varied due to many different reasons. So, in order to find the best model with the best method for a specific job, all the models and methods need to be tested. However, the undersampling method that provided best performances most times in our research was the NearMiss version 2 model. Artificial Neural Network was the machine learning model that had most success in our research. It performed best in two out of three evaluations and comparisons.
|
249 |
Machine Learning-based Quality Prediction in the Froth Flotation Process of Mining : Master’s Degree Thesis in Microdata AnalysisKwame Osei, Eric January 2019 (has links)
In the iron ore mining fraternity, in order to achieve the desired quality in the froth flotation processing plant, stakeholders rely on conventional laboratory test technique which usually takes more than two hours to ascertain the two variables of interest. Such a substantial dead time makes it difficult to put the inherent stochastic nature of the plant system in steady-state. Thus, the present study aims to evaluate the feasibility of using machine learning algorithms to predict the percentage of silica concentrate (SiO2) in the froth flotation processing plant in real-time. The predictive model has been constructed using iron ore mining froth flotation system dataset obtain from Kaggle. Different feature selection methods including Random Forest and backward elimination technique were applied to the dataset to extract significant features. The selected features were then used in Multiple Linear Regression, Random Forest and Artificial Neural Network models and the prediction accuracy of all the models have been evaluated and compared with each other. The results show that Artificial Neural Network has the ability to generalize better and predictions were off by 0.38% mean square error (mse) on average, which is significant considering that the SiO2 range from 0.77%- 5.53% -( mse 1.1%) . These results have been obtained within real-time processing of 12s in the worst case scenario on an Inter i7 hardware. The experimental results also suggest that reagents variables have the most significant influence in SiO2 prediction and less important variable is the Flotation Column.02.air.Flow. The experiments results have also indicated a promising prospect for both the Multiple Linear Regression and Random Forest models in the field of SiO2 prediction in iron ore mining froth flotation system in general. Meanwhile, this study provides management, metallurgists and operators with a better choice for SiO2 prediction in real-time per the accuracy demand as opposed to the long dead time laboratory test analysis causing incessant loss of iron ore discharged to tailings.
|
250 |
Automatic Prediction of Human Age based on Heart Rate Variability Analysis using Feature-Based MethodsAl-Mter, Yusur January 2020 (has links)
Heart rate variability (HRV) is the time variation between adjacent heartbeats. This variation is regulated by the autonomic nervous system (ANS) and its two branches, the sympathetic and parasympathetic nervous system. HRV is considered as an essential clinical tool to estimate the imbalance between the two branches, hence as an indicator of age and cardiac-related events.This thesis focuses on the ECG recordings during nocturnal rest to estimate the influence of HRV in predicting the age decade of healthy individuals. Time and frequency domains, as well as non-linear methods, are explored to extract the HRV features. Three feature-based methods (support vector machine (SVM), random forest, and extreme gradient boosting (XGBoost)) were employed, and the overall test accuracy achieved in capturing the actual class was relatively low (lower than 30%). SVM classifier had the lowest performance, while random forests and XGBoost performed slightly better. Although the difference is negligible, the random forest had the highest test accuracy, approximately 29%, using a subset of ten optimal HRV features. Furthermore, to validate the findings, the original dataset was shuffled and used as a test set and compared the performance to other related research outputs.
|
Page generated in 0.0932 seconds