• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 81
  • 17
  • 9
  • 7
  • 7
  • 6
  • 5
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 169
  • 169
  • 42
  • 41
  • 36
  • 32
  • 30
  • 29
  • 23
  • 22
  • 18
  • 18
  • 17
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Klasifikace malých nekódujících RNA / Classification of Small Noncoding RNAs

Žigárdi, Tomáš January 2015 (has links)
This masters's thesis contains description of designed and implemented tool for classification of plant microRNA without genome. Properties of mature and star sequences in microRNA duplexes are used. Implemented method is based on clustering of RNA sequences (with CD-HIT) to mainly reduce their count. Selected representants from each clusters are classified using support vector machine. Performance of classification is more than 96% (based on cross-validation method using the training data).
142

Automatic Flight Maneuver Identification Using Machine Learning Methods

Bodin, Camilla January 2020 (has links)
This thesis proposes a general approach to solve the offline flight-maneuver identification problem using machine learning methods. The purpose of the study was to provide means for the aircraft professionals at the flight test and verification department of Saab Aeronautics to automate the procedure of analyzing flight test data. The suggested approach succeeded in generating binary classifiers and multiclass classifiers that identified six flight maneuvers of different complexity from real flight test data. The binary classifiers solved the problem of identifying one maneuver from flight test data at a time, while the multiclass classifiers solved the problem of identifying several maneuvers from flight test data simultaneously. To achieve these results, the difficulties that this time series classification problem entailed were simplified by using different strategies. One strategy was to develop a maneuver extraction algorithm that used handcrafted rules. Another strategy was to represent the time series data by statistical measures. There was also an issue of an imbalanced dataset, where one class far outweighed others in number of samples. This was solved by using a modified oversampling method on the dataset that was used for training. Logistic Regression, Support Vector Machines with both linear and nonlinear kernels, and Artifical Neural Networks were explored, where the hyperparameters for each machine learning algorithm were chosen during model estimation by 4-fold cross-validation and solving an optimization problem based on important performance metrics. A feature selection algorithm was also used during model estimation to evaluate how the performance changes depending on how many features were used. The machine learning models were then evaluated on test data consisting of 24 flight tests. The results given by the test data set showed that the simplifications done were reasonable, but the maneuver extraction algorithm could sometimes fail. Some maneuvers were easier to identify than others and the linear machine learning models resulted in a poor fit to the more complex classes. In conclusion, both binary classifiers and multiclass classifiers could be used to solve the flight maneuver identification problem, and solving a hyperparameter optimization problem boosted the performance of the finalized models. Nonlinear classifiers performed the best on average across all explored maneuvers.
143

Machine Learning Applications for Downscaling Groundwater Storage Changes Integrating Satellite Gravimetry and Other Observations

Agarwal, Vibhor January 2021 (has links)
No description available.
144

Online Non-linear Prediction of Financial Time Series Patterns

da Costa, Joel 11 September 2020 (has links)
We consider a mechanistic non-linear machine learning approach to learning signals in financial time series data. A modularised and decoupled algorithm framework is established and is proven on daily sampled closing time-series data for JSE equity markets. The input patterns are based on input data vectors of data windows preprocessed into a sequence of daily, weekly and monthly or quarterly sampled feature measurement changes (log feature fluctuations). The data processing is split into a batch processed step where features are learnt using a Stacked AutoEncoder (SAE) via unsupervised learning, and then both batch and online supervised learning are carried out on Feedforward Neural Networks (FNNs) using these features. The FNN output is a point prediction of measured time-series feature fluctuations (log differenced data) in the future (ex-post). Weight initializations for these networks are implemented with restricted Boltzmann machine pretraining, and variance based initializations. The validity of the FNN backtest results are shown under a rigorous assessment of backtest overfitting using both Combinatorially Symmetrical Cross Validation and Probabilistic and Deflated Sharpe Ratios. Results are further used to develop a view on the phenomenology of financial markets and the value of complex historical data under unstable dynamics.
145

Machine Learning for Exploring State Space Structure in Genetic Regulatory Networks

Thomas, Rodney H. 01 January 2018 (has links)
Genetic regulatory networks (GRN) offer a useful model for clinical biology. Specifically, such networks capture interactions among genes, proteins, and other metabolic factors. Unfortunately, it is difficult to understand and predict the behavior of networks that are of realistic size and complexity. In this dissertation, behavior refers to the trajectory of a state, through a series of state transitions over time, to an attractor in the network. This project assumes asynchronous Boolean networks, implying that a state may transition to more than one attractor. The goal of this project is to efficiently identify a network's set of attractors and to predict the likelihood with which an arbitrary state leads to each of the network’s attractors. These probabilities will be represented using a fuzzy membership vector. Predicting fuzzy membership vectors using machine learning techniques may address the intractability posed by networks of realistic size and complexity. Modeling and simulation can be used to provide the necessary training sets for machine learning methods to predict fuzzy membership vectors. The experiments comprise several GRNs, each represented by a set of output classes. These classes consist of thresholds τ and ¬τ, where τ = [τlaw,τhigh]; state s belongs to class τ if the probability of its transitioning to attractor 􀜣 belongs to the range [τlaw,τhigh]; otherwise it belongs to class ¬τ. Finally, each machine learning classifier was trained with the training sets that was previously collected. The objective is to explore methods to discover patterns for meaningful classification of states in realistically complex regulatory networks. The research design took a GRN and a machine learning method as input and produced output class < Ατ > and its negation ¬ < Ατ >. For each GRN, attractors were identified, data was collected by sampling each state to create fuzzy membership vectors, and machine learning methods were trained to predict whether a state is in a healthy attractor or not. For T-LGL, SVMs had the highest accuracy in predictions (between 93.6% and 96.9%) and precision (between 94.59% and 97.87%). However, naive Bayesian classifiers had the highest recall (between 94.71% and 97.78%). This study showed that all experiments have extreme significance with pvalue < 0.0001. The contribution this research offers helps clinical biologist to submit genetic states to get an initial result on their outcomes. For future work, this implementation could use other machine learning classifiers such as xgboost or deep learning methods. Other suggestions offered are developing methods that improves the performance of state transition that allow for larger training sets to be sampled.
146

Systemic Identification of Radiomic Features Resilient to Batch Effects and Acquisition Variations for Diagnosis of Active Crohn's Disease on CT Enterography

Pattiam Giriprakash, Pavithran 23 August 2021 (has links)
No description available.
147

Algorithmic Methods for Multi-Omics Biomarker Discovery

Li, Yichao January 2018 (has links)
No description available.
148

Accuracy and Reproducibility of Laboratory Diffuse Reflectance Measurements with Portable VNIR and MIR Spectrometers for Predictive Soil Organic Carbon Modeling

Semella, Sebastian, Hutengs, Christopher, Seidel, Michael, Ulrich, Mathias, Schneider, Birgit, Ortner, Malte, Thiele-Bruhn, Sören, Ludwig, Bernard, Vohland, Michael 09 June 2023 (has links)
Soil spectroscopy in the visible-to-near infrared (VNIR) and mid-infrared (MIR) is a cost-effective method to determine the soil organic carbon content (SOC) based on predictive spectral models calibrated to analytical-determined SOC reference data. The degree to which uncertainty in reference data and spectral measurements contributes to the estimated accuracy of VNIR and MIR predictions, however, is rarely addressed and remains unclear, in particular for current handheld MIR spectrometers. We thus evaluated the reproducibility of both the spectral reflectance measurements with portable VNIR and MIR spectrometers and the analytical dry combustion SOC reference method, with the aim to assess how varying spectral inputs and reference values impact the calibration and validation of predictive VNIR and MIR models. Soil reflectance spectra and SOC were measured in triplicate, the latter by different laboratories, for a set of 75 finely ground soil samples covering a wide range of parent materials and SOC contents. Predictive partial least-squares regression (PLSR) models were evaluated in a repeated, nested cross-validation approach with systematically varied spectral inputs and reference data, respectively. We found that SOC predictions from both VNIR and MIR spectra were equally highly reproducible on average and similar to the dry combustion method, but MIR spectra were more robust to calibration sample variation. The contributions of spectral variation (ΔRMSE < 0.4 g·kg−1) and reference SOC uncertainty (ΔRMSE < 0.3 g·kg−1) to spectral modeling errors were small compared to the difference between the VNIR and MIR spectral ranges (ΔRMSE ~1.4 g·kg−1 in favor of MIR). For reference SOC, uncertainty was limited to the case of biased reference data appearing in either the calibration or validation. Given better predictive accuracy, comparable spectral reproducibility and greater robustness against calibration sample selection, the portable MIR spectrometer was considered overall superior to the VNIR instrument for SOC analysis. Our results further indicate that random errors in SOC reference values are effectively compensated for during model calibration, while biased SOC calibration data propagates errors into model predictions. Reference data uncertainty is thus more likely to negatively impact the estimated validation accuracy in soil spectroscopy studies where archived data, e.g., from soil spectral libraries, are used for model building, but it should be negligible otherwise.
149

Chemometric Applications To A Complex Classification Problem: Forensic Fire Debris Analysis

Waddell, Erin 01 January 2013 (has links)
Fire debris analysis currently relies on visual pattern recognition of the total ion chromatograms, extracted ion profiles, and target compound chromatograms to identify the presence of an ignitable liquid. This procedure is described in the ASTM International E1618-10 standard method. For large data sets, this methodology can be time consuming and is a subjective method, the accuracy of which is dependent upon the skill and experience of the analyst. This research aimed to develop an automated classification method for large data sets and investigated the use of the total ion spectrum (TIS). The TIS is calculated by taking an average mass spectrum across the entire chromatographic range and has been shown to contain sufficient information content for the identification of ignitable liquids. The TIS of ignitable liquids and substrates were compiled into model data sets. Substrates are defined as common building materials and household furnishings that are typically found at the scene of a fire and are, therefore, present in fire debris samples. Fire debris samples were also used which were obtained from laboratory-scale and large-scale burns. An automated classification method was developed using computational software that was written in-house. Within this method, a multi-step classification scheme was used to detect ignitable liquid residues in fire debris samples and assign these to the classes defined in ASTM E1618-10. Classifications were made using linear discriminant analysis, quadratic discriminant analysis (QDA), and soft independent modeling of class analogy (SIMCA). The model data sets iv were tested by cross-validation and used to classify fire debris samples. Correct classification rates were calculated for each data set. Classifier performance metrics were also calculated for the first step of the classification scheme which included false positive rates, true positive rates, and the precision of the method. The first step, which determines a sample to be positive or negative for ignitable liquid residue, is arguably the most important in the forensic application. Overall, the highest correct classification rates were achieved using QDA for the first step of the scheme and SIMCA for the remaining steps. In the first step of the classification scheme, correct classification rates of 95.3% and 89.2% were obtained using QDA to classify the crossvalidation test set and fire debris samples, respectively. For this step, the cross-validation test set resulted in a true positive rate of 96.2%, a false positive rate of 9.3%, and a precision of 98.2%. The fire debris data set had a true positive rate of 82.9%, a false positive rate of 1.3%, and a precision of 99.0%. Correct classifications rates of 100% were achieved for both data sets in the majority of the remaining steps which used SIMCA for classification. The lowest correct classification rate, 69.2%, was obtained for the fire debris samples in one of the final steps in the classification scheme. In this research, the first statistically valid error rates for fire debris analysis have been developed through cross-validation of large data sets. The fire debris analyst can use the automated method as a tool for detecting and classifying ignitable liquid residues in fire debris samples. The error rates reduce the subjectivity associated with the current methods and provide a level of confidence in sample classification that does not currently exist in forensic fire debris analysis.
150

Flying in the Academic Environment : An Exploratory Panel Data Analysis of CO2 Emission at KTH

Artman, Arvid January 2024 (has links)
In this study, a panel data set of flights made by employees at the Royal Institute of Technology (KTH) in Sweden is analyzed using generalized linear modeling approaches, with the aim to create a model with high predictive capability of the quarterly CO2 emission and the number of flights, for a year not included in the model estimation. A Zero-inflated Gamma regression model is fitted to the CO2 emission variable and a Zero-inflated Negative Binomial regression model is used for the number of flights. To build the models, cross-validation is performed with the observations from 2018 as the training set and the observations from the next year, 2019, as the test set. One at a time, the variable that best improves the prediction of the test set data (either as included in the count model or the zero-inflation model) is selected until an additional variable turns out insignificant on a 5% significance level in the estimated model. In addition to the variables in the data, three lags of the dependent variables (CO2 emission and flights) were included, as well as transformed versions of the continuous variables, and a random intercept each for the categorical variables indicating quarter and department at KTH, respectively. Neither model selected through the cross-validation process turned out to be particularly good at predicting the values for the upcoming year, but a number of variables were proven to have a statistically significant association with the respective dependent variable.

Page generated in 0.1121 seconds