241 |
Identifying Mitochondrial Genomes in Draft Whole-Genome Shotgun Assemblies of Six Gymnosperm Species / Identifiering av mitokondriers arvsmassa från preliminäraversioner av arvsmassan för sex gymnospermerEldfjell, Yrin January 2018 (has links)
Sequencing efforts for gymnosperm genomes typically focus on nuclear and chloroplast DNA, with only three complete mitochondrial genomes published as of 2017. The availability of additional mitochondrial genomes would aid biological and evolutionary understanding of gymnosperms. Identifying mtDNA from existing whole genome sequencing (WGS) data (i.e. contigs) negates the need for additional experimental work but previous classification methods show limitations in sensitivity or accuracy, particularly in difficult cases. In this thesis I present a classification pipeline based on (1) kmer probability scoring and (2) SVM classification applied to the available contigs. Using this pipeline the mitochondrial genomes of six gymnosperm species were obtained: Abies sibirica, Gnetum gnemon, Juniperus communis, Picea abies, Pinus sylvestris and Taxus baccata. Cross-validation experiments showed a satisfying and forsome species excellent degree of accuracy. / Vid sekvensering av gymnospermers arvsmassa har fokus oftast lagts på kärn- och kloroplast-DNA. Bara tre fullständiga mitokondriegenom har publicerats hittills (2017). Fler mitokondriegenom skulle kunna leda till nya kunskaper om gymnospermers biologi och evolution. Då mitokondriernas arvsmassa identifieras från tillgängliga sekvenser för hela organismen (så kallade “contiger”) behövs inget ytterligare laboratoriearbete, men detta förfarande har visat sig leda till bristfällig känslighet och korrekthet, särskilt i svåra fall. I denna avhandling presenterar jag en metod baserad på (1) kmer-sannolikheter och (2) SVM-klassificering applicerad på de tillgängliga contigerna. Med denna metod togs arvsmassan för mitokondrien hos sex gymnospermer fram: Abies sibirica, Gnetum gnemon, Juniperus communis, Picea abies, Pinus sylvestris och Taxus baccata. Korsvalideringsexperiment visade en tillfredställande och för vissa arter utmärkt precision.
|
242 |
Learning-Based Fusion for Data Deduplication: A Robust and Automated SolutionDinerstein, Jared 01 December 2010 (has links)
This thesis presents two deduplication techniques that overcome the following critical and long-standing weaknesses of rule-based deduplication: (1) traditional rule-based deduplication requires significant manual tuning of the individual rules, including the selection of appropriate thresholds; (2) the accuracy of rule-based deduplication degrades when there are missing data values, significantly reducing the efficacy of the expert-defined deduplication rules.
The first technique is a novel rule-level match-score fusion algorithm that employs kernel-machine-based learning to discover the decision threshold for the overall system automatically. The second is a novel clue-level match-score fusion algorithm that addresses both Problem 1 and 2. This unique solution provides robustness against missing/incomplete record data via the selection of a best-fit support vector machine. Empirical evidence shows that the combination of these two novel solutions eliminates two critical long-standing problems in deduplication, providing accurate and robust results in a critical area of rule-based deduplication.
|
243 |
Predicting transit times for outbound logisticsCochenour, Brooke R. 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / On-time delivery of supplies to industry is essential because delays can disrupt
production schedules. The aim of the proposed application is to predict transit times
for outbound logistics thereby allowing suppliers to plan for timely mitigation of
risks during shipment planning. The predictive model consists of a classifier that is
trained for each specific source-destination pair using historical shipment, weather,
and social media data. The model estimates the transit times for future shipments
using Support Vector Machine (SVM). These estimates were validated using four case
study routes of varying distances in the United States. A predictive model is trained
for each route. The results show that the contribution of each input feature to the
predictive ability of the model varies for each route. The mean average error (MAE)
values of the model vary for each route due to the availability of testing and training
historical shipment data as well as the availability of weather and social media data.
In addition, it was found that the inclusion of the historical traffic data provided by
INRIXTM improves the accuracy of the model. Sample INRIXTM data was available
for one of the routes. One of the main limitations of the proposed approach is the
availability of historical shipment data and the quality of social media data. However,
if the data is available, the proposed methodology can be applied to any supplier with
high volume shipments in order to develop a predictive model for outbound transit
time delays over any land route.
|
244 |
Machine Learning Classification of Facial Affect Recognition Deficits after Traumatic Brain Injury for Informing Rehabilitation Needs and ProgressIffat Naz, Syeda 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / A common impairment after a traumatic brain injury (TBI) is a deficit in emotional recognition, such as inferences of others’ intentions. Some researchers have found these impairments in 39\% of the TBI population. Our research information needed to make inferences about emotions and mental states comes from visually presented, nonverbal cues (e.g., facial expressions or gestures). Theory of mind (ToM) deficits after TBI are partially explained by impaired visual attention and the processing of these important cues. This research found that patients with deficits in visual processing differ from healthy controls (HCs). Furthermore, we found visual processing problems can be determined by looking at the eye tracking data developed from industry standard eye tracking hardware and software. We predicted that the eye tracking data of the overall population is correlated to the TASIT test. The visual processing of impaired (who got at least one answer wrong from TASIT questions) and unimpaired (who got all answer correctly from TASIT questions) differs significantly. We have divided the eye-tracking data into 3 second time blocks of time series data to detect the most salient individual blocks to the TASIT score. Our preliminary results suggest that we can predict the whole population's impairment using eye-tracking data with an improved f1 score from 0.54 to 0.73. For this, we developed optimized support vector machine (SVM) and random forest (RF) classifier.
|
245 |
Case Influence and Model Complexity in Regression and ClassificationTU, SHANSHAN 17 October 2019 (has links)
No description available.
|
246 |
Determining Anomalies in Radar Data for Seedbed Tine Harrow OperationWinbladh, William, Persson, Karl January 2022 (has links)
The agricultural industry is constantly evolving with automation as one of the current main focuses. This thesis involves the automation of a seedbed tine harrow, specifically the control of the tillage depth. The tillage depth is instrumental to farming as it determines the quality of the tilth, how well clods are broken up, and how well the soil aggregates are sorted. Poor control of the tillage depth could result in a bad harvest for the farmer. To control the tillage depth, several pulse radar sensors are installed on the harrow. The sensors measure the distance from the tines of the harrow to the ground. This distance is used in a control-loop that controls the hydraulic actuators that lifts and pushes down the frame of the harrow. Because of the rough working conditions of the tine harrow, the pulse radar sensors are in danger of being damaged or disturbed. A sensor not working as intended will lead to poor control of the tillage depth or even an unstable control system. The purpose of this thesis is to develop diagnosis systems to detect and generate an alarm if the output of a sensor is faulty. Four different systems are developed, three machine learning approaches and one model based approach. To be able to test and train models without having to go out on a field with a real harrow, a test rig is available. The test rig emulates a harrow driving on a field and the tests are designed to imitate plausible sensor errors. The models trained on and tuned to the test rig data are validated with data gathered from a real tine harrow. The validation data from the harrow reveal that the main difference between the field data and test rig data are the vibrations and the sensor heights. The test rig produces negligible amounts of vibrations whereas the vibrations on a real harrow are immense. These differences affect the performances of the models and some tuning have to be done to the models to accommodate for the vibrations. The performance of the model based approach is good and no larger adjustments have to be made to it. The machine learning models created from the test rig data do not work in the field and new models are trained using field data. The new models are accurate and show great potential; albeit, it would be necessary to collect a lot more data for further training. Specifically, training the machine learning models on varying heights. In conclusion, the test rig data is similar to the field data but the vibrations in the system is missing and the heights differ. The missing vibrations results in that the models do not work as intended on field data. The conventional diagnostics approach works, but the generated alarms are binary meaning that the alarm only reveal if the signal is good or bad and does not provide any nuance. The machine learning models does provide nuance, meaning that the model can detect errors, what is causing the error, and warn if an error is about to occur. However, the machine learning models need a lot of data to train on to make this happen.
|
247 |
Klassificering av transkriberade telefonsamtal med Support Vector Machines för ökad effektivitet inom vården / Classification of transcribed telephone calls with support vector machines for increased efficiency in healthcareHöglind, Sanna, Sundström, Emelie January 2019 (has links)
Patientnämndens förvaltning i Stockholm tar årligen emot tusentals samtal som önskar framföra klagomål på vården i Region Stockholm. Syftet med arbetet är att undersöka hur en NLP-robot för klassificering av inkomna klagomål skulle kunna bidra till en ökad effektivitet av verksamheten. Klassificeringen av klagomålen har utförts med hjälp av en metod baserad på Support Vector Machines. För att optimera modellens korrekthet undersöktes hur längden av ordvektorerna påverkar korrektheten. Modellen gav en slutgiltig korrekthet 53,10 %. Detta resultat analyserades sedan med målsättningen att identifiera potentiella förbättringsmöjligheter hos modellen. För framtida arbeten kan det därför vara intressant att undersöka hur antalet samtal, antalet personer som spelar in samtal och klassfördelningen i datamängden påverkar korrektheten. För att undersöka hur effektiviteten hos Patientnämndens förvaltning i Stockholm skulle påverkas av implementeringen av en NLP-robot användes en SWOT-analys. Denna analys visade på tydliga fördelar med automatisering av klagomålshanteringen, men att en sådan implementation måste ske med försiktighet där det säkerställs att tillgången på kompetens är tillräcklig för att förebygga potentiella hot. / Every year Patientnämnden recieves thousands of phone calls from patients wishing to make complaints about the health care in Stockholm. The aim of this work is to investigate how an NLP-robot for classification of recieved phone calls would contribute to an increased efficiency of the operation. The classification of the complaints has been made using a method based on Support Vector Machines. In order to optimize the accuracy of the model the impact of the length of the word vector has been investigated. The final result was an accuracy of 53.10%. The result was analyzed with the goal to identify potential opportunities of improvement of the model. For future work it could be interesting to investigate in how the number of calls, the number of people recording the calls and the distribution between the classes affect the accuracy A SWOT-analysis was performed in order to investigate in how the efficiency of Patientnämnden would be affected by the implementation of an NLP-robot. The analysis showed apparent benefits of automation of complaint management, but also that such an implementation must be done with great caution in order to be able to ensure that the available competence is high enough to prevent potential threats.
|
248 |
Churn prediction using time series data / Prediktion av kunduppsägelser med hjälp av tidsseriedataGranberg, Patrick January 2020 (has links)
Customer churn is problematic for any business trying to expand their customer base. The acquisition of new customers to replace churned ones are associated with additional costs, whereas taking measures to retain existing customers may prove more cost efficient. As such, it is of interest to estimate the time until the occurrence of a potential churn for every customer in order to take preventive measures. The application of deep learning and machine learning to this type of problem using time series data is relatively new and there is a lot of recent research on this topic. This thesis is based on the assumption that early signs of churn can be detected by the temporal changes in customer behavior. Recurrent neural networks and more specifically long short-term memory (LSTM) and gated recurrent unit (GRU) are suitable contenders since they are designed to take the sequential time aspect of the data into account. Random forest (RF) and stochastic vector machine (SVM) are machine learning models that are frequently used in related research. The problem is solved through a classification approach, and a comparison is done with implementations using LSTM, GRU, RF, and SVM. According to the results, LSTM and GRU perform similarly while being slightly better than RF and SVM in the task of predicting customers that will churn in the coming six months, and that all models could potentially lead to cost savings according to simulations (using non-official but reasonable costs assigned to each prediction outcome). Predicting the time until churn is a more difficult problem and none of the models can give reliable estimates, but all models are significantly better than random predictions. / Kundbortfall är problematiskt för företag som försöker expandera sin kundbas. Förvärvandet av nya kunder för att ersätta förlorade kunder är associerat med extra kostnader, medan vidtagandet av åtgärder för att behålla kunder kan visa sig mer lönsamt. Som så är det av intresse att för varje kund ha pålitliga tidsestimat till en potentiell uppsägning kan tänkas inträffa så att förebyggande åtgärder kan vidtas. Applicerandet av djupinlärning och maskininlärning på denna typ av problem som involverar tidsseriedata är relativt nytt och det finns mycket ny forskning kring ämnet. Denna uppsats är baserad på antagandet att tidiga tecken på kundbortfall kan upptäckas genom kunders användarmönster över tid. Reccurent neural networks och mer specifikt long short-term memory (LSTM) och gated recurrent unit (GRU) är lämpliga modellval eftersom de är designade att ta hänsyn till den sekventiella tidsaspekten i tidsseriedata. Random forest (RF) och stochastic vector machine (SVM) är maskininlärningsmodeller som ofta används i relaterad forskning. Problemet löses genom en klassificeringsapproach, och en jämförelse utförs med implementationer av LSTM, GRU, RF och SVM. Resultaten visar att LSTM och GRU presterar likvärdigt samtidigt som de presterar bättre än RF och SVM på problemet om att förutspå kunder som kommer att säga upp sig inom det kommande halvåret, och att samtliga modeller potentiellt kan leda till kostnadsbesparingar enligt simuleringar (som använder icke-officiella men rimliga kostnader associerat till varje utfall). Att förutspå tid till en kunduppsägning är ett svårare problem och ingen av de framtagna modellerna kan ge pålitliga tidsestimat, men alla är signifikant bättre än slumpvisa gissningar.
|
249 |
Predicting SNI Codes from Company Descriptions : A Machine Learning SolutionLindholm, Erik, Nilsson, Jonas January 2023 (has links)
This study aims to develop an automated solution for assigning area of industry codes to businesses based on the contents of their business descriptions. The Swedish standard industrial classification (SNI) is a system used by Statistics Sweden (SCB) for categorizing businesses for their statistics reports. Assignment of SNI codes has so far been done manually by the person registering a new company, but this is a far from optimal solution. Some of the 88 main group areas of industry are hard to tell apart from one another, and this often leads to incorrect assignments. Our approach to this problem was to train a machine learning model using the Naive Bayes and SVM classifier algorithms and conduct an experiment. In 2019, Dahlqvist and Strandlund had attempted this and reached an accuracy score of 52 percent by use of the gradient boosting classifier, but this was considered too low for real-world implementation. Our main goal was to achieve a higher accuracy than that of Dahlqvist and Strandlund, which we eventually succeeded in - our best-performing SVM model reached a score of 60.11 percent. Similarly to Dahlqvist and Strandlund, we concluded that the low quality of the dataset was the main obstacle for achieving higher scores. The dataset we used was severely imbalanced, and much time was spent on investigating and applying oversampling and undersampling as strategies for mitigating this problem. However, we found during the testing phase that none of these strategies had any positive effect on the accuracy scores.
|
250 |
SVM Classification and Analysis of Margin Distance on Microarray DataShaik Abdul, Ameer Basha 16 June 2011 (has links)
No description available.
|
Page generated in 0.0275 seconds