191 |
Computational Methods for Solving Next Generation Sequencing ChallengesAldwairi, Tamer Ali 13 December 2014 (has links)
In this study we build solutions to three common challenges in the fields of bioinformatics through utilizing statistical methods and developing computational approaches. First, we address a common problem in genome wide association studies, which is linking genotype features within organisms of the same species to their phenotype characteristics. We specifically studied FHA domain genes in Arabidopsis thaliana distributed within Eurasian regions by clustering those plants that share similar genotype characteristics and comparing that to the regions from which they were taken. Second, we also developed a tool for calculating transposable element density within different regions of a genome. The tool is built to utilize the information provided by other transposable element annotation tools and to provide the user with a number of options for calculating the density for various genomic elements such as genes, piRNA and miRNA or for the whole genome. It also provides a detailed calculation of densities for each family and subamily of the transposable elements. Finally, we address the problem of mapping multi reads in the genome and their effects on gene expression. To accomplish this, we implemented methods to determine the statistical significance of expression values within the genes utilizing both a unique and multi-read weighting scheme. We believe this approach provides a much more accurate measure of gene expression than existing methods such as discarding multi reads completely or assigning them randomly to a set of best assignments, while also providing a better estimation of the proper mapping locations of ambiguous reads. Overall, the solutions we built in these studies provide researchers with tools and approaches that aid in solving some of the common challenges that arise in the analysis of high throughput sequence data.
|
192 |
The development and analysis of a computationally efficient data driven suit jacket fit recommendation systemBogdanov, Daniil January 2017 (has links)
In this master thesis work we design and analyze a data driven suit jacket fit recommendation system which aim to guide shoppers in the process of assessing garment fit over the web. The system is divided into two stages. In the first stage we analyze labelled customer data, train supervised learning models as to be able to predict optimal suit jacket dimensions of unseen shoppers and determine appropriate models for each suit jacket dimension. In stage two the recommendation system uses the results from stage one and sorts a garment collection from best fit to least fit. The sorted collection is what the fit recommendation system is to return. In this thesis work we propose a particular design of stage two that aim to reduce the complexity of the system but at a cost of reduced quality of the results. The trade-offs are identified and weighed against each other. The results in stage one show that simple supervised learning models with linear regression functions suffice when the independent and dependent variables align at particular landmarks on the body. If style preferences are also to be incorporated into the supervised learning models, non-linear regression functions should be considered as to account for increased complexity. The results in stage two show that the complexity of the recommendation system can be made independent from the complexity of how fit is assessed. And as technology is enabling for more advanced ways of assessing garment fit, such as 3D body scanning techniques, the proposed design of reducing the complexity of the recommendation system enables for highly complex techniques to be utilized without affecting the responsiveness of the system in run-time. / I detta masterexamensarbete designar och analyserar vi ett datadrivet rekommendationssystem för kavajer med mål att vägleda nät-handlare i deras process i att bedöma passform över internet. Systemet är uppdelat i två steg. I det första steget analyserar vi märkt data och tränar modeller i att lära sig att framställa prognoser av optimala kavajmått för shoppare som inte systemet har tidigare exponeras för. I steg två tar rekommendationssystemet resultatet ifrån steg ett och sorterar plaggkollektionen från bästa till sämsta passform. Den sorterade kollektionen är vad systemet är tänkt att retunera. I detta arbete föreslåar vi en specifik utformning gällande steg två med mål att reducera komplexiteten av systemet men till en kostnad i noggrannhet vad det gäller resultat. För- och nackdelar identifieras och vägs mot varandra. Resultatet i steg två visar att enkla modeller med linjära regressionsfunktioner räcker när de obereoende och beroende variabler sammanfaller på specifika punkter på kroppen. Om stil-preferenser också vill inkorpereras i dessa modeller bör icke-linjära regressionsfunktioner betraktas för att redogöra för den ökade komplexitet som medföljer. Resultaten i steg två visar att komplexiteten av rekommendationssystemet kan göras obereoende av komplexiteten för hur passform bedöms. Och då teknologin möjliggör för allt mer avancerade sätt att bedöma passform, såsom 3D-scannings tekniker, kan mer komplexa tekniker utnyttjas utan att påverka responstiden för systemet under körtid.
|
193 |
Identification, investigation and prediction of post-COVID phenotypes : Using Cluster analysis and Ordinal logistic regression to determine severity of post-COVIDMalmquist, Sara, Rykatkin, Oliver January 2023 (has links)
It is believed that a large number of people experience remaining symptoms after COVID-19, so-called post-COVID. The formal definition and diagnostic criteria of post-COVID have been a scientific controversy. So far, there is no reliable system for distinguishing the severity of post-COVID. This type of measurement would be helpful in future targeted therapies. Therefore, this thesis aims to evaluate the relationship between an individual’s functional status today and the symptoms present as well as identify relevant groups of post-COVID based on these 17 long-term symptoms of post-COVID. Further, to produce a model for which of these groups an individual belongs to. By using cluster analysis and ordinal logistic regression, Post-COVID Syndrome scores are produced. That is based upon both subjects who were hospitalised and those who were not, collected through a project called COMBAT post-covid. The individuals are then divided into groups based on these scores, and a prediction model is made using ordinal logistic regression and backward deletion. Three well-separated groups of post-COVID are found based on the produced scores. The prediction model indicates that the nine variables Sex, BMI, Smoking, Snuff, Heart disease, Lung disease, Diabetes, Chronic pain and Symptom severity at the onset seem important for predicting someone’s group. This study showed that the remaining symptoms affected an individual’s functional status, including self-reported working ability and general health.
|
194 |
Genetic Variations and Physiological Mechanisms Underlying Photosynthetic Capacity in Soybean (Glycine max (L.) Merrill) / ダイズの光合成能力の遺伝変異とその生理的機構に関する研究SHAMIM, MOHAMMAD JAN 26 September 2022 (has links)
京都大学 / 新制・課程博士 / 博士(農学) / 甲第24240号 / 農博第2519号 / 新制||農||1094(附属図書館) / 学位論文||R4||N5411(農学部図書室) / 京都大学大学院農学研究科農学専攻 / (主査)教授 白岩 立彦, 教授 土井 元章, 教授 那須田 周平 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
|
195 |
Klusteranalys : Tillämpning av agglomerativ hierarkisk och k-means klustring för att hitta bra kluster bland fotbollsspelare baserat på spelarstatistik.Balbas, Sacko, Törnquist, Arvid January 2024 (has links)
This work is about how the multivariate analysis tool cluster analysis can be appliedto find meaningfull groups of players based on player statistics. The aim of the work isan attempt to find good clusters among players within the Spanish top football divisionLa Liga for the 2022-2023 season. A comparison between agglomerative hierarchical and k-means has been applied as a method to answer the purpose. The result of the workshowed that no good clusters could be identified among the players based on playerstatistics from La Liga season 22-23.
|
196 |
Analysis of Transactional Data with Long Short-Term Memory Recurrent Neural NetworksNawaz, Sabeen January 2020 (has links)
An issue authorities and banks face is fraud related to payments and transactions where huge monetary losses occur to a party or where money laundering schemes are carried out. Previous work in the field of machine learning for fraud detection has addressed the issue as a supervised learning problem. In this thesis, we propose a model which can be used in a fraud detection system with transactions and payments that are unlabeled. The proposed modelis a Long Short-term Memory in an auto-encoder decoder network (LSTMAED)which is trained and tested on transformed data. The data is transformed by reducing it to Principal Components and clustering it with K-means. The model is trained to reconstruct the sequence with high accuracy. Our results indicate that the LSTM-AED performs better than a random sequence generating process in learning and reconstructing a sequence of payments. We also found that huge a loss of information occurs in the pre-processing stages. / Obehöriga transaktioner och bedrägerier i betalningar kan leda till stora ekonomiska förluster för banker och myndigheter. Inom maskininlärning har detta problem tidigare hanterats med hjälp av klassifierare via supervised learning. I detta examensarbete föreslår vi en modell som kan användas i ett system för att upptäcka bedrägerier. Modellen appliceras på omärkt data med många olika variabler. Modellen som används är en Long Short-term memory i en auto-encoder decoder nätverk. Datan transformeras med PCA och klustras med K-means. Modellen tränas till att rekonstruera en sekvens av betalningar med hög noggrannhet. Vår resultat visar att LSTM-AED presterar bättre än en modell som endast gissar nästa punkt i sekvensen. Resultatet visar också att mycket information i datan går förlorad när den förbehandlas och transformeras.
|
197 |
An Unsupervised Machine-Learning Framework for Behavioral Classification from Animal-Borne AccelerometersDentinger, Jane Elizabeth 03 May 2019 (has links)
Studies of animal spatial distributions typically use prior knowledge of animal habitat requirements and behavioral ecology to deduce the most likely explanations of observed habitat use. Animal-borne accelerometers can be used to distinguish behaviors which allows us to incorporate in situ behavior into our understanding of spatial distributions. Past research has focused on using supervised machine-learning, which requires a priori specification of behavior to identify signals whereas unsupervised approaches allow the model to identify as many signal types as permitted by the data. The following framework couples direct observation to behavioral clusters identified from unsupervised machine learning on a large accelerometry dataset. A behavioral profile was constructed to describe the proportion of behaviors observed per cluster and the framework was applied to an acceleration dataset collected from wild pigs (Sus scrofa). Although, most clusters represented combinations of behaviors, a leave-p-out validation procedure indicated this classification system accurately predicted new data.
|
198 |
Computational Intelligence and Data Mining Techniques Using the Fire Data SetStorer, Jeremy J. 04 May 2016 (has links)
No description available.
|
199 |
How Do Socio-Demographics and The Built Environment Affect Individual Accessibility Based on Activity Space as A Transport Exclusion Indicator?Chen, Na 08 November 2016 (has links)
No description available.
|
200 |
The Energy Efficiency Model of a DC Motor for the Control of HEVs / Energieffektivitetsmodellen för en likströmsmotor för styrning av HEVCAI, JIACHENG January 2020 (has links)
This thesis studies a DC motor for a racing hybrid electric vehicle (HEV) prototype.The development of optimization-based energy management strategies (EMS) necessitates an accurate quasi-static model of the driving motor, which includes a 2D efficiency map with the torque outputand rotating speed as the inputs. However, a DC motor's efficiency varies a lot at differentoperating points and the efficiency map from the technical manual does not match the various applications in reality.In view of this, this thesis investigates a field testing based quasi-static modeling method to construct the DC motor efficiency map with only portable and brief testing resources. Firstly, a testbench is designed, manufactured, integrated, and configured with necessary accessories. The testbench consists of the motor under test, a braking motor to provide load torque, a servo-amplifier for torque control and sensing, a host computer for data acquisition, and power supplies. Then, a self-contained testing plan is designed by which as many as possible different testing points can be covered based on the braking motor's power limit. After that, the experiments are successively performed on the test bench, and the input electric power along with the output mechanical power at steady state are recorded. Multiple data process methods are explored to analyze the collected testing data. Root mean square (RMS) is used to reduce the measuring variance. Invalid outliers are identified and filtered out based on the residuals. The qualified samples are employed to build up the 2D efficiency map by fourth-degree polynomial regression. Then, three methods, linear, quadratic, and cubic fittings are attempted separately to estimate the relationships between the input power and output torque at different speeds. The results show that the quadratic model is the best option which results in smaller root mean square error (RMSE) and fair computation complexity. To conclude, the quasi-static dynamic model of a DC motor, which includes a 2D efficiency map and the speed-based polynomial expression of input power, can be properly established by a new method relying on less and simpler devices in contrast to those traditional methods. This method bypasses a bulk of tedious modulations on precise motor speed control which is heavily dependent on a high-precision sensor. The formulated 2D efficiency map will effectively support the future development of model-based EMS. The polynomial expression provides a more efficient approach to estimate instantaneous energy efficiency for an embedded system application. / Denna avhandling studerar en likströmsmotor för en prototyp av ett elektriskt hybridfordon (HEV) för racing. Utvecklingen av optimeringsbaserade energihanteringsstrategier (EMS) kräver en precis kvasistatisk dynamisk modell av den drivande motorn, som inkluderar en en 2D-karta (effektivetetskarta) som beskriver hur verkningsgraden beror på moment och rotationshastighet. Verkningsgraden hos likströmsmotorn varierar dock mycket beroende på arbetspunkt och verkningsgradskartan från databladen stämmer inte alltid med de olika applikationerna i verkligheten. Givet detta undersöker denna avhandling en fältprovsbaserad kvasistatisk modelleringsmetod för att uppskatta likströmsmotorns effektivitetskarta med endast flyttbara och begränsade testresurser. Till att börja med är en testbänk designad, tillverkad, integrerad och konfigurerad med alla nödvändiga komponenter. Testbänken består av den motor som testas, en bromsmotor för att ge belastningsmoment, en servoförstärkare för vridmomentstyrning och mätning, samt en dator för datainsamling och strömförsörjning. Sedan utformas en fristående testplan som gör att så många olika testpunkter som möjligt kan täckas, baserat på bromsmotorn effektgräns. Därefter utförs experimenten successivt på testbänken där ingående elektrisk effekt och utgående mekanisk effekt mäts i jämviktsläget. Flera olika metoder undersöks för att analysera den insamlade testdatan. Kvadratiskt medelvärde används för att minska variansen i testdatan. Ogiltiga outliers identifieras och filtreras ut baserat på hur mycket de avviker från medelvärdet. De godkända testpunkterna används för att bygga upp 2D-effektivitetskartan genom en fjärde gradens polynom regression. Därefter används tre olika metoder, linjära, kvadratiska och kubiska för att skapa kurvanpassningar genom polynomregression för att beskriva sambandet mellan ingångseffekt och utgångseffekt vid olika hastigheter. Resultaten visar att den kvadratiska metoden är det bästa alternativet eftersom det ger en mindre medelkvadratavvikelse och en hanterbar beräkningskomplexitet. Avslutningsvis kan den kvasistatiska dynamiska modellen för en likströmsmotor, som inkluderar en 2D-effektivitetskarta med det hastighetsbaserade polynomuttrycket för ingångseffekt, skapas av en ny metod som förliter sig på mindre och enklare materiel än traditionella metoder. Denna metod kringår en stor del av den omständiga modulering som precis varvtalsstyrning kräver vilken även är väldigt beroende på högprecisionssensorer. Den formulerade 2D-effektivitetskartan kommer ge betydande stöd till framtida utveckling av modelbaserade energihanteringsstrategier 2 (EMS). Polynomuttrycket ger ett mer effektivt tillvägagångssätt för att uppskatta omedelbar energieffektivitet för en inbäddad systemapplikation.
|
Page generated in 0.0496 seconds