1 |
Prediction Of Protein Subcellular Localization Based On Primary Sequence DataOzarar, Mert 01 January 2003 (has links) (PDF)
Subcellular localization is crucial for determining the functions of proteins.
A system called prediction of protein subcellular localization (P2SL) that predicts
the subcellular localization of proteins in eukaryotic organisms based on the
amino acid content of primary sequences using amino acid order is designed.
The approach for prediction is to nd the most frequent motifs for each protein
in a given class based on clustering via self organizing maps and then
to use these most frequent motifs as features for classication by the help of
multi layer perceptrons. This approach allows a classication independent
of the length of the sequence. In addition to these, the use of a new encoding
scheme is described for the amino acids that conserves biological function
based on point of accepted mutations (PAM) substitution matrix. The statistical
test results of the system is presented on a four class problem. P2SL achieves
slightly higher prediction accuracy than the similar studies.
|
2 |
Analysis of Three-Way Data and Other Topics in Clustering and ClassificationGallaugher, Michael Patrick Brian January 2020 (has links)
Clustering and classification is the process of finding underlying group structure in heterogenous data. With the rise of the “big data” phenomenon, more complex data structures have made it so traditional clustering methods are oftentimes not advisable or feasible. This thesis presents methodology for analyzing three different examples of these more complex data types. The first is three-way (matrix variate) data, or data that come in the form of matrices. A large emphasis is placed on clustering skewed three-way data, and high dimensional three-way data. The second is click- stream data, which considers a user’s internet search patterns. Finally, co-clustering methodology is discussed for very high-dimensional two-way (multivariate) data. Parameter estimation for all these methods is based on the expectation maximization (EM) algorithm. Both simulated and real data are used for illustration. / Thesis / Doctor of Philosophy (PhD)
|
3 |
Discovery of retinal biomarkers for vascular conditions through advancement of artery-vein detection and fractal analysisRelan, Devanjali January 2016 (has links)
Research into automatic retina image analysis has become increasingly important, not just in ophthalmology but also in other clinical specialities such as cardiology and neurology. In the retina, blood vessels can be directly visualised non-invasively in-vivo, and hence it serves as a "window" to cardiovascular and neurovascular complications. Biomarker research, i.e. investigating associations between the morphology of the retinal vasculature (as a means of revealing microvascular health or disease) and particular conditions affecting the body or brain could play an important role in detecting disease early enough to impact on patient treatment and care. A fundamental requirement of biomarker research is access to large datasets to achieve sufficient power and significance when ascertaining associations between retinal measures and clinical characterisation of disease. Crucially, the vascular changes that appear can affect arteries and veins differently. An essential part of automatic systems for retinal morphology quantification and biomarker extraction is, therefore, a computational method for classifying vessels into arteries and veins. Artery-vein classification enables the efficient extraction of biomarkers such as the Arteriolar to Venular Ratio, which is a well-established predictor of stroke and other cardiovascular events. While structural parameters of the retinal vasculature such as vessels calibre, branching angle, and tortuosity may individually convey some information regarding specific aspects of the health of the retinal vascular network, they do not convey a global summary of the branching pattern and its state or condition. The retinal vascular tree can be considered a fractal structure as it has a branching pattern that exhibits the property of self-similarity. Fractal analysis, therefore, provides an additional means for the quantitative study of changes to the retinal vascular network and may be of use in detecting abnormalities related to retinopathy and systemic diseases. In this thesis, new developments to fully automated retinal vessel classification and fractal analysis were explored in order to extract potential biomarkers. These novel processes were tested and validated on several datasets of retinal images acquired with fundus cameras. The major contributions of this thesis include: 1) developing a fully automated retinal blood vessel classification technique, 2) developing a fractal analysis technique that quantifies regional as well as global branching complexity, 3) validating the methods using multiple datasets, and 4) applying the proposed methods in multiple retinal vasculature analysis studies.
|
4 |
Integrating network analysis and data mining techniques into effective framework for Web mining and recommendation : a framework for Web mining and recommendationNagi, Mohamad January 2015 (has links)
The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.
|
5 |
Fuzzer Test Log Analysis Using Machine Learning : Framework to analyze logs and provide feedback to guide the fuzzerYadav, Jyoti January 2018 (has links)
In this modern world machine learning and deep learning have become popular choice for analysis and identifying various patterns on data in large volumes. The focus of the thesis work has been on the design of the alternative strategies using machine learning to guide the fuzzer in selecting the most promising test cases. Thesis work mainly focuses on the analysis of the data by using machine learning techniques. A detailed analysis study and work is carried out in multiple phases. First phase is targeted to convert the data into suitable format(pre-processing) so that necessary features can be extracted and fed as input to the unsupervised machine learning algorithms. Machine learning algorithms accepts the input data in form of matrices which represents the dimensionality of the extracted features. Several experiments and run time benchmarks have been conducted to choose most efficient algorithm based on execution time and results accuracy. Finally, the best choice has been implanted to get the desired result. The second phase of the work deals with applying supervised learning over clustering results. The final phase describes how an incremental learning model is built to score the test case logs and return their score in near real time which can act as feedback to guide the fuzzer. / I denna moderna värld har maskininlärning och djup inlärning blivit populärt val för analys och identifiering av olika mönster på data i stora volymer. Uppsatsen har fokuserat på utformningen av de alternativa strategierna med maskininlärning för att styra fuzzer i valet av de mest lovande testfallen. Examensarbete fokuserar huvudsakligen på analys av data med hjälp av maskininlärningsteknik. En detaljerad analysstudie och arbete utförs i flera faser. Första fasen är inriktad på att konvertera data till lämpligt format (förbehandling) så att nödvändiga funktioner kan extraheras och matas som inmatning till de oövervakade maskininlärningsalgoritmerna. Maskininlärningsalgoritmer accepterar ingångsdata i form av matriser som representerar dimensionen av de extraherade funktionerna. Flera experiment och körtider har genomförts för att välja den mest effektiva algoritmen baserat på exekveringstid och resultatnoggrannhet. Slutligen har det bästa valet implanterats för att få önskat resultat. Den andra fasen av arbetet handlar om att tillämpa övervakat lärande över klusterresultat. Slutfasen beskriver hur en inkrementell inlärningsmodell är uppbyggd för att få poäng i testfallsloggarna och returnera poängen i nära realtid vilket kan fungera som feedback för att styra fuzzer.
|
6 |
Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and RecommendationNagi, Mohamad January 2015 (has links)
The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.
|
7 |
A framework for processing correlated probabilistic datavan Schaik, Sebastiaan Johannes January 2014 (has links)
The amount of digitally-born data has surged in recent years. In many scenarios, this data is inherently uncertain (or: probabilistic), such as data originating from sensor networks, image and voice recognition, location detection, and automated web data extraction. Probabilistic data requires novel and different approaches to data mining and analysis, which explicitly account for the uncertainty and the correlations therein. This thesis introduces ENFrame: a framework for processing and mining correlated probabilistic data. Using this framework, it is possible to express both traditional and novel algorithms for data analysis in a special user language, without having to explicitly address the uncertainty of the data on which the algorithms operate. The framework will subsequently execute the algorithm on the probabilistic input, and perform exact or approximate parallel probability computation. During the probability computation, correlations and provenance are succinctly encoded using probabilistic events. This thesis contains novel contributions in several directions. An expressive user language – a subset of Python – is introduced, which allows a programmer to implement algorithms for probabilistic data without requiring knowledge of the underlying probabilistic model. Furthermore, an event language is presented, which is used for the probabilistic interpretation of the user program. The event language can succinctly encode arbitrary correlations using events, which are the probabilistic counterparts of deterministic user program variables. These highly interconnected events are stored in an event network, a probabilistic interpretation of the original user program. Multiple techniques for exact and approximate probability computation (with error guarantees) of such event networks are presented, as well as techniques for parallel computation. Adaptations of multiple existing data mining algorithms are shown to work in the framework, and are subsequently subjected to an extensive experimental evaluation. Additionally, a use-case is presented in which a probabilistic adaptation of a clustering algorithm is used to predict faults in energy distribution networks. Lastly, this thesis presents techniques for integrating a number of different probabilistic data formalisms for use in this framework and in other applications.
|
8 |
Data Fusion of Infrared, Radar, and Acoustics Based Monitoring SystemMirzaei, Golrokh 22 July 2014 (has links)
No description available.
|
9 |
Outliers detection in mixtures of dissymmetric distributions for data sets with spatial constraints / Détection de valeurs aberrantes dans des mélanges de distributions dissymétriques pour des ensembles de données avec contraintes spatialesPlanchon, Viviane 29 May 2007 (has links)
In the case of soil chemical analyses, frequency distributions for some elements show a dissymmetrical aspect, with a very marked spread to the right or to the left. A high frequency of extreme values is also observed and a possible mixture of several distributions, due to the presence of various soil types within a single geographical unit, is encountered. Then, for the outliers detection and the establishment of detection limits, an original outliers detection procedure has been developed; it allows estimating extreme quantiles above and under which observations are considered as outliers. The estimation of these detection limits is based on the right and the left of the distribution tails. A first estimation is realised for each elementary geographical unit to determine an appropriate truncation level. Then, a spatial classification allows creating adjoining homogeneous groups of geographical units to estimate robust limit values based on an optimal number of observations. / Dans le cas des analyses chimiques de sols, les distributions de fréquences des résultats présentent, pour certains éléments étudiés, un caractère très dissymétrique avec un étalement très marqué à droite ou à gauche. Une fréquence importante de valeurs extrêmes est également observée et un mélange éventuel de plusieurs distributions au sein dune même entité géographique, lié à la présence de divers types de sols, peut être rencontré. Dès lors, pour la détection des valeurs aberrantes et la fixation des limites de détection, une méthode originale, permettant destimer des quantiles extrêmes au-dessus et en dessous desquelles les observations sont considérées comme aberrantes, a été élaborée. Lestimation des limites de détection est établie de manière distincte à partir des queues des distributions droite et gauche. Une première estimation par entité géographique élémentaire est réalisée afin de déterminer un niveau de troncature adéquat. Une classification spatiale permet ensuite de créer des groupes dentités homogènes contiguës, de manière à estimer des valeurs limites robustes basées sur un nombre dobservations optimal.
|
10 |
Algorithmische Bestimmung der Alterungscharakteristik von Mittelspannungskabelmuffen basierend auf diagnostischen Messwerten und Betriebsmitteldaten: Algorithmische Bestimmung der Alterungscharakteristik vonMittelspannungskabelmuffen basierend auf diagnostischen Messwerten und BetriebsmitteldatenHunold, Sven 15 December 2016 (has links)
Bei der Zustandsbewertung von Kabeln steht derzeit das Mittelspannungsnetz im Fokus der Betrachtungen. Das Mittelspannungsnetz verbindet das Hochspannungsnetz mit dem Niederspannungsnetz und nimmt damit eine besondere Bedeutung ein. Störungen in diesem Netz wirken sich direkt als Versorgungsunterbrechung auf den Letztverbraucher aus. Rund 80 bis 85 % der Versorgungsunterbrechungen resultieren aus Problemen im Mittelspannungsnetz, sodass dortige Aktivitäten den größten Hebel bei der Steigerung der Versorgungsqualität entwickeln.
Mittels Zustandsbewertung von Kabeln können verdeckte Fehler aufgedeckt oder deren Alterungszustand bestimmt werden. Nicht jeder diagnostizierte Fehler führt unmittelbar zum Ausfall. Er beschleunigt jedoch die Alterung, die letztendlich zum Ausfall führt.
Die Arbeit beschäftigt sich mit der Identifizierung von Fehlern in Mittelspannungskabelmuffen im Zusammenhang mit der Alterung, um die Restlebensdauer auszunutzen und dem Ausfall zuvorzukommen. / By evaluating the status of cables, hidden errors can be detected or their aging condition can be determined. Not every diagnosed fault leads directly to failure. However, it accelerates aging, which ultimately leads to failure.
The work deals with the identification of faults in medium-voltage cable joints in connection with aging in order to exploit the remaining life and to prevent the failure.
|
Page generated in 0.1421 seconds