Global ETD Search

471	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.
472	Detecting Swiching Points and Mode of Transport from GPS Tracks Araya, Yeheyies January 2012 (has links) In recent years, various researches are under progress to enhance the quality of the travel survey. These researches were mainly performed with the aid of GPS technology. Initially the researches were mainly focused on the vehicle travel mode due to the availability of GPS technology in vehicle. But, nowadays due to the accessible of GPS devices for personal uses, researchers have diverted their focus on personal mobility in all travel modes. This master’s thesis aimed at developing a mechanism to extract one type of travel survey information particularly travel mode from collected GPS dataset. The available GPS dataset is collected for travel modes of walk, bike, car, and public transport travel modes such as bus, train and subway. The developed procedure consists of two stages where the first is the dividing the track trips into trips and further the trips into segments by means of a segmentation process. The segmentation process is based on an assumption that a traveler switches from one transportation mode to the other. Thus, the trips are divided into walking and non walking segments. The second phase comprises a procedure to develop a classification model to infer the separated segments with travel modes of walk, bike, bus, car, train and subway. In order to develop the classification model, a supervised classification method has been used where decision tree algorithm is adopted. The highest obtained prediction accuracy of the classification system is walk travel mode with 75.86%. In addition, the travel modes of bike and bus have shown the lowest prediction accuracy. Moreover, the developed system has showed remarkable results that could be used as baseline for further similar researches. Travel demand model Supervised classification model Decision tree Data mining Artificial intelligence Inferring travel modes GPS in travel survey
473	Unsupervised hidden Markov model for automatic analysis of expressed sequence tags Alexsson, Andrei January 2011 (has links) This thesis provides an in-depth analyze of expressed sequence tags (EST) that represent pieces of eukaryotic mRNA by using unsupervised hidden Markov model (HMM). ESTs are short nucleotide sequences that are used primarily for rapid identificationof new genes with potential coding regions (CDS). ESTs are made by sequencing on double-stranded cDNA and the synthesizedESTs are stored in digital form, usually in FASTA format. Since sequencing is often randomized and that parts of mRNA contain non-coding regions, some ESTs will not represent CDS.It is desired to remove these unwanted ESTs if the purpose is to identifygenes associated with CDS. Application of stochastic HMM allow identification of region contents in a EST. Softwares like ESTScanuse HMM in which a training of the HMM is done by supervised learning with annotated data. However, because there are not always annotated data at hand this thesis focus on the ability to train an HMM with unsupervised learning on data containing ESTs, both with and without CDS. But the data used for training is not annotated, i.e. the regions that an EST consists of are unknown. In this thesis a new HMM is introduced where the parameters of the HMM are in focus so that they are reasonablyconsistent with biologically important regionsof an mRNA such as the Kozak sequence, poly(A)-signals and poly(A)-tails to guide the training and decoding correctly with ESTs to proper statesin the HMM. Transition probabilities in the HMMhas been adapted so that it represents the mean length and distribution of the different regions in mRNA. Testing of the HMM's specificity and sensitivityhave been performed via BLAST by blasting each EST and compare the BLAST results with the HMM prediction results.A regression analysis test shows that the length of ESTs used when training the HMM is significantly important, the longer the better. The final resultsshows that it is possible to train an HMM with unsupervised machine learning but to be comparable to supervised machine learning as ESTScan, further expansion of the HMM is necessary such as frame-shift correction of ESTs byimproving the HMM's ability to choose correctly positioned start codons or nucleotides. Usually the false positive results are because of incorrectly positioned start codons leadingto too short CDS lengths. Since no frame-shift correction is implemented, short predicted CDS lengths are not acceptable and is hence not counted as coding regionsduring prediction. However, when there is a lack of supervised models then unsupervised HMM is a potential replacement with stable performance and able to be adapted forany eukaryotic organism. Machine learning Markov Model Hidden Markov Model Expressed sequence tag EST Baum-Welch 1-best Unsupervised Supervised GHMM Bioinformatics Bioinformatik
474	Representing and Recognizing Temporal Sequences Shi, Yifan 15 August 2006 (has links) Activity recognition falls in general area of pattern recognition, but it resides mainly in temporal domain which leads to distinctive characteristics. We provide an extensive survey over existing tools including FSM, HMM, BNT, DBN, SCFG and Symbolic Network Approach (PNF-network). These tools are inefficient to meet many of the requirements of activity recognition, leading to this work to develop a new graphical model: Propagation Net (P-Net). Many activities can be represented by a partially ordered set of temporal intervals, each of which corresponds to a primitive motion. Each interval has both temporal and logical constraints that control the duration of the interval and its relationship with other intervals. P-Net takes advantage of such fundamental constraints that it provides an graphical conceptual model to describe the human knowledge and an efficient computational model to facilitate recognition and learning. P-Nets define an exponentially large joint distribution that standard bayesian inference cannot handle. We devise two approximation algorithms to interpret a multi-dimensional observation sequence of evidence as a multi-stream propagation process through P-Net. First, Local Maximal Search Algorithm (LMSA) is constructed with polynomial complexity; Second, we introduce a particle filter based framework, Discrete Condensation (D-Condensation) algorithm, which samples the discrete state space more efficiently then original Condensation. To construct a P-Net based system, we need two parts: P-Net and the corresponding detector set. Given topology information and detector library, P-Net parameters can be extracted easily from a relatively small number of positive examples. To avoid the tedious process of manually constructing the detector library, we introduce semi-supervised learning framework to build P-Net and the corresponding detectors together. Furthermore, we introduce the Contrast Boosting algorithm that forces the detectors to be as different as possible but not necessary to be non-overlapping. The classification and learning ability of P-Nets are verified on three data sets: 1)vision tracked indoor activity data set; 2)vision tracked glucose monitor calibration data set; 3)sensor data set on simple weight-lifting exercise. Comparison with standard SCFG and HMM prove a P-Net based system is easier to construct and has a superior ability to classify complex human activity and detect anomaly. Temporal sequence model Propagation net Activity recognition Graphical model Semi-supervised learning Simulation methods Pattern recognition systems
475	Support vector classification analysis of resting state functional connectivity fMRI Craddock, Richard Cameron 17 November 2009 (has links) Since its discovery in 1995 resting state functional connectivity derived from functional MRI data has become a popular neuroimaging method for study psychiatric disorders. Current methods for analyzing resting state functional connectivity in disease involve thousands of univariate tests, and the specification of regions of interests to employ in the analysis. There are several drawbacks to these methods. First the mass univariate tests employed are insensitive to the information present in distributed networks of functional connectivity. Second, the null hypothesis testing employed to select functional connectivity dierences between groups does not evaluate the predictive power of identified functional connectivities. Third, the specification of regions of interests is confounded by experimentor bias in terms of which regions should be modeled and experimental error in terms of the size and location of these regions of interests. The objective of this dissertation is to improve the methods for functional connectivity analysis using multivariate predictive modeling, feature selection, and whole brain parcellation. A method of applying Support vector classification (SVC) to resting state functional connectivity data was developed in the context of a neuroimaging study of depression. The interpretability of the obtained classifier was optimized using feature selection techniques that incorporate reliability information. The problem of selecting regions of interests for whole brain functional connectivity analysis was addressed by clustering whole brain functional connectivity data to parcellate the brain into contiguous functionally homogenous regions. This newly developed famework was applied to derive a classifier capable of correctly seperating the functional connectivity patterns of patients with depression from those of healthy controls 90% of the time. The features most relevant to the obtain classifier match those previously identified in previous studies, but also include several regions not previously implicated in the functional networks underlying depression. Machine learning Support vector classification FMRI Biological signal processing Supervised learning (Machine learning) Support vector machines Magnetic resonance imaging
476	Estimation of glottal source features from the spectral envelope of the acoustic speech signal Torres, Juan Félix 17 May 2010 (has links) Speech communication encompasses diverse types of information, including phonetics, affective state, voice quality, and speaker identity. From a speech production standpoint, the acoustic speech signal can be mainly divided into glottal source and vocal tract components, which play distinct roles in rendering the various types of information it contains. Most deployed speech analysis systems, however, do not explicitly represent these two components as distinct entities, as their joint estimation from the acoustic speech signal becomes an ill-defined blind deconvolution problem. Nevertheless, because of the desire to understand glottal behavior and how it relates to perceived voice quality, there has been continued interest in explicitly estimating the glottal component of the speech signal. To this end, several inverse filtering (IF) algorithms have been proposed, but they are unreliable in practice because of the blind formulation of the separation problem. In an effort to develop a method that can bypass the challenging IF process, this thesis proposes a new glottal source information extraction method that relies on supervised machine learning to transform smoothed spectral representations of speech, which are already used in some of the most widely deployed and successful speech analysis applications, into a set of glottal source features. A transformation method based on Gaussian mixture regression (GMR) is presented and compared to current IF methods in terms of feature similarity, reliability, and speaker discrimination capability on a large speech corpus, and potential representations of the spectral envelope of speech are investigated for their ability represent glottal source variation in a predictable manner. The proposed system was found to produce glottal source features that reasonably matched their IF counterparts in many cases, while being less susceptible to spurious errors. The development of the proposed method entailed a study into the aspects of glottal source information that are already contained within the spectral features commonly used in speech analysis, yielding an objective assessment regarding the expected advantages of explicitly using glottal information extracted from the speech signal via currently available IF methods, versus the alternative of relying on the glottal source information that is implicitly contained in spectral envelope representations. Inverse filtering Glottal waveform Voice source Speech processing Glottalization (Phonetics) Speech synthesis Machine learning Supervised learning (Machine learning)
477	Stochastic m-estimators: controlling accuracy-cost tradeoffs in machine learning Dillon, Joshua V. 15 November 2011 (has links) m-Estimation represents a broad class of estimators, including least-squares and maximum likelihood, and is a widely used tool for statistical inference. Its successful application however, often requires negotiating physical resources for desired levels of accuracy. These limiting factors, which we abstractly refer as costs, may be computational, such as time-limited cluster access for parameter learning, or they may be financial, such as purchasing human-labeled training data under a fixed budget. This thesis explores these accuracy- cost tradeoffs by proposing a family of estimators that maximizes a stochastic variation of the traditional m-estimator. Such "stochastic m-estimators" (SMEs) are constructed by stitching together different m-estimators, at random. Each such instantiation resolves the accuracy-cost tradeoff differently, and taken together they span a continuous spectrum of accuracy-cost tradeoff resolutions. We prove the consistency of the estimators and provide formulas for their asymptotic variance and statistical robustness. We also assess their cost for two concerns typical to machine learning: computational complexity and labeling expense. For the sake of concreteness, we discuss experimental results in the context of a variety of discriminative and generative Markov random fields, including Boltzmann machines, conditional random fields, model mixtures, etc. The theoretical and experimental studies demonstrate the effectiveness of the estimators when computational resources are insufficient or when obtaining additional labeled samples is necessary. We also demonstrate that in some cases the stochastic m-estimator is associated with robustness thereby increasing its statistical accuracy and representing a win-win. Approximate inference Semi-supervised learning Parameter learning Structured prediction Graphical models Machine learning Computational complexity Markov random fields
478	Novel document representations based on labels and sequential information Kim, Seungyeon 21 September 2015 (has links) A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major challenges: sparsity and sequentiality. The sparsity often causes high estimation error, and text's sequential nature, interdependency between words, causes even more complication. This thesis presents novel document representations to overcome the two challenges. First, I employ label characteristics to estimate a compact document representation. Because label attributes implicitly describe the geometry of dense subspace that has substantial impact, I can effectively resolve the sparsity issue while only focusing the compact subspace. Second, while modeling a document as a joint or conditional distribution between words and their sequential information, I can efficiently reflect sequential nature of text in my document representations. Lastly, the thesis is concluded with a document representation that employs both labels and sequential information in a unified formulation. The following four criteria are utilized to evaluate the goodness of representations: how close a representation is to its original data, how strongly a representation can be distinguished from each other, how easy to interpret a representation by a human, and how much computational effort is needed for a representation. While pursuing those good representation criteria, I was able to obtain document representations that are closer to the original data, stronger in discrimination, and easier to be understood than traditional document representations. Efficient computation algorithms make the proposed approaches largely scalable. This thesis examines emotion prediction, temporal emotion analysis, modeling documents with edit histories, locally coherent topic modeling, and text categorization tasks for possible applications. Representation learning Topic modeling Supervised learning Sequential document modeling Sentiment analysis Mood analysis Matrix factorization Machine learning Artificial intelligence
479	Answering complex questions : supervised approaches Sadid-Al-Hasan, Sheikh, University of Lethbridge. Faculty of Arts and Science January 2009 (has links) The term “Google” has become a verb for most of us. Search engines, however, have certain limitations. For example ask it for the impact of the current global financial crisis in different parts of the world, and you can expect to sift through thousands of results for the answer. This motivates the research in complex question answering where the purpose is to create summaries of large volumes of information as answers to complex questions, rather than simply offering a listing of sources. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, this task is accomplished by the query-focused multidocument summarization systems. In this thesis we apply different supervised learning techniques to confront the complex question answering problem. To run our experiments, we consider the DUC-2007 main task. A huge amount of labeled data is a prerequisite for supervised training. It is expensive and time consuming when humans perform the labeling task manually. Automatic labeling can be a good remedy to this problem. We employ five different automatic annotation techniques to build extracts from human abstracts using ROUGE, Basic Element (BE) overlap, syntactic similarity measure, semantic similarity measure and Extended String Subsequence Kernel (ESSK). The representative supervised methods we use are Support Vector Machines (SVM), Conditional Random Fields (CRF), Hidden Markov Models (HMM) and Maximum Entropy (MaxEnt). We annotate DUC-2006 data and use them to train our systems, whereas 25 topics of DUC-2007 data set are used as test data. The evaluation results reveal the impact of automatic labeling methods on the performance of the supervised approaches to complex question answering. We also experiment with two ensemble-based approaches that show promising results for this problem domain. / x, 108 leaves : ill. ; 29 cm Supervised learning (Machine learning) Semantic computing Computational linguistics Information retrieval Dissertations, Academic
480	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.

Search results