• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 737
  • 173
  • 83
  • 60
  • 59
  • 23
  • 20
  • 18
  • 10
  • 9
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 1529
  • 301
  • 289
  • 286
  • 234
  • 194
  • 175
  • 146
  • 127
  • 123
  • 122
  • 111
  • 111
  • 92
  • 90
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
431

Improving Scalability And Efficiency Of Ilp-based And Graph-based Concept Discovery Systems

Mutlu, Alev 01 July 2013 (has links) (PDF)
Concept discovery is the problem of finding definitions of target relation in terms or other relation given as a background knowledge. Inductive Logic Programming (ILP)-based and graph-based approaches are two competitors in concept discovery problem. Although ILP-based systems have long dominated the area, graph-based systems have recently gained popularity as they overcome certain shortcomings of ILP-based systems. While having applications in numerous domains, ILP-based concept discovery systems still sustain scalability and efficiency issues. These issues generally arose due to the large search spaces such systems build. In this work we propose memoization-based and parallelization-based methods that modify the search space construction step and the evaluation step of ILP-based concept discovery systems to overcome these problem. In this work we propose three memoization-based methods, called Tabular CRIS, Tabular CRIS-wEF, and Selective Tabular CRIS. In these methods, basically, evaluation queries are stored in look-up tables for later uses. While preserving some core functions in common, each proposed method improves e_ciency and scalability of its predecessor by introducing constraints on what kind of evaluation queries to store in look-up tables and for how long. The proposed parallelization method, called pCRIS, parallelizes the search space construction and evaluation steps of ILP-based concept discovery systems in a data-parallel manner. The proposed method introduces policies to minimize the redundant work and waiting time among the workers at synchronization points. Graph-based approaches were first introduced to the concept discovery domain to handle the so called local plateau problem. Graph-based approaches have recently gained more popularity in concept discovery system as they provide convenient environment to represent relational data and are able to overcome certain shortcomings of ILP-based concept discovery systems. Graph-based approaches can be classified as structure-based approaches and path-finding approaches. The first class of approaches need to employ expensive algorithms such as graph isomorphism to find frequently appearing substructures. The methods that fall into the second class need to employ sophisticated indexing mechanisms to find out the frequently appearing paths that connect some nodes in interest. In this work, we also propose a hybrid method for graph-based concept discovery which does not require costly substructure matching algorithms and path indexing mechanism. The proposed method builds the graph in such a way that similar facts are grouped together and paths that eventually turn to be concept descriptors are build while the graph is constructed.
432

Investigation Of Schizophrenia Related Genes And Pathways Through Genome Wide Association Studies

Dom, Huseyin Alper 01 January 2013 (has links) (PDF)
Schizophrenia is a complex mental disorder that is commonly characterized as deterioration of intellectual process and emotional responses and affects 1% of any given population. SNPs are single nucleotide changes that take place in DNA sequences and establish the major percentage of genomic variations. In this study, our goal was to identify SNPs as genomic markers that are related with schizophrenia and investigate the genes and pathways that are identified through the analysis of SNPs. Genome wide association studies (GWAS) analyse the whole genome of case and control groups to identify genetic variations and search for related markers, like SNPs. GWASs are the most common method to investigate genetic causes of a complex disease such as v schizophrenia because regular linkage studies are not sufficient. Out of 909,622 SNPs analysis of the dbGAP Schizophrenia genotyping data identified 25,555 SNPs with a p-value 5x10-5. Next, combined p-value approach to identify associated genes and pathways and AHP based prioritization to select biologically relevant SNPs with high statistical association are used through METU-SNP software. 6,000 SNPs had an AHP score above 0.4, which mapped to 2,500 genes suggested to be associated with schizophrenia and related conditions. In addition to previously described neurological pathways, pathway and network analysis showed enrichment of two pathways. Melanogenesis and vascular smooth muscle contraction pathways were found to be highly associated with schizophrenia. We have also shown that these pathways can be organized in one biological network, which might have a role in the molecular etiology of schizophrenia. Overall analysis results revealed two novel candidate genes SOS1 and GUCY1B3 that have a possible relation with schizophrenia.
433

Design and Performance Evaluation of Service Discovery Protocols for Vehicular Networks

Abrougui, Kaouther 28 September 2011 (has links)
Intelligent Transportation Systems (ITS) are gaining momentum among researchers. ITS encompasses several technologies, including wireless communications, sensor networks, data and voice communication, real-time driving assistant systems, etc. These states of the art technologies are expected to pave the way for a plethora of vehicular network applications. In fact, recently we have witnessed a growing interest in Vehicular Networks from both the research community and industry. Several potential applications of Vehicular Networks are envisioned such as road safety and security, traffic monitoring and driving comfort, just to mention a few. It is critical that the existence of convenience or driving comfort services do not negatively affect the performance of safety services. In essence, the dissemination of safety services or the discovery of convenience applications requires the communication among service providers and service requesters through constrained bandwidth resources. Therefore, service discovery techniques for vehicular networks must efficiently use the available common resources. In this thesis, we focus on the design of bandwidth-efficient and scalable service discovery protocols for Vehicular Networks. Three types of service discovery architectures are introduced: infrastructure-less, infrastructure-based, and hybrid architectures. Our proposed algorithms are network layer based where service discovery messages are integrated into the routing messages for a lightweight discovery. Moreover, our protocols use the channel diversity for efficient service discovery. We describe our algorithms and discuss their implementation. Finally, we present the main results of the extensive set of simulation experiments that have been used in order to evaluate their performance.
434

Statistical Learning in Drug Discovery via Clustering and Mixtures

Wang, Xu January 2007 (has links)
In drug discovery, thousands of compounds are assayed to detect activity against a biological target. The goal of drug discovery is to identify compounds that are active against the target (e.g. inhibit a virus). Statistical learning in drug discovery seeks to build a model that uses descriptors characterizing molecular structure to predict biological activity. However, the characteristics of drug discovery data can make it difficult to model the relationship between molecular descriptors and biological activity. Among these characteristics are the rarity of active compounds, the large volume of compounds tested by high-throughput screening, and the complexity of molecular structure and its relationship to activity. This thesis focuses on the design of statistical learning algorithms/models and their applications to drug discovery. The two main parts of the thesis are: an algorithm-based statistical method and a more formal model-based approach. Both approaches can facilitate and accelerate the process of developing new drugs. A unifying theme is the use of unsupervised methods as components of supervised learning algorithms/models. In the first part of the thesis, we explore a sequential screening approach, Cluster Structure-Activity Relationship Analysis (CSARA). Sequential screening integrates High Throughput Screening with mathematical modeling to sequentially select the best compounds. CSARA is a cluster-based and algorithm driven method. To gain further insight into this method, we use three carefully designed experiments to compare predictive accuracy with Recursive Partitioning, a popular structureactivity relationship analysis method. The experiments show that CSARA outperforms Recursive Partitioning. Comparisons include problems with many descriptor sets and situations in which many descriptors are not important for activity. In the second part of the thesis, we propose and develop constrained mixture discriminant analysis (CMDA), a model-based method. The main idea of CMDA is to model the distribution of the observations given the class label (e.g. active or inactive class) as a constrained mixture distribution, and then use Bayes’ rule to predict the probability of being active for each observation in the testing set. Constraints are used to deal with the otherwise explosive growth of the number of parameters with increasing dimensionality. CMDA is designed to solve several challenges in modeling drug data sets, such as multiple mechanisms, the rare target problem (i.e. imbalanced classes), and the identification of relevant subspaces of descriptors (i.e. variable selection). We focus on the CMDA1 model, in which univariate densities form the building blocks of the mixture components. Due to the unboundedness of the CMDA1 log likelihood function, it is easy for the EM algorithm to converge to degenerate solutions. A special Multi-Step EM algorithm is therefore developed and explored via several experimental comparisons. Using the multi-step EM algorithm, the CMDA1 model is compared to model-based clustering discriminant analysis (MclustDA). The CMDA1 model is either superior to or competitive with the MclustDA model, depending on which model generates the data. The CMDA1 model has better performance than the MclustDA model when the data are high-dimensional and unbalanced, an essential feature of the drug discovery problem! An alternate approach to the problem of degeneracy is penalized estimation. By introducing a group of simple penalty functions, we consider penalized maximum likelihood estimation of the CMDA1 and CMDA2 models. This strategy improves the convergence of the conventional EM algorithm, and helps avoid degenerate solutions. Extending techniques from Chen et al. (2007), we prove that the PMLE’s of the two-dimensional CMDA1 model can be asymptotically consistent.
435

Aiding Human Discovery of Out-of-the-Moment Handwriting Recognition Errors

Stedman, Ryan January 2009 (has links)
Handwriting recognizers frequently misinterpret digital ink input, requiring human verification of recognizer output to identify and correct errors, before the output of the recognizer can be used with any confidence int its correctness. Technologies like Anoto pens can make this error discovery and correction task more difficult, because verification of recognizer output may occur many hours after data input, creating an ``out-of-the-moment'' verification scenario. This difficulty can increase the number of recognition errors missed by users in verification. To increase the accuracy of human verified recognizer output, methods of aiding users in the discovery of handwriting recognition errors need to be created. While this need has been recognized by the research community, no published work exists examining this problem. This thesis explores the problem of creating error discovery aids for handwriting recognition. Design possibilities for the creation of error discovery aids are explored, and concrete designs for error discovery aids are presented. Evaluations are performed on a set of these proposed discovery aids, showing that the visual proximity aid improves user performance in error discovery. Following the evaluation of the discovery aids proposed in this thesis, the one discovery aid that has been proposed in the literature, confidence highlighting, is explored in detail and its potential as a discovery aid is highlighted. A technique is then presented, complimentary to error discovery aids, to allow a system to monitor and respond to user performance in errors discovery. Finally, a set of implications are derived from the presented work for the design of verification interfaces for handwriting recognition.
436

A Requirements-Based Exploration of Open-Source Software Development Projects – Towards a Natural Language Processing Software Analysis Framework

Vlas, Radu 07 August 2012 (has links)
Open source projects do have requirements; they are, however, mostly informal, text descriptions found in requests, forums, and other correspondence. Understanding such requirements provides insight into the nature of open source projects. Unfortunately, manual analysis of natural language requirements is time-consuming, and for large projects, error-prone. Automated analysis of natural language requirements, even partial, will be of great benefit. Towards that end, I describe the design and validation of an automated natural language requirements classifier for open source software development projects. I compare two strategies for recognizing requirements in open forums of software features. The results suggest that classifying text at the forum post aggregation and sentence aggregation levels may be effective. Initial results suggest that it can reduce the effort required to analyze requirements of open source software development projects. Software development organizations and communities currently employ a large number of software development techniques and methodologies. This implied complexity is also enhanced by a wide range of software project types and development environments. The resulting lack of consistency in the software development domain leads to one important challenge that researchers encounter while exploring this area: specificity. This results in an increased difficulty of maintaining a consistent unit of measure or analysis approach while exploring a wide variety of software development projects and environments. The problem of specificity is more prominently exhibited in an area of software development characterized by a dynamic evolution, a unique development environment, and a relatively young history of research when compared to traditional software development: the open-source domain. While performing research on open source and the associated communities of developers, one can notice the same challenge of specificity being present in requirements engineering research as in the case of closed-source software development. Whether research is aimed at performing longitudinal or cross-sectional analyses, or attempts to link requirements to other aspects of software development projects and their management, specificity calls for a flexible analysis tool capable of adapting to the needs and specifics of the explored context. This dissertation covers the design, implementation, and evaluation of a model, a method, and a software tool comprising a flexible software development analysis framework. These design artifacts use a rule-based natural language processing approach and are built to meet the specifics of a requirements-based analysis of software development projects in the open-source domain. This research follows the principles of design science research as defined by Hevner et. al. and includes stages of problem awareness, suggestion, development, evaluation, and results and conclusion (Hevner et al. 2004; Vaishnavi and Kuechler 2007). The long-term goal of the research stream stemming from this dissertation is to propose a flexible, customizable, requirements-based natural language processing software analysis framework which can be adapted to meet the research needs of multiple different types of domains or different categories of analyses.
437

Distributed Search in Semantic Web Service Discovery

Ziembicki, Joanna January 2006 (has links)
This thesis presents a framework for semantic Web Service discovery using descriptive (non-functional) service characteristics in a large-scale, multi-domain setting. The framework uses Web Ontology Language for Services (OWL-S) to design a template for describing non-functional service parameters in a way that facilitates service discovery, and presents a layered scheme for organizing ontologies used in service description. This service description scheme serves as a core for desigining the four main functions of a service directory: a template-based user interface, semantic query expansion algorithms, a two-level indexing scheme that combines Bloom filters with a Distributed Hash Table, and a distributed approach for storing service description. The service directory is, in turn, implemented as an extension of the Open Service Discovery Architecture. <br /><br /> The search algorithms presented in this thesis are designed to maximize precision and completeness of service discovery, while the distributed design of the directory allows individual administrative domains to retain a high degree of independence and maintain access control to information about their services.
438

An Attempt to Automate <i>NP</i>-Hardness Reductions via <i>SO</i>&#8707; Logic

Nijjar, Paul January 2004 (has links)
We explore the possibility of automating <i>NP</i>-hardness reductions. We motivate the problem from an artificial intelligence perspective, then propose the use of second-order existential (<i>SO</i>&#8707;) logic as representation language for decision problems. Building upon the theoretical framework of J. Antonio Medina, we explore the possibility of implementing seven syntactic operators. Each operator transforms <i>SO</i>&#8707; sentences in a way that preserves <i>NP</i>-completeness. We subsequently propose a program which implements these operators. We discuss a number of theoretical and practical barriers to this task. We prove that determining whether two <i>SO</i>&#8707; sentences are equivalent is as hard as GRAPH ISOMORPHISM, and prove that determining whether an arbitrary <i>SO</i>&#8707; sentence represents an <i>NP</i>-complete problem is undecidable.
439

Statistical Learning in Drug Discovery via Clustering and Mixtures

Wang, Xu January 2007 (has links)
In drug discovery, thousands of compounds are assayed to detect activity against a biological target. The goal of drug discovery is to identify compounds that are active against the target (e.g. inhibit a virus). Statistical learning in drug discovery seeks to build a model that uses descriptors characterizing molecular structure to predict biological activity. However, the characteristics of drug discovery data can make it difficult to model the relationship between molecular descriptors and biological activity. Among these characteristics are the rarity of active compounds, the large volume of compounds tested by high-throughput screening, and the complexity of molecular structure and its relationship to activity. This thesis focuses on the design of statistical learning algorithms/models and their applications to drug discovery. The two main parts of the thesis are: an algorithm-based statistical method and a more formal model-based approach. Both approaches can facilitate and accelerate the process of developing new drugs. A unifying theme is the use of unsupervised methods as components of supervised learning algorithms/models. In the first part of the thesis, we explore a sequential screening approach, Cluster Structure-Activity Relationship Analysis (CSARA). Sequential screening integrates High Throughput Screening with mathematical modeling to sequentially select the best compounds. CSARA is a cluster-based and algorithm driven method. To gain further insight into this method, we use three carefully designed experiments to compare predictive accuracy with Recursive Partitioning, a popular structureactivity relationship analysis method. The experiments show that CSARA outperforms Recursive Partitioning. Comparisons include problems with many descriptor sets and situations in which many descriptors are not important for activity. In the second part of the thesis, we propose and develop constrained mixture discriminant analysis (CMDA), a model-based method. The main idea of CMDA is to model the distribution of the observations given the class label (e.g. active or inactive class) as a constrained mixture distribution, and then use Bayes’ rule to predict the probability of being active for each observation in the testing set. Constraints are used to deal with the otherwise explosive growth of the number of parameters with increasing dimensionality. CMDA is designed to solve several challenges in modeling drug data sets, such as multiple mechanisms, the rare target problem (i.e. imbalanced classes), and the identification of relevant subspaces of descriptors (i.e. variable selection). We focus on the CMDA1 model, in which univariate densities form the building blocks of the mixture components. Due to the unboundedness of the CMDA1 log likelihood function, it is easy for the EM algorithm to converge to degenerate solutions. A special Multi-Step EM algorithm is therefore developed and explored via several experimental comparisons. Using the multi-step EM algorithm, the CMDA1 model is compared to model-based clustering discriminant analysis (MclustDA). The CMDA1 model is either superior to or competitive with the MclustDA model, depending on which model generates the data. The CMDA1 model has better performance than the MclustDA model when the data are high-dimensional and unbalanced, an essential feature of the drug discovery problem! An alternate approach to the problem of degeneracy is penalized estimation. By introducing a group of simple penalty functions, we consider penalized maximum likelihood estimation of the CMDA1 and CMDA2 models. This strategy improves the convergence of the conventional EM algorithm, and helps avoid degenerate solutions. Extending techniques from Chen et al. (2007), we prove that the PMLE’s of the two-dimensional CMDA1 model can be asymptotically consistent.
440

Aiding Human Discovery of Out-of-the-Moment Handwriting Recognition Errors

Stedman, Ryan January 2009 (has links)
Handwriting recognizers frequently misinterpret digital ink input, requiring human verification of recognizer output to identify and correct errors, before the output of the recognizer can be used with any confidence int its correctness. Technologies like Anoto pens can make this error discovery and correction task more difficult, because verification of recognizer output may occur many hours after data input, creating an ``out-of-the-moment'' verification scenario. This difficulty can increase the number of recognition errors missed by users in verification. To increase the accuracy of human verified recognizer output, methods of aiding users in the discovery of handwriting recognition errors need to be created. While this need has been recognized by the research community, no published work exists examining this problem. This thesis explores the problem of creating error discovery aids for handwriting recognition. Design possibilities for the creation of error discovery aids are explored, and concrete designs for error discovery aids are presented. Evaluations are performed on a set of these proposed discovery aids, showing that the visual proximity aid improves user performance in error discovery. Following the evaluation of the discovery aids proposed in this thesis, the one discovery aid that has been proposed in the literature, confidence highlighting, is explored in detail and its potential as a discovery aid is highlighted. A technique is then presented, complimentary to error discovery aids, to allow a system to monitor and respond to user performance in errors discovery. Finally, a set of implications are derived from the presented work for the design of verification interfaces for handwriting recognition.

Page generated in 0.035 seconds