Global ETD Search

51	Semantics-based resource discovery in global-scale grids Li, Juan 11 1900 (has links) Grid computing is a virtualized distributed computing environment aimed at enabling the sharing of geographically distributed resources. Grid resources have traditionally consisted of dedicated supercomputers, clusters, or storage units. With the present ubiquitous network connections and the growing computational and storage capabilities of modem everyday-use computers, more resources such as PCs, devices (e.g., PDAs and sensors), applications, and services are on grid networks. Grid is expected to evolve from a computing and data management facility to a pervasive, world-wide resource-sharing infrastructure. To fully utilize the wide range of grid resources, effective resource discovery mechanisms are required. However, resource discovery in a global-scale grid is challenging due to the considerable diversity, large number, dynamic behavior, and geographical distribution of the resources. The resource discovery technology required to achieve the ambitious global grid vision is still in its infancy, and existing applications have difficulties in achieving both rich searchability and good scalability. In this thesis, we investigate the resource discovery problem for open-networked global-scale grids. In particular, we propose a distributed semantics-based discovery framework. We show how this framework can be used to address the discovery problem in such grids and improve three aspects of performance: expressiveness, scalability, and efficiency. Expressiveness is the first characteristic that a grid resource-searching mechanism should have. Most existing search systems use simple keyword-based lookups, which limit the searchability of the system. Our framework improves search expressiveness from two directions: First, it uses a semantic metadata scheme to provide users with a rich and flexible representation mechanism, to enable effective descriptions of desired resource properties and query requirements. Second, we employ ontological domain knowledge to assist in the search process. The system is thus able to understand the semantics of query requests according to their meanings in a specific domain; this procedure helps the system to locate only semantically related results. The more expressive the resource description and query request, however, the more difficult it is to design a scalable and efficient search mechanism. We ensure scalability by reconfiguring the network with respect to shared ontologies. This reconfiguration partitions the large unorganized search space into multiple well-organized semantically related sub-spaces that we call semantic virtual organizations. Semantic virtual organizations help to discriminatively distribute resource information and queries to related nodes, thus reducing the search space and improving scalability. To further improve the efficiency of searching the virtual organizations, we propose two semantics-based resource-integrating and searching systems: GONID and OntoSum. These two systems address searching problems for applications based on different network topologies: structured and unstructured peer-to-peer overlay networks. Queries in the search systems are processed in a transparent way, so that users accessing the data can be insulated from the fact that the information is distributed across different sources and represented with different formats. In both systems, ontological knowledge is decomposed into different coarse-grained elements, and then these elements are indexed with different schemes to fit the requirements of different applications. Resource metadata reasoning, integrating, and searching are based on the index. A complex query can be evaluated by performing relational operations such as select, project, and join on combinations of the indexing elements. We evaluate the performance of our system with extensive simulation experiments, the results of which confirm the effectiveness of the design. In addition, we implement a prototype that incorporates our ontology-based virtual organization formation and semantics-based query mechanisms. Our deployment of the prototype verifies the system's feasibility and its applicability to real-world applications. Global-scale grids P2P Resource discovery
52	An Architecture for Geographically-Oriented Service Discovery on the Internet Li, Qiyan January 2002 (has links) Most of the service discovery protocols available on the Internet are built upon its logical structure. This phenomenon can be observed frequently from the way in which they behave. For instance, Jini and SLP service providers announce their presence by multicasting service advertisements, an approach that is neither intended to scale nor capable of scaling to the size of the Internet. With mobile and wireless devices becoming increasingly popular, there appears to be a need for performing service discovery in a wide-area context, as there is very little direct correlation between the Internet topology and geographic locations. Even for desktop computers, such a need can arise from time to time. This problem suggests the necessity for an architecture that allows users to locate resources on the Internet using geographic criteria. This thesis presents an architecture that can be deployed with minimal effort in the existing network infrastructure. The geographic information can be shared among multiple applications in a fashion similar to the way DNS is shared throughout the Internet. The design and implementation of the architecture are discussed in detail, and three case studies are used to illustrate how the architecture can be employed by various applications to satisfy dramatically different needs of end-users. Computer Science service discovery distributed computing
53	Microarray analysis using pattern discovery Bainbridge, Matthew Neil 10 December 2004 (has links) Analysis of gene expression microarray data has traditionally been conducted using hierarchical clustering. However, such analysis has many known disadvantages and pattern discovery (PD) has been proposed as an alternative technique. In this work, three similar but different PD algorithms Teiresias, Splash and Genes@Work were benchmarked for time and memory efficiency on a small yeast cell-cycle data set. Teiresias was found to be the fastest, and best over-all program. However, Splash was more memory efficient. This work also investigated the performance of four methods of discretizing microarray data: sign-of-the-derivative, K-means, pre-set value, and Genes@Work stratification. The first three methods were evaluated on their predisposition to group together biologically related genes. On a yeast cell-cycle data set, sign-of-the-derivative method yielded the most biologically significant patterns, followed by the pre-set value and K-means methods. K-means, preset-value, and Genes@Work were also compared on their ability to classify tissue samples from diffuse large b-cell lymphoma (DLBCL) into two subtypes determined by standard techniques. The Genes@Work stratification method produced the best patterns for discriminating between the two subtypes of lymphoma. However, the results from the second-best method, K-means, call into question the accuracy of the classification by the standard technique. Finally, a number of recommendations for improvement of pattern discovery algorithms and discretization techniques are made. data mining patterns pattern discovery microarray bioinformatics
54	A targeted evaluation of OpenEye’s methods for virtual ligand screens and docking Lantz, Mikael January 2005 (has links) <p>The process of drug discovery is very slow and expensive. There is a need for reliable in silico methods; however the performance of these methods differs.</p><p>This work presents a targeted study on how the drug discovery methods used in OpenEye’s tools ROCS, EON and FRED perform on targets with small ligands. It was examined if 12 compounds (markers) somewhat similar to AMP could be detected by ROCS in a random data set comprised of 1000 compounds. It was also examined if EON could find any electrostatic similarities between the queries and the markers. The performance of FRED with respect to re-generation of bound ligand modes was examined on ten different protein/ligand complexes from the Brookhaven Protein Data Bank. It was also examined if FRED is suitable as a screening tool since several other docking methods are used in such a way. Finally it was also examined if it was possible to reduce the time requirements of ROCS when running multiconformer queries by using a combination of single conformer queries coupled with multiconformer queries.</p><p>The conclusions that could be drawn from this project were that FRED is not a good screening tool, but ROCS performs well as such. It was also found that the scoring functions are the weak spots of FRED. EON is probably very sensitive to the conformers used but can in some cases strengthen the results from ROCS. A novel and simple way to reduce the time complexity with multiconformer queries to ROCS was discovered and was shown to work well.</p> Drug discovery ROCS EON FRED Bioinformatics Bioinformatik
55	Detecting differentially expressed genes while controlling the false discovery rate for microarray data Jiao, Shuo. January 2009 (has links) Thesis (Ph.D.)--University of Nebraska-Lincoln, 2009. / Title from title screen (site viewed March 2, 2010). PDF text: 100 p. : col. ill. ; 953 K. UMI publication number: AAT 3379821. Includes bibliographical references. Also available in microfilm and microfiche formats.
56	Classroom investigations into the adaptation and evaluation of elementary human biology topics using the more recent inquiry techniques. Beckett, B. S. January 1972 (has links) Thesis--M.A.(Ed.), University of Hong Kong. / Typewritten.
57	Pattern Discovery in DNA Sequences Yan, Rui 20 March 2014 (has links) A pattern is a relatively short sequence that represents a phenomenon in a set of sequences. Not all short sequences are patterns; only those that are statistically significant are referred to as patterns or motifs. Pattern discovery methods analyze sequences and attempt to identify and characterize meaningful patterns. This thesis extends the application of pattern discovery algorithms to a new problem domain - Single Nucleotide Polymorphism (SNP) classification. SNPs are single base-pair (bp) variations in the genome, and are probably the most common form of genetic variation. On average, one in every thousand bps may be an SNP. The function of most SNPs, especially those not associated with protein sequence changes, remains unclear. However, genome-wide linkage analyses have associated many SNPs with disorders ranging from Crohn’s disease, to cancer, to quantitative traits such as height or hair color. As a result, many groups are working to predict the functional effects of individual SNPs. In contrast, very little research has examined the causes of SNPs: Why do SNPs occur where they do? This thesis addresses this problem by using pattern discovery algorithms to study DNA non-coding sequences. The hypothesis is that short DNA patterns can be used to predict SNPs. For example, such patterns found in the SNP sequence might block the DNA repair mechanism for the SNP, thus causing SNP occurrence. In order to test the hypothesis, a model is developed to predict SNPs by using pattern discovery methods. The results show that SNP prediction with pattern discovery methods is weak (50 2%), whereas machine learning classification algorithms can achieve prediction accuracy as high as 68%. To determine whether the poor performance of pattern discovery is due to data characteristics (such as sequence length or pattern length) or to the specific biological problem (SNP prediction), a survey was conducted by profiling eight representative pattern discovery methods at multiple parameter settings on 6,754 real biological datasets. This is the first systematic review of pattern discovery methods with assessments of prediction accuracy, CPU usage and memory consumption. It was found that current pattern discovery methods do not consider positional information and do not handle short sequences well (<150 bps), including SNP sequences. Therefore, this thesis proposes a new supervised pattern discovery classification algorithm, referred to as Weighted-Position Pattern Discovery and Classification (WPPDC). The WPPDC is able to exploit positional information to identify positionally-enriched motifs, and to select motifs with a high information content for further classification. Tree structure is applied to WPPDC (referred to as T-WPPDC) in order to reduce algorithmic complexity. Compared to pattern discovery methods T-WPPDC not only showed consistently superior prediction accuracy and but generated patterns with positional information. Machine-learning classification methods (such as Random Forests) showed comparable prediction accuracy. However, unlike T-WPPDC, they are classification methods and are unable to generate SNP-associated patterns. Pattern Discovery SNPs Machine Learning Bioinformatics 0984
58	Pattern Discovery in DNA Sequences Yan, Rui 20 March 2014 (has links) A pattern is a relatively short sequence that represents a phenomenon in a set of sequences. Not all short sequences are patterns; only those that are statistically significant are referred to as patterns or motifs. Pattern discovery methods analyze sequences and attempt to identify and characterize meaningful patterns. This thesis extends the application of pattern discovery algorithms to a new problem domain - Single Nucleotide Polymorphism (SNP) classification. SNPs are single base-pair (bp) variations in the genome, and are probably the most common form of genetic variation. On average, one in every thousand bps may be an SNP. The function of most SNPs, especially those not associated with protein sequence changes, remains unclear. However, genome-wide linkage analyses have associated many SNPs with disorders ranging from Crohn’s disease, to cancer, to quantitative traits such as height or hair color. As a result, many groups are working to predict the functional effects of individual SNPs. In contrast, very little research has examined the causes of SNPs: Why do SNPs occur where they do? This thesis addresses this problem by using pattern discovery algorithms to study DNA non-coding sequences. The hypothesis is that short DNA patterns can be used to predict SNPs. For example, such patterns found in the SNP sequence might block the DNA repair mechanism for the SNP, thus causing SNP occurrence. In order to test the hypothesis, a model is developed to predict SNPs by using pattern discovery methods. The results show that SNP prediction with pattern discovery methods is weak (50 2%), whereas machine learning classification algorithms can achieve prediction accuracy as high as 68%. To determine whether the poor performance of pattern discovery is due to data characteristics (such as sequence length or pattern length) or to the specific biological problem (SNP prediction), a survey was conducted by profiling eight representative pattern discovery methods at multiple parameter settings on 6,754 real biological datasets. This is the first systematic review of pattern discovery methods with assessments of prediction accuracy, CPU usage and memory consumption. It was found that current pattern discovery methods do not consider positional information and do not handle short sequences well (<150 bps), including SNP sequences. Therefore, this thesis proposes a new supervised pattern discovery classification algorithm, referred to as Weighted-Position Pattern Discovery and Classification (WPPDC). The WPPDC is able to exploit positional information to identify positionally-enriched motifs, and to select motifs with a high information content for further classification. Tree structure is applied to WPPDC (referred to as T-WPPDC) in order to reduce algorithmic complexity. Compared to pattern discovery methods T-WPPDC not only showed consistently superior prediction accuracy and but generated patterns with positional information. Machine-learning classification methods (such as Random Forests) showed comparable prediction accuracy. However, unlike T-WPPDC, they are classification methods and are unable to generate SNP-associated patterns. Pattern Discovery SNPs Machine Learning Bioinformatics 0984
59	Semantics-based resource discovery in global-scale grids Li, Juan 11 1900 (has links) Grid computing is a virtualized distributed computing environment aimed at enabling the sharing of geographically distributed resources. Grid resources have traditionally consisted of dedicated supercomputers, clusters, or storage units. With the present ubiquitous network connections and the growing computational and storage capabilities of modem everyday-use computers, more resources such as PCs, devices (e.g., PDAs and sensors), applications, and services are on grid networks. Grid is expected to evolve from a computing and data management facility to a pervasive, world-wide resource-sharing infrastructure. To fully utilize the wide range of grid resources, effective resource discovery mechanisms are required. However, resource discovery in a global-scale grid is challenging due to the considerable diversity, large number, dynamic behavior, and geographical distribution of the resources. The resource discovery technology required to achieve the ambitious global grid vision is still in its infancy, and existing applications have difficulties in achieving both rich searchability and good scalability. In this thesis, we investigate the resource discovery problem for open-networked global-scale grids. In particular, we propose a distributed semantics-based discovery framework. We show how this framework can be used to address the discovery problem in such grids and improve three aspects of performance: expressiveness, scalability, and efficiency. Expressiveness is the first characteristic that a grid resource-searching mechanism should have. Most existing search systems use simple keyword-based lookups, which limit the searchability of the system. Our framework improves search expressiveness from two directions: First, it uses a semantic metadata scheme to provide users with a rich and flexible representation mechanism, to enable effective descriptions of desired resource properties and query requirements. Second, we employ ontological domain knowledge to assist in the search process. The system is thus able to understand the semantics of query requests according to their meanings in a specific domain; this procedure helps the system to locate only semantically related results. The more expressive the resource description and query request, however, the more difficult it is to design a scalable and efficient search mechanism. We ensure scalability by reconfiguring the network with respect to shared ontologies. This reconfiguration partitions the large unorganized search space into multiple well-organized semantically related sub-spaces that we call semantic virtual organizations. Semantic virtual organizations help to discriminatively distribute resource information and queries to related nodes, thus reducing the search space and improving scalability. To further improve the efficiency of searching the virtual organizations, we propose two semantics-based resource-integrating and searching systems: GONID and OntoSum. These two systems address searching problems for applications based on different network topologies: structured and unstructured peer-to-peer overlay networks. Queries in the search systems are processed in a transparent way, so that users accessing the data can be insulated from the fact that the information is distributed across different sources and represented with different formats. In both systems, ontological knowledge is decomposed into different coarse-grained elements, and then these elements are indexed with different schemes to fit the requirements of different applications. Resource metadata reasoning, integrating, and searching are based on the index. A complex query can be evaluated by performing relational operations such as select, project, and join on combinations of the indexing elements. We evaluate the performance of our system with extensive simulation experiments, the results of which confirm the effectiveness of the design. In addition, we implement a prototype that incorporates our ontology-based virtual organization formation and semantics-based query mechanisms. Our deployment of the prototype verifies the system's feasibility and its applicability to real-world applications. Global-scale grids P2P Resource discovery
60	A Conceptual Framework for Evaluating and Designing Information Discovery and Curation Tools Voyloshnikova, Elena 29 April 2015 (has links) Everyday life revolves around the discovery and curation of digital information. People search the Web continuously, from quickly looking up the information needed to complete a task, to endlessly searching for inspiration and knowledge. A variety of studies have modeled information seeking strategies and characterized information seeking and curation activities on the Web. However, there is a lack of research on how existing Web applications support the discovery and curation of information, especially concerning the motivations behind them and how different approaches can be compared. In this thesis, I present a study of information discovery tools and how they relate to the nature of information seeking. I propose a conceptual framework that deals with Web application design elements that support different aspects of information discovery and curation. This framework can be used when designing, evaluating or updating Web applications. / Graduate / 0984 information discovery information curation conceptual framework

Search results