Global ETD Search

501	Effective web service discovery using a combination of a semantic model and a data mining technique Bose, Aishwarya January 2008 (has links) With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
502	Granule-based knowledge representation for intra and inter transaction association mining Yang, Wanzhong January 2009 (has links) Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.
503	Essays on the dynamic relationship between different types of investment flow and prices OH, Natalie Yoon-na, Banking & Finance, Australian School of Business, UNSW January 2005 (has links) This thesis presents three related essays on the dynamic relationship between different types of investment flow and prices in the equity market. These studies attempt to provide greater insight into the evolution of prices by investigating not ???what moves prices??? but ???who moves prices??? by utilising a unique database from the Korean Stock Exchange. The first essay investigates the trading behaviour and performance of online equity investors in comparison to other investors on the Korean stock market. Whilst the usage of online resources for trading is becoming more and more prevalent in financial markets, the literature on the role of online investors and their impact on prices is limited. The main finding arising from this essay supports the claim that online investors are noise traders at an aggregate level. Whereas foreigners show distinct trading patterns as a group in terms of consensus on the direction of market movements, online investors do not show such distinct trading patterns. The essay concludes that online investors do not trade on clear information signals and introduce noise into the market. Direct performance and market timing ability measures further show that online investors are the worst performers and market timers whereas foreign investors consistently show outstanding performance and market timing ability. Domestic mutual funds in Korea have not been extensively researched. The second essay analyses mutual fund activity and relations between stock market returns and mutual fund flows in Korea. Although regulatory authorities have been cautious about introducing competing funds, contractual-type mutual funds have not been cannibalized by the US-style corporate mutual funds that started trading in 1998. Negative feedback trading activity is observed between stock market returns and mutual fund flows, measured as net trading volumes using stock purchases and sales volume. It is predominantly returns that drive flows, although stock purchases contain information about returns, partially supporting the price pressure hypothesis. After controlling for declining markets, the results suggest Korean equity fund managers tend to swing indiscriminately between increasing purchases and increasing sales in times of rising market volatility, possibly viewing volatility as an opportunity to profit and defying the mean-variance framework that predicts investors should retract from the market as volatility increases. Mutual funds respond indifferently to wide dispersions in investor beliefs. The third essay focuses on the conflicting issue of home bias by looking at the impact on domestic prices of foreign trades relative to locals using high frequency data from the Korean Stock Exchange (KSE). This essay extends the work of Choe, Kho and Stulz (2004) (CKS) in three ways. First, it analyses the post-Asian financial crisis period, whereas CKS (2004) analyse the crisis (1996-98) period. Second, this essay adopts a modified version of the CKS method to better capture the aggregate behaviour of each investor-type by utilising the participation ratio in comparison to the CKS method. Third, this essay does not limit investigation to intra-day analysis but extends to daily analysis up to 50 days to observe the effect of intensive trading activity in a longer horizon than the CKS study. In contrast to the CKS findings, this paper finds that foreigners have a short-lived private information advantage over locals and trades by foreigners have a larger impact on prices using intra-day data. However, assuming investors buy-hold for up to 50 days, the local individuals provide a greater impact and more profitable returns than foreigners. Superior performance is documented for buys rather than sells. investor types price discovery volume and price relationship Stocks Prices Korea South Investments
504	Discovery and Validation for Composite Services on the Semantic Web Gooneratne, Nalaka Dilshan, s3034554@student.rmit.edu.au January 2009 (has links) Current technology for locating and validating composite services are not sufficient due to the following reasons. Current frameworks do not have the capacity to create complete service descriptions since they do not model all the functional aspects together (i.e. the purpose of a service, state transitions, data transformations). Those that deal with behavioural descriptions are unable to model the ordering constraints between concurrent interactions completely since they do not consider the time taken by interactions. Furthermore, there is no mechanism to assess the correctness of a functional description. Existing semantic-based matching techniques cannot locate services that conform to global constraints. Semantic-based techniques use ontological relationships to perform mappings between the terms in service descriptions and user requests. Therefore, unlike techniques that perform either direct string matching or schema matching, semantic-based approaches can match descriptions created with different terminologies and achieve a higher recall. Global constraints relate to restrictions on values of two or more attributes of multiple constituent services. Current techniques that generate and validate global communication models of composite services yield inaccurate results (i.e. detect phantom deadlocks or ignore actual deadlocks) since they either (i) do not support all types of interactions (i.e. only send and receive, not service and invoke) or (ii) do not consider the time taken by interactions. This thesis presents novel ideas to deal with the stated limitations. First, we propose two formalisms (WS-ALUE and WS-π-calculus) for creating functional and behavioural descriptions respectively. WS-ALUE extends the Description Logic language ALUE with some new predicates and models all the functional aspects together. WS-π-calculus extends π-calculus with Interval Time Logic (ITL) axioms. ITL axioms accurately model temporal relationships between concurrent interactions. A technique comparing a WS-π-calculus description of a service against its WS-ALUE description is introduced to detect any errors that are not equally reflected in both descriptions. We propose novel semantic-based matching techniques to locate composite services that conform to global constraints. These constraints are of two types: strictly dependent or independent. A constraint is of the former type if the values that should be assigned to all the remaining restricted attributes can be uniquely determined once a value is assigned to one. Any global constraint that is not strictly dependent is independent. A complete and correct technique that locates services that conform to strictly dependent constraints in polynomial time, is defined using a three-dimensional data cube. The proposed approach that deals with independent constraints is correct, but not complete, and is a heuristic approach. It incorporates user defined objective functions, greedy algorithms and domain rules to locate conforming services. We propose a new approach to generate global communication models (of composite services) that are free of deadlocks and synchronisation conflicts. This approach is an extension of a transitive temporal reasoning mechanism. Web Services Service Descriptions Composite Services Service Matching Service Discovery Service Validation
505	The performance of multiple hypothesis testing procedures in the presence of dependence Clarke, Sandra Jane January 2010 (has links) Hypothesis testing is foundational to the discipline of statistics. Procedures exist which control for individual Type I error rates and more global or family-wise error rates for a series of hypothesis tests. However, the ability of scientists to produce very large data sets with increasing ease has led to a rapid rise in the number of statistical tests performed, often with small sample sizes. This is seen particularly in the area of biotechnology and the analysis of microarray data. This thesis considers this high-dimensional context with particular focus on the effects of dependence on existing multiple hypothesis testing procedures. / While dependence is often ignored, there are many existing techniques employed currently to deal with this context but these are typically highly conservative or require difficult estimation of large correlation matrices. This thesis demonstrates that, in this high-dimensional context when the distribution of the test statistics is light-tailed, dependence is not as much of a concern as in the classical contexts. This is achieved with the use of a moving average model. One important implication of this is that, when this is satisfied, procedures designed for independent test statistics can be used confidently on dependent test statistics. / This is not the case however for heavy-tailed distributions, where we expect an asymptotic Poisson cluster process of false discoveries. In these cases, we estimate the parameters of this process along with the tail-weight from the observed exceedences and attempt to adjust procedures. We consider both conservative error rates such as the family-wise error rate and more popular methods such as the false discovery rate. We are able to demonstrate that, in the context of DNA microarrays, it is rare to find heavy-tailed distributions because most test statistics are averages.
506	In silico virulence prediction and virulence gene discovery of Streptococcus agalactiae Lin, Frank Po-Yen, Centre for Health Informatics, Faculty of Medicine, UNSW January 2009 (has links) Physicians frequently face challenges in predicting which bacterial subpopulations are likely to cause severe infections. A more accurate prediction of virulence would improve diagnostics and limit the extent of antibiotic resistance. Nowadays, bacterial pathogens can be typed with high accuracy with advanced genotyping technologies. However, effective translation of bacterial genotyping data into assessments of clinical risk remains largely unexplored. The discovery of unknown virulence genes is another key determinant of successful prediction of infectious disease outcomes. The trial-and-error method for virulence gene discovery is time-consuming and resource-intensive. Selecting candidate genes with higher precision can thus reduce the number of futile trials. Several in silico candidate gene prioritisation (CGP) methods have been proposed to aid the search for genes responsible for inherited diseases in human. It remains uninvestigated as to how the CGP concept can assist with virulence gene discovery in bacterial pathogens. The main contribution of this thesis is to demonstrate the value of translational bioinformatics methods to address challenges in virulence prediction and virulence gene discovery. This thesis studied an important perinatal bacterial pathogen, group B streptococcus (GBS), the leading cause of neonatal sepsis and meningitis in developed countries. While several antibiotic prophylactic programs have successfully reduced the number of early-onset neonatal diseases (infections that occur within 7 days of life), the prevalence of late-onset infections (infections that occur between 7??30 days of life) remained constant. In addition, the widespread use of intrapartum prophylactic antibiotics may introduce undue risk of penicillin allergy and may trigger the development of antibiotic-resistant microorganisms. To minimising such potential harm, a more targeted approach of antibiotic use is required. Distinguish virulent GBS strains from colonising counterparts thus lays the cornerstone of achieving the goal of tailored therapy. There are three aims of this thesis: 1. Prediction of virulence by analysis of bacterial genotype data: To identify markers that may be associated with GBS virulence, statistical analysis was performed on GBS genotype data consisting of 780 invasive and 132 colonising S. agalactiae isolates. From a panel of 18 molecular markers studied, only alp3 gene (which encodes a surface protein antigen commonly associated with serotype V) showed an increased association with invasive diseases (OR=2.93, p=0.0003, Fisher??s exact test). Molecular serotype II (OR=10.0, p=0.0007) was found to have a significant association with early-onset neonatal disease when compared with late-onset diseases. To investigate whether clinical outcomes can be predicted by the panel of genotype markers, logistic regression and machine learning algorithms were applied to distinguish invasive isolates from colonising isolates. Nevertheless, the predictive analysis only yielded weak predictive power (area under ROC curve, AUC: 0.56??0.71, stratified 10-fold cross-validation). It was concluded that a definitive predictive relationship between the molecular markers and clinical outcomes may be lacking, and more discriminative markers of GBS virulence are needed to be investigated. 2. Development of two computational CGP methods to assist with functional discovery of prokaryotic genes: Two in silico CGP methods were developed based on comparative genomics: statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify genes with similar occurrence patterns across a range of bacterial genomes. Three rediscovery experiments were carried out to evaluate the CGP methods: a) Rediscovery of peptidoglycan genes was attempted with 417 published bacterial genome sequences. Both CGP methods achieved their best AUC >0.911 in Escherichia coli K-12 and >0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group in the rediscovery of the peptidoglycan metabolism genes. b) A maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes. c) In the rediscovery experiment with genes of 31 metabolic pathways in SA-2603, 14 pathways achieved an AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. The results from the rediscovery experiments demonstrated that the two CGP methods can assist with the study of functionally uncategorised genomic regions and the discovery of bacterial gene-function relationships. 3. Application of the CGP methods to discover GBS virulence genes: Both statistical and inductive CGP were applied to assist with the discovery of unknown GBS virulence factors. Among a list of hypothetical protein genes, several highly-ranked genes were plausibly involved in molecular mechanisms in GBS pathogenesis, including several genes encoding family 8 glycosyltransferase, family 1 and family 2 glycosyltransferase, multiple adhesins, streptococcal neuraminidase, staphylokinase, and other factors that may have roles in contributing to GBS virulence. Such genes may be candidates for further biological validation. In addition, the co-occurrence of these genes with currently known virulence factors suggested that the virulence mechanisms of GBS in causing perinatal diseases are multifactorial. The procedure demonstrated in this prioritisation task should assist with the discovery of virulence genes in other pathogenic bacteria. Candidate gene prioritisation Translational bioinformatics Streptococcus agalactiae Virulence prediction Virulence gene discovery Bacterial genomics Molecular epidemiology
507	Integrative methods for gene data analysis and knowledge discovery on the case study of KEDRI's brain gene ontology a thesis submitted to Auckland University of Technology in partial fulfilment of the requirements for the degree of Master of Computer and Information sciences, 2008 / Wang, Yuepeng January 2008 (has links) Thesis (MCIS) -- AUT University, 2008. / Includes bibliographical references. Also held in print ( 131 leaves : ill. ; 30 cm.) in the Archive at the City Campus (T 616.99404200285 WAN)
508	Strategies for the Discovery of Carbohydrate-Active Enzymes from Environmental Bacteria Larsbrink, Johan January 2013 (has links) The focus of this thesis is a comparative study of approaches in discovery of carbohydrate-active enzymes (CAZymes). CAZymes synthesise, bind to, and degrade all the multitude of carbohydrates found in nature. As such, when aiming for sustainable methods to degrade plant biomass for the generation of biofuels, for which there is a strong drive in society, CAZymes are a natural source of environmentally friendly molecular tools. In nature, microorganisms are the principal degraders of carbohydrates. Not only do they degrade plant matter in forests and aquatic habitats, but also break down the majority of carbohydrates ingested by animals. These symbiotic microorganisms, known as the microbiota, reside in animal digestive tracts in immense quantities, where one of the key nutrient sources is complex carbohydrates. Thus, microorganisms are a plentiful source of CAZymes, and strategies in the discovery of new enzymes from bacterial sources have been the basis for the work presented here, combined with biochemical characterisation of several enzymes. Novel enzymatic activities for the glycoside hydrolase family 31 have been described as a result of the initial projects of the thesis. These later evolved into projects studying bacterial multi-gene systems for the partial or complete degradation of the heterogeneous plant polysaccharide xyloglucan. These systems contain, in addition to various hydrolytic CAZymes, necessary binding-, transport-, and regulatory proteins. The results presented here show, in detail, how very complex carbohydrates can efficiently be degraded by bacterial enzymes of industrial relevance. / <p>QC 20130826</p> CAZyme discovery xyloglucan polysaccharide-utilisation locus microbiota α-xylosidase GH31 transglucosidase human gut
509	Visão sistêmica do Sítio Arqueológico Piracanjuba: a descoberta de conhecimento em sítios arqueológicos Franco, Clélia [UNESP] 26 February 2007 (has links) (PDF) Made available in DSpace on 2014-06-11T19:30:31Z (GMT). No. of bitstreams: 0 Previous issue date: 2007-02-26Bitstream added on 2014-06-13T21:01:19Z : No. of bitstreams: 1 franco_c_dr_prud.pdf: 7388592 bytes, checksum: 3b1b05541970e72cb28df9a6b857ffe8 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Universidade Estadual de Maringa - Uem / Nas últimas décadas, a capacidade de gerar e coletar dados aumentou rapidamente, gerando a necessidade do desenvolvimento de novas técnicas e ferramentas capazes de processar e analisar esses dados descobrindo informações novas e úteis. Surgindo um proeminente campo de pesquisa para a extração de conhecimento de dados Descoberta de Conhecimento em Banco de Dados. Pela aplicação da metodologia da descoberta de conhecimento indireto aos atributos dos fragmentos cerâmicos coletados ao nível do solo no Sítio Arqueológico Piracanjuba Piraju SP, este trabalho pretende prover aos peritos em arqueologia uma visão sistêmica capaz de auxiliá-los no conhecimento das populações pretéritas que ali habitaram. / In the last decades, the capacities to produce and collect data has grown fast and the development of news techniques and tools capable to processes and analyze this datas discovering new and useful information as necessary. Therefore, a huge research area has beginning for the extraction of data understanding Knowledge Discovery in Database. The indirect knowledge discovery applied to ceramic fragment collected at soil level in Piracanjuba's Piraju, SP aims give to archaeology experts a whole vision able to be useful knowledge of the past people living there. Cartografia Arqueologia Mineração de dados (Computação) Estatística multivariada Knowledge discovery Data mining Multivariate statistical
510	Uso de ontologia em serviço de contexto e descoberta de recursos para autoadaptação de sistemas. / The use of ontologies on context and discovery services for self-adaptation of applications. Leila Negris Bezerra 13 July 2011 (has links) Aplicações cientes de contexto precisam de mecanismos para recuperar informações sobre o seu contexto de execução. Com base no contexto atual, tais aplicações são capazes de se autoadaptar para fornecer informações e serviços adequados aos seus usuários. A abordagem comum para infraestruturas de apoio às aplicações sensíveis ao contexto fornece serviços para a descoberta de recursos através da utilização de pares <chave-valor> e motores que executam apenas correspondência sintática. Esta abordagem não considera as possíveis relações semânticas entre as palavras-chave usadas. Portanto, a sua expressividade semântica limitada, leva a um serviço de descoberta que provê baixa taxa de recuperação e baixa acurácia. Este trabalho apresenta a utilização de uma outra abordagem para o serviço de contexto e descoberta, que utiliza ontologias para representar os recursos do contexto de execução e capturar a semântica da consulta do usuário, melhorando assim o processo de descoberta para a autoadaptação de sistemas sensíveis ao contexto. A abordagem proposta oferece também pontos de extensão para as aplicações clientes através da utilização de outras ontologias. Esta abordagem foi integrada à infraestrutura CDRF, de forma a adicionar semântica aos serviços desenvolvidos neste projeto. Exemplos de aplicações são também propostos para demonstrar a utilização dos novos serviços. / Context-aware applications demand ways of retrieving context information from the environment. Based on the current context, such applications are able to self-adapt to provide the correct information and services to its users. The usual approach for supporting infrastructures for context-aware applications provides facilities for resource discovery using <key-value> pairs and discovery engines that perform syntactic matching. This approach does not consider the possible semantic relations between the keywords used. So its limited semantic expressiveness often leads to poor discovery results. This paper presents the use of a different approach for service discovery that uses ontologies to represent resources and capture the semantics of the users query, improving the discovery process for self-adaptation of context-aware systems. The proposed approach also offers extension hooks to the client applications through the use of other ontologies. This approach is integrated into the CDRF framework and adds semantics to the services developed in that project. Example applications are also proposed to demonstrate the use of the new services. Aplicações cientes de contexto Ontologia Serviços de contexto e descoberta. Ontology Context-aware applications Context and discovery services ENGENHARIAS

Search results