Global ETD Search

441	Event-Level Pattern Discovery for Large Mixed-Mode Database Wu, Bin January 2010 (has links) For a large mixed-mode database, how to discretize its continuous data into interval events is still a practical approach. If there are no class labels for the database, we have nohelpful correlation references to such task Actually a large relational database may contain various correlated attribute clusters. To handle these kinds of problems, we first have to partition the databases into sub-groups of attributes containing some sort of correlated relationship. This process has become known as attribute clustering, and it is an important way to reduce our search in looking for or discovering patterns Furthermore, once correlated attribute groups are obtained, from each of them, we could find the most representative attribute with the strongest interdependence with all other attributes in that cluster, and use it as a candidate like a a class label of that group. That will set up a correlation attribute to drive the discretization of the other continuous data in each attribute cluster. This thesis provides the theoretical framework, the methodology and the computational system to achieve that goal. Event-Level Pattern Discovery Large Mixed-Mode Database System Design Engineering
442	Design and Implementation of a Service Discovery and Recommendation Architecture for SaaS Applications Sukkar, Muhamed January 2010 (has links) Increasing number of software vendors are offering or planning to offer their applications as a Software-as-a-Service (SaaS) to leverage the benefits of cloud computing and Internet-based delivery. Therefore, potential clients will face increasing number of providers that satisfy their requirements to choose from. Consequently, there is an increasing demand for automating such a time-consuming and error-prone task. In this work, we develop an architecture for automated service discovery and selection in cloud computing environment. The system is based on an algorithm that recommends service choices to users based on both functional and non-functional characteristics of available services. The system also derives automated ratings from monitoring results of past service invocations to objectively detect badly-behaving providers. We demonstrate the effectiveness of our approach using an early prototype that was developed following object-oriented methodology and implemented using various open-source Java technologies and frameworks. The prototype uses a Chord DHT as its distributed backing store to achieve scalability. SaaS Cloud Computing Service Discovery Service Selection and Recommendation Service Level Agreement Computer Science
443	An economic analysis of crude oil exploration in Saskatchewan and Alberta Kamsari, Haul 28 February 2005 (has links) The International market of crude oil and natural gas is well established and very competitive. Knowledge about costs is important in helping to understand the current position of producers within the industry. In the eyes of the producers, the lower the costs the more profitable they will become given the price of crude. This thesis focuses on an economic analysis of crude oil exploration in Saskatchewan and Alberta. In a competitive market, the producers require estimates of finding costs in both regions. The public policies that are designed to encourage crude exploration also rely heavily on reliable estimates of these costs. The results show that Saskatchewans per-unit finding cost is significantly lower than Albertas in spite of the geological differences between the two provinces. The finding costs are estimated by using a methodology (Uhler 1979) that has been widely accepted within economic literature of non-renewable resources. The results support the hypothesis that finding costs in both regions are increasing and the argument that these costs will converge in the long-run, except for the last six years of the analysis. land royalties taxation finding cost discovery process exploration activities crude oil oil prices public policies
444	Making sense of the mess : do CDS's help? Esau, Heidi Marie 12 April 2010 (has links) In a firm level matched sample of 499 firms we examine the information flow between stocks and the credit default swap (CDSs) over a period of January 2004 to December 2008. Our study confirms the general findings of previous studies that the information generally flows from equity market to CDS market. However, for a much smaller number of firms we also find that information also flows from the CDS to its stock. A major advantage of our sample period is that it allows us to examine the information flow before and during the crisis. This paper makes two contributions. We document that the firms for which the information flows from the CDS to its stock increases by almost tenfold during the crisis. The current crisis is often referred as a credit crisis, so this finding is consistent with what is expected of CDSs. The major contribution of this paper is that it identifies the firm specific factors that influence the information flow across the two markets. We show that characteristics such as asset size, profitability, and industry, amongst others, play an important role in determining information flow. credit default swap CDS lead-lag credit crisis price discovery firm characteristics
445	Text Mining Biomedical Literature for Genomic Knowledge Discovery Liu, Ying 20 July 2005 (has links) The last decade has been marked by unprecedented growth in both the production of biomedical data and the amount of published literature discussing it. Almost every known or postulated piece of information pertaining to genes, proteins, and their role in biological processes is reported somewhere in the vast amount of published biomedical literature. We believe the ability to rapidly survey and analyze this literature and extract pertinent information constitutes a necessary step toward both the design and the interpretation of any large-scale experiment. Moreover, automated literature mining offers a yet untapped opportunity to integrate many fragments of information gathered by researchers from multiple fields of expertise into a complete picture exposing the interrelated roles of various genes, proteins, and chemical reactions in cells and organisms. In this thesis, we show that functional keywords in biomedical literature, particularly Medline, represent very valuable information and can be used to discover new genomic knowledge. To validate our claim we present an investigation into text mining biomedical literature to assist microarray data analysis, yeast gene function classification, and biomedical literature categorization. We conduct following studies: 1. We test sets of genes to discover common functional keywords among them and use these keywords to cluster them into groups; 2. We show that it is possible to link genes to diseases by an expert human interpretation of the functional keywords for the genes- none of these diseases are as yet mentioned in public databases; 3. By clustering genes based on commonality of functional keywords it is possible to group genes into meaningful clusters that reveal more information about their functions, link to diseases and roles in metabolism pathways; 4. Using extracted functional keywords, we are able to demonstrate that for yeast genes, we can make a better functional grouping of genes in comparison to available public microarray and phylogenetic databases; 5. We show an application of our approach to literature classification. Using functional keywords as features, we are able to extract epidemiological abstracts automatically from Medline with higher sensitivity and accuracy than a human expert. Text mining Biomedical literature Gene function Clustering Genomic knowledge discovery Microarray
446	Feature-Based Hierarchical Knowledge Engineering for Aircraft Life Cycle Design Decision Support Zhao, Wei 09 April 2007 (has links) The design process of aerospace systems is becoming more and more complex. As the process is progressively becoming enterprise-wide, it involves multiple vendors and encompasses the entire life-cycle of the system, as well as a system-of-systems perspective. The amount of data and information generated under this paradigm has increased exponentially creating a difficult situation as it pertains to data storage, management, and retrieval. Furthermore, the data themselves are not suitable or adequate for use in most cases and must be translated into knowledge with a proper level of abstraction. Adding to the problem is the fact that the knowledge discovery process needed to support the growth of data in aerospace systems design has not been developed to the appropriate level. In fact, important design decisions are often made without sufficient understanding of their overall impact on the aircraft's life, because the data have not been efficiently converted and interpreted in time to support design. In order to make the design process adapt to the life-cycle centric requirement, this thesis proposes a methodology to provide the necessary supporting knowledge for better design decision making. The primary contribution is the establishment of a knowledge engineering framework for design decision support to effectively discover knowledge from the existing data, and efficiently manage and present the knowledge throughout all phases of the aircraft life-cycle. The second contribution is the proposed methodology on the feature generation and exploration, which is used to improve the process of knowledge discovery process significantly. In addition, the proposed work demonstrates several multimedia-based approaches on knowledge presentation. Knowledge presentation Knowledge management Knowledge discovery Life cycle costing Airplanes Design and construction Decision making
447	Computer Simulation of Interaction between Protein and Organic Molecules Wang, Cheng-Chieh 21 July 2011 (has links) Docking is one of the methods in virtual screeing. Studies from around 1980 to now, many docking software have been developed, but these software have many short comings. The software currently used for docking have many disadvantage: poor efficiency, rigid structure of the proteins and the ligands, poor accuracy, without the polarization after binding, leading virtual screening is still stuck in a supporting role. Our experiment with new method improves those shortcomings of docking. With this new method, we obtain the following improvements in docking process: better efficiency, flexible structure of the proteins and the ligands, better accuracy. In the depression-related protein docked with traditional Chinese medicine test. We change the conformations of ligands with the shapes of active sites before posing, this makes the conformation of complex much more reasonable, even more complicated, large ligands. In the experiment of random sites docking, we found a possible path for compounds traveling into active sites. We illustrate a docking area by linking all possible docking sites. The lead compound may not successfully travel into active site when this area is occupied by other proteins or ligands. In the docking experiment with side-chain rotation, we rotate the torsion angle to make side chains relax. We obtained a similar result with molecular dynamics, and saved a lot of time. Drug design Molecular docking Computer simulation Torsion angle Drug screening Drug discovery
448	Price Discovery in the Natural Gas Markets of the United States and Canada Olsen, Kyle 2010 December 1900 (has links) The dynamics of the U.S. and Canada natural gas spot markets are evolving through deregulation policies and technological advances. Economic theory suggests that these markets will be integrated. The key question is the extent of integration among the markets. This thesis characterizes the degree of dynamic integration among 11 major natural gas markets, six from the U.S. and five from Canada, and determines each individual markets’ role in price discovery. This is the first study to include numerous Canadian markets in a North American natural gas market study. Causal flows modeling using directed acyclic graphs in conjunction with time series analysis are used to explain the relationships among the markets. Daily gas price data from 1994 to 2009 are used. The 11 natural gas market prices are tied together with nine long-run co-integrating relationships. All markets are included in the co-integration space, providing evidence the markets are integrated. Results show the degree of integration varies by region. Further results indicate no clear price leader exists among the 11 markets. Dawn market is exogenous in contemporaneous time, while Sumas market is an information sink. Henry Hub plays a significant role in the price discovery of markets in the U.S. Midwest and Northeast, but little to markets in the west. The uncertainty of a markets’ price depends primarily on markets located in nearby regions. Policy makers may use information on market integration for important policy matters in efforts of attaining efficiency. Gas traders benefit from knowing the price discovery relationships. vector error correction model directed acyclical graph price discovery natural gas markets innovation accounting
449	Algorithms for Large-Scale Internet Measurements Leonard, Derek Anthony 2010 December 1900 (has links) As the Internet has grown in size and importance to society, it has become increasingly difficult to generate global metrics of interest that can be used to verify proposed algorithms or monitor performance. This dissertation tackles the problem by proposing several novel algorithms designed to perform Internet-wide measurements using existing or inexpensive resources. We initially address distance estimation in the Internet, which is used by many distributed applications. We propose a new end-to-end measurement framework called Turbo King (T-King) that uses the existing DNS infrastructure and, when compared to its predecessor King, obtains delay samples without bias in the presence of distant authoritative servers and forwarders, consumes half the bandwidth, and reduces the impact on caches at remote servers by several orders of magnitude. Motivated by recent interest in the literature and our need to find remote DNS nameservers, we next address Internet-wide service discovery by developing IRLscanner, whose main design objectives have been to maximize politeness at remote networks, allow scanning rates that achieve coverage of the Internet in minutes/hours (rather than weeks/months), and significantly reduce administrator complaints. Using IRLscanner and 24-hour scan durations, we perform 20 Internet-wide experiments using 6 different protocols (i.e., DNS, HTTP, SMTP, EPMAP, ICMP and UDP ECHO). We analyze the feedback generated and suggest novel approaches for reducing the amount of blowback during similar studies, which should enable researchers to collect valuable experimental data in the future with significantly fewer hurdles. We finally turn our attention to Intrusion Detection Systems (IDS), which are often tasked with detecting scans and preventing them; however, it is currently unknown how likely an IDS is to detect a given Internet-wide scan pattern and whether there exist sufficiently fast stealth techniques that can remain virtually undetectable at large-scale. To address these questions, we propose a novel model for the windowexpiration rules of popular IDS tools (i.e., Snort and Bro), derive the probability that existing scan patterns (i.e., uniform and sequential) are detected by each of these tools, and prove the existence of stealth-optimal patterns. Distance Estimation Horizontal Scanning Service Discovery Intrusion Detection Systems Internet Measurements
450	An Efficient Bitmap-Based Approach to Mining Sequential Patterns for Large Databases Wu, Chien-Hui 29 July 2004 (has links) The task of Data Mining is to find the useful information within the incredible sets of data. One of important research areas of Data Mining is Mining Sequential Patterns. For a transaction database, sequential pattern means that there are some relations between the items bought by customers in a period of time. If we can find these relations by mining sequential patterns, we can provide better selling strategy to gain more customers' attentions. However, since the transaction database contains a lot of data, and it will be scanned during the mining process again and again, to improve the running efficiency is an important topic. In the GSP algorithm proposed by Srikant and Agrawal, they use a complex data structure to store and generate candidates. The generated candidates satisfy a property, ``the subsets of a frequent itemset are also frequent'. The property leads to fewer number of candidates; however, it still spends too much time to counting candidates. In the SPAM algorithm proposed by Aryes et al., they use the bitwise operations to reduce the time for counting candidates. However, it generates too many candidates which will never become frequent itemsets, which decreases the efficiency. In this thesis, we proposed a new bitmap-based algorithm. By modifying the way to generate candidates in the GSP algorithm and applying the bitwise operations in the SPAM algorithm, the proposed algorithm can mine sequential patterns efficiently. That is, we use the similar candidate generation method presented in the GSP algorithm to reduce the number of candidates and the similar counting method proposed in the SPAM algorithm to reduce the time of counting candidates. In the proposed algorithm, we classify the itemsets into two cases, simultaneous occurrence (noted as AB) and sequential occurrence (noted as A-> B). In the case of simultaneous occurrence, the number of candidate is C(n,k) based on the exhausted method. In order to prevent too many candidates generated, we make use of the property, ``the subsets of a frequent itemset are also frequent', to reduce the number of candidates from C(n,k) to C(y,k), k <= y < n. In the case of sequential occurrence, the candidates are generated by using a special join operation which could combine, for example, A->B and B->C to A->B->C. Moreover, we have to consider two other cases: (1) combing A->B and A->C to A->BC; (2) combing A->C and B->C to AB->C. The method of counting candidates is similar to the SPAM algorithm (i.e., bitwise operations). From our simulation results, based on the same bit representation for the transaction database, we show that our proposed algorithm could provide better performance than the SPAM algorithm in terms of the processing time, since our algorithm could generate fewer number of candidates than the SPAM algorithm. Bitmap Based Mining Knowledge Discovery Sequential Patterns Pattern Analysis Data Mining

Search results