Spelling suggestions: "subject:"database"" "subject:"catabase""
811 |
A Mobile Agent Approach for Global Database Constraint Checking: Using Cpa-Insert AlgorithmSupaneedis, Audsanee 13 May 2005 (has links)
As the important of global data sharing is widely utilized in many corporations, it is well know as multidatabase. However, the system occurs and interesting issue. It is global constraint checking. It is mandatory to set up a potential checking application inside; therefore, global constraint checking needs these following essential characteristics such as 1) mobility 2) heterogeneity and 3) robustness. The effective way to implement the checking is using Aglets which is well recognized as one of the good mobile agent. Aglets is very appropriate because it contains the ability of mobility, and it is 100% Java compatible and open source. In this thesis, we construct the application of global constraint checking following these steps. To begin with starting step, user enters the insert statement. The system then receives the input, and then connection with Global Metadatabase begins. It will optimize the proper route for checking. Its optimized data will be sent out with the mobile agents to the remote sites. Eventually, results will be collected and show to user.
|
812 |
Query Optimization for On-Demand Information Extraction Tasks over Text DatabasesFarid, Mina H. 12 March 2012 (has links)
Many modern applications involve analyzing large amounts of data that comes from unstructured text documents. In its original format, data contains information that, if extracted, can give more insight and help in the decision-making process. The ability to answer structured SQL queries over unstructured data allows for more complex data analysis. Querying unstructured data can be accomplished with the help of information extraction (IE) techniques. The traditional way is by using the Extract-Transform-Load (ETL) approach, which performs all possible extractions over the document corpus and stores the extracted relational results in a data warehouse. Then, the extracted data is queried. The ETL approach produces results that are out of date and causes an explosion in the number of possible relations and attributes to extract. Therefore, new approaches to perform extraction on-the-fly were developed; however, previous efforts relied on specialized extraction operators, or particular IE algorithms, which limited the optimization opportunities of such queries.
In this work, we propose an on-line approach that integrates the engine of the database management system with IE systems using a new type of view called extraction views. Queries on text documents are evaluated using these extraction views, which get populated at query-time with newly extracted data. Our approach enables the optimizer to apply all well-defined optimization techniques. The optimizer selects the best execution plan using a defined cost model that considers a user-defined balance between the cost and quality of extraction, and we explain the trade-off between the two factors. The main contribution is the ability to run on-demand information extraction to consider latest changes in the data, while avoiding unnecessary extraction from irrelevant text documents.
|
813 |
Utilizing the Canadian Long-Term Pavement Performance (C-LTPP) Database for Asphalt Dynamic Modulus PredictionKorczak, Richard January 2013 (has links)
In 2007, the Mechanistic-Empirical Pavement Design Guide (MEPDG) was successfully approved as the new American Association of State Highway and Transportation Officials (AASHTO) pavement design standard (Von Quintus et al., 2007). Calibration and validation of the MEPDG is currently in progress in several provinces across Canada. The MEPDG will be used as the standard pavement design methodology for the foreseeable future (Tighe, 2013).
This new pavement design process requires several parameters specific to local conditions of the design location. In order to perform an accurate analysis, a database of parameters including those specific to local materials, climate and traffic are required to calibrate the models in the MEPDG.
In 1989, the Canadian Strategic Highway Research Program (C-SHRP) launched a national full scale field experiment known as the Canadian Long-Term Pavement Performance (C-LTPP) program. Between the years, 1989 and 1992, a total of 24 test sites were constructed within all ten provinces. Each test site contained multiple monitored sections for a total of 65 sections. Each of these sites received rehabilitation treatments of various thicknesses of asphalt overlays. The C-LTPP program attempted to design and build the test sections across Canada so as to cover the widest range of experimental factors such as traffic loading, environmental region, and subgrade type. With planned strategic pavement data collection cycles, it would then be possible to compare results obtained at different test sites (i.e. across traffic levels, environmental zones, soil types) across the country.
The United States Long-Term Pavement Performance (US-LTPP) database is serving as a critical tool in implementing the new design guide. The MEPDG was delivered with the prediction models calibrated to average national conditions. For the guide to be an effective resource for individual agencies, the national models need to be evaluated against local and regional performance. The results of these evaluations are being used to determine if local calibration is required. It is expected that provincial agencies across Canada will use both C-LTPP and US-LTPP test sites for these evaluations. In addition, C-LTPP and US-LTPP sites provide typical values for many of the MEPDG inputs (C-SHRP, 2000).
The scope of this thesis is to examine the existing data in the C-LTPP database and assess its relevance to Canadian MEPDG calibration. Specifically, the thesis examines the dynamic modulus parameter (|E*|) and how it can be computed using existing C-LTPP data and an Artificial Neural Network (ANN) model developed under a Federal Highway Administration (FHWA) study (FHWA, 2011).
The dynamic modulus is an essential property that defines the stiffness characteristics of a Hot Mix Asphalt (HMA) mixture as a function of both its temperature and rate of loading. |E*| is also a primary material property input required for a Level 1 analysis in the MEPDG. In order to perform a Level 1 MEPDG analysis, detailed local material, environmental and traffic parameters are required for the pavement section being analyzed. Additionally, it can be used in various pavement response models based on visco-elasticity.
The dynamic modulus values predicted using both Level 2 and Level 3 viscosity-based ANN models in the ANNACAP software showed a good correlation to the measured dynamic modulus values for two C-LTPP test sections and supplementary Ontario mixes. These findings support previous research findings done during the development of the ANN models. The viscosity-based prediction model requires the least amount data in order to run a prediction. A Level 2 analysis requires mix volumetric data as well as viscosity testing and a Level 3 analysis only requires the PG grade of the binder used in the HMA. The ANN models can be used as an alternative to the MEPDG default predictions (Level 3 analysis) and to develop the master curves and determine the parameters needed for a Level 1 MEPDG analysis. In summary, Both the Level 2 and Level 3 viscosity-based model results demonstrated strong correlations to measured values indicating that either would be a suitable alternative to dynamic modulus laboratory testing.
The new MEPDG design methodology is the future of pavement design and research in North America. Current MEPDG analysis practices across the country use default inputs for the dynamic modulus. However, dynamic modulus laboratory characterization of asphalt mixes across Canada is time consuming and not very cost-effective. This thesis has shown that Level 2 and Level 3 viscosity-based ANN predictions can be used in order to perform a Level 1 MEPDG analysis. Further development and use of ANN models in dynamic modulus prediction has the potential to provide many benefits.
|
814 |
Data Quality By Design: A Goal-oriented ApproachJiang, Lei 13 August 2010 (has links)
A successful information system is the one that meets its design goals. Expressing these goals and subsequently translating them into a working solution is a major challenge for information systems engineering. This thesis adopts the concepts and techniques from goal-oriented (software)
requirements engineering research for conceptual database design, with a focus on data quality issues. Based on a real-world case study, a goal-oriented process is proposed for database requirements analysis and modeling. It spans from analysis of high-level stakeholder goals to detailed design of a conceptual databases schema. This process is then extended specifically for dealing with data quality issues: data of low quality may be detected and corrected by performing various quality assurance activities; to support these activities, the schema needs to be revised by accommodating additional data requirements. The extended process therefore focuses on analyzing and modeling quality assurance data requirements.
A quality assurance activity supported by a revised schema may involve manual work,
and/or rely on some automatic techniques, which often depend on the specification and enforcement of data quality rules. To address the constraint aspect in conceptual database design, data quality rules are classified according to a number of domain and application independent properties. This classification can be used to guide rule designers and to facilitate building of a
rule repository. A quantitative framework is then proposed for measuring and comparing DQ
rules according to one of these properties: effectiveness; this framework relies on derivation of formulas that represent the effectiveness of DQ rules under different probabilistic assumptions.
A semi-automatic approach is also presented to derive these effectiveness formulas.
|
815 |
Using System Structure and Semantics for Validating and Optimizing Performance of Multi-tier Storage SystemsSoundararajan, Gokul 01 September 2010 (has links)
Modern persistent storage systems must balance two competing imperatives: they must meet strict application-level performance goals and they must reduce the operating costs. The current techniques of either manual tuning by administrators or by over-provisioning resources are either time-consuming or expensive. Therefore, to reduce the costs of management, automated performance-tuning solutions are needed.
To address this need, we develop and evaluate algorithms centered around the key thesis that a holistic semantic-aware view of the application and system is needed for automatically tuning and validating the performance of multi-tier storage systems. We obtain this global system view by leveraging structural and semantic information available at each tier and by making this information available to all tiers. Specifically, we develop two key build- ing blocks: (i) context-awareness, where information about the application structure and semantics is exchanged between the tiers, and (ii) dynamic performance models that use the structure of the system to build lightweight resource-to-performance mappings quickly. We implement a prototype storage system, called Akash, based on commodity components. This prototype enables us to study all above scenarios in a realistic rendering of a modern multi-tier storage system. We also develop a runtime tool, Dena, to analyze the performance and behaviour of multi-tier server systems.
We apply these tools and techniques in three real-world scenarios. First, we leverage application context-awareness at the storage server in order to improve the performance of I/O prefetching. Tracking application access patterns per context enables us to improve
the prediction accuracy for future access patterns, over existing algorithms, where the high interleaving of I/O accesses from different contexts make access patterns hard to recognize. Second, we build and leverage dynamic performance models for resource allocation, providing consistent and predictable performance, corresponding to pre-determined application goals. We show that our dynamic resource allocation algorithms minimize the interference effects between e-commerce applications sharing a common infrastructure. Third, we introduce a high-level paradigm for interactively validating system performance by the system administrator. The administrator leverages existing performance models and other semantic knowledge about the system in order to discover bottlenecks and other opportunities for performance improvements. Our evaluation shows that our techniques enable significant improvements in performance over current approaches.
|
816 |
Investigating the Process of Developing a KDD Model for the Classification of Cases with Cardiovascular Disease Based on a Canadian DatabaseLiu, Chenyu January 2012 (has links)
Medicine and health domains are information intensive fields as data volume has been
increasing constantly from them. In order to make full use of the data, the technique of
Knowledge Discovery in Databases (KDD) has been developed as a comprehensive pathway
to discover valid and unsuspected patterns and trends that are both understandable and useful to data analysts.
The present study aimed to investigate the entire KDD process of developing a classification model for cardiovascular disease (CVD) from a Canadian dataset for the first time. The research data source was Canadian Heart Health Database, which contains 265 easily collected variables and 23,129 instances from ten Canadian provinces. Many practical issues involving in different steps of the integrated process were addressed, and possible solutions were suggested based on the experimental results. Five specific learning schemes representing five distinct KDD approaches were employed, as they were never compared with one another. In addition, two improving approaches including cost-sensitive learning and ensemble learning were also examined. The performance of developed models was
measured in many aspects. The data set was prepared through data cleaning and missing value imputation. Three pairs of experiments demonstrated that the dataset balancing and outlier removal exerted positive influence to the classifier, but the variable normalization was not helpful. Three combinations of subset generation method and evaluation function were tested in variable
subset selection phase, and the combination of Best-First search and Correlation-based
Feature Selection showed comparable goodness and was maintained for other benefits.
Among the five learning schemes investigated, C4.5 decision tree achieved the best
performance on the classification of CVD, followed by Multilayer Feed-forward Network, KNearest Neighbor, Logistic Regression, and Naïve Bayes. Cost-sensitive learning exemplified by the MetaCost algorithm failed to outperform the single C4.5 decision tree when varying the cost matrix from 5:1 to 1:7. In contrast, the models developed from ensemble modeling, especially AdaBoost M1 algorithm, outperformed other models.
Although the model with the best performance might be suitable for CVD screening in
general Canadian population, it is not ready to use in practice. I propose some criteria to improve the further evaluation of the model. Finally, I describe some of the limitations of the study and propose potential solutions to address such limitations through out the KDD process. Such possibilities should be explored in further research.
|
817 |
SPIRAL CONSTRUCTION OF SYNTACTICALLY ANNOTATED SPOKEN LANGUAGE CORPUSInagaki, Yasuyoshi, Kawaguchi, Nobuo, Matsubara, Shigeki, Ohno, Tomohiro 26 October 2003 (has links)
No description available.
|
818 |
Flexible Monitoring of Storage I/OBenke, Tim 17 June 2009 (has links)
For any computer system, monitoring its performance is vital to understanding and fixing problems and performance bottlenecks. In this work we present the architecture and implementation of a system for monitoring storage devices that serve virtual machines. In contrast to existing approaches, our system is more flexible because it employs a query language that can capture both specific and detailed information on I/O transfers. Therefore our monitoring solution provides the user with enough statistics to enable him or her to find and solve problems, but not overwhelm them with too much information. Our system monitors I/O activity in virtual machines and supports basic distributed query processing. Experiments show the performance overhead of the prototype implementation to be acceptable in many realistic settings.
|
819 |
A Web-Based Database Application as an Analysis Tool for Energy Use and Carbon Dioxide EmissionTuran, Biray Jr January 2009 (has links)
The aim of this thesis project was to migrate an existing excel-based application, used to analyze energy use and carbon dioxide emission of companies, to a web-based application. Special development questions were put around which software development process, solution stack and user interface to be used according to the company needs. The spiral lifecycle model has been chosen because it provides a clear view of the process and has the concept of early prototypes. A solution stack based on Linux, Apache, PHP and MySQL has been chosen because such approach has met the company requirements in terms of cost, security, support, and maintenance. As a result, the developed web-based system overcomes the problems normally found in excel-based applications, such as application deployment and maintenance, and provides a more usable and richer user interface.
|
820 |
A Web-Based Database Application as an Analysis Tool for Energy Use and Carbon Dioxide EmissionTuran, Biray Jr Unknown Date (has links)
The aim of this thesis project was to migrate an existing excel-based application, used to analyze energy use and carbon dioxide emission of companies, to a web-based application. Special development questions were put around which software development process, solution stack and user interface to be used according to the company needs. The spiral lifecycle model has been chosen because it provides a clear view of the process and has the concept of early prototypes. A solution stack based on Linux, Apache, PHP and MySQL has been chosen because such approach has met the company requirements in terms of cost, security, support, and maintenance. As a result, the developed web-based system overcomes the problems normally found in excel-based applications, such as application deployment and maintenance, and provides a more usable and richer user interface.
|
Page generated in 0.0539 seconds