Spelling suggestions: "subject:"database"" "subject:"catabase""
391 |
Automatic design of batch processing systemsDwyer, Barry, 1938- January 1999 (has links)
Bibliography: p. 281-289. Electronic publication; Full text available in PDF format; abstract in HTML format. Electronic reproduction.[Australia] :Australian Digital Theses Program,2001.
|
392 |
Learning from large data : Bias, variance, sampling, and learning curvesBrain, Damien, mikewood@deakin.edu.au January 2003 (has links)
One of the fundamental machine learning tasks is that of predictive classification. Given that organisations collect an ever increasing amount of data, predictive classification methods must be able to effectively and efficiently handle large amounts of data. However, it is understood that present requirements push existing algorithms to, and sometimes beyond, their limits since many classification prediction algorithms were designed when currently common data set sizes were beyond imagination.
This has led to a significant amount of research into ways of making classification learning algorithms more effective and efficient. Although substantial progress has been made, a number of key questions have not been answered.
This dissertation investigates two of these key questions. The first is whether different types of algorithms to those currently employed are required when using large data sets. This is answered by analysis of the way in which the bias plus variance decomposition of predictive classification error changes as training set size is increased. Experiments find that larger training sets require different types of algorithms to those currently used. Some insight into the characteristics of suitable algorithms is provided, and this may provide some direction for the development of future classification prediction algorithms which are specifically designed for use with large data sets.
The second question investigated is that of the role of sampling in machine learning with large data sets. Sampling has long been used as a means of avoiding the need to scale up algorithms to suit the size of the data set by scaling down the size of the data sets to suit the algorithm. However, the costs of performing sampling have not been widely explored. Two popular sampling methods are compared with learning from all available data in terms of predictive accuracy,
model complexity, and execution time. The comparison shows that sub-sampling generally products models with accuracy close to, and sometimes greater than, that obtainable from learning with all available data. This result suggests that it may be possible to develop algorithms that take advantage of the sub-sampling methodology to reduce the time required to infer a model while sacrificing little if any accuracy.
Methods of improving effective and efficient learning via sampling are also investigated, and now sampling methodologies proposed. These methodologies include using a varying-proportion of instances to determine the next inference step and using a statistical calculation at each inference step to determine sufficient sample size. Experiments show that using a statistical calculation of sample size can not only substantially reduce execution time but can do so with only a small loss, and occasional gain, in accuracy.
One of the common uses of sampling is in the construction of learning curves. Learning curves are often used to attempt to determine the optimal training size which will maximally reduce execution time while nut being detrimental to accuracy. An analysis of the performance of methods for detection of convergence of learning curves is performed, with the focus of the analysis on methods that calculate the gradient, of the tangent to the curve. Given that such methods can be susceptible to local accuracy plateaus, an investigation into the frequency of local plateaus is also performed. It is shown that local accuracy plateaus are a common occurrence, and that ensuring a small loss of accuracy often results in greater computational cost than learning from all available data. These results cast doubt over the applicability of gradient of tangent methods for detecting convergence, and of the viability of learning curves for reducing execution time in general.
|
393 |
以資料庫為核心之超文件應用系統設計與開發 / Database-Centric Hypertext Applications Design and Development王漪萍, Wang, I-Ping Unknown Date (has links)
隨著WWW從單純的資訊傳播管道演進到可作為企業應用的平台,以Web為導向的電腦應用已被企業視為是最具競爭力的武器。基於WWW潛在的龐大商業利益,有越來越多的組織或個人想建立自己的網站,在Internet上發展企業對企業或企業對個人的應用程式。如何設計Web應用系統已成為最重要的課題。然而目前Web程式的開發方法混亂而且沒有標準化,有些建置方式並沒有依循系統分析與設計的原則。簡而言之,我們需要特別的方法與工具來支援超文件系統的開發。所以本研究希望找到一個結構化的步驟與流程來開發以資料庫為核心的超文件應用系統,此外還會製作一套可以支援超文件應用系統設計流程的工具軟體雛形。 / As World Wide Web has evolved from the simple delivery mechanism to a platform for complex business applications, Web-based business computing is already seen as a new competitive business weapon. The potential commercial payoff of WWW results in more and more organizations or people want to construct their own web sites, developing business-to-business or business-to-consumer applications upon the Internet. Designing Web applications has become the serious issue. However, current methods of designing and modeling hypertext applications are still in chaos and keeps ad-hoc nature. Some of the development methods lack serious system analysis and design. In a word, we need special methods and tools to support the development of hypertext applications. Therefore, this research intends to present a structured approach, a step-by-step procedure for developing database-centric hypertext applications. In addition, this paper will implement a tool kit system prototype to support the process of design hypertext applications.
|
394 |
Towards the development of a defensive cyber damage and mission impact methodologyFortson, Larry W., January 1900 (has links)
Thesis (M.S.)--Air Force Institute of Technology, 2007. / AFIT/GIR/ENV/07-M9. Title from title page of PDF document (viewed on: Nov. 29, 2007). "March 2007." Includes bibliographical references (leaves 226-237).
|
395 |
An empirical study of the use of conceptual models for mutation testing of database application programsWu, Yongjian, January 2006 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2007. / Title proper from title frame. Also available in printed format.
|
396 |
Document management and retrieval for specialised domains: an evolutionary user-based approachKim, Mihye, Computer Science & Engineering, Faculty of Engineering, UNSW January 2003 (has links)
Browsing marked-up documents by traversing hyperlinks has become probably the most important means by which documents are accessed, both via the World Wide Web (WWW) and organisational Intranets. However, there is a pressing demand for document management and retrieval systems to deal appropriately with the massive number of documents available. There are two classes of solution: general search engines, whether for the WWW or an Intranet, which make little use of specific domain knowledge or hand-crafted specialised systems which are costly to build and maintain. The aim of this thesis was to develop a document management and retrieval system suitable for small communities as well as individuals in specialised domains on the Web. The aim was to allow users to easily create and maintain their own organisation of documents while ensuring continual improvement in the retrieval performance of the system as it evolves. The system developed is based on the free annotation of documents by users and is browsed using the concept lattice of Formal Concept Analysis (FCA). A number of annotation support tools were developed to aid the annotation process so that a suitable system evolved. Experiments were conducted in using the system to assist in finding staff and student home pages at the School of Computer Science and Engineering, University of New South Wales. Results indicated that the annotation tools provided a good level of assistance so that documents were easily organised and a lattice-based browsing structure that evolves in an ad hoc fashion provided good efficiency in retrieval performance. An interesting result suggested that although an established external taxonomy can be useful in proposing annotation terms, users appear to be very selective in their use of terms proposed. Results also supported the hypothesis that the concept lattice of FCA helped take users beyond a narrow search to find other useful documents. In general, lattice-based browsing was considered as a more helpful method than Boolean queries or hierarchical browsing for searching a specialised domain. We conclude that the concept lattice of Formal Concept Analysis, supported by annotation techniques is a useful way of supporting the flexible open management of documents required by individuals, small communities and in specialised domains. It seems likely that this approach can be readily integrated with other developments such as further improvements in search engines and the use of semantically marked-up documents, and provide a unique advantage in supporting autonomous management of documents by individuals and groups - in a way that is closely aligned with the autonomy of the WWW.
|
397 |
Similarity-based real-time concurrency control protocolsLai, Chih 29 January 1999 (has links)
Serializability is unnecessarily strict for real-time systems because most transactions
in such systems occur periodically and changes among data values over a
few consecutive periods are often insignificant. Hence, data values produced within
a short interval can be treated as if they are "similar" and interchangeable. This
notion of similarity allows higher concurrency than serializability, and the increased
concurrency may help more transactions to meet their deadlines. The similarity stack
protocol (SSP) proposed in [25, 26] utilizes the concept of similarity. The rules of SSP
are constructed based on prior knowledge of worst-case execution time (WCET) and
data requirements of transactions. As a result, SSP rules need to be re-constructed
each time a real-time application is changed. Moreover, if WCET and data require
merits of transactions are over-estimated, the benefits provided by similarity can be
quickly overshadowed, causing feasible schedules to be rejected.
The advantages of similarity and the drawbacks of SSP motivate us to design
other similarity-based protocols that can better utilize similarity without relying on
any prior information. Since optimistic approaches usually do not require prior information
of transactions, we explore the ideas of integrating optimistic approaches
with similarity in this thesis. We develop three different protocols based on either the
forward-validation or backward-validation mechanisms. We then compare implementation
overheads, number of transaction restarts, length of transaction blocking time,
and predictabilities of these protocols. One important characteristic of our design
is that, when similarity is not applicable, our protocols can still accept serializable
histories. We also study how to extend our protocols to handle aperiodic transactions
and data freshness in this thesis. Finally, a set of simulation experiments is conducted
to compare the deadline miss rates between SSP and one of our protocol. / Graduation date: 1999
|
398 |
Knowledge Integration to Overcome Ontological Heterogeneity: Challenges from Financial Information SystemsFirat, Aykut, Madnick, Stuart E., Grosof, Benjamin 01 1900 (has links)
The shift towards global networking brings with it many opportunities and challenges. In this paper, we discuss key technologies in achieving global semantic interoperability among heterogeneous information systems, including both traditional and web data sources. In particular, we focus on the importance of this capability and technologies we have designed to overcome ontological heterogeneity, a common type of disparity in financial information systems. Our approach to representing and reasoning with ontological heterogeneities in data sources is an extension of the Context Interchange (COIN) framework, a mediator-based approach for achieving semantic interoperability among heterogeneous sources and receivers. We also analyze the issue of ontological heterogeneity in the context of source-selection, and offer a declarative solution that combines symbolic solvers and mixed integer programming techniques in a constraint logic-programming framework. Finally, we discuss how these techniques can be coupled with emerging Semantic Web related technologies and standards such as Web-Services, DAML+OIL, and RuleML, to offer scalable solutions for global semantic interoperability. We believe that the synergy of database integration and Semantic Web research can make significant contributions to the financial knowledge integration problem, which has implications in financial services, and many other e-business tasks. / Singapore-MIT Alliance (SMA)
|
399 |
Multimedia Data Mining and Retrieval for Multimedia Databases Using Associations and CorrelationsLin, Lin 23 June 2010 (has links)
With the explosion in the complexity and amount of pervasive multimedia data, there are high demands of multimedia services and applications in various areas for people to easily access and distribute multimedia data. Facing with abundance multimedia resources but inefficient and rather old-fashioned keyword-based information retrieval approaches, a content-based multimedia information retrieval (CBMIR) system is required to (i) reduce the dimension space for storage saving and computation reduction; (ii) advance multimedia learning methods to accurately identify target semantics for bridging the semantics between low-level/mid-level features and high-level semantics; and (iii) effectively search media content for dynamical media delivery and enable the extensive applications to be media-type driven. This research mainly focuses on multimedia data mining and retrieval system for multimedia databases by addressing some main challenges, such as data imbalance, data quality, semantic gap, user subjectivity and searching issues. Therefore, a novel CBMIR system is proposed in this dissertation. The proposed system utilizes both association rule mining (ARM) technique and multiple correspondence analysis (MCA) technique by taking into account both pattern discovery and statistical analysis. First, media content is represented by the global and local low-level and mid-level features and stored in the multimedia database. Second, a data filtering component is proposed in the system to improve the data quality and reduce the data imbalance. To be specific, the proposed filtering step is able to vertically select features and horizontally prune instances in multimedia databases. Third, a new learning and classification method mining weighted association rules is proposed in the retrieval system. The MCA-based correlation is used to generate and select the weighted N-feature-value pair rules, where the N varies from one to many. Forth, a ranking method independent of classifiers is proposed in the system to sort the retrieved results and put the most interesting ones on the top of the browsing list. Finally, a user interface is implemented in CBMIR system that allows the user to choose his/her interested concept, searches media based on the target concept, ranks the retrieved segments using the proposed ranking algorithm, and then displays the top-ranked segments to the user. The system is experimented with various high-level semantics from TRECVID benchmark data sets. TRECVID sound and vision data is a large data set, includes various types of videos, and has very rich semantics. Overall, the proposed system achieves promising results in comparison with the other well-known methods. Moreover, experiments that compare each component with some other famous algorithms are conducted. The experimental results show that all proposed components improve the functionalities of the CBMIR system, and the proposed system reaches effectiveness, robustness and efficiency for a high-dimensional multimedia database.
|
400 |
A data management framework for secure and dependable data grid /Tu, Manghui, January 2006 (has links)
Thesis (Ph. D.)--University of Texas at Dallas, 2006. / Includes vita. Includes bibliographical references (leaves 231-251).
|
Page generated in 0.0874 seconds