Global ETD Search

61	Data quality and data cleaning in database applications Li, Lin January 2012 (has links) Today, data plays an important role in people's daily activities. With the help of some database applications such as decision support systems and customer relationship management systems (CRM), useful information or knowledge could be derived from large quantities of data. However, investigations show that many such applications fail to work successfully. There are many reasons to cause the failure, such as poor system infrastructure design or query performance. But nothing is more certain to yield failure than lack of concern for the issue of data quality. High quality of data is a key to today's business success. The quality of any large real world data set depends on a number of factors among which the source of the data is often the crucial factor. It has now been recognized that an inordinate proportion of data in most data sources is dirty. Obviously, a database application with a high proportion of dirty data is not reliable for the purpose of data mining or deriving business intelligence and the quality of decisions made on the basis of such business intelligence is also unreliable. In order to ensure high quality of data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. This thesis is focusing on the improvement of data quality in database applications with the help of current data cleaning methods. It provides a systematic and comparative description of the research issues related to the improvement of the quality of data, and has addressed a number of research issues related to data cleaning. In the first part of the thesis, related literature of data cleaning and data quality are reviewed and discussed. Building on this research, a rule-based taxonomy of dirty data is proposed in the second part of the thesis. The proposed taxonomy not only summarizes the most dirty data types but is the basis on which the proposed method for solving the Dirty Data Selection (DDS) problem during the data cleaning process was developed. This helps us to design the DDS process in the proposed data cleaning framework described in the third part of the thesis. This framework retains the most appealing characteristics of existing data cleaning approaches, and improves the efficiency and effectiveness of data cleaning as well as the degree of automation during the data cleaning process. Finally, a set of approximate string matching algorithms are studied and experimental work has been undertaken. Approximate string matching is an important part in many data cleaning approaches which has been well studied for many years. The experimental work in the thesis confirmed the statement that there is no clear best technique. It shows that the characteristics of data such as the size of a dataset, the error rate in a dataset, the type of strings in a dataset and even the type of typo in a string will have significant effect on the performance of the selected techniques. In addition, the characteristics of data also have effect on the selection of suitable threshold values for the selected matching algorithms. The achievements based on these experimental results provide the fundamental improvement in the design of 'algorithm selection mechanism' in the data cleaning framework, which enhances the performance of data cleaning system in database applications. 005.72 QA76 Computer software
62	A holistic semantic based approach to component specification and retrieval Li, Chengpu January 2012 (has links) Component-Based Development (CBD) has been broadly used in software development as it enhances the productivity and reduces the costs and risks involved in systems development. It has become a well-understood and widely used technology for developing not only large enterprise applications, but also a whole spectrum of software applications, as it offers fast and flexible development. However, driven by the continuous expansions of software applications, the increase in component varieties and sizes and the evolution from local to global component repositories, the so-called component mismatch problem has become an even more severe hurdle for component specification and retrieval. This problem not only prevents CBD from reaching its full potential, but also hinders the acceptance of many existing component repository. To overcome the above problem, existing approaches engaged a variety of technologies to support better component specification and retrieval. The existing approaches range from the early syntax-based (traditional) approaches to the recent semantic-based approaches. Although the different technologies are proposed to achieve accurate description of the component specification and/or user query in their specification and retrieval, the existing semantic-based approaches still fail to achieve the following goals which are desired for present component reuse: precise, automated, semantic-based and domain capable. This thesis proposes an approach, namely MVICS-based approach, aimed at achieving holistic, semantic-based and adaptation-aware component specification and retrieval. As the foundation, a Multiple-Viewed and Interrelated Component Specification ontology model (MVICS) is first developed for component specification and repository building. The MVICS model provides an ontology-based architecture to specify components from a range of perspectives; it integrates the knowledge of Component-Based Software Engineering (CBSE), and supports ontology evolution to reflect the continuous developments in CBD and components. A formal definition of the MVICS model is presented, which ensures the rigorousness of the model and supports the high level of automation of the retrieval. Furthermore, the MVICS model has a smooth mechanism to integrate with domain related software system ontology. Such integration enhances the function and application scope of the MVICS model by bringing more domain semantics into component specification and retrieval. Another improved feature of the proposed approach is that the effect of possible component adaptation is extended to the related components. Finally a comprehensive profile of the result components shows the search results to the user from a summary to satisfied and unsatisfied discrepancy details. The above features of the approach are well integrated, which enables a holistic view in semantic-based component specification and retrieval. A prototype tool was developed to exert the power of the MVICS model in expressing semantics and process automation in component specification and retrieval. The tool implements the complete process of component search. Three case studies have been undertaken to illustrate and evaluate the usability and correctness of the approach, in terms of supporting accurate component specification and retrieval, seamless linkage with a domain ontology, adaptive component suggestion and comprehensive result component profile. A conclusion is drawn based on an analysis of the feedback from the case studies, which shows that the proposed approach can be deployed in real life industrial development. The benefits of MVICS include not only the improvement of the component search precision and recall, reducing the development time and the repository maintenance effort, but also the decrease of human intervention on CBD. 005.1 QA76 Computer software
63	Novel hyper-heuristics applied to the domain of bin packing Sim, Kevin January 2014 (has links) Principal to the ideology behind hyper-heuristic research is the desire to increase the level of generality of heuristic procedures so that they can be easily applied to a wide variety of problems to produce solutions of adequate quality within practical timescales. This thesis examines hyper-heuristics within a single problem domain, that of Bin Packing where the benefits to be gained from selecting or generating heuristics for large problem sets with widely differing characteristics is considered. Novel implementations of both selective and generative hyper-heuristics are proposed. The former approach attempts to map the characteristics of a problem to the heuristic that best solves it while the latter uses Genetic Programming techniques to automate the heuristic design process. Results obtained using the selective approach show that solution quality was improved significantly when contrasted to the performance of the best single heuristic when applied to large sets of diverse problem instances. Although enforcing the benefits to be gained by selecting from a range of heuristics the study also highlighted the lack of diversity in human designed algorithms. Using Genetic Programming techniques to automate the heuristic design process allowed both single heuristics and collectives of heuristics to be generated that were shown to perform significantly better than their human designed counterparts. The thesis concludes by combining both selective and generative hyper-heuristic approaches into a novel immune inspired system where heuristics that cover distinct areas of the problem space are generated. The system is shown to have a number of advantages over similar cooperative approaches in terms of its plasticity, efficiency and long term memory. Extensive testing of all of the hyper-heuristics developed on large sets of both benchmark and newly generated problem instances enforces the utility of hyper-heuristics in their goal of producing fast understandable procedures that give good quality solutions for a range of problems with widely varying characteristics. 006.3 QA76 Computer software
64	Evaluating book and hypertext : analysis of individual differences Wilkinson, Simon January 2001 (has links) This thesis investigates the usability of an 800 page textbook compared with a hypertext version containing the same information. Hypertext is an interesting new medium in that it is seen as possessing advantages as both delivery technology that influence cost and access to information and design technology influencing student achievement. Unfortunately the proclamations of its advocates have usually exceeded empirical findings. Also, rapid advances in both hardware and software are necessitating the frequent re-evaluation of contemporary hypertext. In addition to an up-to-date evaluation of the relative performance of book and hypertext supporting set tasks, the research reported in this thesis also sought to specifically analyse the potential role individual differences could play within media evaluation. To do this the cognitive styles and spatial ability of 57 postgraduate student volunteers, from two computer related diplomas, were measured. Half the subjects were then randomly assigned to a Book group and half to a Hypertext group. Each group was then allocated the same amount of time to complete two separate tasks: 1) short answer questions analysing the basic information retrieval potential of each medium, and one week later 2) four open-ended short essay questions. Surprisingly, subjects assigned to the Book group performed significantly better than those assigned to the Hypertext group for Task 1. The mean academic performance of subjects (the mean mark obtained over the 8 modules of their diploma) predicted most variance in Task 1 performance for both groups. However, with Task 2, the cognitively more demanding exercise, none of the measured individual differences could significantly predict the scores of subjects. Another surprising finding, given that all subjects were studying computing, was that the amount of prior computing experience was found to approach significance for those subjects assigned to Hypertext for Task 1. Given the ease with which this particular individual difference could be manipulated it was decided to run a second experiment employing -subjects with more experience of the Hypertext system used. The results from this second cohort showed no significant differences in score for either task between Book or Hypertext. However, as the more qualitative data from a questionnaire showed, there are a large number of different factors and issues that contribute to the ultimate acceptability of one medium compared with the other. The thesis concludes by recommending a number of possible avenues for future research looking at the role hypertext has to play in the construction of hyperlibraries and Virtual Learning Environments. 004 QA76 Computer software
65	Generative aspect-oriented component adaptation Feng, Yankui January 2008 (has links) Due to the availability of components and the diversity of target applications, mismatches between pre-qualified existing components and the particular reuse context in applications are often inevitable and have been a major hurdle of component reusability and successful composition. Although component adaptation has acted as a key solution for eliminating these mismatches, existing practices are either only capable for adaptation at the interface level, or require too much intervention from software engineers. Another weakness of existing approaches is the lack of reuse of component adaptation knowledge. Aspect Oriented Programming (AOP) is a new methodology that provides separation of crosscutting concerns by introducing a new unit of modularization - an Aspect that crosscuts other modules. In this way, all the associated complexity of the crosscutting concerns is isolated into the Aspects, hence the final system becomes easier to design, implement and maintain. The nature of AOP makes it particularly suitable for addressing non-functional mismatches with component-based systems. However, current AOP techniques are not powerful enough for efficient component adaptation due to the weaknesses they have, including the limited reusability of Aspects, platform specific Aspects, and naive weaving processes. Therefore, existing AOP technology needs to be expanded before it can be used for efficient component adaptation. This thesis presents a highly automated approach to component adaptation through product line based Generative Aspect Oriented Component adaptation. In the approach, the adaptation knowledge is captured in Aspects and aims to be reusable in various adaptation circumstances. Automatic generation of adaptation Aspects is developed as a key technology to improve the level of automation of the approach and the reusability of adaptation knowledge. This generation is realised by developing a two dimensional Aspect model, which incorporates the technologies of software product line and generative programming. The adaptability and automation of the approach is achieved in an Aspect oriented component adaptation framework by generating and then applying the adaptation Aspects under a designed weaving process according to specific adaptation requirements. To expand the adaptation power of AOP, advanced Aspect weaving processes have been developed with the support of an enhanced aspect weaver. To promote the reusability of adaptation Aspects, an expandable repository of reusable adaptation Aspects has been developed based on the proposed two-dimensional Aspect model. A prototype tool is built as a leverage of the approach and automates the adaptation process. Case studies have been done to illustrate and evaluate the approach, in terms of its capability of building highly reusable Aspects across various AOP platforms and providing advanced weaving process. In summary, the proposed approach applies Generative Aspect Oriented Adaptation to targeted components to correct the mismatch problem so that the components can be integrated into a target application easily. The automation of the adaptation process, the deep level of the adaptation, and the reusability of adaptation knowledge are the advantages of the approach. 005.3 QA76 Computer software
66	A software framework for the microscopic modelling of pedestrian movement Kukla, Robert January 2007 (has links) A town planner, faced with the task of designing attractive walking spaces, needs a tool that will allow different designs to be compared in terms of their attractiveness as well as their effectiveness. PEDFLOW is an attempt to create such a tool. It is an agent-based, microscopic model of pedestrian flow where virtual pedestrians navigate a virtual environment. On their way towards a goal the agents, representing pedestrians, interact with features of the environment and with other agents. The microscopic, rule-based actions result in an emergent behaviour that mimics that of real pedestrians. Pedestrians are subjected to a multitude of influences when walking. The majority of existing models only focus on a single aspect, typically the avoidance of obstructions or other pedestrians. PEDFLOW uses an implementation of context-mediated behaviour to enable the agents to deal with multiple cause-effect relations in a well-defined and flexible yet highly efficient manner. A variety of mobile and immobile entities can be modelled by objects in an object-oriented environment. The model is informed by an empirical study of pedestrian behaviour and the parameters of the agents are derived from measures of observed pedestrian movement. PEDFLOW's suitability for pedestrian modelling in the described context is evaluated in both qualitative and quantitative terms. Typical macroscopic movement patterns from the real world such as "platooning" and "walking with a partner" are selected and the corresponding emergent model behaviours investigated. Measures of service (MOS) are defined end extracted from the model for comparison with real world measures. As PEDFLOW was created as an interactive tool to be used in an office environment rather than in a high performance lab, the scalability and performance limitations are explored with regards to the size of the modelled area, the number of modelled pedestrians and the complexity of the interactions between them. It is shown that PEDFLOW can be a useful tool in the urban design process. 005.3 QA76 Computer software
67	Metaheuristics for university course timetabling Lewis, Rhydian M. R. January 2006 (has links) The work presented in this thesis concerns the problem of timetabling at universities – particularly course-timetabling, and examines the various ways in which metaheuristic techniques might be applied to these sorts of problems. Using a popular benchmark version of a university course timetabling problem, we examine the implications of using a “twostaged” algorithmic approach, whereby in stage-one only the mandatory constraints are considered for satisfaction, with stage-two then being concerned with satisfying the remaining constraints but without re-breaking any of the mandatory constraints in the process. Consequently, algorithms for each stage of this approach are proposed and analysed in detail. For the first stage we examine the applicability of the so-called Grouping Genetic Algorithm (GGA). In our analysis of this algorithm we discover a number of scaling-up issues surrounding the general GGA approach and discuss various reasons as to why this is so. Two separate ways of enhancing general performance are also explored. Secondly, an Iterated Heuristic Search algorithm is also proposed for the same problem, and in experiments it is shown to outperform the GGA in almost all cases. Similar observations to these are also witnessed in a second set of experiments, where the analogous problem of colouring equipartite graphs is also considered. Two new metaheuristic algorithms are also proposed for the second stage of the twostaged approach: an evolutionary algorithm (with a number of new specialised evolutionary operators), and a simulated annealing-based approach. Detailed analyses of both algorithms are presented and reasons for their relative benefits and drawbacks are discussed. Finally, suggestions are also made as to how our best performing algorithms might be modified in order to deal with further “real-world” constraints. In our analyses of these modified algorithms, as well as witnessing promising behaviour in some cases, we are also able to highlight some of the limitations of the two-stage approach in certain cases. 006.3 QA76 Computer software
68	An expert system for the performance control of rotating machinery Pearson, William N. January 2000 (has links) This research presented in this thesis examines the application of feed forward neural networks to the performance control of a gas transmission compressor. It is estimated that a global saving in compressor fuel gas of 1% could save the production of 6 million tonnes of CO2 per year. Current compressor control philosophy pivots around prevention of surge or anti-surge control. Prevention of damage to high capital cost equipment is a key control driver but other factors such as environmental emissions restrictions require most efficient use of fuel. This requires reliable and accurate performance control. A steady state compressor model was developed. Actual compressor performance characteristics were used in the model and correlations were applied to determine the adiabatic head characteristics for changed process conditions. The techniques of neural network function approximation and pattern recognition were investigated. The use of neural networks can avoid the potential difficulties in specifying regression model coefficients. Neural networks can be readily re-trained, once a database is populated, to reflect changing characteristics of a compressor. Research into the use of neural networks to model compressor performance characteristics is described. A program of numerical testing was devised to assess the performance of neural networks. Testing was designed to evaluate training set size, signal noise, extrapolated data, random data and use of normalised compressor coefficient data on compressor speed estimates. Data sets were generated using the steady state compressor model. The results of the numerical testing are discussed. Established control paradigms are reviewed and the use of neural networks in control l'Iystems were identified. These were generally to be found in the areas of adaptive or model predictive control. Algorithms required to implement a novel compressor performance control scheme are described. A review of plant control hierarchies has identified how the Mdwme might be implemented. The performance control algorithm evaluates current !,!'Ocells load and suggests a new compressor speed or updates the neural network model. {'ornpressor speed can be predicted to approximately ± 2.5% using a neural network h,lt1l'd model predictive performance controller. Comparisons with previous work suggest l'1l1t 'IlUal global savings of 34 million tonnes of CO2 emissions per year. A generic, rotating machinery performance control expert system is proposed. 629.8 QA76 Computer software
69	Forensic verification of operating system activity via novel data, acquisition and analysis techniques Graves, Jamie Robert January 2009 (has links) Digital Forensics is a nascent field that faces a number of technical, procedural and cultural difficulties that must be overcome if it is to be recognised as a scientific discipline, and not just an art. Technical problems involve the need to develop standardised tools and techniques for the collection and analysis of digital evidence. This thesis is mainly concerned with the technical difficulties faced by the domain. In particular, the exploration of techniques that could form the basis of trusted standards to scientifically verify data. This study presents a set of techniques, and methodologies that can be used to describe the fitness of system calls originating from the Windows NT platform as a form of evidence. It does so in a manner that allows for open investigation into the manner in which the activities described by this form of evidence can be verified. The performance impact on the Device Under Test (DUT) is explored via the division of the Windows NT system calls into service subsets. Of particular interest to this work is the file subset, as the system calls can be directly linked to user interaction. The subsequent quality of data produced by the collection tool is examined via the use of the Basic Local Alignment Search Tool (BLAST) sequence alignment algorithm . In doing so, this study asserts that system calls provide a recording, or time line, of evidence extracted from the operating system, which represents actions undertaken. In addition, it asserts that these interactions can be compared against known profiles (fingerprints) of activity using BLAST, which can provide a set of statistics relating to the quality of match, and a measure of the similarities of sequences under scrutiny. These are based on Karlin-Altschul statistics which provides, amongst other values, a P-Value to describe how often a sequence will occur within a search space. The manner in which these statistics are calculated is augmented by the novel generation of the NM1,5_D7326 scoring matrix based on empirical data gathered from the operating system, which is compared against the de facto, biologically generated, BLOSUM62 scoring matrix. The impact on the Windows 2000 and Windows XP DUTs of monitoring most of the service subsets, including the file subset, is statistically insignificant when simple user interactions are performed on the operating system. For the file subset, p = 0.58 on Windows 2000 Service Pack 4, and p = 0.84 on Windows XP Service Pack 1. This study shows that if the event occurred in a sequence that originated on an operating system that was not subjected to high process load or system stress, a great deal of confidence can be placed in a gapped match, using either the NM_I.5~7326 or BLOSUM62 scoring matrices, indicating an event occurred, as all fingerprints of interest (FOI) were identified. The worst-case BLOSUM62 P-Value = 1.10E-125, and worst-case NM1.5_D7326 P-Value = 1.60E-72, showing that these matrices are comparable in their sensitivity during normal system conditions. This cannot be said for sequences gathered during high process load or system stress conditions. The NM1.5_D7326 scoring matrix failed to identify any FOI. The BLOSUM62 scoring matrix returned a number of matches that may have been the FOI, as discerned via the supporting statistics, but were not positively identified within the evaluation criteria. The techniques presented in this thesis are useful, structured and quantifiable. They provide the basis for a set of methodologies that can be used for providing objective data for additional studies into this form of evidence, which can further explore the details of the calibration and analysis methods, thus supplying the basis for a trusted form of evidence, which may be described as fit-for-purpose. 005.3 QA76 Computer software
70	Wild networks : the articulation of feedback and evaluation in a creative inter-disciplinary design studio Joel, Sian January 2011 (has links) It is argued that design exists within a collective social network of negotiation, feedback sharing and reflection that is integral to the design process. To encourage this, requires a technological solution that enables designers to access, be aware of, and evaluate the work of others, and crucially, reflect upon how they are socially influenced. However in order to develop software that accurately reveals peer valuation, an understanding is required of the sociality at work in an interdisciplinary design studio. This necessitates an acknowledgement of the complexities of the feedback sharing process that is not only socially intricate in nature but is also potentially unacknowledged. In order to develop software that addresses these issues and makes explicit the dynamics of social interaction at play in a design studio, a ‘wild networks' methodological approach is applied to two case studies, one in an educational setting, the other in a professional practice. The ‘wild networks' approach uses social network analysis, through and in conjunction with, contextual observation and is used to map the network of numerous stakeholders, actors, views and perceptions at work. This methodological technique has resulted in an understanding of social networks within a design studio, how they are shaped and formed and has facilitated the development of prototype network visualisation software based upon the needs and characteristics of real design studios. The findings from this thesis can be interpreted in various ways. Firstly the findings from the case studies and from prototype technological representations enhance previous research surrounding the idea of a social model of design. The research identifies and highlights the importance of evolving peer-to-peer feedback, and the role of visual evaluation within social networks of feedback sharing. The results can also be interpreted from a methodological viewpoint. The thesis demonstrates the use of network analysis and contextual observation in providing an effective way of understanding the interactions of designers in a studio, and as an appropriate way to inform the software design process to support creativity. Finally the results can be interpreted from a software design perspective. The research, through the application of a ‘wild networks' methodological process, identifies key features (roles, location, levels, graphics and time), for inclusion within a socially translucent, network visualisation prototype that is based upon real world research. 745.2 QA76 Computer software

Search results