Global ETD Search

791	Detecting worm mutations using machine learning Sharma, Oliver January 2008 (has links) Worms are malicious programs that spread over the Internet without human intervention. Since worms generally spread faster than humans can respond, the only viable defence is to automate their detection. Network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures. Not only does this approach mean that new worms cannot be detected until the corresponding signatures are created, but that mutations of known worms will remain undetected because each mutation will usually have a different signature. The intuitive and seemingly most effective solution is to write more generic signatures, but this has been found to increase false alarm rates and is thus impractical. This dissertation investigates the feasibility of using machine learning to automatically detect mutations of known worms. First, it investigates whether Support Vector Machines can detect mutations of known worms. Support Vector Machines have been shown to be well suited to pattern recognition tasks such as text categorisation and hand-written digit recognition. Since detecting worms is effectively a pattern recognition problem, this work investigates how well Support Vector Machines perform at this task. The second part of this dissertation compares Support Vector Machines to other machine learning techniques in detecting worm mutations. Gaussian Processes, unlike Support Vector Machines, automatically return confidence values as part of their result. Since confidence values can be used to reduce false alarm rates, this dissertation determines how Gaussian Process compare to Support Vector Machines in terms of detection accuracy. For further comparison, this work also compares Support Vector Machines to K-nearest neighbours, known for its simplicity and solid results in other domains. The third part of this dissertation investigates the automatic generation of training data. Classifier accuracy depends on good quality training data -- the wider the training data spectrum, the higher the classifier's accuracy. This dissertation describes the design and implementation of a worm mutation generator whose output is fed to the machine learning techniques as training data. This dissertation then evaluates whether the training data can be used to train classifiers of sufficiently high quality to detect worm mutations. The findings of this work demonstrate that Support Vector Machines can be used to detect worm mutations, and that the optimal configuration for detection of worm mutations is to use a linear kernel with unnormalised bi-gram frequency counts. Moreover, the results show that Gaussian Processes and Support Vector Machines exhibit similar accuracy on average in detecting worm mutations, while K-nearest neighbours consistently produces lower quality predictions. The generated worm mutations are shown to be of sufficiently high quality to serve as training data. Combined, the results demonstrate that machine learning is capable of accurately detecting mutations of known worms. 005.84
792	Hybrid probabilistic broadcast schemes for mobile ad hoc networks Mohammed, Aminu January 2009 (has links) Broadcasting is one of the fundamental data dissemination mechanisms in mobile ad hoc network (MANET), which is, for instance, extensively used in many routing protocols for route discovery process. The dynamic topology and limited communication bandwidth of such networks pose a number of challenges in designing an efficient broadcasting scheme for MANETs. The simplest approach is flooding, where each node retransmit every unique received packet exactly once on each outgoing link. Although flooding ensures that broadcast packet is received by all network nodes, it generates many redundant transmissions which can trigger high transmission collision and contention in the network, a phenomenon referred to as the broadcast storm. Several probabilistic broadcast algorithms have been proposed that incur low communication overhead to mitigate the broadcast storm problem and tend to show superior adaptability in changing environments when compared to deterministic (i.e., non-probabilistic) schemes. However, most of these schemes reduce redundant broadcasts at the expense of reachability, a requirement for near-global network topological information or support from additional hardware. This research argues that broadcast schemes that combine the important features of fixed probabilistic and counter-based schemes can reduce the broadcast storm problem without sacrificing reachability while still achieving better end-to-end delay. To this end, the first part of this research investigate the effects of forwarding probabilities and counter threshold values on the performance of fixed probabilistic and counter-based schemes. The findings of this investigation are exploited to suggest a new hybrid approach, the Probabilistic Counter-Based Scheme (PCBS) that uses the number of duplicate packets received to estimate neighbourhood density and assign a forwarding probability value to restrict the generation of so many redundant broadcast packets. The simulation results reveal that under various network conditions PCBS reduces the number of redundant transmissions, collision rate and end-to-end delay significantly without sacrificing reachability when compared against counter-based, fixed probabilistic and flood broadcasting. Often in MANETs, there are regions of different node density due to node mobility. As such, PCBS can suffer from a degree of inflexibility in terms of rebroadcast probability, since each node is assigned the same forwarding probability regardless of its local neighbourhood conditions. To address this shortcoming, the second part of this dissertation proposes an Adjusted Probabilistic Counter-Based Scheme (APCBS) that dynamically assigns the forwarding probability to a node based on its local node density using a mathematical function. Thus, a node located in a sparse region of the network is assigned a high forwarding probability while a node located in denser region is assigned a relatively lower forwarding probability. These combined effects enhance end-to-end delay, collision rate and reachability compared to PCBS variant. The performance of most broadcasting schemes that have been suggested for MANETs including those presented here, have been analysed in the context of “pure” broadcast scenarios with relatively little investigation towards their performance impact on specific applications such as route discovery process. The final part of this thesis evaluates the performance of the well-known AODV routing protocol when augmented with APCBS route discovery. Results indicate that the resulting route discovery approach reduces the routing overhead, collision rate and end-to-end delay without degrading the overall network throughput compared to the existing approaches based on flooding, counterbased and fixed probabilistic route discovery. 621.382
793	Crossmodal audio and tactile interaction with mobile touchscreens Hoggan, Eve Elizabeth January 2010 (has links) Touchscreen mobile devices often use cut-down versions of desktop user interfaces placing high demands on the visual sense that may prove awkward in mobile settings. The research in this thesis addresses the problems encountered by situationally impaired mobile users by using crossmodal interaction to exploit the abundant similarities between the audio and tactile modalities. By making information available to both senses, users can receive the information in the most suitable way, without having to abandon their primary task to look at the device. This thesis begins with a literature review of related work followed by a definition of crossmodal icons. Two icons may be considered to be crossmodal if and only if they provide a common representation of data, which is accessible interchangeably via different modalities. Two experiments investigated possible parameters for use in crossmodal icons with results showing that rhythm, texture and spatial location are effective. A third experiment focused on learning multi-dimensional crossmodal icons and the extent to which this learning transfers between modalities. The results showed identification rates of 92% for three-dimensional audio crossmodal icons when trained in the tactile equivalents, and identification rates of 89% for tactile crossmodal icons when trained in the audio equivalent. Crossmodal icons were then incorporated into a mobile touchscreen QWERTY keyboard. Experiments showed that keyboards with audio or tactile feedback produce fewer errors and greater speeds of text entry compared to standard touchscreen keyboards. The next study examined how environmental variables affect user performance with the same keyboard. The data showed that each modality performs differently with varying levels of background noise or vibration and the exact levels at which these performance decreases occur were established. The final study involved a longitudinal evaluation of a touchscreen application, CrossTrainer, focusing on longitudinal effects on performance with audio and tactile feedback, the impact of context on performance and personal modality preference. The results show that crossmodal audio and tactile icons are a valid method of presenting information to situationally impaired mobile touchscreen users with recognitions rates of 100% over time. This thesis concludes with a set of guidelines on the design and application of crossmodal audio and tactile feedback to enable application and interface designers to employ such feedback in all systems. 005.3
794	Affect-based information retrieval Arapakis, Ioannis January 2010 (has links) One of the main challenges Information Retrieval (IR) systems face nowadays originates from the semantic gap problem: the semantic difference between a user’s query representation and the internal representation of an information item in a collection. The gap is further widened when the user is driven by an ill-defined information need, often the result of an anomaly in his/her current state of knowledge. The formulated search queries, which are submitted to the retrieval systems to locate relevant items, produce poor results that do not address the users’ information needs. To deal with information need uncertainty IR systems have employed in the past a range of feedback techniques, which vary from explicit to implicit. The first category of feedback techniques necessitates the communication of explicit relevance judgments, in return for better query reformulations and recommendations of relevant results. However, the latter happens at the expense of users’ cognitive resources and, furthermore, introduces an additional layer of complexity to the search process. On the other hand, implicit feedback techniques make inferences on what is relevant based on observations of user search behaviour. By doing so, they disengage users from the cognitive burden of document rating and relevance assessments. However, both categories of RF techniques determine topical relevance with respect to the cognitive and situational levels of interaction, failing to acknowledge the importance of emotions in cognition and decision making. In this thesis I investigate the role of emotions in the information seeking process and develop affective feedback techniques for interactive IR. This novel feedback framework aims to aid the search process and facilitate a more natural and meaningful interaction. I develop affective models that determine topical relevance based on information gathered from various sensory channels, and enhance their performance using personalisation techniques. Furthermore, I present an operational video retrieval system that employs affective feedback to enrich user profiles and offers meaningful recommendations of unseen videos. The use of affective feedback as a surrogate for the information need is formalised as the Affective Model of Browsing. This is a cognitive model that motivates the use of evidence extracted from the psycho-somatic mobilisation that occurs during cognitive appraisal. Finally, I address some of the ethical and privacy issues that arise from the social-emotional interaction between users and computer systems. This study involves questionnaire data gathered over three user studies, from 74 participants of different educational background, ethnicity and search experience. The results show that affective feedback is a promising area of research and it can improve many aspects of the information seeking process, such as indexing, ranking and recommendation. Eventually, it may be that relevance inferences obtained from affective models will provide a more robust and personalised form of feedback, which will allow us to deal more effectively with issues such as the semantic gap. 020
795	Evolutionarily stable and fragile modules of yeast biochemical network Santra, Tapesh January 2011 (has links) Gene and protein interaction networks have evolved to precisely specify cell fates and functions. Here, we analyse whether the architecture of these networks affects evolvability. We find evidence to suggest that in yeast these networks are mainly acyclic, and that evolutionary changes in these parts do not affect their global dynamic properties. In contrast, feedback loops strongly influence dynamic behaviour and are often evolutionarily conserved. Feedback loops are often found to reside in a clustered manner by means of coupling and nesting with each other in the molecular interaction network of yeast. In these clusters some feedback mechanisms are biologically vital for the operation of the module and some provide auxiliary functional assistance. We find that the biologically vital feedback mechanisms are highly conserved in both transcription regulation and protein interaction network of yeast. In particular, long feedback loops and oscillating modules in protein interaction networks are found to be biologically vital and hence highly conserved. These data suggest that biochemical networks evolve differentially depending on their structure with acyclic parts being permissive to evolution while cyclic parts tend to be conserved. 579
796	Application of point-process system identification techniques to complex physiological systems Halliday, David M. January 1986 (has links) This thesis is concerned with the application of system identification techniques to the analysis of complex physiological systems. The techniques are applied to neuronal spike-train data obtained from elements of the neuromuscular system. A brief description of the neuromuscular system is given in chapter 1, along with a more detailed discussion of the muscle spindle, which is the component of the neuromuscular system which this study deals with. In addition, some possibilities for system identification studies of the muscle spindle are discussed. The identification procedure is based on statistical methods for the treatment of point-process data. The point-process representation of a spike-train is introduced in chapter 2 with definitions of time and frequency domain point-process parameters. Estimates for these parameters are given, along with expressions for their asymptotic distributions. The linear point-process system identification model is introduced and estimates are described for the model parameters in terms of the previously defined point-process parameters. These point-process and linear parameter estimates are applied to muscle spindle spike-train data. In the analysis of a single spike-train certain important features only show up in the frequency domain, and for input and output spike-trains a linear transfer function type description is constructed in the frequency domain. The mathematical model of this transfer function is used as the basis for an analogue computer simulation of a subsystem of the muscle spindle. This consists of a linear first order filter followed by an encoder which generates output spikes. Data logged from the simulation is processed in the same manner as experimental data, and the effect of varying the simulation parameters on the linear model estimates is looked at. It is shown that in general the linear model description reflects the properties of the linear filter in the simulation, and varying the simulation parameters can be used to accurately match results from simulated data with those obtained from real data. Chapter 3 compares the point-process approach with a more conventional filtering and sampled data approach to estimate power spectra. The filtering of spike-trains with broad band spectra is investigated, and this shows up a pitfall in the choice of filter cut-off frequency. It is concluded that the point-process approach is preferable due to shorter computational times, and the well documented statistical propeties of the point-process estimates. The application of the point-process techniques described in chapter 2 to the analysis of more general spike-train data is considered in chapter 4. Three techniques for measuring the degree of coupling between two spike-trains are compared, and the point-process frequency domain measure is found to be the most sensitive. This measure is also applied to a data set containing a strong single periodicity, and the ability to detect coupling at a single harmonic is demonstrated. The analysis of coupling between spike-trains in the frequency domain is extended to deal with multiple spike-trains, and the ability to distinguish genuine coupling from the effect of a common input is shown to be a powerful tool which can be used to investigate communications pathways in neural systems. Finally, one special feature of the muscle spindle response to a spike-train input is analysed using the simulation. It is demonstrated that the point-process approach can produce results about a particular phenomenon from a single experiment much more rapidly than using a repetitive trial and error approach. Chapter 5 considers the extension of the linear point-process identification model introduced in chapter 2. Higher order time and frequency domain point-process parameters are defined and estimates given. In the time domain, a new technique for rapidly generating higher order time domain parameters is developed. The quadratic point-process model is introduced and solutions for its parameters given. These estimates are applied to muscle 610.28
797	Hardware and software aspects of parallel computing Bissland, Lesley January 1996 (has links) Part 1 (Chapters 2,3 and 4) is concerned with the development of hardware for multiprocessor systems. Some of the concepts used in digital hardware design are introduced in Chapter 2. These include the fundamentals of digital electronics such as logic gates and flip-flops as well as the more complicated topics of rom and programmable logic. It is often desirable to change the network topology of a multiprocessor machine to suit a particular application. The third chapter describes a circuit switching scheme that allows the user to alter the network topology prior to computation. To achieve this, crossbar switches are connected to the nodes, and the host processor (a PC) programs the crossbar switches to make the desired connections between the nodes. The hardware and software required for this system is described in detail. Whilst this design allows the topology of a multiprocessor system to be altered prior to computation, the topology is still fixed during program run-time. Chapter 4 presents a system that allows the topology to be altered during run-time. The nodes send connection requests to a control processor which programs a crossbar switch connected to the nodes. This system allows every node in a parallel computer to communicate directly with every other node. The hardware interface between the nodes and the control processor is discussed in detail, and the software on the control processor is also described. Part 2 (Chapters 5 and 6) of this thesis is concerned with the parallelisation of a large molecular mechanics program. Chapter 5 describes the fundamentals of molecular mechanics such as the steric energy equation and its components, force field parameterisation and energy minimisation. The implementation of a novel programming (COMFORT) and hardware (the BB08) environment into a parallel molecular mechanics (MM) program is presented in Chapter 6. The structure of the sequential version of the MM program is detailed, before discussing the implementation of the parallel version using COMFORT and the BB08. 621.39
798	The generation and classification of small leaks in a high pressure water system Shepherd, Robert January 2011 (has links) This report investigates the detection of small leaks from the primary system of a Nuclear Pressurised Water Reactor. Leak rates of 12 g/s are invariably difficult to detect and locate. The typical leak indicators in a nuclear reactor control room are a drop in pressure and level from the pressuriser, and the air sampler detecting particulate matter. However, in both cases the leak is normally quite substantial by the time any parameters or values are obviously outside the normal operating conditions. Therefore, a small leak could go undetected for a significant amount of time. As part of the reactor safety studies, it is important to have more information about small leaks. Due to the lack of small leak data, the solution was to construct a high pressure water rig producing temperatures and pressures close to those experienced in the primary circuit, these being 200ºC and 100 bar respectively. Pressure is maintained by a vane water pump and heating is achieved by passing a high current through a small diameter, thin walled pipe. To reproduce different size cracks, various size carburettor jets are used. The water on exiting this crack, flashes to steam and immediately meets metallic pipe lagging, which is typical of most primary systems. With the typical crack scenario recreated it is now important to add sensors that will detect conditions associated with a small leak. These sensors are either mounted on or around the lagging material. The parameters that are monitored include vibrations, acoustics, thermal variations, moisture change, air flow and pressure adjustment leaving a predetermined outlet. The sensor outputs are pre-processed and the nonlinear data are applied to an artificial neural network, whereas the other data are applied to a digital logic system. The results showed that with 13 different leak rates, separated by only 1.4 g/s the ANN was able to correctly differentiate and identify different leak sizes with a certainty of over 97%. The results from all the analysis are further presented graphically through an Operator Advisory System. This informs the operator of the predicted leak size and location. All of the available sensor data relevant to the leak can be viewed and location of the leak is presented by a three dimensional model of the reactor system. 621.48
799	Extension to models of coincident failure in multiversion software Salako, Kizito Oluwaseun January 2012 (has links) Fault-tolerant architectures for software-based systems have been used in various practical applications, including Right control systems for commercial airliners (e.g. AIRBUS A340, A310) as part of an aircraft's so-called fiy-bY-'win: Right control system [1], the control systems for autonomous spacecrafts (e.g. Cassini-Huygens Saturn orbiter and probe) [2], rail interlocking systems [3] and nuclear reactor safety systems [4, 5]. The use of diverse, independently developed, functionally equivalent software modules in a fault-tolerant con- figura tion has been advocated as a means of achieving highly reliable systems from relatively less reliable system components [6, 7, 8, 9]. In this regard it had been postulated that [6] "The independence of programming efforts will greatly reduce the probability of identical softuiare faults occurring 'in two 01' more versions of the proqram." Experimental evaluation demonstrated that despite the independent creation of such versions positive failure correlation between the versions can be expected in practice [10, 11]. The conceptual models of Eckhardt et al [12] and Littlewood et al [13], referred to as the EL model and LM model respectively, were instrumental in pointing out sources of uncertainty that determine both the size and sign of such failure correlation. In particular, there are two important sources of uncertainty: The process of developing software: given sufficiently complex system requirements, the particular software version that will be produced from such a process is not knqwn with certainty. Consequently, complete knowledge of what the failure behaviour of the software will be is also unknown; The occurrence of demands during system operation: during system operation it may not be certain which demand 11 system will receive next from the environment. To explain failure correlation between multiple software versions the EL model introduced lite notion of difficulty: that is, given a demand that could occur during system operation there is a chance that a given software development team will develop a software component that fails when handling such a demand as part of the system. A demand with an associated high probability of developed software failing to handle it correctly is considered to be a "difficult" demand for a development team: a low probability of failure would suggest an "easy" demand. In the EL model different development. teams, even when isolated from each other, are identical in how likely they are to make mistakes while developing their respective software versions. Consequently, despite the teams possibly creating software versions that fail on different demands, in developing their respective versions the teams find the same demands easy, and the same demands difficult. The implication of this is the versions developed by the teams do not fail independently; if one observes t.he failure-of one team's version this could indicate that the version failed on a difficult. demand, thus increasing one's expectation that the second team's version will also fail on that demand. Succinctly put, due to correlated "difficulties" between the teams across the demands, "independently developed software cannot be expected to fail independently". The LM model takes this idea a step further by illustrating, under rather general practical conditions, that negative failure correlation is also possible; possible, because the teams may be sufficiently diverse in which demands they find "difficult". This in turn implies better reliability than would be expected under naive assumptions of failure independence between software modules built by the respective teams. Although these models provide such insight they also pose questions yet to be answered. 005.4
800	A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives Gooch, P. January 2012 (has links) In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework.

Search results