Global ETD Search

471	Gene Set Based Ensemble Methods for Cancer Classification Duncan, William Evans 20 June 2013 (has links) Diagnosis of cancer very often depends on conclusions drawn after both clinical and microscopic examinations of tissues to study the manifestation of the disease in order to place tumors in known categories. One factor which determines the categorization of cancer is the tissue from which the tumor originates. Information gathered from clinical exams may be partial or not completely predictive of a specific category of cancer. Further complicating the problem of categorizing various tumors is that the histological classification of the cancer tissue and description of its course of development may be atypical. Gene expression data gleaned from micro-array analysis provides tremendous promise for more accurate cancer diagnosis. One hurdle in the classification of tumors based on gene expression data is that the data space is ultra-dimensional with relatively few points; that is, there are a small number of examples with a large number of genes. A second hurdle is expression bias caused by the correlation of genes. Analysis of subsets of genes, known as gene set analysis, provides a mechanism by which groups of differentially expressed genes can be identified. We propose an ensemble of classifiers whose base classifiers are ℓ1-regularized logistic regression models with restriction of the feature space to biologically relevant genes. Some researchers have already explored the use of ensemble classifiers to classify cancer but the effect of the underlying base classifiers in conjunction with biologically-derived gene sets on cancer classification has not been explored. Computer Science
472	Dynamic Bayesian Network Based Fault Diagnosis on Nonlinear Dynamic Systems Weng, Jiannian 02 April 2013 (has links) Fault diagnosis approaches for nonlinear real-world systems play a very important role in maintaining dependable, robust operations of safety-critical systems like aircraft, automobiles, power plants and planetary rovers. They require online tracking functions to monitor system behavior and ensure system operations remain within specified safety limits. It is important that such methods are robust to uncertainties, such as modeling errors, disturbance and measurement noise. In this thesis, we employ a temporal Bayesian technique called Dynamic Bayesian Networks (DBNs) to model nonlinear dynamic systems for uncertain probabilistic reasoning in diagnosis application domains. Within the DBN framework, we develop the modeling scheme, model construction process, and the use of the models to build diagnostic models for online diagnosis. This thesis also performs a preliminary comparison of two particle filter algorithms: generic particle filters (GPF) and auxiliary particle filter (APF). These are commonly used for tracking and estimating the true system behavior. Our approach to diagnosis includes a DBN model based diagnosis framework combining qualitative TRANSCEND scheme and quantitative methods for refining the fault isolation, and using parameter estimation techniques to provide more precise estimates of fault hypotheses. As a proof of concept, we apply this DBN based diagnosis scheme to the Reverse Osmosis (RO) subsystem of the Advanced Water Recovery System (AWRS). Performance of the two particle filter algorithms are compared based on a number of fault scenarios and different levels of noise as well. The results show our DBN-based scheme is effective for fault isolation and identification of complex nonlinear systems. Computer Science
473	Toward Digitizing the Human Experience: A New Resource for Natural Language Processing Weltman, Jerry Scott 09 April 2013 (has links) A long-standing goal of Artificial Intelligence is to program computers that understand natural language. A basic obstacle is that computers lack the common sense that even small children acquire simply by experiencing life, and no one has devised a way to program this experience into a computer. This dissertation presents a methodology and proof-of-concept software system that enables non-experts, with some training, to create simple experiences. For the purposes of this dissertation, an experience is a series of time-ordered comic frames, annotated with the changing intentional and physical states of the characters and objects in each frame. Each frame represents a small action and the effects of that action. To create an annotated experience, the software interface guides non-experts in identifying facts about experiences that humans normally take for granted. As part of this process, it uses the Socratic Method to help users notice difficult-to-articulate commonsense data. The resulting data is in two forms: specific narrative statements and general commonsense rules. Other researchers have proposed similar narrative data for commonsense modeling, but this project opens up the possibility of non-experts creating these data types. A test on ten subjects suggests that non-experts are able to use this methodology to produce high quality experiential data. The systems inference capability, using forward chaining, demonstrates that the collected data is suitable for automated processing. Computer Science
474	Clustering Rare Event Features to Increase Statistical Power Sivley, Robert Michael 12 April 2013 (has links) Rare genetic variation has been put forward as a major contributor to the development of disease; however, it is inherently difficult to associate rare variants with disease, as the low number of observations greatly reduces statistical power. Binning is a method that groups several variants together and merges them into a single feature, sacrificing resolution to increase statistical power. Binning strategies are applicable to rare variant analysis in any field, though their effectiveness is dependent on the method used to group variants. This thesis presents a flexible workflow for rare variant analysis, comprised of five sequential steps: identification of rare variants, annotation of those variants, clustering the variants, collapsing those clusters, and statistical analysis. There are no restrictions on which clustering algorithms are applied, so a review of the core clustering paradigms is provided as an introduction for readers unfamiliar with the field. Also presented is RVCLUST, an R package that facilitates all stages of the described workflow and provides a collection of interfaces to common clustering algorithms and statistical tests. The utility of RVCLUST is demonstrated in a genetic analysis of rare variants in gene regulatory regions and their effect on gene expression. The results of this analysis suggest that informed clustering is an effective alternative to existing strategies, discovering the same associations while avoiding the statistical complications introduced by other binning methods. Computer Science
475	Program Analysis: Termination Proofs for Linear Simple Loops Chen, Hongyi 24 January 2013 (has links) Termination proof synthesis for simple loops, i.e., loops with only conjoined constraints in the loop guard and variable updates in the loop body, is the building block of termination analysis, as well as liveness analysis, for large complex imperative systems. In particular, we consider a subclass of simple loops which contain only linear constraints in the loop guard and linear updates in the loop body. We call them Linear Simple Loops (LSLs). LSLs are particularly interesting because most loops in practice are indeed linear; more importantly, since we allow the update statements to handle nondeterminism, LSLs are expressive enough to serve as a foundational model for non-linear loops as well. Existing techniques can successfully synthesize a linear ranking function for an LSL if there exists one. When a terminating LSL does not have a linear ranking function, these techniques fail. In this dissertation we describe an automatic method that generates proofs of (universal) termination for LSLs based on the synthesis of disjunctive ranking relations. The method repeatedly finds linear ranking functions on parts of the state space and checks whether the transitive closure of the transition relation is included in the union of the ranking relations. We have implemented the method and have shown experimental evidence of the effectiveness of our method. Computer Science
476	Bayesian Inference Application to Burglary Detection Bhale, Ishan Singh 24 January 2013 (has links) Real time motion tracking is very important for video analytics. But very little research has been done in identifying the top-level plans behind the atomic activities evident in various surveillance footages [61]. Surveillance videos can contain high level plans in the form of complex activities [61]. These complex activities are usually a combination of various articulated activities like breaking windshield, digging, and non-articulated activities like walking, running. We have developed a Bayesian framework for recognizing complex activities like burglary. This framework (belief network) is based on an expectation propagation algorithm [8] for approximate Bayesian inference. We provide experimental results showing the application of our framework for automatically detecting burglary from surveillance videos in real time. Computer Science
477	SkypeMorph: Protocol Obfuscation for Censorship Resistance Mohajeri Moghaddam, Hooman January 2013 (has links) The Tor network is designed to provide users with low-latency anonymous communication. Tor clients build circuits with publicly listed relays to anonymously reach their destinations. Low-latency anonymous communication is also an essential property required by censorship circumvention tools and thus Tor has been widely used as a censorship resistance tool. However, since the Tor relays are publicly listed, they can be easily blocked by censoring adversaries. Consequently, the Tor project envisioned the possibility of unlisted entry points to the Tor network, commonly known as bridges. In recent years, there have been attempts to achieve fast and real-time methods to discover Tor, and specifically bridge, connections. In this thesis we address the issue of preventing censors from detecting a certain type of traffic, for instance Tor connections, by observing the communications between a remote node and nodes in their network. We propose a generic model in which the client obfuscates its messages to the bridge in a widely used protocol over the Internet. We investigate using Skype video calls as our target protocol and our goal is to make it difficult for the censoring adversary to distinguish between the obfuscated bridge connections and actual Skype calls using statistical comparisons. Although our method is generic and can be used by any censorship resistance application, we present it for Tor, which has well-studied anonymity properties. We have implemented our model as a proof-of-concept proxy that can be extended to a pluggable transport for Tor, and it is available under an open-source licence. Using this implementation we observed the obfuscated bridge communications and showed their characteristics match those of Skype calls. We also compared two methods for traffic shaping and concluded that they perform almost equally in terms of overhead; however, the simpler method makes fewer assumptions about the characteristics of the censorship resistance application’s network traffic, and so this is the one we recommend. Computer Science
478	CLASSIFYING EMOTION USING STREAMING OF PHYSIOLOGICAL CORRELATES OF EMOTION Elmore, Nathan J. 13 February 2013 (has links) The ability for a computer to recognize emotions would have many uses. In the field of human-computer interaction, it would be useful if computers could sense if a user is frustrated and offer help (Lisetti & Nasoz, 2002), or it could be used in cars to predict stress or road rage (Nasoz, Lisetti, & Vasilakos, 2010). Also, it has uses in the medical field with emotional therapy or monitoring patients (Rebenitsch, Owen, Brohil, Biocca, & Ferydiansyah, 2010). Emotion recognition is a complex subject that combines psychology and computer science, but it is not a new problem. When the question was first posed, researchers examined at physiological signals that could help differentiate an emotion (Schachter & Singer, 1962). As the research progressed, researchers examined ways in which computers could recognize emotions, many of which were successful. Previous research has not yet looked at the emotional data as streaming data, or attempted to classify emotion in real time. This thesis extracts features from a window of simulated streaming data to attempt to classify emotions in real time. As a corollary, this method can also be used to attempt to identify the earliest point an emotion can be predicted. The results show that emotions can be classified in real time, and applying a window and feature extraction leads to better classification success. It shows that this method may be used to determine if an emotion could be predicted before it is cognitively experienced, but it could not predict the emotion transitional state. More research is required before that goal can be achieved. Computer Science
479	Software Architectural Support for Tangible User Interfaces in Distributed, Heterogeneous Computing Environments Toole, Cornelius 14 June 2012 (has links) This research focuses on tools that support the development of tangible interaction-based applications for distributed computing environments. Applications built with these tools are capable of utilizing heterogeneous resources for tangible interaction and can be reconfigured for different contexts with minimal code changes. Current trends in computing, especially in areas such as computational science, scientific visualization and computer supported collaborative work, foreshadow increasing complexity, distribution and remoteness of computation and data. These trends imply that tangible interface developers must address concerns of both tangible interaction design and networked distributed computing. In this dissertation, we present a software architecture that supports separation of these concerns. Additionally, a tangibles-based software development toolkit based on this architecture is presented that enables the logic of elements within a tangible user interface to be mapped to configurations that vary in the number, type and location of resources within a given tangibles-based system. Computer Science
480	An Extensible and Scalable Pilot-MapReduce Framework for Data Intensive Applications on Distributed Cyberinfrastructure Mantha, Pradeep Kumar 12 July 2012 (has links) The volume and complexity of data that must be analyzed in scientific applications is increasing exponentially. Often, this data is distributed; thus, the ability to analyze data by localizing it will yield limited returns. Therefore, an efficient processing of large distributed datasets is required, whilst ideally not introducing fundamentally new programming models or methods. For example, extending MapReduce - a proven effective programming model for processing large datasets, to work more effectively on distributed data and on different infrastructure (such as non-Hadoop, general-purpose clusters) is desirable. We posit that this can be achieved with an effective and efficient runtime environment and without refactoring MapReduce itself. MapReduce on distributed data requires effective distributed coordination of computation (map and reduce) and data, as well as distributed data management (in particular the transfer of intermediate data units). To address these requirements, we design and implement Pilot-MapReduce (PMR) - a flexible, infrastructure-independent runtime environment for MapReduce. PMR is based on Pilot abstractions for both compute (Pilot- Jobs) and data (Pilot-Data): it utilizes Pilot-Jobs to couple the map phase computation to the nearby source data, and Pilot-Data to move intermediate data using parallel data transfers to the reduce computation phase. We analyze the effectiveness of PMR over applications with different characteristics (e. g. different volumes of intermediate and output data). Our experimental evaluations show that the Pilot abstraction for data movement across multiple clusters is promising, and can lower the execution time span of the entire MapReduce execution. We also investigate the performance of PMR with distributed data using a Word Count and a genome sequencing application over different MapReduce configurations. We find that PMR is a viable tool to support distributed NGS analytics by comparing and contrasting the PMR approach to similar capabilities of Seqal and Crossbow, two Next Generation Sequencing(NGS) Hadoop MapReduce based applications. Our experiments show that PMR provides the desired flexibility in the deployment and configuration of MapReduce runs to address specific application characteristics and achieve an optimal performance, both locally and over wide-area multiple clusters. Computer Science

Search results