Global ETD Search

161	An exploration of BMSF algorithm in genome-wide association mapping Jiang, Dayou January 1900 (has links) Master of Science / Department of Statistics / Haiyan Wang / Motivation: Genome-wide association studies (GWAS) provide an important avenue for investigating many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS is a great tool to identify genetic factors that influence health and disease. However, the high dimensionality of the gene expression dataset makes GWAS challenging. Although a lot of promising machine learning methods, such as Support Vector Machine (SVM), have been investigated in GWAS, the question of how to improve the accuracy of the result has drawn increased attention of many researchers A lot of the studies did not apply feature selection to select a parsimonious set of relevant genes. For those that performed gene selections, they often failed to consider the possible interactions among genes. Here we modify a gene selection algorithm BMSF originally developed by Zhang et al. (2012) for improving the accuracy of cancer classification with binary responses. A continuous response version of BMSF algorithm is provided in this report so that it can be applied to perform gene selection for continuous gene expression dataset. The algorithm dramatically reduces the dimension of the gene markers under concern, thus increases the efficiency and accuracy of GWAS. Results: We applied the continuous response version of BMSF on the wheat phenotypes dataset to predict two quantitative traits based on the genotype marker data. This wheat dataset was previously studied in Long et al. (2009) for the same purpose but used only direct application of SVM regression methods. By applying our gene selection method, we filtered out a large portion of genes which are less relevant and achieved a better prediction result for the test data by building SVM regression model using only selected genes on the training data. We also applied our algorithm on simulated datasets which was generated following the setting of an example in Fan et al. (2011). The continuous response version of BMSF showed good ability to identify active variables hidden among high dimensional irrelevant variables. In comparison to the smoothing based methods in Fan et al. (2011), our method has the advantage of no ambiguity due to difference choices of the smoothing parameter. BMSF alogorithm Genome-wide association mapping Feature selection Computer Science (0984)
162	Generating high confidence contracts without user input using Daikon and ESC/Java2 Rayakota, Balaji January 1900 (has links) Master of Science / Department of Computing and Information Science / Torben Amtoft / Invariants are properties which are asserted to be true at certain program points. Invariants are of paramount importance when proving program correctness and program properties. Method, constructor, and class invariants can serve as contracts which specify program behavior and can lead to more accurate reuse of code; more accurate than comments because contracts are less error prone and they may be proved without testing. Dynamic invariant generation techniques run the program under inspection and observe the values that are computed at each program point and report a list of invariants that were observed to be possibly true. Static checkers observe program code and try to prove the correctness of annotated invariants by generating proofs for them. This project attempts to get strong invariants for a subset of classes in Java; there are two phases first we use Daikon, a tool that suggests invariants using dynamic invariant generation techniques, and next we get the invariants checked using ESC/Java2, which is a static checker for Java. In the first phase an ‘Instrumenter’ program inspects Java classes and generates code such that sufficient information is supplied to Daikon to generate strong invariants. All of this is achieved without any user input. The aim is to be able to understand the behavior of a program using already existing tools. Software Contracts Automatic Invariant Generation Daikon ESC/Java2 Computer Science (0984)
163	Color based classification of circular markers for the identification of experimental units Narjala, Lakshmi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Daniel Andresen / The purpose of this project is to analyze the growth of plants under certain lighting conditions. In order to ensure ideal lighting for all plants under demanding conditions like lack of optimal light due to shadowing, side wall reflections, overlapping of plants, etc., pots are rotated manually in an irregular fashion. To keep track of the position of these plants from time to time, a marking system is used for each tray of 16 plants. These markers are unique for each tray High definition surveillance cameras placed above these plants capture the plant images periodically. These images undergo image processing. Image processing should be able to identify and recognize the plants from the identification markers that were placed within each tray and thereby draw the statistics about the growth of the plants. Hence the computing part of this project is all about extracting the identity of a plant through image processing. Image processing involves object and color recognition. Fiji, an image processing tool, is used for object recognition and the Python image module called “Image” is used for color recognition. Object recognition accurately locates the position of these circular objects and measures their size and shape. Color recognition identifies the pixel values of these circular objects. Finally the code corresponding to three-element groups of these circular units is fetched and stored. This code gives the identity of the tray and, therefore, each plant. The timestamp that is stored with each plant image along with the code fetched through image processing is used to track the location of a plant in the plant chamber through time. Image processing Object recognition Color recognition Circular markers as experimental units Computer Science (0984)
164	Models and algorithms for cyber-physical systems Gujrati, Sumeet January 1900 (has links) Doctor of Philosophy / Department of Computing and Information Sciences / Gurdip Singh / In this dissertation, we propose a cyber-physical system model, and based on this model, present algorithms for a set of distributed computing problems. Our model specifies a cyber-physical system as a combination of cyber-infrastructure, physical-infrastructure, and user behavior specification. The cyber-infrastructure is superimposed on the physical-infrastructure and continuously monitors its (physical-infrastructure's) changing state. Users operate in the physical-infrastructure and interact with the cyber-infrastructure using hand-held devices and sensors; and their behavior is specified in terms of actions they can perform (e.g., move, observe). While in traditional distributed systems, users interact solely via the underlying cyber-infrastructure, users in a cyber-physical system may interact directly with one another, access sensor data directly, and perform actions asynchronously with respect to the underlying cyber-infrastructure. These additional types of interactions have an impact on how distributed algorithms for cyber-physical systems are designed. We augment distributed mutual exclusion and predicate detection algorithms so that they can accommodate user behavior, interactions among them and the physical-infrastructure. The new algorithms have two components - one describing the behavior of the users in the physical-infrastructure and the other describing the algorithms in the cyber-infrastructure. Each combination of users' behavior and an algorithm in the cyber-infrastructure yields a different cyber-physical system algorithm. We have performed extensive simulation study of our algorithms using OMNeT++ simulation engine and Uppaal model checker. We also propose Cyber-Physical System Modeling Language (CPSML) to specify cyber-physical systems, and a centralized global state recording algorithm. Cyber-physical systems Distributed systems Mutual exclusion Predicate detection Global state recording Computer Science (0984)
165	Automated genre classification in literature Jordan, Emily January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William Hsu / This thesis examines automated genre classification in literature. The approach described uses text based comparison of book summaries to examine if word similarity is a feasible method for identifying genre types. Genres help users form impressions of what form a text will take. Knowing the genre of a literary work provides librarians, information scientists, and other users of a text collection with a summative guide to its form, its possible content, and what its members are about without having to peruse individual topic titles. This makes automatically generating genre labels a potentially useful tool in sorting unmarked text collections or searching the web. This thesis provides a brief overview of the problems faced by researchers wishing to automate genre classification as well as the current work in the field. My own methodology will also be discussed. I implemented two basic methods for labeling genre. The results collected using them will be covered, as well as future work and improvements to the project that I wish to implement. Genre Automated genre classification Labeling Automated Classification Computer Science (0984) Library Science (0399)
166	An application of topic modeling algorithms to text analytics in business intelligence Alsadhan, Majed January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / William H. Hsu / In this work, we focus on the task of clustering businesses in the state of Kansas based on the content of their websites and their business listing information. Our goal is to cluster the businesses and overcome the challenges facing current approaches such as: data noise, low number of clustered businesses, and lack of evaluation approach. We propose an LSA-based approach to analyze the businesses’ data and cluster those businesses by using Bisecting K-Means algorithm. In this approach, we analyze the businesses’ data by using LSA and produce businesses’ representations in a reduced space. We then use the businesses’ representations to cluster the businesses by applying the Bisecting K-Means algorithm. We also apply an existing LDA-based approach to cluster the businesses and compare the results with our proposed LSA-based approach at the end. In this work, we evaluate the results by using a human-expert-based evaluation procedure. At the end, we visualize the clusters produced in this work by using Google Earth and Tableau. According to our evaluation procedure, the LDA-based approach performed slightly bet- ter then the LSA-based approach. However, with the LDA-based approach, there were some limitations which are: low number of clustered businesses, and not being able to produce a hierarchical tree for the clusters. With the LSA-based approach, we were able to cluster all the businesses and produce a hierarchical tree for the clusters. LSA LDA Clustering Businesses Bisecting Kmeans Computer Science (0984) Economics (0501)
167	An approach to Natural Language understanding Marlen, Michael Scott January 1900 (has links) Doctor of Philosophy / Department of Computing and Information Sciences / David A. Gustafson / Natural Language understanding over a set of sentences or a document is a challenging problem. We approach this problem using semantic extraction and an ontology for answering questions based on the data. There is more information in a sentence than that found by extracting out the visible terms and their obvious relations between one another. It is the hidden information that is not seen that gives this solution the advantage over alternatives. This methodology was tested against the FraCas Test Suite with near perfect results (correct answers) for the sections that are the focus of this paper (Generalized Quantifiers, Plurals, Adjectives, Comparatives, Verbs, and Attitudes). The results indicate that extracting the visible semantics as well as the unseen semantics and their interrelations using an ontology to reason over it provides reliable and provable answers to questions validating this technology. Natural Language understanding Semantic Ontology Question Answering FraCas Test Suite Computer Science (0984)
168	Android application for USDA (U.S. Department of Agriculture) structural design software Addanki, Nikhita January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Mitchell L. Neilsen / The computer industry has seen a growth in the development of mobile applications over the last few years. Tablet/Mobile applications are preferred over their desktop versions due to their increased accessibility and usability. Android is the most popular mobile OS in the world. It not only provides a world-class platform for creating several apps, but also consists of an open marketplace for distributing them to Android users everywhere. This openness has led to it being a favorite for consumers and developers alike, thereby leading to a strong growth in app consumption. The main objective of the project is to design and develop an Android software application for USDA (U.S. Department of Agriculture) structural design that can be used on Android tablets. The different components of USDA that can be designed using this application are SingleCell, TwinCell, Cchan, Cbasin and Drpws3e. The USDA (U.S. Department of Agriculture) structural design application was previously developed using FORTRAN. But FORTRAN is not supported by Android Tablets. So, F2J Translator software was used to convert the FORTRAN source files to java source files which are supported by Android. Also, many other formatters such as CommonIn, CommonOut, and SwapStreams were used to translate some common blocks of FORTRAN code that cannot be translated by F2J Translator. The developed Android software allows users to access all different components of USDA structural design. Users can either directly enter the data in the forms provided or upload a file that already has data stored in it. When the application is run, the output can be accessed as a PDF file. Users can even send the output of a particular component to their personal email address. This output provided by the software application is helpful for design engineers to implement new structural designs. Android U.S Department of Agriculture FORTRAN Translator Android application Computer Science (0984)
169	Enhancing evaluation techniques using Mutation Operator Production Rule System and accidental fault methodology Gupta, Pranshu January 1900 (has links) Doctor of Philosophy / Department of Computing and Information Sciences / David A. Gustafson / Software testing is an essential component of software development life cycle, and certain software testing methodologies require enormous amounts of time and expense in order to detect and correct errors in a software system. The two primary goals of any testing methodology are error detection and increased reliability. Each methodology utilizes a unique technique to achieve these goals and detect faults in the software. In this paper, an evaluation approach is presented that can enhance evaluation techniques for software testing methodologies. Firstly, a new framework, Mutation Operator Production Rule System (MOPRS), is introduced that allows specifications of mutation operators that can be effective, precise, and focused on object-oriented faults. A new concept of effective mutation operator has been added to this system. An effective mutation operator is a precise set of rules that when applied to a program creates a set of mutants, which when killed by a test suite, will mean that further seeded or accidental faults characterized by the same fault type are highly likely to be killed by the same test suite. These effective mutation operators focus on fault types specific to object-oriented programming concepts. As a result, object-oriented faults are detected instead of finding traditional faults common to non-object-oriented and object-oriented programming. These mutation operators cover the gaps in the existing set of mutation operators. An evaluation method is described that can enhance the evaluation techniques, Accidental Fault Methodology (AFM), for software testing methodologies. When effective mutation operators are used along with this evaluation technique, it will demonstrate if the software testing methodology successfully detected induced faults and also any accidental faults specific to the object-oriented fault type. Testing Mutation Production rule Accidental fault Evaluation technique Computer Science (0984)
170	Recommender system for recipes Goda, Sai Bharath January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Daniel A. Anderson / Most of the e-commerce websites like Amazon, EBay, hotels, trip advisor etc. use recommender systems to recommend products to their users. Some of them use the knowledge of history/ of all users to recommend what kind of products the current user may like (Collaborative filtering) and some use the knowledge of the products which the user is interested in and make recommendations (Content based filtering). An example is Amazon which uses both kinds of techniques.. These recommendation systems can be represented in the form of a graph where the nodes are users and products and edges are between users and products. The aim of this project is to build a recommender system for recipes by using the data from allrecipes.com. Allrecipes.com is a popular website used all throughout the world to post recipes, review them and rate them. To understand the data set one needs to know how the recipes are posted and rated in allrecipes.com, whose details are given in the paper. The network of allrecipes.com consists of users, recipes and ingredients. The aim of this research project is to extensively study about two algorithms adsorption and matrix factorization, which are evaluated on homogeneous networks and try them on the heterogeneous networks and analyze their results. This project also studies another algorithm that is used to propagate influence from one network to another network. To learn from one network and propagate the same information to another network we compute flow (influence of one network on another) as described in [7]. The paper introduces a variant of adsorption that takes the flow values into account and tries to make recommendations in the user-recipe and the user-ingredient networks. The results of this variant are analyzed in depth in this paper. Recommender systems Adsorption Matrix factorization Recipes Ingredients Computer Science (0984) Information Technology (0489)

Search results