Global ETD Search

211	Event recognition in epizootic domains Bujuru, Swathi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / In addition to named entities such as persons, locations, organizations, and quantities which convey factual information, there are other entities and attributes that relate identifiable objects in the text and can provide valuable additional information. In the field of epizootics, these include specific properties of diseases such as their name, location, species affected, and current confirmation status. These are important for compiling the spatial and temporal statistics and other information needed to track diseases, leading to applications such as detection and prevention of bioterrorism. Toward this objective, we present a system (Rule Based Event Extraction System in Epizootic Domains) that can be used for extracting the infectious disease outbreaks from the unstructured data automatically by using the concept of pattern matching. In addition to extracting events, the components of this system can help provide structured and summarized data that can be used to differentiate confirmed events from suspected events, answer questions regarding when and where the disease was prevalent develop a model for predicting future disease outbreaks, and support visualization using interfaces such as Google Maps. While developing this system, we consider the research issues that include document relevance classification, entity extraction, recognizing the outbreak events in the disease domain and to support the visualization for events. We present a sentence-based event extraction approach for extracting the outbreak events from epizootic domain that has tasks such as extracting the events such as the disease name, location, species, confirmation status, and date; classifying the events into two categories of confirmation status- confirmed or suspected. The present approach shows how confirmation status is important in extracting the disease based events from unstructured data and a pyramid approach using reference summaries is used for evaluating the extracted events. Information extraction Text analytics Natural language understanding Event recognition Veterinary epidemiology Data mining Agriculture, Animal Pathology (0476) Artificial Intelligence (0800) Computer Science (0984)
212	Web genre classification using feature selection and semi-supervised learning Chetry, Roshan January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / As the web pages continuously change and their number grows exponentially, the need for genre classification of web pages also increases. One simple reason for this is given by the need to group web pages into various genre categories in order to reduce the complexities of various web tasks (e.g., search). Experts unanimously agree on the huge potential of genre classification of web pages. However, while everybody agrees that genre classification of web pages is necessary, researchers face problems in finding enough labeled data to perform supervised classification of web pages into various genres. The high cost of skilled manual labor, rapid changing nature of web and never ending growth of web pages are the main reasons for the limited amount of labeled data. On the contrary unlabeled data can be acquired relatively inexpensively in comparison to labeled data. This suggests the use of semi-supervised learning approaches for genre classification, instead of using supervised approaches. Semi-supervised learning makes use of both labeled and unlabeled data for training - typically a small amount of labeled data and a large amount of unlabeled data. Semi-supervised learning have been extensively used in text classification problems. Given the link structure of the web, for web-page classification one can use link features in addition to the content features that are used for general text classification. Hence, the feature set corresponding to web-pages can be easily divided into two views, namely content and link based feature views. Intuitively, the two feature views are conditionally independent given the genre category and have the ability to predict the class on their own. The scarcity of labeled data, availability of large amounts of unlabeled data, richer set of features as compared to the conventional text classification tasks (specifically complementary and sufficient views of features) have encouraged us to use co-training as a tool to perform semi-supervised learning. During co-training labeled examples represented using the two views are used to learn distinct classifiers, which keep improving at each iteration by sharing the most confident predictions on the unlabeled data. In this work, we classify web-pages of .eu domain consisting of 1232 labeled host and 20000 unlabeled hosts (provided by the European Archive Foundation [Benczur et al., 2010]) into six different genres, using co-training. We compare our results with the results produced by standard supervised methods. We find that co-training can be an effective and cheap alternative to costly supervised learning. This is mainly due to the two independent and complementary feature sets of web: content based features and link based features. Web genre classification Co-training Semi-supervised learning Feature selection Roshan Chetry Computer Science (0984) Information Technology (0489) Web Studies (0646)
213	A model-driven development and verification approach for medical devices Jedryszek, Jakub January 1900 (has links) Master of Science / Department of Computing and Information Sciences / John Hatcliff / Medical devices are safety-critical systems whose failure may put human life in danger. They are becoming more advanced and thus more complex. This leads to bigger and more complicated code-bases that are hard to maintain and verify. Model-driven development provides high-level and abstract description of the system in the form of models that omit details, which are not relevant during the design phase. This allows for certain types of verification and hazard analysis to be performed on the models. These models can then be translated into code. However, errors that do not exist in the models may be introduced during the implementation phase. Automated translation from verified models to code may prevent to some extent. This thesis proposes approach for model-driven development and verification of medical devices. Models are created in AADL (Architecture Analysis & Design Language), a language for software and hardware architecture modeling. AADL models are translated to SPARK Ada, contract-based programming language, which is suitable for software verification. Generated code base is further extended by developers to implement internals of specific devices. Created programs can be verified using SPARK tools. A PCA (Patient Controlled Analgesia) pump medical device is used to illustrate the primary artifacts and process steps. The foundation for this work is "Integrated Clinical Environment Patient-Controlled Analgesia Infusion Pump System Requirements" document and AADL Models created by Brian Larson. In addition to proposed model-driven development approach, a PCA pump prototype was created using the BeagleBoard-xM device as a platform. Some components of PCA pump prototype were verified by SPARK tools and Bakar Kiasan. Model-driven development Software verification Medical devices SPARK Ada Patient-Controlled Analgesia Pump Computer Science (0984) Medicine (0564)
214	Engineering complex systems with multigroup agents Case, Denise Marie January 1900 (has links) Doctor of Philosophy / Computing and Information Sciences / Scott A. DeLoach / As sensor prices drop and computing devices continue to become more compact and powerful, computing capabilities are being embedded throughout our physical environment. Connecting these devices in cyber-physical systems (CPS) enables applications with significant societal impact and economic benefit. However, engineering CPS poses modeling, architecture, and engineering challenges and, to fully realize the desired benefits, many outstanding challenges must be addressed. For the cyber parts of CPS, two decades of work in the design of autonomous agents and multiagent systems (MAS) offers design principles for distributed intelligent systems and formalizations for agent-oriented software engineering (AOSE). MAS foundations offer a natural fit for enabling distributed interacting devices. In some cases, complex control structures such as holarchies can be advantageous. These can motivate complex organizational strategies when implementing such systems with a MAS, and some designs may require agents to act in multiple groups simultaneously. Such agents must be able to manage their multiple associations and assignments in a consistent and unambiguous way. This thesis shows how designing agents as systems of intelligent subagents offers a reusable and practical approach to designing complex systems. It presents a set of flexible, reusable components developed for OBAA++, an organization-based architecture for single-group MAS, and shows how these components were used to develop the Adaptive Architecture for Systems of Intelligent Systems (AASIS) to enable multigroup agents suitable for complex, multigroup MAS. This work illustrates the reusability and flexibility of the approach by using AASIS to simulate a CPS for an intelligent power distribution system (IPDS) operating two multigroup MAS concurrently: one providing continuous voltage control and a second conducting discrete power auctions near sources of distributed generation. Complex systems Cyber-physical systems Agent architectures Distributed artificial Intelligence Multiagent systems Power distribution systems Artificial Intelligence (0800) Computer Science (0984)
215	On improving natural language processing through phrase-based and one-to-one syntactic algorithms Meyer, Christopher Henry January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / Machine Translation (MT) is the practice of using computational methods to convert words from one natural language to another. Several approaches have been created since MT’s inception in the 1950s and, with the vast increase in computational resources since then, have continued to evolve and improve. In this thesis I summarize several branches of MT theory and introduce several newly developed software applications, several parsing techniques to improve Japanese-to-English text translation, and a new key algorithm to correct translation errors when converting from Japanese kanji to English. The overall translation improvement is measured using the BLEU metric (an objective, numerical standard in Machine Translation quality analysis). The baseline translation system was built by combining Giza++, the Thot Phrase-Based SMT toolkit, the SRILM toolkit, and the Pharaoh decoder. The input and output parsing applications were created as intermediary to improve the baseline MT system as to eliminate artificially high improvement metrics. This baseline was measured with and without the additional parsing provided by the thesis software applications, and also with and without the thesis kanji correction utility. The new algorithm corrected for many contextual definition mistakes that are common when converting from Japanese to English text. By training the new kanji correction utility on an existing dictionary, identifying source text in Japanese with a high number of possible translations, and checking the baseline translation against other translation possibilities; I was able to increase the translation performance of the baseline system from minimum normalized BKEU scores of .0273 to maximum normalized scores of .081. The preliminary phase of making improvements to Japanese-to-English translation focused on correcting segmentation mistakes that occur when attempting to parse Japanese text into meaningful tokens. The initial increase is not indicative of future potential and is artificially high as the baseline score was so low to begin with, but was needed to create a reasonable baseline score. The final results of the tests confirmed that a significant, measurable improvement had been achieved through improving the initial segmentation of the Japanese text through parsing the input corpora and through correcting kanji translations after the Pharaoh decoding process had completed. Artificial Intelligence Natural language processing Japanese Machine translation Contextual syntax Phrase-based translation Artificial Intelligence (0800) Computer Science (0984) Language, Modern (0291)
216	A multi-objective GP-PSO hybrid algorithm for gene regulatory network modeling Cai, Xinye January 1900 (has links) Doctor of Philosophy / Department of Electrical and Computer Engineering / Sanjoy Das / Stochastic algorithms are widely used in various modeling and optimization problems. Evolutionary algorithms are one class of population-based stochastic approaches that are inspired from Darwinian evolutionary theory. A population of candidate solutions is initialized at the first generation of the algorithm. Two variation operators, crossover and mutation, that mimic the real world evolutionary process, are applied on the population to produce new solutions from old ones. Selection based on the concept of survival of the fittest is used to preserve parent solutions for next generation. Examples of such algorithms include genetic algorithm (GA) and genetic programming (GP). Nevertheless, other stochastic algorithms may be inspired from animals’ behavior such as particle swarm optimization (PSO), which imitates the cooperation of a flock of birds. In addition, stochastic algorithms are able to address multi-objective optimization problems by using the concept of dominance. Accordingly, a set of solutions that do not dominate each other will be obtained, instead of just one best solution. This thesis proposes a multi-objective GP-PSO hybrid algorithm to recover gene regulatory network models that take environmental data as stimulus input. The algorithm infers a model based on both phenotypic and gene expression data. The proposed approach is able to simultaneously infer network structures and estimate their associated parameters, instead of doing one or the other iteratively as other algorithms need to. In addition, a non-dominated sorting approach and an adaptive histogram method based on the hypergrid strategy are adopted to address ‘convergence’ and ‘diversity’ issues in multi-objective optimization. Gene network models obtained from the proposed algorithm are compared to a synthetic network, which mimics key features of Arabidopsis flowering control system, visually and numerically. Data predicted by the model are compared to synthetic data, to verify that they are able to closely approximate the available phenotypic and gene expression data. At the end of this thesis, a novel breeding strategy, termed network assisted selection, is proposed as an extension of our hybrid approach and application of obtained models for plant breeding. Breeding simulations based on network assisted selection are compared to one common breeding strategy, marker assisted selection. The results show that NAS is better both in terms of breeding speed and final phenotypic level. multi-objective optimization genetic programming particle swarm optimization gene regulatory network modeling plant breeding simulation NK fitness landscape models Artificial Intelligence (0800) Computer Science (0984) Information Science (0723)
217	Cross-domain sentiment classification using grams derived from syntax trees and an adapted naive Bayes approach Cheeti, Srilaxmi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / There is an increasing amount of user-generated information in online documents, includ- ing user opinions on various topics and products such as movies, DVDs, kitchen appliances, etc. To make use of such opinions, it is useful to identify the polarity of the opinion, in other words, to perform sentiment classification. The goal of sentiment classification is to classify a given text/document as either positive, negative or neutral based on the words present in the document. Supervised learning approaches have been successfully used for sentiment classification in domains that are rich in labeled data. Some of these approaches make use of features such as unigrams, bigrams, sentiment words, adjective words, syntax trees (or variations of trees obtained using pruning strategies), etc. However, for some domains the amount of labeled data can be relatively small and we cannot train an accurate classifier using the supervised learning approach. Therefore, it is useful to study domain adaptation techniques that can transfer knowledge from a source domain that has labeled data to a target domain that has little or no labeled data, but a large amount of unlabeled data. We address this problem in the context of product reviews, specifically reviews of movies, DVDs and kitchen appliances. Our approach uses an Adapted Naive Bayes classifier (ANB) on top of the Expectation Maximization (EM) algorithm to predict the sentiment of a sentence. We use grams derived from complete syntax trees or from syntax subtrees as features, when training the ANB classifier. More precisely, we extract grams from syntax trees correspond- ing to sentences in either the source or target domains. To be able to transfer knowledge from source to target, we identify generalized features (grams) using the frequently co-occurring entropy (FCE) method, and represent the source instances using these generalized features. The target instances are represented with all grams occurring in the target, or with a reduced grams set obtained by removing infrequent grams. We experiment with different types of grams in a supervised framework in order to identify the most predictive types of gram, and further use those grams in the domain adaptation framework. Experimental results on several cross-domains task show that domain adaptation approaches that combine source and target data (small amount of labeled and some unlabeled data) can help learn classifiers for the target that are better than those learned from the labeled target data alone. Adapted naive bayes algorithm Cross domain sentiment classification Grams Domain adaptation Syntax subtrees Computer Engineering (0464) Computer Science (0984) Information Science (0723)
218	Continuous-time infinite dynamic topic models Elshamy, Wesam Samy January 1900 (has links) Doctor of Philosophy / Department of Computing and Information Sciences / William Henry Hsu / Topic models are probabilistic models for discovering topical themes in collections of documents. In real world applications, these models provide us with the means of organizing what would otherwise be unstructured collections. They can help us cluster a huge collection into different topics or find a subset of the collection that resembles the topical theme found in an article at hand. The first wave of topic models developed were able to discover the prevailing topics in a big collection of documents spanning a period of time. It was later realized that these time-invariant models were not capable of modeling 1) the time varying number of topics they discover and 2) the time changing structure of these topics. Few models were developed to address this two deficiencies. The online-hierarchical Dirichlet process models the documents with a time varying number of topics. It varies the structure of the topics over time as well. However, it relies on document order, not timestamps to evolve the model over time. The continuous-time dynamic topic model evolves topic structure in continuous-time. However, it uses a fixed number of topics over time. In this dissertation, I present a model, the continuous-time infinite dynamic topic model, that combines the advantages of these two models 1) the online-hierarchical Dirichlet process, and 2) the continuous-time dynamic topic model. More specifically, the model I present is a probabilistic topic model that does the following: 1) it changes the number of topics over continuous time, and 2) it changes the topic structure over continuous-time. I compared the model I developed with the two other models with different setting values. The results obtained were favorable to my model and showed the need for having a model that has a continuous-time varying number of topics and topic structure. Machine learning Topic models Statistical models Artificial intelligence Graphical models Bayesian statistics Applied Mathematics (0364) Artificial Intelligence (0800) Computer Science (0984)
219	The evaluation of software defined networking for communication and control of cyber physical systems Sydney, Ali January 1900 (has links) Doctor of Philosophy / Department of Electrical and Computer Engineering / Don Gruenbacher / Caterina Scoglio / Cyber physical systems emerge when physical systems are integrated with communication networks. In particular, communication networks facilitate dissemination of data among components of physical systems to meet key requirements, such as efficiency and reliability, in achieving an objective. In this dissertation, we consider one of the most important cyber physical systems: the smart grid. The North American Electric Reliability Corporation (NERC) envisions a smart grid that aggressively explores advance communication network solutions to facilitate real-time monitoring and dynamic control of the bulk electric power system. At the distribution level, the smart grid integrates renewable generation and energy storage mechanisms to improve reliability of the grid. Furthermore, dynamic pricing and demand management provide customers an avenue to interact with the power system to determine electricity usage that satisfies their lifestyle. At the transmission level, efficient communication and a highly automated architecture provide visibility in the power system; hence, faults are mitigated faster than they can propagate. However, higher levels of reliability and efficiency rely on the supporting physical communication infrastructure and the network technologies employed. Conventionally, the topology of the communication network tends to be identical to that of the power network. In this dissertation, however, we employ a Demand Response (DR) application to illustrate that a topology that may be ideal for the power network may not necessarily be ideal for the communication network. To develop this illustration, we realize that communication network issues, such as congestion, are addressed by protocols, middle-ware, and software mechanisms. Additionally, a network whose physical topology is designed to avoid congestion realizes an even higher level of performance. For this reason, characterizing the communication infrastructure of smart grids provides mechanisms to improve performance while minimizing cost. Most recently, algebraic connectivity has been used in the ongoing research effort characterizing the robustness of networks to failures and attacks. Therefore, we first derive analytical methods for increasing algebraic connectivity and validate these methods numerically. Secondly, we investigate impact on the topology and traffic characteristics as algebraic connectivity is increased. Finally, we construct a DR application to demonstrate how concepts from graph theory can dramatically improve the performance of a communication network. With a hybrid simulation of both power and communication network, we illustrate that a topology which may be ideal for the power network may not necessarily be ideal for the communication network. To date, utility companies are embracing network technologies such as Multiprotocol Label Switching (MPLS) because of the available support for legacy devices, traffic engineering, and virtual private networks (VPNs) which are essential to the functioning of the smart grid. Furthermore, this particular network technology meets the requirement of non-routability as stipulated by NERC, but these benefits are costly for the infrastructure that supports the full MPLS specification. More importantly, with MPLS routing and other switching technologies, innovation is restricted to the features provided by the equipment. In particular, no practical method exists for utility consultants or researchers to test new ideas, such as alternatives to IP or MPLS, on a realistic scale in order to obtain the experience and confidence necessary for real-world deployments. As a result, novel ideas remain untested. On the contrary, OpenFlow, which has gained support from network providers such as Microsoft and Google and equipment vendors such as NEC and Cisco, provides the programmability and flexibility necessary to enable innovation in next-generation communication architectures for the smart grid. This level of flexibility allows OpenFlow to provide all features of MPLS and allows OpenFlow devices to co-exist with existing MPLS devices. Therefore, in this dissertation we explore a low-cost OpenFlow Software Defined Networking solution and compare its performance to that of MPLS. In summary, we develop methods for designing robust networks and evaluate software defined networking for communication and control in cyber physical systems where the smart grid is the system under consideration. Software defined networking Smart grid Power system communication OpenFlow Cyber physical systems Algebraic connectivity Computer Science (0984) Electrical Engineering (0544) Information Technology (0489)
220	A convolutive model for polyphonic instrument identification and pitch detection using combined classification Weese, Joshua L. January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / Pitch detection and instrument identification can be achieved with relatively high accuracy when considering monophonic signals in music; however, accurately classifying polyphonic signals in music remains an unsolved research problem. Pitch and instrument classification is a subset of Music Information Retrieval (MIR) and automatic music transcription, both having numerous research and real-world applications. Several areas of research are covered in this thesis, including the fast Fourier transform, onset detection, convolution, and filtering. Basic music theory and terms are also presented in order to explain the context and structure of data used. The focus of this thesis is on the representation of musical signals in the frequency domain. Polyphonic signals with many different voices and frequencies can be exceptionally complex. This thesis presents a new model for representing the spectral structure of polyphonic signals: Uniform MAx Gaussian Envelope (UMAGE). The new spectral envelope precisely approximates the distribution of frequency parts in the spectrum while still being resilient to oscillating rapidly (noise) and is able to generalize well without losing the representation of the original spectrum. When subjectively compared to other spectral envelope methods, such as the linear predictive coding envelope method and the cepstrum envelope method, UMAGE is able to model high order polyphonic signals without dropping partials (frequencies present in the signal). In other words, UMAGE is able to model a signal independent of the signal’s periodicity. The performance of UMAGE is evaluated both objectively and subjectively. It is shown that UMAGE is robust at modeling the distribution of frequencies in simple and complex polyphonic signals. Combined classification (combiners), a methodology for learning large concepts, is used to simplify the learning process and boost classification results. The output of each learner is then averaged to get the final result. UMAGE is less accurate when identifying pitches; however, it is able to achieve accuracy in identifying instrument groups on order-10 polyphonic signals (ten voices), which is competitive with the current state of the field. Machine learning Digital signal processing Music information retrieval Polyphonic Instrument Identification Polyphonic pitch detection Gaussian mixture Computer Science (0984) Information Science (0723) Music (0413)

Search results