Global ETD Search

301	Entropy Measurements and Ball Cover Construction for Biological Sequences Robertson, Jeffrey Alan 01 August 2018 (has links) As improving technology is making it easier to select or engineer DNA sequences that produce dangerous proteins, it is important to be able to predict whether a novel DNA sequence is potentially dangerous by determining its taxonomic identity and functional characteristics. These tasks can be facilitated by the ever increasing amounts of available biological data. Unfortunately, though, these growing databases can be difficult to take full advantage of due to the corresponding increase in computational and storage costs. Entropy scaling algorithms and data structures present an approach that can expedite this type of analysis by scaling with the amount of entropy contained in the database instead of scaling with the size of the database. Because sets of DNA and protein sequences are biologically meaningful instead of being random, they demonstrate some amount of structure instead of being purely random. As biological databases grow, taking advantage of this structure can be extremely beneficial. The entropy scaling sequence similarity search algorithm introduced here demonstrates this by accelerating the biological sequence search tools BLAST and DIAMOND. Tests of the implementation of this algorithm shows that while this approach can lead to improved query times, constructing the required entropy scaling indices is difficult and expensive. To improve performance and remove this bottleneck, I investigate several ideas for accelerating building indices that support entropy scaling searches. The results of these tests identify key tradeoffs and demonstrate that there is potential in using these techniques for sequence similarity searches. / Master of Science / As biological organisms are created and discovered, it is important to compare their genetic information to known organisms in order to detect possible harmful or dangerous properties. However, the collection of published genetic information from known organisms is huge and growing rapidly, making it difficult to search. This thesis shows that it might be possible to use the non-random properties of biological information to increase the speed and efficiency of searches; that is, because genetic sequences are not random but have common structures, the increase of known data does not mean a proportional increase in complexity, known as entropy. Specifically, when comparing a new sequence to a set of previously known sequences, it is important to choose the correct algorithms for comparing the similarity of two sequences, also known as the distance between them. This thesis explores the performance of entropy scaling algorithm compared to several conventional tools. Bioinformatics Entropy Scaling Sequence Search BLAST
302	The Search for a Cost Matrix to Solve Rare-Class Biological Problems Lawson, Mark Jon 10 December 2009 (has links) The rare-class data classification problem is a common one. It occurs when, in a dataset, the class of interest is far outweighed by other classes, thus making it difficult to classify using typical classification algorithms. These types of problems are found quite often in biological datasets, where data can be sparse and the class of interest has few representatives. A variety of solutions to this problem exist with varying degrees of success. In this paper, we present our solution to the rare-class problem. This solution uses MetaCost, a cost-sensitive meta-classifier, that takes in a classification algorithm, training data, and a cost matrix. This cost matrix adjusts the learning of the classification algorithm to classify more of the rare-class data but is generally unknown for a given dataset and classifier. Our method uses three different types of optimization techniques (greedy, simulated annealing, genetic algorithm) to determine this optimal cost matrix. In this paper we will show how this method can improve upon classification in a large amount of datasets, achieving better results along a variety of metrics. We will show how it can improve on different classification algorithms and do so better and more consistently than other rare-class learning techniques like oversampling and under-sampling. Overall our method is a robust and effective solution to the rare-class problem. / Ph. D. Local Search Bioinformatics Machine learning Classification
303	Scalability of Stepping Stones and Pathways Venkatachalam, Logambigai 30 May 2008 (has links) Information Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. "Search" is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP "proof-of-concept" implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents. / Master of Science Scalability Lucene CiteSeer Connection finding search framework
304	Transparency, trust, and level of detail in user interface design for human autonomy teaming Wang, Tianzi 03 November 2023 (has links) Effective collaboration between humans and autonomous agents can improve productivity and reduce risks of human operators in safety-critical situations, with autonomous agents working as complementary teammates and lowering physical and mental demands by providing assistance and recommendations in complicated scenarios. Ineffective collaboration would have drawbacks, such as risks of being out-of-the-loop when switching over controls, increased time and workload due to the additional needs for communication and situation assessment, unexpected outcomes due to overreliance, and disuse of autonomy due to uncertainty and low expectations. Disclosing the information about the agents for communication and collaboration is one approach to calibrate trust for appropriate reliance and overcome the drawbacks in human-autonomy teaming. When disclosing agent information, the level of detail (LOD) needs careful consideration because not only the availability of information but also the demand for information processing would change, resulting in unintended consequences on comprehension, workload, and task performance. This dissertation investigates how visualization design at different LODs about autonomy influences transparency, trust, and, ultimately, the effectiveness of human autonomy teaming (HAT) in search and rescue missions. LOD indicates the amount of information aggregated or organized in communication for the human to perceive, comprehend, and respond, and could be manipulated by changing the granularity of information in a user interface. High LOD delivers less information so that users can identify overview and key information of autonomy, while low LOD delivers information in a more detailed manner. The objectives of this research were (1) to build a simulation platform for a representative HAT task affected by visualizations at different LODs about autonomy, (2) to establish the empirical relationship between LOD and transparency, given potential information overload with indiscriminate exposure, and (3) examine how to adapt LOD in visualization with respect to trust as users interact with autonomy over time. A web-based application was developed for wilderness SAR, which can support different visualizations of the lost-person model, UAV path-planner, and task assignment. Two empirical studies were conducted recruiting human participants to collaborate with autonomous agents, making decisions on search area assignment, unmanned aerial vehicle path planning, and object detection. The empirical data included objective measures of task performance and compliance, subjective ratings of transparency, trust, and workload, and qualitative interview data about the designs with students and search and rescue professionals. The first study revealed that lowering LODs (i.e., more details) does not lead to a proportional increase in transparency (ratings), trust, workload, accuracy, and speed. Transparency increased with decreased LODs up to a point before the subsequent decline, providing empirical evidence for the transparency paradox phenomenon. Further, lowering LOD about autonomy can promote trust with diminishing returns and plateau even with lowering LOD further. This suggests that simply presenting some information about autonomy can build trust quickly, as the users may perceive any reasonable forms of disclosure as signs of benevolence or good etiquette that promote trust. Transparency appears more sensitive to LOD than trust, likely because trust is conceptually less connected to the understanding of autonomy than transparency. In addition, the impacts of LODs were not uniform across the human performance measurements. The visualization with the lowest LOD yielded the highest decision accuracy but the worst in decision speed and intermediate levels of workload, transparency, and trust. LODs could induce the speed-accuracy trade-off. That is, as LOD decreases, more cognitive resources are needed to process the increased amount of information; thus, processing speed decreases accordingly. The second study revealed patterns of overall and instantaneous trust with respect to visualization at different LODs. For static visualization, the lowest LOD resulted in higher transparency ratings than the middle and high LOD. The lowest LOD generated the highest overall trust amongst the static and adaptive LODs. For visualizations of all LODs, instantaneous trust increased and then stabilized after a series of interactions. However, the rate of change and plateau for trust varied with LODs and modes between static and adaptive. The lowest, middle, and adaptive LODs followed a sigmoid curve, while the high LOD followed a linear one. Among the static LODs, the lowest LOD exhibits the highest growth rate and plateau in trust. The middle LOD developed trust the slowest and reached the lowest plateau. The high LOD showed a linear growth rate until a level similar to that of the lowest LOD. Adaptive LOD earned the trust of the participants at a very similar speed and plateau as the lowest LOD. Taking these results together, more details about autonomy are effective for expediting the process of building trust, as long as the amount of information is carefully managed to prevent overloading participants' information processing. Further, varying quantities of information in adaptive mode could yield very similar growth and plateau in trust, helping humans to deal with either the minimum or maximum amount of information. This adaptive approach could prevent situations where comprehension is hindered due to insufficient information or where users are potentially overloaded by details. Adapting LODs to instantaneous trust presents a promising technique for managing information exchange that can promote the efficiency of communication for building trust. The contribution of this research to literature is two-fold. The first study provides the first empirical evidence indicating that the impact of LODs on transparency and trust is not linear, which has not been explicitly demonstrated in prior studies about HAT. The impact of LOD on transparency is more sensitive than trust, calling for a more defined and consistent use of the term or concept - "transparency" and a deeper investigation into the relationships between trust and transparency. The second study presents the first examination of how static and dynamic LODs can influence the development of trust toward autonomy. The algorithm for adapting LOD for the adaptive visualization based on user trust is novel, and adaptive LODs in visualization could switch between detailed and abstract information to influence trust without always transmitting all the details about autonomy. Visualizations with different LODs in both static and adaptive modes present their own set of benefits and drawbacks, resulting in trade-offs concerning the speed of promoting trust and information quantity transmitted during communication. These findings indicate that LOD is an important factor for designing and analyzing visualization for transparency and trust in HAT. / Doctor of Philosophy / The collaboration between human and autonomous agents in search and rescue (SAR) missions aims to improve the success rate and speed of finding the lost person. In these missions, a human supervisor may coordinate with autonomous agents responsible for estimating lost person behavior, path planning, and unmanned aerial vehicles. The human SAR professional may rely on information from the autonomous agents to reinforce the search plan and make crucial decisions. Balancing the amount of information provided by the autonomous agents to the SAR professionals is critical, as insufficient information can hinder trust, leading to manual intervention, and excessive information can cause information overload, reducing efficiency. Both cases can result in human distrust of autonomy. Effective visualization of information can help study and improve the transmission of information between humans and autonomous agents. This approach can reduce unnecessary information in communication, thus conserving communication resources without sacrificing trust. This dissertation investigates how visualization design at the proper aggregation of details about autonomy, also referred to as level of detail (LOD), influences perceived understanding of the autonomous agents (i.e., transparency), trust, and ultimately, the effectiveness of human autonomy teaming (HAT) for wilderness SAR. A simulation platform was built for proof-of-concept, and two studies were conducted recruiting human participants to use the platform for completing simulated SAR tasks supported by visualizations at different LODs about autonomy. Study 1 results showed that transparency ratings increased with more details about autonomy up to a point and then declined with the most details (i.e., lowest LOD). Trust, workload, and performance also did not linearly improve with more details about autonomy. The non-linear relationships of LODs with transparency, trust, workload, and performance, confirmed the phenomenon of the transparency paradox, which refers to the disclosure of excessive information about autonomy may hinder transparency and subsequent performance. Study 2 results also illustrated that when visualization with LOD adapted to instant trust, the speed of building trust and the plateau of trust on autonomy can achieve the same level as the visualization provided with the most details, which performed the best in building trust. This adaptive approach minimized the amount of information displayed relative to the visualization, constantly presenting the most information, potentially easing the burden of communication. Taken together, this research highlights that the amount of information about autonomy to display must be considered carefully for both research and practice. Further, this dissertation advances the visualization design by illustrating that visualization adapting LODs based on trust is effective at building trust in a manner that minimizes the amount of information presented to the user. transparency trust human autonomy teaming search and rescue
305	Auditors’ Information Search and Documentation: Does Knowledge of the Client Preference Or PCAOB Accountability Pressure Matter? Olvera, Renee M. 05 1900 (has links) Auditors regularly make judgments regarding whether a client’s chosen accounting policy is appropriate and in accordance with generally accepted accounting Principles (GAAP). However, to form this judgment, auditors must either possess adequate topic-specific knowledge or must gain such knowledge through information search. This search is subject to numerous biases, including a bias toward confirmation of a client’s preference. It is important to further our understanding of bias in auditors’ information search to identify its causes and effects. Furthering our understanding is necessary to provide a basis for recommending and evaluating a potential debiaser, such as accountability. the Public Company Accounting Oversight Board (PCAOB) annually inspects the audit files of selected engagements, which introduces a new form of accountability within the auditing profession. This new form of accountability has come at great cost, however, there is little empirical evidence regarding its effects on auditors’ processes. As such, it is important to understand whether the presence of accountability from the PCAOB is effective in modifying auditors’ search behaviors to diminish confirmation bias. Using an online experiment, I manipulate client preference (unknown vs. known) and PCAOB accountability pressure (low vs. high) and measure search type (information –focus or decision-focus), search depth (shallow or deep) and documentation quality. I investigate whether auditors’ information search behaviors differ based on knowledge of client’s preference and in the presence of accountability from an expected PCAOB inspection. I also investigate whether differences in auditors’ information search behaviors influence documentation quality, which is the outcome of greatest concern to the PCAOB. I hypothesize and find a client preference effect on information search type such that auditors with knowledge of the client preference consider guidance associated with the client’s preference longer than those without knowledge of the client’s preference. Contrary to expectations, PCAOB accountability pressure does not influence information search depth. with respect to documentation quality, I find that auditors engaged in a more information-focused search have higher documentation quality. Further, as expected, auditors who initially engage in a decision-focus and deep search have higher documentation quality than those auditors who initially engaged in a decision-focused but shallow search. PCAOB client preference information search documentation
306	DATALOG WITH CONTRAINTS: A NEW ANSWER-SET PROGRAMMING FORMALISM East, Deborah J. 01 January 2001 (has links) Knowledge representation and search are two fundamental areas of artificial intelligence. Knowledge representation is the area of artificial intelligence which deals with capturing, in a formal language, the properties of objects and the relationships between objects. Search is a systematic examination of all possible candidate solutions to a problem that is described as a theory in some knowledge representation formalism. We compare traditional declarative programming formalisms such as PROLOG and DATALOG with answer-set programming formalisms such as logic programming with stable model semantic. In this thesis we develop an answer-set formalism we can DC. The logic of DC is based on the logic of prepositional schemata and a version of Closed World Assumption. Two important features of the DC logic is that it supports modeling of the cardinalities of sets and Horn clauses. These two features facilitate modeling of search problems. The DC system includes and implementation of a grounder and a solver. The grounder for the DC system grounds instances of problems retaining the structure of the cardinality of sets. The resulting theories are thus more concise. In addition, the solver for the DC system utilizes the structure of cardinality of sets to perform more efficient search. The second feature, Horn clauses, are used when transitive closure will eliminate the need for additional variables. The semantics of the Horn clauses are retained in the grounded theories. This also results in more concise theories. Our goal in developing DC is to provide the computer science community with a system which facilitates modeling of problems, is easy to use, is efficient and captures the class of problems in NP-search. We show experimental results comparing DC to other systems. These results show that DC is always competitive with state-of-the-art answer-set programming systems and for many problems DC is more efficient.
307	Direct Search Methods for Nonsmooth Problems using Global Optimization Techniques Robertson, Blair Lennon January 2010 (has links) This thesis considers the practical problem of constrained and unconstrained local optimization. This subject has been well studied when the objective function f is assumed to smooth. However, nonsmooth problems occur naturally and frequently in practice. Here f is assumed to be nonsmooth or discontinuous without forcing smoothness assumptions near, or at, a potential solution. Various methods have been presented by others to solve nonsmooth optimization problems, however only partial convergence results are possible for these methods. In this thesis, an optimization method which use a series of local and localized global optimization phases is proposed. The local phase searches for a local minimum and gives the methods numerical performance on parts of f which are smooth. The localized global phase exhaustively searches for points of descent in a neighborhood of cluster points. It is the localized global phase which provides strong theoretical convergence results on nonsmooth problems. Algorithms are presented for solving bound constrained, unconstrained and constrained nonlinear nonsmooth optimization problems. These algorithms use direct search methods in the local phase as they can be applied directly to nonsmooth problems because gradients are not explicitly required. The localized global optimization phase uses a new partitioning random search algorithm to direct random sampling into promising subsets of ℝⁿ. The partition is formed using classification and regression trees (CART) from statistical pattern recognition. The CART partition defines desirable subsets where f is relatively low, based on previous sampling, from which further samples are drawn directly. For each algorithm, convergence to an essential local minimizer of f is demonstrated under mild conditions. That is, a point x* for which the set of all feasible points with lower f values has Lebesgue measure zero for all sufficiently small neighborhoods of x*. Stopping rules are derived for each algorithm giving practical convergence to estimates of essential local minimizers. Numerical results are presented on a range of nonsmooth test problems for 2 to 10 dimensions showing the methods are effective in practice. nonsmooth optimization CART CARTopt direct search numerical results random search Hooke and Jeeves filter method
308	Analýza činnosti záchranářských psů v rámci Integrovaného záchranného systému / Activity analysis Cynology Rescue Services Jozífová, Kristýna January 2011 (has links) Title: The analysis of the activities of rescue dogs in the Integred Rescue System. Objectives: To analyze the frequency of deploying dogs in action under the Integrated Rescue Systém for the period 2009 to 2011. Describe reasons for Access and silure to deploy dogs in the IRS. Method: Research of available sources, data collection. Results: Every year is the use of increasingly cynological pairs. Most service dogs are being used. At the request of debris search requirements are mainly from the Fire and Rescue Service area search by police of the Czech Republic. Keywords: Integrated Rescue Systém, dog, debris search, area search.
309	Advancing large scale object retrieval Arandjelovic, Relja January 2013 (has links) The objective of this work is object retrieval in large scale image datasets, where the object is specified by an image query and retrieval should be immediate at run time. Such a system has a wide variety of applications including object or location recognition, video search, near duplicate detection and 3D reconstruction. The task is very challenging because of large variations in the imaged object appearance due to changes in lighting conditions, scale and viewpoint, as well as partial occlusions. A starting point of established systems which tackle the same task is detection of viewpoint invariant features, which are then quantized into visual words and efficient retrieval is performed using an inverted index. We make the following three improvements to the standard framework: (i) a new method to compare SIFT descriptors (RootSIFT) which yields superior performance without increasing processing or storage requirements; (ii) a novel discriminative method for query expansion; (iii) a new feature augmentation method. Scaling up to searching millions of images involves either distributing storage and computation across many computers, or employing very compact image representations on a single computer combined with memory-efficient approximate nearest neighbour search (ANN). We take the latter approach and improve VLAD, a popular compact image descriptor, using: (i) a new normalization method to alleviate the burstiness effect; (ii) vocabulary adaptation to reduce influence of using a bad visual vocabulary; (iii) extraction of multiple VLADs for retrieval and localization of small objects. We also propose a method, SCT, for extremely low bit-rate compression of descriptor sets in order to reduce the memory footprint of ANN. The problem of finding images of an object in an unannotated image corpus starting from a textual query is also considered. Our approach is to first obtain multiple images of the queried object using textual Google image search, and then use these images to visually query the target database. We show that issuing multiple queries significantly improves recall and enables the system to find quite challenging occurrences of the queried object. Current retrieval techniques work only for objects which have a light coating of texture, while failing completely for smooth (fairly textureless) objects best described by shape. We present a scalable approach to smooth object retrieval and illustrate it on sculptures. A smooth object is represented by its imaged shape using a set of quantized semi-local boundary descriptors (a bag-of-boundaries); the representation is suited to the standard visual word based object retrieval. Furthermore, we describe a method for automatically determining the title and sculptor of an imaged sculpture using the proposed smooth object retrieval system. 006
310	Método beam search aplicado a problemas de programação da produção / Beam search method for scheduling problems Jesus Filho, José Eurípedes Ferreira de 05 December 2018 (has links) Nesta tese, dois diferentes problemas de programação da produção são abordados, o Flexible JobShop Scheduling Problem com Flexibilidade de sequenciamento e o Flowshop Scheduling Problem com tempos de espera e permutação de sequência. Para ambos, inicialmente um algoritmo list scheduling (LS) que explora características do problema é desenvolvido e então estendido para um método do tipo Beam Search (BS) que utiliza o LS em seus principais elementos: (1) expansão dos níveis, (2) avaliação local dos candidatos e (3) avaliação global dos candidatos. Todos os métodos propostos são determinísticos e seus pseudocódigos são cuidadosamente descritos para garantir a replicabilidade dos resultados reportados. O desempenho dos métodos propostos são avaliados utilizando instâncias e outros métodos heurísticos da literatura. Os resultados computacionais obtidos mostram a eficiência das heurísticas propostas que superaram os métodos da literatura utilizando pouco tempo computacional. / In this thesis two diferent scheduling problems were addressed, the Flexible Job Shop Scheduling Problem with sequence Flexibility and the Flowshop Scheduling Problem with waiting times and sequence permutation. For both problems, firstly, a list scheduling (LS) algorithm which exploit features of the problem was developed and then it was extedend to a Beam Search (BS) method which use the LS in his main features: (1) level expansion, (2) local evaluation and (3) global evaluation. All the proposed methods are deterministics and their pseudocodes are carefully described to ensure the replicability of the reported results. The performance of the proposed methods was evaluated using instances and other heuristic methods found in literature. The computational results show the eficiency of the proposed heuristics, which outperformed the literature methods while using low computational time. Beam search Beam search Flexible jobshop Flowshop Flowshop Jobshop Jobshop Jobshop flexível List scheduling List scheduling

Search results