• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 17
  • 11
  • 8
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 111
  • 42
  • 42
  • 25
  • 20
  • 13
  • 12
  • 11
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Combinatorial Properties of Periodic Patterns in Compressed Strings

Pape-Lange, Julian 07 November 2023 (has links)
In this thesis, we study the following three types of periodic string patterns and some of their variants. Firstly, we consider maximal d-repetitions. These are substrings that are at least 2+d times as long as their minimum period. Secondly, we consider 3-cadences. These are arithmetic subsequence of three equal characters. Lastly, we consider maximal pairs. These are pairs of identical substrings. Maximal d-repetitions and maximal pairs of uncompressed strings are already well-researched. However, no non-trivial upper bound for distinct occurrences of these patterns that take the compressed size of the underlying strings into account were known prior to this research. We provide upper bounds for several variants of these two patterns that depend on the compressed size of the string, the logarithm of the string's length, the highest allowed power and d. These results also lead to upper bounds and new insights for the compacted directed acyclic word graph and the run-length encoded Burrows-Wheeler transform. We prove that cadences with three elements can be efficiently counted in uncompressed strings and can even be efficiently detected on grammar-compressed binary strings. We also show that even slightly more difficult variants of this problem are already NP-hard on compressed strings. Along the way, we extend the underlying geometry of the convolution from rectangles to arbitrary polygons. We also prove that this non-rectangular convolution can still be efficiently computed.:1 Introduction 2 Preliminaries 3 Non-Rectangular Convolution 4 Alphabet Reduction 39 5 Maximal (Sub-)Repetitions 6 Cadences 7 Maximal Pairs A Propositions
102

On the Complexity of Binary Polynomial Optimization Over Acyclic Hypergraphs

Del Pia, Alberto, Di Gregorio, Silvia 19 March 2024 (has links)
In this work, we advance the understanding of the fundamental limits of computation for binary polynomial optimization (BPO), which is the problem of maximizing a given polynomial function over all binary points. In our main result we provide a novel class of BPO that can be solved efficiently both from a theoretical and computational perspective. In fact, we give a strongly polynomial-time algorithm for instances whose corresponding hypergraph is β-acyclic. We note that the β-acyclicity assumption is natural in several applications including relational database schemes and the lifted multicut problem on trees. Due to the novelty of our proving technique, we obtain an algorithm which is interesting also from a practical viewpoint. This is because our algorithm is very simple to implement and the running time is a polynomial of very low degree in the number of nodes and edges of the hypergraph. Our result completely settles the computational complexity of BPO over acyclic hypergraphs, since the problem is NP-hard on α-acyclic instances.Our algorithm can also be applied to any general BPO problem that contains β-cycles. For these problems, the algorithm returns a smaller instance together with a rule to extend any optimal solution of the smaller instance to an optimal solution of the original instance.
103

General dynamic Yannakakis: Conjunctive queries with theta joins under updates

Idris, Muhammad, Ugarte, Martín, Vansummeren, Stijn, Voigt, Hannes, Lehner, Wolfgang 17 July 2023 (has links)
The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. In prior work, we have proposed general dynamic Yannakakis (GDYN), a general framework for dynamically processing acyclic conjunctive queries with θ-joins in the presence of data updates. Whereas traditional approaches face a trade-off between materialization of subresults (to avoid inefficient recomputation) and recomputation of subresults (to avoid the potentially large space overhead of materialization), GDYN is able to avoid this trade-off. It intelligently maintains a succinct data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In this paper, we consolidate and extend the development of GDYN. First, we give full formal proof of GDYN ’s correctness and complexity. Second, we present a novel algorithm for computing GDYN query plans. Finally, we instantiate GDYN to the case where all θ-joins are inequalities and present extended experimental comparison against state-of-the-art engines. Our approach performs consistently better than the competitor systems with multiple orders of magnitude improvements in both time and memory consumption.
104

Multiple sequence analysis in the presence of alignment uncertainty

Herman, Joseph L. January 2014 (has links)
Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.
105

Real-time Business Intelligence through Compact and Efficient Query Processing Under Updates

Idris, Muhammad 05 March 2019 (has links) (PDF)
Responsive analytics are rapidly taking over the traditional data analytics dominated by the post-fact approaches in traditional data warehousing. Recent advancements in analytics demand placing analytical engines at the forefront of the system to react to updates occurring at high speed and detect patterns, trends, and anomalies. These kinds of solutions find applications in Financial Systems, Industrial Control Systems, Business Intelligence and on-line Machine Learning among others. These applications are usually associated with Big Data and require the ability to react to constantly changing data in order to obtain timely insights and take proactive measures. Generally, these systems specify the analytical results or their basic elements in a query language, where the main task then is to maintain query results under frequent updates efficiently. The task of reacting to updates and analyzing changing data has been addressed in two ways in the literature: traditional business intelligence (BI) solutions focus on historical data analysis where the data is refreshed periodically and in batches, and stream processing solutions process streams of data from transient sources as flows of data items. Both kinds of systems share the niche of reacting to updates (known as dynamic evaluation), however, they differ in architecture, query languages, and processing mechanisms. In this thesis, we investigate the possibility of a reactive and unified framework to model queries that appear in both kinds of systems.In traditional BI solutions, evaluating queries under updates has been studied under the umbrella of incremental evaluation of queries that are based on the relational incremental view maintenance model and mostly focus on queries that feature equi-joins. Streaming systems, in contrast, generally follow automaton based models to evaluate queries under updates, and they generally process queries that mostly feature comparisons of temporal attributes (e.g. timestamp attributes) along with comparisons of non-temporal attributes over streams of bounded sizes. Temporal comparisons constitute inequality constraints while non-temporal comparisons can either be equality or inequality constraints. Hence these systems mostly process inequality joins. As a starting point for our research, we postulate the thesis that queries in streaming systems can also be evaluated efficiently based on the paradigm of incremental evaluation just like in BI systems in a main-memory model. The efficiency of such a model is measured in terms of runtime memory footprint and the update processing cost. To this end, the existing approaches of dynamic evaluation in both kinds of systems present a trade-off between memory footprint and the update processing cost. More specifically, systems that avoid materialization of query (sub)results incur high update latency and systems that materialize (sub)results incur high memory footprint. We are interested in investigating the possibility to build a model that can address this trade-off. In particular, we overcome this trade-off by investigating the possibility of practical dynamic evaluation algorithm for queries that appear in both kinds of systems and present a main-memory data representation that allows to enumerate query (sub)results without materialization and can be maintained efficiently under updates. We call this representation the Dynamic Constant Delay Linear Representation (DCLRs).We devise DCLRs with the following properties: 1) they allow, without materialization, enumeration of query results with bounded-delay (and with constant delay for a sub-class of queries), 2) they allow tuple lookup in query results with logarithmic delay (and with constant delay for conjunctive queries with equi-joins only), 3) they take space linear in the size of the database, 4) they can be maintained efficiently under updates. We first study the DCLRs with the above-described properties for the class of acyclic conjunctive queries featuring equi-joins with projections and present the dynamic evaluation algorithm called the Dynamic Yannakakis (DYN) algorithm. Then, we present the generalization of the DYN algorithm to the class of acyclic queries featuring multi-way Theta-joins with projections and call it Generalized DYN (GDYN). We devise DCLRs with the above properties for acyclic conjunctive queries, and the working of DYN and GDYN over DCLRs are based on a particular variant of join trees, called the Generalized Join Trees (GJTs) that guarantee the above-described properties of DCLRs. We define GJTs and present algorithms to test a conjunctive query featuring Theta-joins for acyclicity and to generate GJTs for such queries. We extend the classical GYO algorithm from testing a conjunctive query with equalities for acyclicity to testing a conjunctive query featuring multi-way Theta-joins with projections for acyclicity. We further extend the GYO algorithm to generate GJTs for queries that are acyclic.GDYN is hence a unified framework based on DCLRs that enables processing of queries that appear in streaming systems as well as in BI systems in a unified main-memory model and addresses the space-time trade-off. We instantiate GDYN to the particular case where all Theta-joins involve only equalities and inequalities and call this instantiation IEDYN. We implement DYN and IEDYN as query compilers that generate executable programs in the Scala programming language and provide all the necessary data structures and their maintenance and enumeration methods in a continuous stream processing model. We evaluate DYN and IEDYN against state-of-the-art BI and streaming systems on both industrial and synthetically generated benchmarks. We show that DYN and IEDYN outperform the existing systems by over an order of magnitude efficiency in both memory footprint and update processing time. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
106

Finite element modeling of electromagnetic radiation and induced heat transfer in the human body

Kim, Kyungjoo 24 September 2013 (has links)
This dissertation develops adaptive hp-Finite Element (FE) technology and a parallel sparse direct solver enabling the accurate modeling of the absorption of Electro-Magnetic (EM) energy in the human head. With a large and growing number of cell phone users, the adverse health effects of EM fields have raised public concerns. Most research that attempts to explain the relationship between exposure to EM fields and its harmful effects on the human body identifies temperature changes due to the EM energy as the dominant source of possible harm. The research presented here focuses on determining the temperature distribution within the human body exposed to EM fields with an emphasis on the human head. Major challenges in accurately determining the temperature changes lie in the dependence of EM material properties on the temperature. This leads to a formulation that couples the BioHeat Transfer (BHT) and Maxwell equations. The mathematical model is formed by the time-harmonic Maxwell equations weakly coupled with the transient BHT equation. This choice of equations reflects the relevant time scales. With a mobile device operating at a single frequency, EM fields arrive at a steady-state in the micro-second range. The heat sources induced by EM fields produce a transient temperature field converging to a steady-state distribution on a time scale ranging from seconds to minutes; this necessitates the transient formulation. Since the EM material properties depend upon the temperature, the equations are fully coupled; however, the coupling is realized weakly due to the different time scales for Maxwell and BHT equations. The BHT equation is discretized in time with a time step reflecting the thermal scales. After multiple time steps, the temperature field is used to determine the EM material properties and the time-harmonic Maxwell equations are solved. The resulting heat sources are recalculated and the process continued. Due to the weak coupling of the problems, the corresponding numerical models are established separately. The BHT equation is discretized with H¹ conforming elements, and Maxwell equations are discretized with H(curl) conforming elements. The complexity of the human head geometry naturally leads to the use of tetrahedral elements, which are commonly employed by unstructured mesh generators. The EM domain, including the head and a radiating source, is terminated by a Perfectly Matched Layer (PML), which is discretized with prismatic elements. The use of high order elements of different shapes and discretization types has motivated the development of a general 3D hp-FE code. In this work, we present new generic data structures and algorithms to perform adaptive local refinements on a hybrid mesh composed of different shaped elements. A variety of isotropic and anisotropic refinements that preserve conformity of discretization are designed. The refinement algorithms support one- irregular meshes with the constrained approximation technique. The algorithms are experimentally proven to be deadlock free. A second contribution of this dissertation lies with a new parallel sparse direct solver that targets linear systems arising from hp-FE methods. The new solver interfaces to the hierarchy of a locally refined mesh to build an elimination ordering for the factorization that reflects the h-refinements. By following mesh refinements, not only the computation of element matrices but also their factorization is restricted to new elements and their ancestors. The solver is parallelized by exploiting two-level task parallelism: tasks are first generated from a parallel post-order tree traversal on the assembly tree; next, those tasks are further refined by using algorithms-by-blocks to gain fine-grained parallelism. The resulting fine-grained tasks are asynchronously executed after their dependencies are analyzed. This approach effectively reduces scheduling overhead and increases flexibility to handle irregular tasks. The solver outperforms the conventional general sparse direct solver for a class of problems formulated by high order FEs. Finally, numerical results for a 3D coupled BHT with Maxwell equations are presented. The solutions of this Maxwell code have been verified using the analytic Mie series solutions. Starting with simple spherical geometry, parametric studies are conducted on realistic head models for a typical frequency band (900 MHz) of mobile phones. / text
107

Average case analysis of algorithms for the maximum subarray problem

Bashar, Mohammad Ehsanul January 2007 (has links)
Maximum Subarray Problem (MSP) is to find the consecutive array portion that maximizes the sum of array elements in it. The goal is to locate the most useful and informative array segment that associates two parameters involved in data in a 2D array. It's an efficient data mining method which gives us an accurate pattern or trend of data with respect to some associated parameters. Distance Matrix Multiplication (DMM) is at the core of MSP. Also DMM and MSP have the worst-case complexity of the same order. So if we improve the algorithm for DMM that would also trigger the improvement of MSP. The complexity of Conventional DMM is O(n³). In the average case, All Pairs Shortest Path (APSP) Problem can be modified as a fast engine for DMM and can be solved in O(n² log n) expected time. Using this result, MSP can be solved in O(n² log² n) expected time. MSP can be extended to K-MSP. To incorporate DMM into K-MSP, DMM needs to be extended to K-DMM as well. In this research we show how DMM can be extended to K-DMM using K-Tuple Approach to solve K-MSP in O(Kn² log² n log K) time complexity when K ≤ n/log n. We also present Tournament Approach which solves K-MSP in O(n² log² n + Kn²) time complexity and outperforms the K-Tuple
108

Dynamic Programming Algorithms for Semantic Dependency Parsing / Algoritmer för semantisk dependensparsning baserade på dynamisk programmering

Axelsson, Nils January 2017 (has links)
Dependency parsing can be a useful tool to allow computers to parse text. In 2015, Kuhlmann and Jonsson proposed a logical deduction system that parsed to non-crossing dependency graphs with an asymptotic time complexity of O(n3), where “n” is the length of the sentence to parse. This thesis extends the deduction system by Kuhlmann and Jonsson; the extended deduction system introduces certain crossing edges, while maintaining an asymptotic time complexity of O(n4). In order to extend the deduction system by Kuhlmann and Jonsson, fifteen logical item types are added to the five proposed by Kuhlmann and Jonsson. These item types allow the deduction system to intro-duce crossing edges while acyclicity can be guaranteed. The number of inference rules in the deduction system is increased from the 19 proposed by Kuhlmann and Jonsson to 172, mainly because of the larger number of combinations of the 20 item types. The results are a modest increase in coverage on test data (by roughly 10% absolutely, i.e. approx. from 70% to 80%), and a comparable placement to that of Kuhlmann and Jonsson by the SemEval 2015 task 18 metrics. By the method employed to introduce crossing edges, derivational uniqueness is impossible to maintain. It is hard to defien the graph class to which the extended algorithm, QAC, parses, and it is therefore empirically compared to 1-endpoint crossing and graphs with a page number of two or less, compared to which it achieves lower coverage on test data. The QAC graph class is not limited by page number or crossings. The takeaway of the thesis is that extending a very minimal deduction system is not necessarily the best approach, and that it may be better to start off with a strong idea of to which graph class the extended algorithm should parse. Additionally, several alternative ways of extending Kuhlmann and Jonsson are proposed. / Dependensparsning kan vara ett användbart verktyg för att få datorer att kunna läsa text. Kuhlmann och Jonsson kom 2015 fram till ett logiskt deduktionssystem som kan parsa till ickekorsande grafer med en asymptotisk tidskomplexitet O(n3), där "n" är meningens som parsas längd. Detta arbete utökar Kuhlmann och Jonssons deduktionssystem så att det kan introducera vissa korsande bågar, medan en asymptotisk tidskomplexitet O(n4) uppnås. För att tillåta deduktionssystemet att introducera korsande bågar, introduceras 15 nya logiska delgrafstyper, eller item. Dessa item-typer tillåter deduktionssystemet att introducera korsande bågar på ett sådant sätt att acyklicitet bibehålls. Antalet logiska inferensregler tags från Kuhlmanns och Jonssons 19 till 172, på grund av den större mängden kombinationer av de nu 20 item-typerna. Resultatet är en mindre ökning av täckning på testdata (ungefär 10 procentenheter, d v s från cirka 70% till 80%), och jämförbar placering med Kuhlmann och Jonsson enligt måtten från uppgift 18 från SemEval 2015. Härledningsunikhet kan inte garanteras på grund av hur bågar introduceras i det nya deduktionssystemet. Den utökade algoritmen, QAC, parsar till en svårdefinierad grafklass, som jämförs empiriskt med 1-endpoint-crossing-grafer och grafer med pagenumber 2 eller mindre. QAC:s grafklass har lägre täckning än båda dessa, och har ingen högre gräns i pagenumber eller antal korsningar. Slutsatsen är att det inte nödvändigtvis är optimalt att utöka ett mycket minimalt och specifikt deduktionssystem, och att det kan vara bättre att inleda processen med en specifik grafklass i åtanke. Dessutom föreslås flera alternativa metoder för att utöka Kuhlmann och Jonsson.
109

Designing a Novel RPL Objective Function & Testing RPL Objective Functions Performance

Mardini, Khalil, Abdulsamad, Emad January 2023 (has links)
The use of Internet of Things systems has increased to meet the need for smart systems in various fields, such as smart homes, intelligent industries, medical systems, agriculture, and the military. IoT networks are expanding daily to include hundreds and thousands of IoT devices, which transmit information through other linked devices to reach the network sink or gateway. The information follows different routes to the network sink. Finding an ideal routing solution is a big challenge due to several factors, such as power, computation, storage, and memory limitation for IoT devices. In 2011, A new standardized routing protocol for low-power and lossy networks was released by the Internet Engineering task force (IETF). The IETF adopted a distance vector routing algorithm for the RPL protocol. RPL protocol utilizes the objective functions (OFs) to select the path depending on diffident metrics.These OFs with different metrics must be evaluated and tested to develop the best routing solution.This project aims to test the performance of standardized RPL objective functions in a simulation environment. Afterwards, a new objective function with a new metric will be implemented and tested in the same environmental conditions. The performance results of the standard objective functions and the newly implemented objective function will be analyzed and compared to evaluate whether the standard objective functions or the new objective function is better as a routing solution for the IoT devices network.
110

Structure learning of Bayesian networks via data perturbation / Aprendizagem estrutural de Redes Bayesianas via perturbação de dados

Gross, Tadeu Junior 29 November 2018 (has links)
Structure learning of Bayesian Networks (BNs) is an NP-hard problem, and the use of sub-optimal strategies is essential in domains involving many variables. One of them is to generate multiple approximate structures and then to reduce the ensemble to a representative structure. It is possible to use the occurrence frequency (on the structures ensemble) as the criteria for accepting a dominant directed edge between two nodes and thus obtaining the single structure. In this doctoral research, it was made an analogy with an adapted one-dimensional random-walk for analytically deducing an appropriate decision threshold to such occurrence frequency. The obtained closed-form expression has been validated across benchmark datasets applying the Matthews Correlation Coefficient as the performance metric. In the experiments using a recent medical dataset, the BN resulting from the analytical cutoff-frequency captured the expected associations among nodes and also achieved better prediction performance than the BNs learned with neighbours thresholds to the computed. In literature, the feature accounted along of the perturbed structures has been the edges and not the directed edges (arcs) as in this thesis. That modified strategy still was applied to an elderly dataset to identify potential relationships between variables of medical interest but using an increased threshold instead of the predict by the proposed formula - such prudence is due to the possible social implications of the finding. The motivation behind such an application is that in spite of the proportion of elderly individuals in the population has increased substantially in the last few decades, the risk factors that should be managed in advance to ensure a natural process of mental decline due to ageing remain unknown. In the learned structural model, it was graphically investigated the probabilistic dependence mechanism between two variables of medical interest: the suspected risk factor known as Metabolic Syndrome and the indicator of mental decline referred to as Cognitive Impairment. In this investigation, the concept known in the context of BNs as D-separation has been employed. Results of the carried out study revealed that the dependence between Metabolic Syndrome and Cognitive Variables indeed exists and depends on both Body Mass Index and age. / O aprendizado da estrutura de uma Rede Bayesiana (BN) é um problema NP-difícil, e o uso de estratégias sub-ótimas é essencial em domínios que envolvem muitas variáveis. Uma delas consiste em gerar várias estruturas aproximadas e depois reduzir o conjunto a uma estrutura representativa. É possível usar a frequência de ocorrência (no conjunto de estruturas) como critério para aceitar um arco dominante entre dois nós e assim obter essa estrutura única. Nesta pesquisa de doutorado, foi feita uma analogia com um passeio aleatório unidimensional adaptado para deduzir analiticamente um limiar de decisão apropriado para essa frequência de ocorrência. A expressão de forma fechada obtida foi validada usando bases de dados de referência e aplicando o Coeficiente de Correlação de Matthews como métrica de desempenho. Nos experimentos utilizando dados médicos recentes, a BN resultante da frequência de corte analítica capturou as associações esperadas entre os nós e também obteve melhor desempenho de predição do que as BNs aprendidas com limiares vizinhos ao calculado. Na literatura, a característica contabilizada ao longo das estruturas perturbadas tem sido as arestas e não as arestas direcionadas (arcos) como nesta tese. Essa estratégia modificada ainda foi aplicada a um conjunto de dados de idosos para identificar potenciais relações entre variáveis de interesse médico, mas usando um limiar aumentado em vez do previsto pela fórmula proposta - essa cautela deve-se às possíveis implicações sociais do achado. A motivação por trás dessa aplicação é que, apesar da proporção de idosos na população ter aumentado substancialmente nas últimas décadas, os fatores de risco que devem ser controlados com antecedência para garantir um processo natural de declínio mental devido ao envelhecimento permanecem desconhecidos. No modelo estrutural aprendido, investigou-se graficamente o mecanismo de dependência probabilística entre duas variáveis de interesse médico: o fator de risco suspeito conhecido como Síndrome Metabólica e o indicador de declínio mental denominado Comprometimento Cognitivo. Nessa investigação, empregou-se o conceito conhecido no contexto de BNs como D-separação. Esse estudo revelou que a dependência entre Síndrome Metabólica e Variáveis Cognitivas de fato existe e depende tanto do Índice de Massa Corporal quanto da idade.

Page generated in 0.0419 seconds