• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 26
  • 12
  • 5
  • 4
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 62
  • 15
  • 14
  • 13
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • 11
  • 10
  • 10
  • 10
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

A security architecture for protecting dynamic components of mobile agents

Yao, Ming January 2004 (has links)
New techniques,languages and paradigms have facilitated the creation of distributed applications in several areas. Perhaps the most promising paradigm is the one that incorporates the mobile agent concept. A mobile agent in a large scale network can be viewed as a software program that travels through a heterogeneous network, crossing various security domains and executing autonomously in its destination. Mobile agent technology extends the traditional network communication model by including mobile processes, which can autonomously migrate to new remote servers. This basic idea results in numerous benefits including flexible, dynamic customisation of the behavior of clients and servers and robust interaction over unreliable networks. In spite of its advantages, widespread adoption of the mobile agent paradigm is being delayed due to various security concerns. Currently available mechanisms for reducing the security risks of this technology do not e±ciently cover all the existing threats. Due to the characteristics of the mobile agent paradigm and the threats to which it is exposed, security mechanisms must be designed to protect both agent hosting servers and agents. Protection to agent-hosting servers' security is a reasonably well researched issue, and many viable mechanisms have been developed to address it. Protecting agents is technically more challenging and solutions to do so are far less developed. The primary added complication is that, as an agent traverses multiple servers that are trusted to different degrees, the agent's owner has no control over the behaviors of the agent-hosting servers. Consequently the hosting servers can subvert the computation of the passing agent. Since it is infeasible to enforce the remote servers to enact the security policy that may prevent the server from corrupting agent's data, cryptographic mechanisms defined by the agent's owner may be one of the feasible solutions to protect agent's data.Hence the focus of this thesis is the development and deployment of cryptographic mechanisms for securing mobile agents in an open environment. Firstly, requirements for securing mobile agents' data are presented. For a sound mobile agent application, the data in an agent that is collected from each visiting server must be provided integrity. In some applications where servers intend to keep anonymous and will reveal their identities only under certain cir- cumstances, privacy is also necessitated. Aimed at these properties, four new schemes are designed to achieve different security levels: two schemes direct at preserving integrity for the agent's data, the other two focus on attaining data privacy. There are four new security techniques designed to support these new schemes. The first one is joint keys to discourage two servers from colluding to forge a victim server's signature. The second one is recoverable key commitment to enable detection of any illegal operation of hosting servers on an agent's data. The third one is conditionally anonymous digital signature schemes, utilising anonymous public-key certificates, to allow any server to digitally sign a document without leaking its identity. The fourth one is servers' pseudonyms that are analogues of identities, to enable servers to be recognised as legitimate servers while their identities remain unknown to anyone. Pseudonyms can be deanonymised with the assistance of authorities. Apart from these new techniques, other mechanisms such as hash chaining relationship and mandatory verification process are adopted in the new schemes. To enable the inter-operability of these mechanisms, a security architecture is therefore developed to integrate compatible techniques to provide a generic solution for securing an agent's data. The architecture can be used independently of the particular mobile agent application under consideration. It can be used for guiding and supporting developers in the analysis of security issues during the design and implementation of services and applications based on mobile agents technology.
42

Making sense of traditional Chinese medicine: a cognitive semantic approach

Altman, Magda Elizabeth 30 June 2004 (has links)
Cognitive linguists posit that language as a system of meaning is closely related to cognition and to the associated perceptual and physiological structures of the body. From the cognitive semantic viewpoint, cognitive processes underpin and motivate linguistic phenomena such as categorisation, polysemy, metaphor, metonymy and image schemas. The pedagogical implication of the cognitive semantic perspective is that understanding these cognitive motivations facilitates language learning. This dissertation uses an applied cognitive semantic approach to `make sense' of a traditional knowledge system, Traditional Chinese Medicine (TCM). TCM views human physiology as a holistic and dynamic system that exemplifies the same principles as the cosmos-environment. TCM models result in a categorisation of physiological phenomena based on a complex system of experiential and cosmological correspondences. I suggest that the holistic epistemology of cognitive linguistics is well suited to an understanding of these holistic models. From a pedagogical viewpoint, I argue that an analysis of the cognitive motivations which underpin TCM categorisations and the polysemy of some key TCM terms can help the student make sense of TCM as a meaningful system of thought and practice. Both the theoretical and applied approaches explored in this dissertation should have relevance to other traditional knowledge systems, particularly traditional medical systems. / Linguistics and Modern Languages / M.A. (Linguistics)
43

Scalable cost-efficient placement and chaining of virtual network functions / Posicionamento e encadeamento escalável e baixo custo de funções virtualizados de rede

Luizelli, Marcelo Caggiani January 2017 (has links)
A Virtualização de Funções de Rede (NFV – Network Function Virtualization) é um novo conceito arquitetural que está remodelando a operação de funções de rede (e.g., firewall, gateways e proxies). O conceito principal de NFV consiste em desacoplar a lógica de funções de rede dos dispositivos de hardware especializados e, desta forma, permite a execução de imagens de software sobre hardware de prateleira (COTS – Commercial Off-The-Shelf). NFV tem o potencial para tornar a operação das funções de rede mais flexíveis e econômicas, primordiais em ambientes onde o número de funções implantadas pode chegar facilmente à ordem de centenas. Apesar da intensa atividade de pesquisa na área, o problema de posicionar e encadear funções de rede virtuais (VNF – Virtual Network Functions) de maneira escalável e com baixo custo ainda apresenta uma série de limitações. Mais especificamente, as estratégias existentes na literatura negligenciam o aspecto de encadeamento de VNFs (i.e., objetivam sobretudo o posicionamento), não escalam para o tamanho das infraestruturas NFV (i.e., milhares de nós com capacidade de computação) e, por último, baseiam a qualidade das soluções obtidas em custos operacionais não representativos. Nesta tese, aborda-se o posicionamento e o encadeamento de funções de rede virtualizadas (VNFPC – Virtual Network Function Placement and Chaining) como um problema de otimização no contexto intra- e inter-datacenter. Primeiro, formaliza-se o problema VNFPC e propõe-se um modelo de Programação Linear Inteira (ILP) para resolvêlo. O objetivo consiste em minimizar a alocação de recursos, ao mesmo tempo que atende aos requisitos e restrições de fluxo de rede. Segundo, aborda-se a escalabilidade do problema VNFPC para resolver grandes instâncias do problema (i.e., milhares de nós NFV). Propõe-se um um algoritmo heurístico baseado em fix-and-optimize que incorpora a meta-heurística Variable Neighborhood Search (VNS) para explorar eficientemente o espaço de solução do problema VNFPC. Terceiro, avalia-se as limitações de desempenho e os custos operacionais de estratégias típicas de aprovisionamento ambientes reais de NFV. Com base nos resultados empíricos coletados, propõe-se um modelo analítico que estima com alta precisão os custos operacionais para requisitos de VNFs arbitrários. Quarto, desenvolve-se um mecanismo para a implantação de encadeamentos de VNFs no contexto intra-datacenter. O algoritmo proposto (OCM – Operational Cost Minimization) baseia-se em uma extensão da redução bem conhecida do problema de emparelhamento ponderado (i.e., weighted perfect matching problem) para o problema de fluxo de custo mínimo (i.e., min-cost flow problem) e considera o desempenho das VNFs (e.g., requisitos de CPU), bem como os custos operacionais estimados. Os resultados alcaçados mostram que o modelo ILP proposto para o problema VNFPC reduz em até 25% nos atrasos fim-a-fim (em comparação com os encadeamentos observados nas infra-estruturas tradicionais) com um excesso de provisionamento de recursos aceitável – limitado a 4%. Além disso, os resultados evidenciam que a heurística proposta (baseada em fix-and-optimize) é capaz de encontrar soluções factíveis de alta qualidade de forma eficiente, mesmo em cenários com milhares de VNFs. Além disso, provê-se um melhor entendimento sobre as métricas de desempenho de rede (e.g., vazão, consumo de CPU e capacidade de processamento de pacotes) para as estratégias típicas de implantação de VNFs adotadas infraestruturas NFV. Por último, o algoritmo proposto no contexto intra-datacenter (i.e. OCM) reduz significativamente os custos operacionais quando comparado aos mecanismos de posicionamento típicos uti / Network Function Virtualization (NFV) is a novel concept that is reshaping the middlebox arena, shifting network functions (e.g. firewall, gateways, proxies) from specialized hardware appliances to software images running on commodity hardware. This concept has potential to make network function provision and operation more flexible and cost-effective, paramount in a world where deployed middleboxes may easily reach the order of hundreds. Despite recent research activity in the field, little has been done towards scalable and cost-efficient placement & chaining of virtual network functions (VNFs) – a key feature for the effective success of NFV. More specifically, existing strategies have neglected the chaining aspect of NFV (focusing on efficient placement only), failed to scale to hundreds of network functions and relied on unrealistic operational costs. In this thesis, we approach VNF placement and chaining as an optimization problem in the context of Inter- and Intra-datacenter. First, we formalize the Virtual Network Function Placement and Chaining (VNFPC) problem and propose an Integer Linear Programming (ILP) model to solve it. The goal is to minimize required resource allocation, while meeting network flow requirements and constraints. Then, we address scalability of VNFPC problem to solve large instances (i.e., thousands of NFV nodes) by proposing a fixand- optimize-based heuristic algorithm for tackling it. Our algorithm incorporates a Variable Neighborhood Search (VNS) meta-heuristic, for efficiently exploring the placement and chaining solution space. Further, we assess the performance limitations of typical NFV-based deployments and the incurred operational costs of commodity servers and propose an analytical model that accurately predict the operational costs for arbitrary service chain requirements. Then, we develop a general service chain intra-datacenter deployment mechanism (named OCM – Operational Cost Minimization) that considers both the actual performance of the service chains (e.g., CPU requirements) as well as the operational incurred cost. Our novel algorithm is based on an extension of the well-known reduction from weighted matching to min-cost flow problem. Finally, we tackle the problem of monitoring service chains in NFV-based environments. For that, we introduce the DNM (Distributed Network Monitoring) problem and propose an optimization model to solve it. DNM allows service chain segments to be independently monitored, which allows specialized network monitoring requirements to be met in a efficient and coordinated way. Results show that the proposed ILP model for the VNFPC problem leads to a reduction of up to 25% in end-to-end delays (in comparison to chainings observed in traditional infrastructures) and an acceptable resource over-provisioning limited to 4%. Also, we provide strong evidences that our fix-and-optimize based heuristic is able to find feasible, high-quality solutions efficiently, even in scenarios scaling to thousands of VNFs. Further, we provide indepth insights on network performance metrics (such as throughput, CPU utilization and packet processing) and its current limitations while considering typical deployment strategies. Our OCM algorithm reduces significantly operational costs when compared to the de-facto standard placement mechanisms used in Cloud systems. Last, our DNM model allows finer grained network monitoring with limited overheads. By coordinating the placement of monitoring sinks and the forwarding of network monitoring traffic, DNM can reduce the number of monitoring sinks and the network resource consumption (54% lower than a traditional method).
44

Zážitek z četby / Enjoyment from reading

Přibylová, Kateřina January 2018 (has links)
Master's thesis was focused on enjoyment from reading. The aim of the thesis was to analyze and after that characterize, where the enjoyment from reading lies, which factors do affect it, which elements of titel do participate on iit, by different respondents based on qualitative survey. The first part of thesis was dedicated to theoretical treatise about enjoyment generally and enjoyment from art. Focus was placed on conception of reader and author, on reading and literary work from literry theory point of view. In practical part was at first introduced and commented set of questions, that was used as a tool for enjoyment study together with methodology of qualitative research. After that my answers on questions focused on immediate enjoyment and perception were submitted also with expected answers from respondents, following the elaboration of respondents reader's profiles nad their enjoyment from reading. The analyzes of answers related to respondent's enjoyment were part of annexes. Keywords Enjoyment, enjoyment chaining, reader, reading, readership, reader's biography, self- reflection, analysis, author, literary work, literature
45

Analysis and Modelling of Activity-Travel Behaviour of Non-Workers from an Indian City

Manoj, M January 2015 (has links) (PDF)
Indian cities have been witnessing rapid transformation due to the synergistic effect of industrialisation, flourishing-economy, motorisation, population explosion, and migration. The alarming increase in travel demand as an after effect of the transformation, and the scarcity in transport infrastructures have exacerbated urban transport issues such as congestion, pollution, and inequity. Due to the escalating cost of transport infrastructure and the scarcity of resources such as space, there has been an increasing interest in promoting sustainable transportation policy measures for the optimum use of existing resources. Such policy measures mostly target the activitytravel behaviour of individuals to bring about desired changes in the transport sector. However, the responses of individuals to most of the measures are complex or unknown. The current ‘commute trip-based’ aggregate travel demand analysis strategy followed in most of the Indian cities is inadequate for providing basic inputs to understand the activity-travel behaviour of individuals under such policy interventions. Furthermore, the current analysis strategy also ignores the activitytravel behaviour of non-workers – who include homemakers, unemployed, and retired individuals – whose inclusion to transportation planning is relevant when the proposed policies are mostly ‘citizen-centric’. Analysis of activity-travel behaviour of non-workers provide important inputs to transportation planning as their activity-travel behaviour, and responses to transportation policies are different from that of workers. However, case studies exploring the activity-travel behaviour of non-workers from Indian cities are very limited. Appraising the practical importance of this subject, the current research undertakes a comprehensive analysis of the activity-travel behaviour of non-workers from a developing country’s context. To fulfil the goal, a series of empirical analysis are conducted on a primary activity-travel weekday survey data collected from Bangalore city. The analysis provides insightful findings and interpretations consistent with a developing country’s perspective. The day-planner format of time use diary, which was observed to have satisfactory performances in developed countries, is apparently have inferior performances in a developing country’s context. Further, the face-to-face method of survey administration is observed to have higher operating and economic efficiencies compared to the drop-off and pick-up method. The comprehensive analysis of activity-travel behaviour of non-workers indicate that comparing with their counterparts in the developed world (e.g. the U.S.), non-workers in Bangalore city are observed to have lower activity participation level (in terms of time allocation and number of stops), higher dependency on walking, lower trip chaining tendency, and a distinct time-of-day preference for departing to activity locations. On the other hand, the analysis shows similarities (mode use and trip chaining) and differences (time allocation and departure time choice) with the findings of the case studies from the developing world (e.g. China). Activity-travel behaviour of non-workers belonging to low-income households is characterised by lower activity participation level, higher dependency on sustainable transport modes, and lower trip chaining propensity, compared to other two income groups (middle and high-income groups). The research also suggests that built environment measures have their highest impacts on non-workers’ travel decisions related to shopping. Finally, the joint analysis of activity participation and travel behaviour of non-workers indicate that in-home maintenance activity duration drives the time allocation and travel behaviour of non-workers, and non-workers trade in-home discretionary activity duration with travel time. The joint analysis also shows that the time spent on children’s and elders’ activity is an important time allocation of its own. Keywords: Activity-travel behaviour, Non-worker, Time Use, Income Groups, India
46

Scalable cost-efficient placement and chaining of virtual network functions / Posicionamento e encadeamento escalável e baixo custo de funções virtualizados de rede

Luizelli, Marcelo Caggiani January 2017 (has links)
A Virtualização de Funções de Rede (NFV – Network Function Virtualization) é um novo conceito arquitetural que está remodelando a operação de funções de rede (e.g., firewall, gateways e proxies). O conceito principal de NFV consiste em desacoplar a lógica de funções de rede dos dispositivos de hardware especializados e, desta forma, permite a execução de imagens de software sobre hardware de prateleira (COTS – Commercial Off-The-Shelf). NFV tem o potencial para tornar a operação das funções de rede mais flexíveis e econômicas, primordiais em ambientes onde o número de funções implantadas pode chegar facilmente à ordem de centenas. Apesar da intensa atividade de pesquisa na área, o problema de posicionar e encadear funções de rede virtuais (VNF – Virtual Network Functions) de maneira escalável e com baixo custo ainda apresenta uma série de limitações. Mais especificamente, as estratégias existentes na literatura negligenciam o aspecto de encadeamento de VNFs (i.e., objetivam sobretudo o posicionamento), não escalam para o tamanho das infraestruturas NFV (i.e., milhares de nós com capacidade de computação) e, por último, baseiam a qualidade das soluções obtidas em custos operacionais não representativos. Nesta tese, aborda-se o posicionamento e o encadeamento de funções de rede virtualizadas (VNFPC – Virtual Network Function Placement and Chaining) como um problema de otimização no contexto intra- e inter-datacenter. Primeiro, formaliza-se o problema VNFPC e propõe-se um modelo de Programação Linear Inteira (ILP) para resolvêlo. O objetivo consiste em minimizar a alocação de recursos, ao mesmo tempo que atende aos requisitos e restrições de fluxo de rede. Segundo, aborda-se a escalabilidade do problema VNFPC para resolver grandes instâncias do problema (i.e., milhares de nós NFV). Propõe-se um um algoritmo heurístico baseado em fix-and-optimize que incorpora a meta-heurística Variable Neighborhood Search (VNS) para explorar eficientemente o espaço de solução do problema VNFPC. Terceiro, avalia-se as limitações de desempenho e os custos operacionais de estratégias típicas de aprovisionamento ambientes reais de NFV. Com base nos resultados empíricos coletados, propõe-se um modelo analítico que estima com alta precisão os custos operacionais para requisitos de VNFs arbitrários. Quarto, desenvolve-se um mecanismo para a implantação de encadeamentos de VNFs no contexto intra-datacenter. O algoritmo proposto (OCM – Operational Cost Minimization) baseia-se em uma extensão da redução bem conhecida do problema de emparelhamento ponderado (i.e., weighted perfect matching problem) para o problema de fluxo de custo mínimo (i.e., min-cost flow problem) e considera o desempenho das VNFs (e.g., requisitos de CPU), bem como os custos operacionais estimados. Os resultados alcaçados mostram que o modelo ILP proposto para o problema VNFPC reduz em até 25% nos atrasos fim-a-fim (em comparação com os encadeamentos observados nas infra-estruturas tradicionais) com um excesso de provisionamento de recursos aceitável – limitado a 4%. Além disso, os resultados evidenciam que a heurística proposta (baseada em fix-and-optimize) é capaz de encontrar soluções factíveis de alta qualidade de forma eficiente, mesmo em cenários com milhares de VNFs. Além disso, provê-se um melhor entendimento sobre as métricas de desempenho de rede (e.g., vazão, consumo de CPU e capacidade de processamento de pacotes) para as estratégias típicas de implantação de VNFs adotadas infraestruturas NFV. Por último, o algoritmo proposto no contexto intra-datacenter (i.e. OCM) reduz significativamente os custos operacionais quando comparado aos mecanismos de posicionamento típicos uti / Network Function Virtualization (NFV) is a novel concept that is reshaping the middlebox arena, shifting network functions (e.g. firewall, gateways, proxies) from specialized hardware appliances to software images running on commodity hardware. This concept has potential to make network function provision and operation more flexible and cost-effective, paramount in a world where deployed middleboxes may easily reach the order of hundreds. Despite recent research activity in the field, little has been done towards scalable and cost-efficient placement & chaining of virtual network functions (VNFs) – a key feature for the effective success of NFV. More specifically, existing strategies have neglected the chaining aspect of NFV (focusing on efficient placement only), failed to scale to hundreds of network functions and relied on unrealistic operational costs. In this thesis, we approach VNF placement and chaining as an optimization problem in the context of Inter- and Intra-datacenter. First, we formalize the Virtual Network Function Placement and Chaining (VNFPC) problem and propose an Integer Linear Programming (ILP) model to solve it. The goal is to minimize required resource allocation, while meeting network flow requirements and constraints. Then, we address scalability of VNFPC problem to solve large instances (i.e., thousands of NFV nodes) by proposing a fixand- optimize-based heuristic algorithm for tackling it. Our algorithm incorporates a Variable Neighborhood Search (VNS) meta-heuristic, for efficiently exploring the placement and chaining solution space. Further, we assess the performance limitations of typical NFV-based deployments and the incurred operational costs of commodity servers and propose an analytical model that accurately predict the operational costs for arbitrary service chain requirements. Then, we develop a general service chain intra-datacenter deployment mechanism (named OCM – Operational Cost Minimization) that considers both the actual performance of the service chains (e.g., CPU requirements) as well as the operational incurred cost. Our novel algorithm is based on an extension of the well-known reduction from weighted matching to min-cost flow problem. Finally, we tackle the problem of monitoring service chains in NFV-based environments. For that, we introduce the DNM (Distributed Network Monitoring) problem and propose an optimization model to solve it. DNM allows service chain segments to be independently monitored, which allows specialized network monitoring requirements to be met in a efficient and coordinated way. Results show that the proposed ILP model for the VNFPC problem leads to a reduction of up to 25% in end-to-end delays (in comparison to chainings observed in traditional infrastructures) and an acceptable resource over-provisioning limited to 4%. Also, we provide strong evidences that our fix-and-optimize based heuristic is able to find feasible, high-quality solutions efficiently, even in scenarios scaling to thousands of VNFs. Further, we provide indepth insights on network performance metrics (such as throughput, CPU utilization and packet processing) and its current limitations while considering typical deployment strategies. Our OCM algorithm reduces significantly operational costs when compared to the de-facto standard placement mechanisms used in Cloud systems. Last, our DNM model allows finer grained network monitoring with limited overheads. By coordinating the placement of monitoring sinks and the forwarding of network monitoring traffic, DNM can reduce the number of monitoring sinks and the network resource consumption (54% lower than a traditional method).
47

Interconnection Optimization for Dataflow Architectures

Moser, Nico, Gremzow, Carsten, Menge, Matthias 08 June 2007 (has links)
In this paper we present a dataflow processor architecture based on [1], which is driven by controlflow generated tokens. We will show the special properties of this architecture with regard to scalability, extensibility, and parallelism. In this context we outline the application scope and compare our approach with related work. Advantages and disadvantages will be discussed and we suggest solutions to solve the disadvantages. Finally an example of the implementation of this architecture will be given and we have a look at further developments. We believe the features of this basic approach predestines the architecture especially for embedded systems and system on chips.
48

Influence of Soil Water Repellency on Post-fire Revegetation Success and Management Techniques to Improve Establishment of Desired Species

Madsen, Matthew D. 17 December 2009 (has links) (PDF)
The influence of soil water repellency (WR) on vegetation recovery after a fire is poorly understood. This dissertation presents strategies to broaden opportunities for enhanced post-fire rangeland restoration and monitoring of burned piñon and juniper (P-J) woodlands by: 1) mapping the extent and severity of critical and subcritical WR, 2) determining the influence of WR on soil ecohydrologic properties and revegetation success, and 3) evaluating the suitability of a wetting agent composed of alkylpolyglycoside-ethylene oxide/propylene oxide block copolymers as a post-fire restoration tool for ameliorating the effects of soil WR and increasing seedling establishment. Results indicate that: • Post-fire patterns of soil WR were highly correlated to pre-fire P-J woodland canopy structure. Critical soil WR levels occurred under burned tree canopies while sub-critical WR extended out to approximately two times the canopy radius. At sites where critical soil WR was present, infiltration rate, soil moisture, and vegetation cover were significantly less than at non-hydrophobic sites. These parameters were also reduced in soils with subcritical WR relative to non-hydrophobic soils (albeit to a lesser extent). Aerial photography coupled with feature extraction software and geographic information systems (GIS) proved to be an effective tool for mapping P-J cover and density, and for scaling-up field surveys of soil WR to the fire boundary scale. • Soil WR impairs seed germination and seedling establishment by decreasing soil moisture availability by reducing infiltration, decreasing soil moisture storage capacity, and disconnecting soil surface layers from underlying moisture reserves. Consequently, soil WR appears to be acting as a temporal ecological threshold by impairing establishment of desired species within the first few years after a fire. • Wetting agents can significantly improve ecohydrologic properties required for plant growth by overcoming soil WR; thus, increasing the amount and duration of available water for seed germination and seedling establishment. Success of this technology appears to be the result of the wetting agent increasing soil moisture amount and availability by 1) improving soil infiltration and water holding capacity; and 2) allowing seedling roots to connect to underling soil moisture reserves.
49

Bridging Language & Data : Optimizing Text-to-SQL Generation in Large Language Models / Från ord till SQL : Optimering av text-till-SQL-generering i stora språkmodeller

Wretblad, Niklas, Gordh Riseby, Fredrik January 2024 (has links)
Text-to-SQL, which involves translating natural language into Structured Query Language (SQL), is crucial for enabling broad access to structured databases without expert knowledge. However, designing models for such tasks is challenging due to numerous factors, including the presence of ’noise,’ such as ambiguous questions and syntactical errors. This thesis provides an in-depth analysis of the distribution and types of noise in the widely used BIRD-Bench benchmark and the impact of noise on models. While BIRD-Bench was created to model dirty and noisy database values, it was not created to contain noise and errors in the questions and gold queries. We found after a manual evaluation that noise in questions and gold queries are highly prevalent in the financial domain of the dataset, and a further analysis of the other domains indicate the presence of noise in other parts as well. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark’s reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods. The thesis then introduces the concept of classifying noise in natural language questions, aiming to prevent the entry of noisy questions into text-to-SQL models and to annotate noise in existing datasets. Experiments using GPT-3.5 and GPT-4 on a manually annotated dataset demonstrated the viability of this approach, with classifiers achieving up to 0.81 recall and 80% accuracy. Additionally, the thesis explored the use of LLMs for automatically correcting faulty SQL queries. This showed a 100% success rate for specific query corrections, highlighting the potential for LLMs in improving dataset quality. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise.
50

Structural Analysis on Activity-travel Patterns, Travel Demand, Socio-demographics, and Urban Form: Evidence from Cleveland Metropolitan Area

Chen, Yu-Jen 24 August 2017 (has links)
No description available.

Page generated in 0.0531 seconds