• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 763
  • 170
  • 24
  • 21
  • 21
  • 21
  • 21
  • 21
  • 21
  • 6
  • 6
  • 4
  • 1
  • 1
  • Tagged with
  • 2872
  • 2872
  • 2521
  • 2129
  • 1312
  • 553
  • 527
  • 462
  • 443
  • 382
  • 373
  • 306
  • 262
  • 223
  • 208
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Real-time event detection using Twitter

McMinn, Andrew James January 2018 (has links)
Twitter has become the social network of news and journalism. Monitoring what is said on Twitter is a frequent task for anyone who requires timely access to information: journalists, traders, and the emergency services have all invested heavily in monitoring Twitter in recent years. Given this, there is a need to develop systems that can automatically monitor Twitter to detect real-world events as they happen, and alert users to novel events. However, this is not an easy task due to the noise and volume of data that is produced from social media streams such as Twitter. Although a range of approaches have been developed, many are unevaluated, cannot scale past low volume streams, or can only detect specific types of event. In this thesis, we develop novel approaches to event detection, and enable the evaluation and comparison of event detection approaches by creating a large-scale test collection called Events 2012, containing 120 million tweets and with relevance judgements for over 500 events. We use existing event detection approaches and Wikipedia to generate candidate events, then use crowdsourcing to gather annotations. We propose a novel entity-based, real-time, event detection approach that we evaluate using the Events 2012 collection, and show that it outperforms existing state-of-the-art approaches to event detection whilst also being scalable. We examine and compare automated and crowdsourced evaluation methodologies for the evaluation of event detection. Finally, we propose a Newsworthiness score that is learned in real-time from heuristically labelled data. The score is able to accurately classify individual tweets as newsworthy or noise in real-time. We adapt the score for use as a feature for event detection, and find that it can easily be used to filter out noisy clusters and improve existing event detection techniques. We conclude with a summary of our research findings and answers to our research questions. We discuss some of the difficulties that remain to be solved in event detection on Twitter and propose some possible future directions for research into real-time event detection on Twitter.
82

Analysing political events on Twitter : topic modelling and user community classification

Fang, Anjie January 2019 (has links)
Recently, political events, such as elections or referenda, have raised a lot of discussions on social media networks, in particular, Twitter. This brings new opportunities for social scientists to address social science tasks, such as understanding what communities said, identify- ing whether a community has an influence on another or analysing how these communities respond to political events online. However, identifying these communities and extracting what they said from social media data are challenging and non-trivial tasks. In this thesis, we aim to make progress towards understanding 'who' (i.e. communities) said 'what' (i.e. discussed topics) and 'when' (i.e. time) during political events on Twitter. While identifying the 'who' can benefit from Twitter user community classification approaches, 'what' they said and 'when' can be effectively addressed on Twitter by extracting their discussed topics using topic modelling approaches that also account for the importance of time on Twitter. To evaluate the quality of these topics, it is necessary to investigate how coherent these topics are to humans. Accordingly, we propose a series of approaches in this thesis. First, we investigate how to effectively evaluate the coherence of the topics generated using a topic modelling approach. The topic coherence metric evaluates the topical coherence by examining the semantic similarity among words in a topic. We argue that the semantic similarity of words in tweets can be effectively captured by using word embeddings trained using a Twitter background dataset. Through a user study, we demonstrate that our proposed word embedding-based topic coherence metric can assess the coherence of topics like humans. In addition, inspired by the precision at k information retrieval metric, we propose to evaluate the coherence of a topic model (containing many topics) by averaging the top-ranked topics within the topic model. Our proposed metrics can not only evaluate the coherence of topics and topic models, but also can help users to choose the most coherent topics. Second, we aim to extract topics with a high coherence from Twitter data. Such topics can be easily interpreted by humans and they can assist to examine 'what' has been discussed on Twitter and 'when'. Indeed, we argue that topics can be discussed in different time periods and therefore can be effectively identified and distinguished by considering their time periods. Hence, we propose an effective time-sensitive topic modelling approach by integrating the time dimension of tweets (i.e. 'when'). We show that the time dimension helps to generate topics with a high coherence. Hence, we argue that 'what' has been discussed and 'when' can be effectively addressed by our proposed time-sensitive topic modelling approach. Next, to identify 'who' participated in the topic discussions, we propose approaches to identify the community affiliations of Twitter users, including automatic ground-truth generation approaches and a user community classification approach. To generate ground-truth data for training a user community classifier, we show that the mentioned hashtags and entities in the users' tweets can indicate which community a Twitter user belongs to. Hence, we argue that they can be used to generate the ground-truth data for classifying users into communities. On the other hand, we argue that different communities favour different topic discussions and their community affiliations can be identified by leveraging the discussed topics. Accordingly, we propose a Topic-Based Naive Bayes (TBNB) classification approach to classify Twitter users based on their words and discussed topics. We demonstrate that our TBNB classifier together with the ground-truth generation approaches can effectively identify the community affiliations of Twitter users. Finally, to show the generalisation of our approaches, we apply our approaches to analyse 3.6 million tweets related to US Election 2016 on Twitter. We show that our TBNB approach can effectively identify the 'who', i.e. classify Twitter users into communities by using hashtags and the discussed topics. To investigate 'what' these communities have discussed, we apply our time-sensitive topic modelling approach to extract coherent topics. We finally analyse the community-related topics evaluated and selected using our proposed topic coherence metrics. Overall, we contribute to provide effective approaches to assist social scientists towards analysing political events on Twitter. These approaches include topic coherence metrics, a time-sensitive topic modelling approach and approaches for classifying the community affiliations of Twitter users. Together they make progress to study and understand the connections and dynamics among communities on Twitter.
83

A framework for technology-assisted sensitivity review : using sensitivity classification to prioritise documents for review

McDonald, Graham January 2019 (has links)
More than a hundred countries implement freedom of information laws. In the UK, the Freedom of Information Act 2000 (FOIA) states that the government's documents must be made freely available, or opened, to the public. Moreover, all central UK government departments' documents that have a historic value, for example the minutes from significant meetings, must be transferred to the The National Archives (TNA) within twenty years of the document's creation. However, government documents can contain sensitive information, such as personal information or information that would likely damage the international relations of the UK if it was opened to the public. Therefore, all government documents that are to be publicly archived must be sensitivity reviewed to identify and redact the sensitive information, or close the document until the information is no longer sensitive. Historically, government documents have been stored in a structured file-plan that can reliably inform a sensitivity reviewer about the subject-matter and the likely sensitivities in the documents. However, the lack of structure in digital document collections and the volume of digital documents that are to be sensitivity reviewed mean that the traditional manual sensitivity review process is not practical for digital sensitivity review. In this thesis, we argue that the automatic classification of documents that contain sensitive information, sensitivity classification, can be deployed to assist government departments and human reviewers to sensitivity review born-digital government documents. However, classifying sensitive information is a complex task, since sensitivity is context-dependent. For example, identifying if information is sensitive or not can require a human to judge on the likely effect of releasing the information into the public domain. Moreover, sensitivity is not necessarily topic-oriented, i.e., it is usually dependent on a combination of what is being said and about whom. Furthermore, the vocabulary and entities that are associated to particular types of sensitive information, e.g., confidential information, can vary greatly between different collections. We propose to address sensitivity classification as a text classification task. Moreover, through a thorough empirical evaluation, we show that text classification is effective for sensitivity classification and can be improved by identifying the vocabulary, syntactic and semantic document features that are reliable indicators of sensitive or non-sensitive text. Furthermore, we propose to reduce the number of documents that have to be reviewed to learn an effective sensitivity classifier through an active learning strategy in which a sensitivity reviewer redacts any sensitive text in a document as they review it, to construct a representation of the sensitivities in a collection. With this in mind, we propose a novel framework for technology-assisted sensitivity review that can prioritise the most appropriate documents to be reviewed at specific stages of the review process. Furthermore, our framework can provide the reviewers with useful information to assist them in making their reviewing decisions. Our framework consists of four components, namely the Document Representation, Document Prioritisation, Feedback Integration and Learned Predictions components, that can be instantiated to learn from the reviewers' feedback about the sensitivities in a collection or provide assistance to reviewers at different stages of the review. In particular, firstly, the Document Representation component encodes the document features that can be reliable indicators of the sensitivities in a collection. Secondly, the Document Prioritisation component identifies the documents that should be prioritised for review at a particular stage of the reviewing process, for example to provide the sensitivity classifier with information about the sensitivities in the collection or to focus the available reviewing resources on the documents that are the most likely to be released to the public. Thirdly, the Feedback Integration component integrates explicit feedback from a reviewer to construct a representation of the sensitivities in a collection and identify the features of a reviewer's interactions with the framework that indicate the amount of time that is required to sensitivity review a specific document. Finally, the Learned Predictions component combines the information that has been generated by the other three components and, as the final step in each iteration of the sensitivity review process, the Learned Predictions component is responsible for making accurate sensitivity classification and expected reviewing time predictions for the documents that have not yet been sensitivity reviewed. In this thesis, we identify two realistic digital sensitivity review scenarios as user models and conduct two user studies to evaluate the effectiveness of our proposed framework for assisting digital sensitivity review. Firstly, in the limited review user model, which addresses a scenario in which there are insufficient reviewing resources available to sensitivity review all of the documents in a collection, we show that our proposed framework can increase the number of documents that can be reviewed and released to the public with the available reviewing resources. Secondly, in the exhaustive review user model, which addresses a scenario in which all of the documents in a collection will be manually sensitivity reviewed, we show that providing the reviewers with useful information about the documents in the collection that contain sensitive information can increase the reviewers' accuracy, reviewing speed and agreement. This is the first thesis to investigate automatically classifying FOIA sensitive information to assist digital sensitivity review. The central contributions of this thesis are our proposed framework for technology-assisted sensitivity review and our sensitivity classification approaches. Our contributions are validated using a collection of government documents that are sensitivity reviewed by expert sensitivity reviewers to identify two FOIA sensitivities, namely international relations and personal information. The thesis draws insights from a thorough evaluation and analysis of our proposed framework and sensitivity classifier. Our results demonstrate that our proposed framework is a viable technology for assisting digital sensitivity review.
84

Video popularity metrics and bubble cache eviction algorithm analysis

Weisenborn, Hildebrand J. January 2018 (has links)
Video data is the largest type of traffic in the Internet, currently responsible for over 72% of the total traffic, with over 883PB of data per month in 2016. Large scale CDN solutions are available that offer a variety of distributed hosting platforms for the purpose of transmitting video over IP. However, the IP protocol, unlike ICN protocol implementations, does not provide an any-cast architecture from which a CDN would greatly benefit. In this thesis we introduce a novel cache eviction strategy called ``Bubble,'' as well as two variants of Bubble, that can be applied to any-cast protocols to aid in optimising video delivery. Bubble, Bubble-LRU and Bubble-Insert were found to greatly reduce the quantity of video associated traffic observed in cache enabled networks. Additionally, analysis on two British Telecom (BT) provided video popularity distributions leveraging Kullback-Leibler and Pearson Chi-Squared testing methods was performed. This was done to assess which model, Zipf or Zipf-Mandelbrot, is best suited to replicate video popularity distributions and the results of these tests conclude that Zipf-Mandelbrot is the most appropriate model to replicate video popularity distributions. The work concludes that the novel cache eviction algorithms introduced in this thesis provide an efficient caching mechanism for future content delivery networks and that the modelled Zipf-Mandelbrot distribution is a better method for simulating the performance of caching algorithms.
85

An ICMetric based framework for secure end-to-end communication

Tahir, Ruhma January 2018 (has links)
Conventional cryptographic algorithms rely on highly sophisticated and well established algorithms to ensure security, while the cryptographic keys are kept secret. However, adversaries can attack the keys of a cryptosystem without targeting the algorithm. This dissertation aims to cover this gap in the domain of cryptography, that is, the problem associated with cryptographic key compromise. The thesis accomplishes this by presenting a novel security framework based on the ICMetric technology. The proposed framework provides schemes for a secure end-to-end communication environment based on the ICMetric technology, which is a novel root of trust and can eliminate issues associated with stored keys. The ICMetric technology processes unique system features to establish an identity which is then used as a basis for cryptographic services. Hence the thesis presents a study on the concept of the ICMetric technology and features suitable for generating the ICMetric of a system. The first contribution of this thesis is the creation of ICMetric keys of sufficient length and entropy that can be used in cryptographic applications. The proposed strong ICMetric key generation scheme follows a two-tier structure, so that the ICMetric keys are resilient to pre-computed attacks. The second contribution of this thesis is a symmetric key scheme that can be used for symmetric key applications based on the ICMetric of the system. The symmetric keys are generated based on zero knowledge protocols and the cryptographic services are provided without transmitting the key over the channel. The fourth major contribution of this thesis is the investigation into the feasibility of employing the ICMetric technology for identifying Docker containers employed by cloud service providers for hosting their cloud services.
86

High speed 802.11ad wireless video streaming

Abe, Adewale January 2018 (has links)
The aim of this thesis is to investigate, both theoretically and experimentally, the capability of the IEEE 802.11ad device, the Wireless Gigabit Alliance known as WiGig operating in the 60 GHz band to handle rise in data traffic ubiquitous to high speed data transmission such as bulk data transfer, and wireless video streaming. According to Cisco and others, it is estimated that in 2020, internet video traffic will account for 82 % of all consumer internet traffic. This research evalu- ated the feasibility of the 60 GHz to provide minimum data rate of about 970 Mbps from the Ethernet link limited or clamped to 1 Gbps. This translated to 97 % effi- ciency with respect to the IEEE 802.11ad system performance. For the first time, the author proposed the enhancement of millimetre wave propagation through the use of specular reflection in non-line-of-sight environment, providing at least 94 % bandwidth utilization. Additional investigations result of the IEEE 802.11ad device in real live streaming of 4k ultra-high definition (UHD) video shows the feasibility of aggressive frequency reuse in the absence of co-channel interference. Moreover, using heuristic approach, this work compared materials absorption and signal reception at 60 GHz and the results gives better performance in contrast to the theoretical values. Finally, this thesis proposes a framework for the 802.11ad wireless H.264 video streaming over 60 GHz band. The work describes the potential and efficiency of WiGig device in streaming high definition (HD) video with high temporal index (TI) and 4k UHD video with no retransmission. Caching point established at the re-transmitter increase coverage and cache multimedia data. The results in this thesis shows the growing potential of millimeter wave technology, the WiGig for very high speed bulk data transfer, and live streaming video transmission.
87

Efficient magnetic resonance wireless power transfer systems

Thabet, Thabat January 2018 (has links)
This thesis aims to improve the performance of magnetic resonance wireless power transfer systems. Different factors have different effects on the performance and the efficiency of the maximum transfer of power in the system. These factors are: the resonance frequency; the quality factor of the resonators; the value and shape of the coils; the mutual inductance, including the distance between the coils; and the load. These systems have four potential types of connection in the transmitter and receiver. These types are Serial to Serial (SS), Serial to Parallel (SP), Parallel to Serial (PS) and Parallel to Parallel (PP). Each type has different applications because it has a different performance from the others. Magnetic resonance wireless power systems in some applications consist of one transmitter and one receiver, while in other applications there is a demand to transfer the power to more than one receiver simultaneously. Hence the importance of studying multiple receiver systems arises. The serial to serial type connection was studied along with the effects of all the other factors on the efficiency, including the existence of multiple receivers. The symmetric capacitance tuning method was presented as a solution to the frequency splitting problem that usually appears in SS wireless power transfer systems with a small gap between the two resonators. Compared to other existing methods, this method provides advantages of high efficiency and keeps the frequency within the chosen Industrial Scientific Medical (ISM) band. The impact of the connection type on the efficiency of wireless power transfer systems and the effect of the load impedance on each type was studied. Finally, an algorithm for intelligent management and control of received wireless power was offered to run a load that requires more than the received power.
88

A study of the kinematics of probabilities in information retrieval

Crestani, Fabio A. January 1998 (has links)
In Information Retrieval (IR), probabilistic modelling is related to the use of a model that ranks documents in decreasing order of their estimated probability of relevance to a user's information need expressed by a query. In an IR system based on a probabilistic model, the user is guided to examine first the documents that are the most likely to be relevant to his need. If the system performed well, these documents should be at the top of the retrieved list. In mathematical terms the problem consists of estimating the probability P(R | q,d), that is the probability of relevance given a query q and a document d. This estimate should be performed for every document in the collection, and documents should then be ranked according to this measure. For this evaluation the system should make use of all the information available in the indexing term space. This thesis contains a study of the kinematics of probabilities in probabilistic IR. The aim is to get a better insight of the behaviour of the probabilistic models of IR currently in use and to propose new and more effective models by exploiting different kinematics of probabilities. The study is performed both from a theoretical and an experimental point of view. Theoretically, the thesis explores the use of the probability of a conditional, namely P(d → q), to estimate the conditional probability P(R | q,d). This is achieved by interpreting the term space in the context of the "possible worlds semantics". Previous approaches in this direction had as their basic assumption the consideration that "a document is a possible world". In this thesis a different approach is adopted, based on the assumption that "a term is a possible world". This approach enables the exploitation of term-term semantic relationships in the term space, estimated using an information theoretic measure. This form of information is rarely used in IR at retrieval time. Two new models of IR are proposed, based on two different way of estimating P(d → q) using a logical technique called Imaging. The first model is called Retrieval by Logical Imaging; the second is called Retrieval by General Logical Imaging, being a generalisation of the first model. The probability kinematics of these two models is compared with that of two other proposed models: the Retrieval by Joint Probability model and the Retrieval by Conditional Probability model. These last two models mimic the probability kinematics of the Vector Space model and of the Probabilistic Retrieval model. Experimentally, the retrieval effectiveness of the above four models is analysed and compared using five test collections of different sizes and characteristics. The results of this experimentation depend heavily on the choice of term weight and term similarity measures adopted. The most important conclusion of this thesis is that theoretically a probability transfer that takes into account the semantic similarity between the probability-donor and the probability-recipient is more effective than a probability transfer that does not take that into account. In the context of IR this is equivalent to saying that models that exploit the semantic similarity between terms in the term space at retrieval time are more effective that models that do not do that. Unfortunately, while the experimental investigation carried out using small test collections provide evidence supporting this conclusion, experiments performed using larger test collections do not provide as much supporting evidence (although they do not provide contrasting evidence either). The peculiar characteristics of the term space of different collections play an important role in shaping the effects that different probability kinematics have on the effectiveness of the retrieval process. The above result suggests the necessity and the usefulness of further investigations into more complex and optimised models of probabilistic IR, where probability kinematics follows non-classical approaches. The models proposed in this thesis are just two such approaches; other ones can be developed using recent results achieved in other fields, such as non-classical logics and belief revision theory.
89

Algorithmic skeletons for exact combinatorial search at scale

Archibald, Blair January 2018 (has links)
Exact combinatorial search is essential to a wide range of application areas including constraint optimisation, graph matching, and computer algebra. Solutions to combinatorial problems are found by systematically exploring a search space, either to enumerate solutions, determine if a specific solution exists, or to find an optimal solution. Combinatorial searches are computationally hard both in theory and practice, and efficiently exploring the huge number of combinations is a real challenge, often addressed using approximate search algorithms. Alternatively, exact search can be parallelised to reduce execution time. However, parallel search is challenging due to both highly irregular search trees and sensitivity to search order, leading to anomalies that can cause unexpected speedups and slowdowns. As core counts continue to grow, parallel search becomes increasingly useful for improving the performance of existing searches, and allowing larger instances to be solved. A high-level approach to parallel search allows non-expert users to benefit from increasing core counts. Algorithmic Skeletons provide reusable implementations of common parallelism patterns that are parameterised with user code which determines the specific computation, e.g. a particular search. We define a set of skeletons for exact search, requiring the user to provide in the minimal case a single class that specifies how the search tree is generated and a parameter that specifies the type of search required. The five are: Sequential search; three general-purpose parallel search methods: Depth-Bounded, Stack-Stealing, and Budget; and a specific parallel search method, Ordered, that guarantees replicable performance. We implement and evaluate the skeletons in a new C++ parallel search framework, YewPar. YewPar provides both high-level skeletons and low-level search specific schedulers and utilities to deal with the irregularity of search and knowledge exchange between workers. YewPar is based on the HPX library for distributed task-parallelism potentially allowing search to execute on multi-cores, clusters, cloud, and high performance computing systems. Underpinning the skeleton design is a novel formal model, MT^3 , a parallel operational semantics that describes multi-threaded tree traversals, allowing reasoning about parallel search, e.g. describing common parallel search phenomena such as performance anomalies. YewPar is evaluated using seven different search applications (and over 25 specific instances): Maximum Clique, k-Clique, Subgraph Isomorphism, Travelling Salesperson, Binary Knapsack, Enumerating Numerical Semigroups, and the Unbalanced Tree Search Benchmark. The search instances are evaluated at multiple scales from 1 to 255 workers, on a 17 host, 272 core Beowulf cluster. The overheads of the skeletons are low, with a mean 6.1% slowdown compared to hand-coded sequential implementation. Crucially, for all search applications YewPar reduces search times by an order of magnitude, i.e hours/minutes to minutes/seconds, and we commonly see greater than 60% (average) parallel efficiency speedups for up to 255 workers. Comparing skeleton performance reveals that no one skeleton is best for all searches, highlighting a benefit of a skeleton approach that allows multiple parallelisations to be explored with minimal refactoring. The Ordered skeleton avoids slowdown anomalies where, due to search knowledge being order dependent, a parallel search takes longer than a sequential search. Analysis of Ordered shows that, while being 41% slower on average (73% worse-case) than Depth-Bounded, in nearly all cases it maintains the following replicable performance properties: 1) parallel executions are no slower than one worker sequential executions 2) runtimes do not increase as workers are added, and 3) variance between repeated runs is low. In particular, where Ordered maintains a relative standard deviation (RSD) of less than 15%, Depth-Bounded suffers from an RSD greater than 50%, showing the importance of carefully controlling search orders for repeatability.
90

From components to compositions : (de-)construction of computer-controlled behaviour with the robot operating system

Lyyra, Antti Kalervo January 2018 (has links)
Robots and autonomous systems play an increasingly important role in modern societies. This role is expected to increase as the computational methods and capabilities advance. Robots and autonomous systems produce goal-directed and context-dependent behaviour with an aim to loosen the coupling between the machines and their operators. These systems are a domain of complex digital innovation that intertwines the physical and digital worlds with computer-controlled behaviour as robots and autonomous systems render their behaviour from the interaction with the surrounding environment. Complex product and system innovation literature maintains that designers are expected to have detailed knowledge of different components and their interactions. To the contrary, digital innovation literature holds that end-product agnostic components can be generatively combined from heterogeneous sources utilising standardised interfaces. An in-depth case study into the Robot Operating System (ROS) was conducted to explore the conceptual tension between the specificity of designs and distributedness of knowledge and control in the context of complex digital innovation. The thematic analysis of documentary evidence, field notes and interviews produced three contributions. First, the case description presents how ROS has evolved over the past ten years to a global open-source community that is widely used in the development of robots and autonomous systems. Second, a model that conceptualises robots and autonomous as contextually bound and embodied chains of transformation is proposed to describe the structural and functional dynamics of complex digital innovation. Third, the generative-integrative mode of development is proposed to characterise the process of innovation that begins from a generative combination of components and subsequently proceeds to the integration phase during which the system behaviour is experimented, observed and adjusted. As the initial combination builds upon underspecification and constructive ambiguity, the generative combination is gradually crafted into a more dependable composition through the iterative removal of semantic incongruences.

Page generated in 0.0647 seconds