101 |
PAC'nPost : a peer-to-peer micro-blogging social networkAsthana, H. January 2014 (has links)
In this thesis we provide a framework for a micro-blogging social network in an unstructured peer-to-peer network. At a user level, a micro-blogging service provides (i) a means for publishing a micro-blog, (ii) a means to follow a micro-blogger, and (iii) a means to search for micro-blogs containing keywords. Since unstructured peer-to-peer networks do not bind either the data or the location of data to the nodes in the network, search in an unstructured network is necessarily probabilistic. Using the probably approximately correct (PAC) search architecture, the search of an unstructured network is based on a probabilistic search that queries a fixed number of nodes. We provide a mechanism by which information whose creation rate is high, such as micro-blogs, can be disseminated in the network in a rapid-yet-restrained manner, in order to be retrieved with high probability. We subject our framework to spikes in the data creation rate, as is common in micro-blogging social networks, and to various types of churn. Since both dissemination and retrieval incur bandwidth costs, we investigate the optimal replication of data, in the sense of minimizing the overall system bandwidth. We explore whether replicating the micro-blog posts of users with a larger number of followers more than the posts of other users can reduce the overall system bandwidth. Finally, we investigate trending keywords in our framework. Trending keywords are important in a micro-blogging social network as they provide users with breaking news they might not get from the users they follow. Whereas identifying trending keywords in a centrally managed system is relatively straightforward, in a distributed decentralized system, the nodes do not have access to the global statistics such as the frequency of the keywords and the information creation rate. We describe a two-step algorithm which is capable of detecting multiple trending keywords with moderate increase in bandwidth.
|
102 |
Financial methods for online advertisingChen, B. January 2015 (has links)
Online advertising, a form of advertising that reaches consumers through the World Wide Web, has become a multi-billion dollar industry. Using the state of the art computing technologies, online auctions have become an important sales mechanism for automating transactions in online advertising markets, where advertisement (shortly ad) inventories, such as impressions or clicks, are able to be auctioned off in milliseconds after they are generated by online users. However, with providing non-guaranteed deliveries, the current auction mechanisms have a number of limitations including: the uncertainty in the winning payment prices for buyers; the volatility in the seller’s revenue; and the weak loyalty between buyer and seller. To address these issues, this thesis explores the methods and techniques from finance to evaluate and allocate ad inventories over time and to design new sales models. Finance, as a sub-field of microeconomics, studies how individuals and organisations make decisions regarding the allocation of resources over time as well as the handling of risk. Therefore, we believe that financial methods can be used to provide novel solutions to the non-guaranteed delivery problem in online advertising. This thesis has three major contributions. We first study an optimal dynamic model for unifying programmatic guarantee and real-time bidding in display advertising. This study solves the problem of algorithmic pricing and allocation of guaranteed contracts. We then propose a multi-keyword multi-click ad option. This work discusses a flexible way of guaranteed deliveries in the sponsored search context, and it’s evaluation is under the no arbitrage principle and is based on the assumption that the underlying winning payment prices of candidate keywords for specific positions follow a geometric Brownian motion. However, according to our data analysis and other previous research, the same underlying assumption is not valid empirically for display ads. We therefore study a lattice framework to price an ad option based on a stochastic volatility underlying model. This research extends the usage of ad options to display advertising in a more general situation.
|
103 |
Systematic trading : calibration advances through machine learningAlvarez Teleña, S. January 2015 (has links)
Systematic trading in finance uses computer models to define trade goals, risk controls and rules that can execute trade orders in a methodical way. This thesis investigates how performance in systematic trading can be crucially enhanced by both i) persistently reducing the bid-offer spread quoted by the trader through optimized and realistically backtested strategies and ii) improving the out-of-sample robustness of the strategy selected through the injection of theory into the typically data-driven calibration processes. While doing so it brings to the foreground sound scientific reasons that, for the first time to my knowledge, technically underpin popular academic observations about the recent nature of the financial markets. The thesis conducts consecutive experiments across strategies within the three important building blocks of systematic trading: a) execution, b) quoting and c) risk-reward allowing me to progressively generate more complex and accurate backtested scenarios as recently demanded in the literature (Cahan et al. (2010)). The three experiments conducted are: 1. Execution: an execution model based on support vector machines. The first experiment is deployed to improve the realism of the other two. It analyses a popular model of execution: the volume weighted average price (VWAP). The VWAP algorithm targets to split the size of an order along the trading session according to the expected intraday volume's profile since the activity in the markets typically resembles convex seasonality – with more activity around the open and the closing auctions than along the rest of the day. In doing so, the main challenge is to provide the model with a reasonable expected profile. After proving in my data sample that two simple static approaches to the profile overcome the PCA-ARMA from Bialkowski et al. (2008) (a popular two-fold model composed by a dynamic component around an unsupervised learning structure) a further combination of both through an index based on supervised learning is proposed. The Sample Sensitivity Index hence successfully allows estimating the expected volume's profile more accurately by selecting those ranges of time where the model shall be less sensitive to past data through the identification of patterns via support vector machines. Only once the intraday execution risk has been defined can the quoting policy of a mid-frequency (in general, up to a week) hedging strategy be accurately analysed. 2. Quoting: a quoting model built upon particle swarm optimization. The second experiment analyses for the first time to my knowledge how to achieve the disruptive 50% bid-offer spread discount observed in Menkveld (2013) without increasing the risk profile of a trading agent. The experiment depends crucially on a series of variables of which market impact and slippage are typically the most difficult to estimate. By adapting the market impact model in Almgren et al. (2005) to the VWAP developed in the previous experiment and by estimating its slippage through its errors' distribution a framework within which the bid-offer spread can be assessed is generated. First, a full-replication spread, (that set out following the strict definition of a product in order to hedge it completely) is calculated and fixed as a benchmark. Then, by allowing benefiting from a lower market impact at the cost of assuming deviation risk (tracking error and tail risk) a non-full-replication spread is calibrated through particle swarm optimization (PSO) as in Diez et al. (2012) and compared with the benchmark. Finally, it is shown that the latter can reach a discount of a 50% with respect to the benchmark if a certain number of trades is granted. This typically occurs on the most liquid securities. This result not only underpins Menkveld's observations but also points out that there is room for further reductions. When seeking additional performance, once the quoting policy has been defined, a further layer with a calibrated risk-reward policy shall be deployed. 3. Risk-Reward: a calibration model defined within a Q-learning framework. The third experiment analyses how the calibration process of a risk-reward policy can be enhanced to achieve a more robust out-of-sample performance – a cornerstone in quantitative trading. It successfully gives a response to the literature that recently focusses on the detrimental role of overfitting (Bailey et al. (2013a)). The experiment was motivated by the assumption that the techniques underpinned by financial theory shall show a better behaviour (a lower deviation between in-sample and out-of-sample performance) than the classical data-driven only processes. As such, both approaches are compared within a framework of active trading upon a novel indicator. The indicator, called the Expectations' Shift, is rooted on the expectations of the markets' evolution embedded in the dynamics of the prices. The crucial challenge of the experiment is the injection of theory within the calibration process. This is achieved through the usage of reinforcement learning (RL). RL is an area of ML inspired by behaviourist psychology concerned with how software agents take decisions in an specific environment incentivised by a policy of rewards. By analysing the Q-learning matrix that collects the set of state/actions learnt by the agent within the environment, defined by each combination of parameters considered within the calibration universe, the rationale that an autonomous agent would have learnt in terms of risk management can be generated. Finally, by then selecting the combination of parameters whose attached rationale is closest to that of the portfolio manager a data-driven solution that converges to the theory-driven solution can be found and this is shown to successfully outperform out-of-sample the classical approaches followed in Finance. The thesis contributes to science by addressing what techniques could underpin recent academic findings about the nature of the trading industry for which a scientific explanation was not yet given: • A novel agent-based approach that allows for a robust out-of-sampkle performance by crucially providing the trader with a way to inject financial insights into the generally data-driven only calibration processes. It this way benefits from surpassing the generic model limitations present in the literature (Bailey et al. (2013b), Schorfheid and Wolpin (2012), Van Belle and Kerr (2012) or Weiss and Kulikowski (1991)) by finding a point where theory-driven patterns (the trader's priors tend to enhance out-of-sample robustness) merge with data-driven ones (those that allow to exploit latent information). • The provision of a technique that, to the best of my knowledge, explains for the first time how to reduce the bid-offer spread quoted by a traditional trader without modifying her risk appetite. A reduction not previously addressed in the literature in spite of the fact that the increasing regulation against the assumption of risk by market makers (e.g. Dodd–Frank Wall Street Reform and Consumer Protection Act) does yet coincide with the aggressive discounts observed by Menkveld (2013). As a result, this thesis could further contribute to science by serving as a framework to conduct future analyses in the context of systematic trading. • The completion of a mid-frequency trading experiment with high frequency execution information. It is shown how the latter can have a significant effect on the former not only through the erosion of its performance but, more subtly, by changing its entire strategic design (both, optimal composition and parameterization). This tends to be highly disregarded by the financial literature. More importantly, the methodologies disclosed herein have been crucial to underpin the setup of a new unit in the industry, BBVA's Global Strategies & Data Science. This disruptive, global and cross-asset team gives an enhanced role to science by successfully becoming the main responsible for the risk management of the Bank's strategies both in electronic trading and electronic commerce. Other contributions include: the provision of a novel risk measure (flowVaR); the proposal of a novel trading indicator (Expectations’ Shift); and the definition of a novel index that allows to improve the estimation of the intraday volume’s profile (Sample Sensitivity Index).
|
104 |
Investigating the challenges of data, pricing and modelling to enable agent based simulation of the Credit Default Swap marketZangeneh, L. January 2014 (has links)
The Global Financial Crisis of 2007-2008 is considered by three top economists the worst financial crisis since the Great Depression of the 1930s [Pendery, 2009]. The crisis played a major role in the failure of key businesses, declines in consumer wealth, and significant downturn in economic activities leading to the 2008-2012 global recession and contributing to the European sovereign-debt crisis [Baily and Elliott, 2009] [Williams, 2012]. More importantly, the serious limitation of existing conventional tools and models as well as a vital need for developing complementary tools to improve the robustness of existing overall framework immediately became apparent. This thesis details three proposed solutions drawn from three main subject areas: Statistic, Genetic Programming (GP), and Agent-Based Modeling (ABM) to help enable agent-based simulation of Credit Default Swap (CDS) market. This is accomplished by tackling three challenges of lack of sufficient data to support research, lack of efficient CDS pricing technique to be integrated into agent based model, and lack of practical CDS market experimental model, that are faced by designers of CDS investigation tools. In particular, a general data generative model is presented for simulating financial data, a novel price calculator is proposed for pricing CDS contracts, and a unique CDS agent-based model is designed to enable the investigation of market. The solutions presented can be seen as modular building blocks that can be applied to a variety of applications. Ultimately, a unified general framework is presented for integrating these three solutions. The motivation for the methods is to suggest viable tools that address these challenges and thus enable the future realistic simulation of the CDS market using the limited real data in hand. A series of experiments were carried out, and a comparative evaluation and discussion is provided. In particular, we presented the advantages of realistic artificial data to enable open ended simulation and to design various scenarios, the effectiveness of Cartesian Genetic Programming (CGP) as a bio-inspired evolutionary method for a complex real-world financial problem, and capability of Agent Based (AB) models for investigating CDS market. These experiments demonstrate the efficiency and viability of the proposed approaches and highlight interesting directions of future research.
|
105 |
Employing variation in the object of learning for the design-based development of serious games that support learning of conditional knowledgeRuskov, M. P. January 2014 (has links)
Learning how to cope with tasks that do not have optimal solutions is a life-long challenge. In particular when such education and training needs to be scalable, technologies are needed to support teachers and facilitators in providing the feedback and discussion necessary for quality learning. In this thesis, I conduct design-based research by following a typical game development cycle to develop a serious game. I propose a framework that derives learning and motivational principles to include them into the design of serious games. My exploration starts with project management as a learning domain, and for practical reasons, shifts towards information security. The first (concept) phase of the development includes an in-depth study: a simulation game of negotiation (Study 1: class study, n=60). In the second (design) phase I used rapid prototyping to develop a gamified web toolkit, embodying the CCO framework from crime prevention, making five small-scale formative evaluations (Study 2, n=17) and a final lab evaluation (Study 3, n=28). In the final (production) stage the toolkit was used in two class studies (Study 4, n=34 and Study 5, n=20), exploring its adoption in a real-world environment. This thesis makes three main contributions. One contribution is the adaptation of the iterative method of the phenomenographic learning study to the study of the efficiency of serious games. This employs open questionsing, analysed with 3 different means of analysis to demonstrate 4 distinct types of evidence of deep learning. Another contribution is the provided partial evidence for the positive effects from the introduction of variation on engagement and learning. The third contribution is the development of four design- based research principles: i) the importance of being agile; ii) feedback from interpretation of the theory; iii) particular needs for facilitation; and iv) reusing user-generated content.
|
106 |
Improving the quality and security of probabilistic search in peer-to-peer information retrieval systemsRichardson, S. A. January 2014 (has links)
Commercial web search engines typically use a centralised architecture, where machines are kept at centralised server facilities and are under the control of a central authority. This requires high capital and operating costs, potentially deterring new entrants from offering competing services. A promising alternative is to host the search engine on a peer-to-peer (P2P) network. Volunteers can add their machines as peers, essentially providing computing resources for free. However, search quality and security against malicious behaviour is lower than for a centralised system. This thesis addresses these issues for an unstructured P2P architecture. To improve search quality we develop techniques to accurately estimate the collection global statistics that are required by modern retrieval models, but which may be unavailable to peers. To aid with improving search quality further, we introduce the measure of rank-accuracy to better model human perception of queries, and propose rank-aware document replication policies to increase overall rank-accuracy. These policies assume the query distribution can be inferred from prior queries. For cases where this is not feasible, we propose a rank-aware dynamic replication technique that distributes documents as queries are performed. To improve security, we first develop theoretical models to show how an adversary can use malicious nodes to (i) censor a document, (ii) increase the rank of a document, or (iii) disrupt overall search results. We then develop defences that can detect and evict adversarial nodes. We also study how an adversary may perform these attacks by manipulating the estimates of global statistics at each node, and we develop a defence that is effective when up to 40% of nodes behave maliciously.
|
107 |
Videos in context for telecommunication and spatial browsingPece, F. January 2015 (has links)
The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance.
|
108 |
Mining text and time series data with applications in financeStaines, J. January 2015 (has links)
Finance is a field extremely rich in data, and has great need of methods for summarizing and understanding these data. Existing methods of multivariate analysis allow the discovery of structure in time series data but can be difficult to interpret. Often there exists a wealth of text data directly related to the time series. In this thesis it is shown that this text can be exploited to aid interpretation of, and even to improve, the structure uncovered. To this end, two approaches are described and tested. Both serve to uncover structure in the relationship between text and time series data, but do so in very different ways. The first model comes from the field of topic modelling. A novel topic model is developed, closely related to an existing topic model for mixed data. Improved held-out likelihood is demonstrated for this model on a corpus of UK equity market data and the discovered structure is qualitatively examined. To the authors’ knowledge this is the first attempt to combine text and time series data in a single generative topic model. The second method is a simpler, discriminative method based on a low-rank decomposition of time series data with constraints determined by word frequencies in the text data. This is compared to topic modelling using both the equity data and a second corpus comprising foreign exchange rates time series and text describing global macroeconomic sentiments, showing further improvements in held-out likelihood. One example of an application for the inferred structure is also demonstrated: construction of carry trade portfolios. The superior results using this second method serve as a reminder that methodological complexity does not guarantee performance gains.
|
109 |
Supply side optimisation in online display advertisingYuan, S. January 2015 (has links)
On the Internet there are publishers (the supply side) who provide free contents (e.g., news) and services (e.g., email) to attract users. Publishers get paid by selling ad displaying opportunities (i.e., impressions) to advertisers. Advertisers then sell products to users who are converted by ads. Better supply side revenue allows more free content and services to be created, thus, benefiting the entire online advertising ecosystem. This thesis addresses several optimisation problems for the supply side. When a publisher creates an ad-supported website, he needs to decide the percentage of ads first. The thesis reports a large-scale empirical study of Internet ad density over past seven years, then presents a model that includes many factors, especially the competition among similar publishers, and gives an optimal dynamic ad density that generates the maximum revenue over time. This study also unveils the tragedy of the commons in online advertising where users' attention has been overgrazed which results in a global sub-optimum. After deciding the ad density, the publisher retrieves ads from various sources, including contracts, ad networks, and ad exchanges. This forms an exploration-exploitation problem when ad sources are typically unknown before trail. This problem is modelled using Partially Observable Markov Decision Process (POMDP), and the exploration efficiency is increased by utilising the correlation of ads. The proposed method reports 23.4% better than the best performing baseline in the real-world data based experiments. Since some ad networks allow (or expect) an input of keywords, the thesis also presents an adaptive keyword extraction system using BM25F algorithm and the multi-armed bandits model. This system has been tested by a domain service provider in crowdsourcing based experiments. If the publisher selects a Real-Time Bidding (RTB) ad source, he can use reserve price to manipulate auctions for better payoff. This thesis proposes a simplified game model that considers the competition between seller and buyer to be one-shot instead of repeated and gives heuristics that can be easily implemented. The model has been evaluated in a production environment and reported 12.3% average increase of revenue. The documentation of a prototype system for reserve price optimisation is also presented in the appendix of the thesis.
|
110 |
Management and visualisation of non-linear history of polygonal 3D modelsDobos, J. January 2015 (has links)
The research presented in this thesis concerns the problems of maintenance and revision control of large-scale three dimensional (3D) models over the Internet. As the models grow in size and the authoring tools grow in complexity, standard approaches to collaborative asset development become impractical. The prevalent paradigm of sharing files on a file system poses serious risks with regards, but not limited to, ensuring consistency and concurrency of multi-user 3D editing. Although modifications might be tracked manually using naming conventions or automatically in a version control system (VCS), understanding the provenance of a large 3D dataset is hard due to revision metadata not being associated with the underlying scene structures. Some tools and protocols enable seamless synchronisation of file and directory changes in remote locations. However, the existing web-based technologies are not yet fully exploiting the modern design patters for access to and management of alternative shared resources online. Therefore, four distinct but highly interconnected conceptual tools are explored. The first is the organisation of 3D assets within recent document-oriented No Structured Query Language (NoSQL) databases. These "schemaless" databases, unlike their relational counterparts, do not represent data in rigid table structures. Instead, they rely on polymorphic documents composed of key-value pairs that are much better suited to the diverse nature of 3D assets. Hence, a domain-specific non-linear revision control system 3D Repo is built around a NoSQL database to enable asynchronous editing similar to traditional VCSs. The second concept is that of visual 3D differencing and merging. The accompanying 3D Diff tool supports interactive conflict resolution at the level of scene graph nodes that are de facto the delta changes stored in the repository. The third is the utilisation of HyperText Transfer Protocol (HTTP) for the purposes of 3D data management. The XML3DRepo daemon application exposes the contents of the repository and the version control logic in a Representational State Transfer (REST) style of architecture. At the same time, it manifests the effects of various 3D encoding strategies on the file sizes and download times in modern web browsers. The fourth and final concept is the reverse-engineering of an editing history. Even if the models are being version controlled, the extracted provenance is limited to additions, deletions and modifications. The 3D Timeline tool, therefore, implies a plausible history of common modelling operations such as duplications, transformations, etc. Given a collection of 3D models, it estimates a part-based correspondence and visualises it in a temporal flow. The prototype tools developed as part of the research were evaluated in pilot user studies that suggest they are usable by the end users and well suited to their respective tasks. Together, the results constitute a novel framework that demonstrates the feasibility of a domain-specific 3D version control.
|
Page generated in 0.0322 seconds