Global ETD Search

641	High Dimensional Data Methods in Industrial Organization Type Discrete Choice Models Lopez Gomez, Daniel Felipe 11 August 2022 (has links) No description available. Economics
642	Shaping the Future City - Ethical Considerations of Implementation of Technology in Urban Design Makhotina Gudnason, Daria January 2022 (has links) Reliable data is the strongest driving force of the modern Smart City. To get access to relevant information that will allow building a long-term management policy, the city is filled with sensors and video cameras connected to a high-speed data transmission infrastructure. Today, even the inhabitants themselves are turning into sources of information: users' mobile devices - there are 7.7 billion registered mobile subscribers in the world - produce terabyte streams of data about the location, purchases and interests of individuals in real time. The city management system gets a chance to become truly human-centered. The fact that smart cities depend on tracking and analyzing vast amounts of previously untapped data on the movements and activities of urban populations means that, as attractive as smart city visions may be, they remain largely utopian and can quickly become dystopian in practice. The benefits of smart cities have been praised, loud and clear. But the truth is that the introduction of fully integrated data-driven technologies into urban infrastructure and governance poses a serious threat to human rights. Therefore, smart city developers regularly face a number of ethical issues. Based on the analysis of H22 project of city of Helsingborg and digitalization policy of Copenhagen, this research aims at revealing the issues, and providing possible solutions for a number of them within a theoretical framework, that considers surveillance capitalism, implications of big data, and citizen involvement. smart city big data ethical issues citizen participation Social Sciences Samhällsvetenskap
643	API Design and Middleware Optimization for Big Data and Machine Learning Applications Guo, Jia January 2021 (has links) No description available. Computer Science Computer Engineering
644	Towards a Data Analytics Culture : An Exploratory Study on the Role of Organizational Culture for Data Analytics Practices Roschlau, Elisabeth, Märkle, Lisa January 2022 (has links) Background: Over the years, Data Analytics (DA) has gained much attention enabling the extraction of valuable insights from the massive amount of data that is being produced every day. To exploit DA practices, various requirements for its successful usage are needed. Organizational Culture (OC) is provenly one critical intangible resource that is required for DA practices. However, there is a lack of existing research about what factors and values of OC facilitate DA practices. Purpose: The purpose of this study is to explore what role OC plays for DA practices and how OC can support the effective use of DA in a company. This research is guided by the research question: What are facilitating factors and underlying values of OC for DA Practices? By exploring and linking the two concepts of DA and OC, the study aims to provide a greater understanding of OC for DA practices. This offers insights for DA practitioners and managers to handle their specific OC and guide DA more targeted. Method: Following an inductive, qualitative study with an exploratory research design, the authors conducted 12 semi-structured interviews. The interviewees were selected through purposive sampling and represent two different perspectives: DA experts and DA collaborators. By conducting a Grounded Analysis, a deeper understanding of OC factors and values was created, leading to a final framework of an OC for DA practices. Conclusion: The study results illustrate various OC factors that facilitate DA practices. These factors differ between subcultures, which are represented by four groups of actors. Three factors were identified as superior, as they had an enabling effect on DA practices in the investigated OCs. Finally, the study derived five underlying values, representing a shared cultural mindset among organizational members, that facilitate DA practice. Organizational Culture Cultural Factors Cultural Values Data Analytics Big Data Data-Driven Business Administration Företagsekonomi
645	Asynchronous Algorithms for Large-Scale Optimization : Analysis and Implementation Aytekin, Arda January 2017 (has links) This thesis proposes and analyzes several first-order methods for convex optimization, designed for parallel implementation in shared and distributed memory architectures. The theoretical focus is on designing algorithms that can run asynchronously, allowing computing nodes to execute their tasks with stale information without jeopardizing convergence to the optimal solution. The first part of the thesis focuses on shared memory architectures. We propose and analyze a family of algorithms to solve an unconstrained, smooth optimization problem consisting of a large number of component functions. Specifically, we investigate the effect of information delay, inherent in asynchronous implementations, on the convergence properties of the incremental prox-gradient descent method. Contrary to related proposals in the literature, we establish delay-insensitive convergence results: the proposed algorithms converge under any bounded information delay, and their constant step-size can be selected independently of the delay bound. Then, we shift focus to solving constrained, possibly non-smooth, optimization problems in a distributed memory architecture. This time, we propose and analyze two important families of gradient descent algorithms: asynchronous mini-batching and incremental aggregated gradient descent. In particular, for asynchronous mini-batching, we show that, by suitably choosing the algorithm parameters, one can recover the best-known convergence rates established for delay-free implementations, and expect a near-linear speedup with the number of computing nodes. Similarly, for incremental aggregated gradient descent, we establish global linear convergence rates for any bounded information delay. Extensive simulations and actual implementations of the algorithms in different platforms on representative real-world problems validate our theoretical results. / <p>QC 20170317</p> convex optimization optimization asynchronous algorithms algorithms parallel algorithms large-scale big data Control Engineering Reglerteknik
646	Hive, Spark, Presto for Interactive Queries on Big Data Gureev, Nikita January 2018 (has links) Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. The space of big data processing has been growing fast over the past years and many technologies have been introduced in the big data ecosystem to address the problem of processing large volumes of data, and some of the early tools have become widely adopted, with Apache Hive being one of them. However,with the recent advances in technology, there are other tools better suited for interactive analytics of big data, such as Apache Spark and Presto. In this thesis these technologies are examined and benchmarked in order to determine their performance for the task of interactive business intelligence queries. The benchmark is representative of interactive business intelligence queries, and uses a star-shaped schema. The performance HiveTez, Hive LLAP, Spark SQL, and Presto is examined with text, ORC, Parquet data on different volume and concurrency. A short analysis and conclusions are presented with the reasoning about the choice of framework and data format for a system that would run interactive queries on bigdata. / Traditionella relationella databassystem kan inte användas effektivt för att analysera stora datavolymer och filformat, såsom big data. Apache Hadoop är en av de första open-source verktyg som tillhandahåller ett distribuerat datalagring och resurshanteringssystem. Området för big data processing har växt fort de senaste åren och många teknologier har introducerats inom ekosystemet för big data för att hantera problemet med processering av stora datavolymer, och vissa tidiga verktyg har blivit vanligt förekommande, där Apache Hive är en av de. Med nya framsteg inom området finns det nu bättre verktyg som är bättre anpassade för interaktiva analyser av big data, som till exempel Apache Spark och Presto. I denna uppsats är dessa teknologier analyserade med benchmarks för att fastställa deras prestanda för uppgiften av interaktiva business intelligence queries. Dessa benchmarks är representative för interaktiva business intelligence queries och använder stjärnformade scheman. Prestandan är undersökt för Hive Tex, Hive LLAP, Spark SQL och Presto med text, ORC Parquet data för olika volymer och parallelism. En kort analys och sammanfattning är presenterad med ett resonemang om valet av framework och dataformat för ett system som exekverar interaktiva queries på big data. Hadoop SQL interactive analysis Hive Spark Spark SQL Presto Big Data Computer and Information Sciences Data- och informationsvetenskap
647	Handling Big Data using a Distributed Search Engine : Preparing Log Data for On-Demand Analysis Ekman, Niklas January 2017 (has links) Big data are datasets that is very large and computational complex. With an increasing volume of data the time a trivial processing task can be challenging. Companies collects data at a fast rate but knowing what to do with the data can be hard. A search engine is a system that indexes data making it efficiently queryable by users. When a bug occurs in a computer system log data is consulted in order to understand why, but processing big log data can take a long time. The purpose of this thesis is to investigate, compare and implement a distributed search engine that can prepare log data for analysis, which will make it easier for a developer to investigate bugs. There are three popular search engines: Apache Lucene, Elasticsearch and Apache Solr. Elasticsearch and Apache Solr are built as distributed systems making them capable of handling big data. Requirements was established through interviews. Big log data of totally 40 GB was provided that would be indexed in the selected search engine. The log data provided was generated in a proprietary binary format and it had to be decoded before. The distributed search engines was evaluated based on: Distributed architecture, text analysis, indexing and querying. Elasticsearch was selected for implementation. A cluster was set up on Amazon Web Services and tests was executed in order to determine how different configurations performed. An indexing software was written that would transfer data to the cluster. Results was verified through a case-study with participants of the stakeholder. / Stordata är en datamängd som är mycket stora och komplexa att göra beräkningar på. När en datamängd ökar blir en trivial bearbetningsuppgift betydligt mera utmanande. Företagen samlar idag in data i allt snabbare takt men det är svårt att veta exakt vad man ska göra med den data. En sökmotor är ett system som indexerar data och gör det effektivt att för användare att söka i det. När ett fel inträffar i ett datorsystem går utvecklare igenom loggdata för att få en insikt i varför, men det kan ta lång tid att söka igenom en stor mängd loggdata. Syftet med denna avhandling är att undersöka, jämföra och implementera en distribuerad sökmotor som kan förbereda loggdata för analys, vilket gör det lättare för utvecklare att undersöka buggar. Det finns tre populära sökmotorer: Apache Lucene, Elasticsearch och Apache Solr. Elasticsearch och Apache Solr är byggda som distribuerade system och kan därav hantera stordata. Krav fastställdes genom intervjuer. En stor mängd loggdata på totalt 40 GB indexerades i den valda sökmotorn. Den loggdata som användes genererades i en proprietär binärt format som behövdes avkodas för att kunna användas. De distribuerade sökmotorerna utvärderades utifrån kriterierna: Distribuerad arkitektur, textanalys, indexering och förfrågningar. Elasticsearch valdes för att implementeras. Ett kluster sattes upp på Amazon Web Services och test utfördes för att bestämma hur olika konfigurationer presterade. En indexeringsprogramvara skrevs som skulle överföra data till klustret. Resultatet verifierades genom en studie med deltagare från intressenten. Big Data Distributed System Search Engine Stordata Distribuerat system Sökmotor Computer and Information Sciences Data- och informationsvetenskap
648	Graph-based features for machine learning driven code optimization / Maskinlärnings-driven programkodsoptimering med graf-baserad datautvinning Kindestam, Anton January 2017 (has links) In this paper we present a method of using the Shortest-Path Graph Kernel, on graph-based features of computer programs, to train a Support Vector Regression model which predicts execution time speedup over baseline given an unseen program and a point in optimization space, based on a method proposed in Using Graph-Based Program Characterization for Predictive Modeling by Park et al. The optimization space is represented by command-line parameters to the polyhedral C-to-C compiler PoCC, and PolyBench is used to generate the data set of speedups over baseline. The model is found to produce results reasonable by some metrics, but due to the large error and the pseudo-random behaviour of the output the method, in its current form, must reluctantly be rejected. / I den här raporten presenterar vi en metod att träna en Stöd-vektor-regressions-modell som givet ett osett program och en punkt i optimeringsrymden kan förutsäga hur mycket snabbare över baslinjen programmet kommer att exekvera förutsatt att man applicerar givna optimeringar. För att representera programmet använder vi en grafstruktur för vilken vi kan använda en grafkärna, Shortest-Path Graph Kernel, vilken kan avgöra hur lika två olika grafer är varandra. Metoden är baserad på en metod presenterad av Park et al. i Using Graph-Based Program Characterization for Predictive Modeling. Optimeringsrymden erhålls genom olika kombinationer av kommandoradsparametrar till den polyhedriska C-till-C-kompilatorn PoCC. Testdatat erhölls genom att förberäkna hastighetsfaktorer för alla optimeringar och alla program i test-algoritms-biblioteket PolyBench. Vi finner att modellen i vissa mått mätt producerar "bra" resultat, men p.g.a. av det stora felet och det slumpmässiga beteendet måste dessvärre metoden, i dess nuvarande form,förkastas. Machine Learning Graph Kernel Compilation Big Data Tuning Computer Sciences Datavetenskap (datalogi)
649	How smart are smart cities? : How can big data influence the planning of sustainable urban mobility? Teixeira Betarelli Cabrera, Luiza, Marie Woda, Angela January 2018 (has links) The opportunities we have to move around the city, otherwise known as urban mobility, are intrinsically linked to both local industrial output as well as the physical possibilities of moving provided by the landscape of the urban environment. With the advent of automotive congestion solidifying itself as a reality as populations in urban centres continue to rise, the conspicuous question is: If we continue the way we’re going, what happens next? This thesis explores the influence of new technologies and the data that drives them on the mechanisms of planning mobility in urban centres, as well as the potential of adoption of innovative mobility solutions that address environmental concerns through insights that higher resolution data can provide. Statistical data sets have been used in the past in order to justify urban interventions and shifts in the established landscape, to collectively move citizens and their production towards spatial outcomes that hinge from directives of governance. More recently these shifts have been in a bid to address the overarching awareness of changes in the natural environment due to industrial (and therefore human) intervention. Moving into a future when there is a higher resolution of quantitative data harvested from abroad consumer base it becomes possible to enhance the city planning process by linking close to-real-time supply and demand. The central proposition of this thesis is that unique value propositions of mobility consumer markets should be driven by the needs of people, rather than the capabilities of technology and industry. There are obvious real-world ramifications for changes in the way citizens move around the city; the sizing of streets, noise levels of automobiles, access and egress points, the distance between points of interest and the capabilities of the fixed built infrastructure to accommodate change. This body of research focus’ on the connection enabled by putting people, rather than technical solutions at the centre of the sustainability debate. Urban mobility sustainable development big data innovation. Other Engineering and Technologies Annan teknik
650	Efficient Wearable Big Data Harnessing and Mining with Deep Intelligence Elijah J Basile (13161057) 27 July 2022 (has links) <p>Wearable devices and their ubiquitous use and deployment across multiple areas of health provide key insights in patient and individual status via big data through sensor capture at key parts of the individual’s body. While small and low cost, their limitations rest in their computational and battery capacity. One key use of wearables has been in individual activity capture. For accelerometer and gyroscope data, oscillatory patterns exist between daily activities that users may perform. By leveraging spatial and temporal learning via CNN and LSTM layers to capture both the intra and inter-oscillatory patterns that appear during these activities, we deployed data sparsification via autoencoders to extract the key topological properties from the data and transmit via BLE that compressed data to a central device for later decoding and analysis. Several autoencoder designs were developed to determine the principles of system design that compared encoding overhead on the sensor device with signal reconstruction accuracy. By leveraging asymmetric autoencoder design, we were able to offshore much of the computational and power cost of signal reconstruction from the wearable to the central devices, while still providing robust reconstruction accuracy at several compression efficiencies. Via our high-precision Bluetooth voltmeter, the integrated sparsified data transmission configuration was tested for all quantization and compression efficiencies, generating lower power consumption to the setup without data sparsification for all autoencoder configurations. </p> <p><br></p> <p>Human activity recognition (HAR) is a key facet of lifestyle and health monitoring. Effective HAR classification mechanisms and tools can provide healthcare professionals, patients, and individuals key insights into activity levels and behaviors without the intrusive use of human or camera observation. We leverage both spatial and temporal learning mechanisms via CNN and LSTM integrated architectures to derive an optimal classification architecture that provides robust classification performance for raw activity inputs and determine that a LSTMCNN utilizing a stacked-bidirectional LSTM layer provides superior classification performance to the CNNLSTM (also utilizing a stacked-bidirectional LSTM) at all input widths. All inertial data classification frameworks are based off sensor data drawn from wearable devices placed at key sections of the body. With the limitation of wearable devices being a lack of computational and battery power, data compression techniques to limit the quantity of transmitted data and reduce the on-board power consumption have been employed. While this compression methodology has been shown to reduce overall device power consumption, this comes at a cost of more-or-less information loss in the reconstructed signals. By employing an asymmetric autoencoder design and training the LSTMCNN classifier with the reconstructed inputs, we minimized the classification performance degradation due to the wearable signal reconstruction error The classifier is further trained on the autoencoder for several input widths and with quantized and unquantized models. The performance for the classifier trained on reconstructed data ranged between 93.0\% and 86.5\% accuracy dependent on input width and autoencoder quantization, showing promising potential of deep learning with wearable sparsification. </p> Medical devices Signal processing Big Data Wearables LSTM CNN Data Compression Smart Health Human Activity Recognition

Search results