Global ETD Search

71	Middleware for online scientific data analytics at extreme scale Zheng, Fang 22 May 2014 (has links) Scientific simulations running on High End Computing machines in domains like Fusion, Astrophysics, and Combustion now routinely generate terabytes of data in a single run, and these data volumes are only expected to increase. Since such massive simulation outputs are key to scientific discovery, the ability to rapidly store, move, analyze, and visualize data is critical to scientists' productivity. Yet there are already serious I/O bottlenecks on current supercomputers, and movement toward the Exascale is further accelerating this trend. This dissertation is concerned with the design, implementation, and evaluation of middleware-level solutions to enable high performance and resource efficient online data analytics to process massive simulation output data at large scales. Online data analytics can effectively overcome the I/O bottleneck for scientific applications at large scales by processing data as it moves through the I/O path. Online analytics can extract valuable insights from live simulation output in a timely manner, better prepare data for subsequent deep analysis and visualization, and gain improved performance and reduced data movement cost (both in time and in power) compared to the conventional post-processing paradigm. The thesis identifies the key challenges for online data analytics based on the needs of a variety of large-scale scientific applications, and proposes a set of novel and effective approaches to efficiently program, distribute, and schedule online data analytics along the critical I/O path. In particular, its solution approach i) provides a high performance data movement substrate to support parallel and complex data exchanges between simulation and online data analytics, ii) enables placement flexibility of analytics to exploit distributed resources, iii) for co-placement of analytics with simulation codes on the same nodes, it uses fined-grained scheduling to harvest idle resources for running online analytics with minimal interference to the simulation, and finally, iv) it supports scalable efficient online spatial indices to accelerate data analytics and visualization on the deep memory hierarchies of high end machines. Our middleware approach is evaluated with leadership scientific applications in domains like Fusion, Combustion, and Molecular Dynamics, and on different High End Computing platforms. Substantial improvements are demonstrated in end-to-end application performance and in resource efficiency at scales of up to 16384 cores, for a broad range of analytics and visualization codes. The outcome is a useful and effective software platform for online scientific data analytics facilitating large-scale scientific data exploration. Scientific data analytics I/O middleware Middleware Big data High performance computing
72	Big data e concorrência: uma avaliação dos impactos da exploração de big data para o método antitruste tradicional de análise de concentrações econômicas Monteiro, Gabriela Reis Paiva January 2017 (has links) Submitted by Gabriela Reis Paiva Monteiro (gmonteiro@fgvmail.br) on 2018-02-20T21:11:14Z No. of bitstreams: 1 Dissertação_Aluna_Gabriela Reis Paiva Monteiro_Mestrado em Direito da Regulação_19.02.2018.pdf: 1599058 bytes, checksum: 501ba3c9cc74f0a69bef579348904595 (MD5) / Approved for entry into archive by Diego Andrade (diego.andrade@fgv.br) on 2018-02-21T12:40:01Z (GMT) No. of bitstreams: 1 Dissertação_Aluna_Gabriela Reis Paiva Monteiro_Mestrado em Direito da Regulação_19.02.2018.pdf: 1599058 bytes, checksum: 501ba3c9cc74f0a69bef579348904595 (MD5) / Made available in DSpace on 2018-03-02T13:37:48Z (GMT). No. of bitstreams: 1 Dissertação_Aluna_Gabriela Reis Paiva Monteiro_Mestrado em Direito da Regulação_19.02.2018.pdf: 1599058 bytes, checksum: 501ba3c9cc74f0a69bef579348904595 (MD5) Previous issue date: 2018-02-02 / Uma característica de mercados digitais é a geração e análise de uma “enxurrada” de dados, o que tem sido considerado um elemento chave de muitos negócios que emergem no cenário da “Internet das Coisas”. O termo big data reflete essa tendência de coletar, adquirir, armazenar e processar grandes volumes de dados digitais para criar valor econômico. Os modelos de negócio das plataformas online frequentemente se baseiam na exploração de dados, em particular os de natureza pessoal, que são usados como insumo para melhorar e personalizar os serviços ou produtos que oferecem. Até recentemente, as autoridades antitrustes ainda não haviam se debruçado completamente sobre as implicações do uso de big data para uma política de defesa da concorrência, mas essa situação tem se modificado com o surgimento de discussões sobre as preocupações anticompetitivas suscitadas pela exploração dessa capacidade. Dessa forma, esta dissertação buscou investigar se e em que medida a exploração de big data em mercados digitais pode ser considerada uma vantagem comparativa que suscita riscos anticompetitivos e, nesse caso, como a análise dessa variável competitiva pode ser incorporada ao método antitruste tradicional para o controle de estruturas. Esta investigação identificou que, em determinadas situações, a capacidade de big data pode representar relevante vantagem competitiva, gerando diversas preocupações concorrenciais no contexto de concentrações econômicas. De forma geral, essas preocupações anticompetitivas podem ser analisadas dentro do escopo das etapas do método antitruste clássico, não se verificando, neste momento, a necessidade de um novo arcabouço metodológico que seja especificamente aplicável ao exame de operações envolvendo agentes econômicos cujos modelos de negócio se baseiem preponderantemente em dados. Não obstante, determinadas ferramentas desse método precisarão ser adaptadas ou alargadas pela autoridade concorrencial brasileira, principalmente para que sejam levadas em consideração outras dimensões competitivas não relacionadas a preço, como qualidade, inovação e privacidade, bem como particularidades do big data e do ecossistema de sua exploração na avaliação dos efeitos e das eficiências da operação, assim como de eventuais remédios / A feature of digital markets is the generation and analysis of a “torrent” of data which is being considered as a key aspect of many business emerging in the context of the “Internet of Things”. The word big data reflects this trend towards collecting, acquiring, storing and processing great volumes of digital data to create economic value. Online platforms’ business models are frequently based on exploiting data, in particular personal data, which are used as an input to improve and personalize services and product that they offer. Until recently, antitrust authorities had not carefully analyzed the impacts of the use of big to a competition policy, but this situation has been changing with the emergence of discussions about anticompetitive concerns raised by the exploitation of this capacity. In light of this, this work aimed at investigating if and to which extent the exploitation of big data in digital markets may be considered a comparative advantage that raises antitrust risks and, in this case, how an analysis of this competitive variable should be incorporated in the traditional antitrust approach to mergers and acquisitions. This investigation identified that, under certain conditions, big data capacity may result in a relevant competitive advantage, giving raise to anticompetitive concerns in the context of mergers and acquisitions. In a general manner, these concerns may be analyzed within the scope of the phases of the classic antitrust method, and there is no need, at this moment, for a new methodologic framework specifically applicable to the analysis of transactions involving firms which business models are preponderantly based on the use of data. Notwithstanding, certain tools might need to be adapted or enlarged by the Brazilian antitrust authority, mainly to take into account non-price competition dimensions, such as quality, innovation and privacy, as well as particular features of big data and its ecosystem in the assessment of the transaction’s effects and efficiencies, as well as potential remedies Dados Privacidade Método Metodologia Antitruste Concorrência CADE Big data Data analytics Direito
73	Evaluating the Performance of Leadership in Energy and Environmental Design (LEED) Certified Facilities using Data-Driven Predictive Models for Energy and Occupant Satisfaction with Indoor Environmental Quality (IEQ) January 2015 (has links) abstract: Given the importance of buildings as major consumers of resources worldwide, several organizations are working avidly to ensure the negative impacts of buildings are minimized. The U.S. Green Building Council's (USGBC) Leadership in Energy and Environmental Design (LEED) rating system is one such effort to recognize buildings that are designed to achieve a superior performance in several areas including energy consumption and indoor environmental quality (IEQ). The primary objectives of this study are to investigate the performance of LEED certified facilities in terms of energy consumption and occupant satisfaction with IEQ, and introduce a framework to assess the performance of LEED certified buildings. This thesis attempts to achieve the research objectives by examining the LEED certified buildings on the Arizona State University (ASU) campus in Tempe, AZ, from two complementary perspectives: the Macro-level and the Micro-level. Heating, cooling, and electricity data were collected from the LEED-certified buildings on campus, and their energy use intensity was calculated in order to investigate the buildings' actual energy performance. Additionally, IEQ occupant satisfaction surveys were used to investigate users' satisfaction with the space layout, space furniture, thermal comfort, indoor air quality, lighting level, acoustic quality, water efficiency, cleanliness and maintenance of the facilities they occupy. From a Macro-level perspective, the results suggest ASU LEED buildings consume less energy than regional counterparts, and exhibit higher occupant satisfaction than national counterparts. The occupant satisfaction results are in line with the literature on LEED buildings, whereas the energy results contribute to the inconclusive body of knowledge on energy performance improvements linked to LEED certification. From a Micro-level perspective, data analysis suggest an inconsistency between the LEED points earned for the Energy & Atmosphere and IEQ categories, on one hand, and the respective levels of energy consumption and occupant satisfaction on the other hand. Accordingly, this study showcases the variation in the performance results when approached from different perspectives. This contribution highlights the need to consider the Macro-level and Micro-level assessments in tandem, and assess LEED building performance from these two distinct but complementary perspectives in order to develop a more comprehensive understanding of the actual building performance. / Dissertation/Thesis / Masters Thesis Engineering 2015 Sustainability Statistics Civil engineering Data Analytics Energy Consumption Indoor Environmental Quality LEED Buildings Predictive Models Sustainability
74	A Study of Text Mining Framework for Automated Classification of Software Requirements in Enterprise Systems January 2016 (has links) abstract: Text Classification is a rapidly evolving area of Data Mining while Requirements Engineering is a less-explored area of Software Engineering which deals the process of defining, documenting and maintaining a software system's requirements. When researchers decided to blend these two streams in, there was research on automating the process of classification of software requirements statements into categories easily comprehensible for developers for faster development and delivery, which till now was mostly done manually by software engineers - indeed a tedious job. However, most of the research was focused on classification of Non-functional requirements pertaining to intangible features such as security, reliability, quality and so on. It is indeed a challenging task to automatically classify functional requirements, those pertaining to how the system will function, especially those belonging to different and large enterprise systems. This requires exploitation of text mining capabilities. This thesis aims to investigate results of text classification applied on functional software requirements by creating a framework in R and making use of algorithms and techniques like k-nearest neighbors, support vector machine, and many others like boosting, bagging, maximum entropy, neural networks and random forests in an ensemble approach. The study was conducted by collecting and visualizing relevant enterprise data manually classified previously and subsequently used for training the model. Key components for training included frequency of terms in the documents and the level of cleanliness of data. The model was applied on test data and validated for analysis, by studying and comparing parameters like precision, recall and accuracy. / Dissertation/Thesis / Masters Thesis Engineering 2016 Computer science Engineering data analytics R requirements classification text classification text mining
75	Evaluation of Storage Systems for Big Data Analytics January 2017 (has links) abstract: Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model. / Dissertation/Thesis / Masters Thesis Computer Science 2017 Computer science Alluxio Big Data Analytics Ceph Disk Centric Hybrid Memory Centric
76	How to Think About Resilient Infrastructure Systems January 2018 (has links) abstract: Resilience is emerging as the preferred way to improve the protection of infrastructure systems beyond established risk management practices. Massive damages experienced during tragedies like Hurricane Katrina showed that risk analysis is incapable to prevent unforeseen infrastructure failures and shifted expert focus towards resilience to absorb and recover from adverse events. Recent, exponential growth in research is now producing consensus on how to think about infrastructure resilience centered on definitions and models from influential organizations like the US National Academy of Sciences. Despite widespread efforts, massive infrastructure failures in 2017 demonstrate that resilience is still not working, raising the question: Are the ways people think about resilience producing resilient infrastructure systems? This dissertation argues that established thinking harbors misconceptions about infrastructure systems that diminish attempts to improve their resilience. Widespread efforts based on the current canon focus on improving data analytics, establishing resilience goals, reducing failure probabilities, and measuring cascading losses. Unfortunately, none of these pursuits change the resilience of an infrastructure system, because none of them result in knowledge about how data is used, goals are set, or failures occur. Through the examination of each misconception, this dissertation results in practical, new approaches for infrastructure systems to respond to unforeseen failures via sensing, adapting, and anticipating processes. Specifically, infrastructure resilience is improved by sensing when data analytics include the modeler-in-the-loop, adapting to stress contexts by switching between multiple resilience strategies, and anticipating crisis coordination activities prior to experiencing a failure. Overall, results demonstrate that current resilience thinking needs to change because it does not differentiate resilience from risk. The majority of research thinks resilience is a property that a system has, like a noun, when resilience is really an action a system does, like a verb. Treating resilience as a noun only strengthens commitment to risk-based practices that do not protect infrastructure from unknown events. Instead, switching to thinking about resilience as a verb overcomes prevalent misconceptions about data, goals, systems, and failures, and may bring a necessary, radical change to the way infrastructure is protected in the future. / Dissertation/Thesis / Doctoral Dissertation Civil, Environmental and Sustainable Engineering 2018 Systems science Sustainability Civil engineering Critical Infrastructure Data Analytics Electric Power Systems Resilience South Korea
77	The impact of Big Data on companies and a lack of skills as the origin of the challenges they are facing : An investigation aimed to understand the origin of the challenges companies are facing with Big Data Ishac, Patrick, Dussoulier, Hannah January 2018 (has links) The 21st century saw the rise of internet and with it, the digitalization of our world. Today, many companies rely on technology to run their businesses and Big Data is one of the latest phenomenon that arose from technological evolution. As the amount of data is constantly increasing, ranging from business intelligence to personal information, Big Data has become a major source of competitive advantage for companies who are able to implement it efficiently. However, as with every new technology, challenges and issues arise. What’s more, the learning curve is steep, and companies need to adapt quickly, so as to follow the pace of innovation and develop the skill-set of their employees to remain competitive in their respective industries. This paper investigates how Big Data is impacting companies, the main challenges they are facing within its implementation and looks to determine if these challenges originate from a lack of skills from the current workforce. A qualitative study has been conducted, interviewing nine respondents over eight interviews of 54 minutes on average. Three main ideas have been outlined through the interviews conducted by the authors. The first is the impact of Big Data in companies with mainly the benefits, challenges, regulations as well as the cohabitation of human beings and technology. The second and third are the optimal profile of a decision-maker and the ideal profile of the employee in companies working with Big Data. The profiles of the decision-maker and employee are composed of characteristics, skills and experience. The decision-maker, in this paper, was defined as a key actor in the success or failure of a company and of great influence on the profile of the employee. His skills, such as strategic, basic, analytical, communication and decision-making were developed, and their correlation was demonstrated. Ultimately, the lack of skills in companies today, often regarded as a challenge by numerous scholars, was shown to be the origin for many of the challenges companies are facing, mainly through bad decision-making and lack of communication. The authors finally outlined steps for a successful implementation of Big Data in companies and future trends such as regulations and increased technological evolution to carefully and actively pursue for people and businesses alike. Big Data Big Data Analytics Skills Challenges Impact Business Administration Företagsekonomi
78	Utilização de big data analytics nos sistemas de medição de desempenho: estudos de caso Mello, Raquel Gama Soares de 12 February 2015 (has links) Made available in DSpace on 2016-06-02T19:52:10Z (GMT). No. of bitstreams: 1 6712.pdf: 2095829 bytes, checksum: 0fcab607bc1d879d07e91b41e95f55c5 (MD5) Previous issue date: 2015-02-12 / Financiadora de Estudos e Projetos / Big data is associated with large amounts of data of different types that come from different sources in a very fast way, able to add value to business and with veracity. Nowadays, many companies are looking for ways to extract useful information from this huge amount of data. This can be attained applying analytical techniques. The application of these techniques to big data is denominated big data analytics. It can influence how managers make their decisions and manage the company businesses. This influences the use of performance measurement systems (PMSs). These systems are composed by a multidimensional set of performance measures that can support decision making and business planning. This way, performance measurement systems and big data analytics can be used to support decision making and the implementation of actions. There is evidence, in the literature, that big data analytics can be used in performance measurement systems. Following this context, this study aims at investigating how companies apply the big data analytics in using performance measurement systems. To achieve this objective, a systematic literature review was carried out for checking existing studies on the relationship between big data and performance measurement system. Then, case study method was applied. The empirical findings showed that big data analytics supports the decision making process, making it more efficient and effective. The results showed that big data analytics helps PMS identify, through analyses, how past actions can influence the future performance. Such analyses are in essence descriptive and predictive and it was applied in sales process. The empirical findings from the case studies showed that big data analytics contributes mainly to the use of PMSs related to planning and to influencing behavior. Therefore, it is possible to conclude that there is a contribution when big data analytics is used in performance measurement system. / Big data está associado a grande quantidade de dados de diferentes tipos, provindos de diversas fontes de forma acelerada, capazes de trazer valor aos negócios e com veracidade. Atualmente, muitas empresas buscam formas de extrair informações úteis deste grande volume de dados. Isso pode ser feito por meio de técnicas analíticas. A aplicação dessas técnicas ao big data é denominada big data analytics que pode influenciar a forma como os gestores tomam as suas decisões e gerenciam os negócios da empresa. Isto pode afetar os sistemas de medição de desempenho (SMDs) que são compostos por um conjunto de medidas de desempenho multidimensionais capaz de apoiar a tomada de decisões e o planejamento dos negócios. Dessa forma, os sistemas de medição de desempenho e o big data analytics podem ser utilizados para apoiar a tomada de decisão e dar suporte à realização das ações. Há evidências, na literatura pesquisada, de que o big data analytics possa ser utilizado nos sistemas de medição de desempenho. Dentro deste contexto, esta pesquisa tem como objetivo investigar como as empresas usam big data analytics nos sistemas de medição de desempenho. Para alcançar o objetivo deste trabalho, primeiramente, foi realizada uma revisão sistemática da literatura para verificar as publicações existentes a respeito da relação entre big data analytics e sistema de medição de desempenho. Em seguida, o método de pesquisa utilizado foi estudo de caso múltiplo de caráter exploratório. As análises dos dados comprovaram que o big data analytics auxilia para que o processo de tomada de decisão seja mais eficiente e efetivo. Os resultados apontaram que o big data analytics auxilia o SMD a identificar como ações passadas podem influenciar o desempenho futuro por meio das análises realizadas. Essas análises são descritivas e preditivas e contribuem nas ações de venda dos produtos. Os dados empíricos provindos dos estudos de caso mostraram que big data analytics contribui principalmente para o uso dos SMDs relacionado ao planejamento e a influenciar o comportamento. Portanto, é possível concluir que existe uma contribuição quando big data analytics é utilizado no sistema de medição de desempenho. Sistemas de medição de desempenho Performance measurement system Big data Big data analytics ENGENHARIAS::ENGENHARIA DE PRODUCAO
79	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies Sellén, David January 2016 (has links) Large amounts of data in various forms are generated at a fast pace in today´s society. This is commonly referred to as “Big Data”. Making use of Big Data has been increasingly important for both business and in research. The forest industry is generating big amounts of data during the different processes of forest harvesting. In Sweden, forest infor-mation is sent to SDC, the information hub for the Swedish forest industry. In 2014, SDC received reports on 75.5 million m3fub from harvester and forwarder machines. These machines use a global stand-ard called StanForD 2010 for communication and to create reports about harvested stems. The arrival of scalable cloud technologies that com-bines Big Data with machine learning makes it interesting to develop an application to analyze the large amounts of data produced by the forest industry. In this study, a proof-of-concept has been implemented to be able to analyze harvest production reports from the StanForD 2010 standard. The system consist of a back-end and front-end application and is built using cloud technologies such as Apache Spark and Ha-doop. System tests have proven that the concept is able to successfully handle storage, processing and machine learning on gigabytes of HPR files. It is capable of extracting information from raw HPR data into datasets and support a machine learning pipeline with pre-processing and K-Means clustering. The proof-of-concept has provided a code base for further development of a system that could be used to find valuable knowledge for the forest industry. Big Data analytics Apache Spark StanForD 2010 forest industry harvest production report Computer Engineering Datorteknik
80	Data Masking, Encryption, and their Effect on Classification Performance: Trade-offs Between Data Security and Utility Asenjo, Juan C. 01 January 2017 (has links) As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed that the use of data masking and encryption can impact the quality of data mining results. This dissertation investigated and compared the causal effects of data masking and encryption on classification performance as a measure of the quality of knowledge discovery. A review of the literature found a gap in the body of knowledge, indicating that this problem had not been studied before in an experimental setting. The objective of this dissertation was to gain an understanding of the trade-offs between data security and utility in the field of analytics and data mining. The research used a nationally recognized cancer incidence database, to show how masking and encryption of potentially sensitive demographic attributes such as patients’ marital status, race/ethnicity, origin, and year of birth, could have a statistically significant impact on the patients’ predicted survival. Performance parameters measured by four different classifiers delivered sizable variations in the range of 9% to 10% between a control group, where the select attributes were untouched, and two experimental groups where the attributes were substituted or suppressed to simulate the effects of the data protection techniques. In practice, this represented a corroboration of the potential risk involved when basing medical treatment decisions using data mining applications where attributes in the data sets are masked or encrypted for patient privacy and security concerns. Big Data Data Analytics Data Mining Encryption Knowledge Discovery Masking Computer Engineering Computer Sciences

Search results