131 |
Microservices in data intensive applicationsRemeika, Mantas, Urbanavicius, Jovydas January 2018 (has links)
The volumes of data which Big Data applications have to process are constantly increasing. This requires for the development of highly scalable systems. Microservices is considered as one of the solutions to deal with the scalability problem. However, the literature on practices for building scalable data-intensive systems is still lacking. This thesis aims to investigate and present the benefits and drawbacks of using microservices architecture in big data systems. Moreover, it presents other practices used to increase scalability. It includes containerization, shared-nothing architecture, data sharding, load balancing, clustering, and stateless design. Finally, an experiment comparing the performance of a monolithic application and a microservices-based application was performed. The results show that with increasing amount of load microservices perform better than the monolith. However, to cope with the constantly increasing amount of data, additional techniques should be used together with microservices.
|
132 |
Inteligência cibernética e uso de recursos semânticos na detecção de perfis falsos no contexto do Big Data /Oliveira, José Antonio Maurilio Milagre de. January 2016 (has links)
Orientador: José Eduardo Santarem Segundo / Banca: Ricardo César Gonçalves Sant'Ana / Banca: Mário Furlaneto Neto / Resumo: O desenvolvimento da Internet transformou o mundo virtual em um repositório infindável de informações. Diariamente, na sociedade da informação, pessoas interagem, capturam e despejam dados nas mais diversas ferramentas de redes sociais e ambientes da Web. Estamos diante do Big Data, uma quantidade inacabável de dados com valor inestimável, porém de difícil tratamento. Não se tem dimensão da quantidade de informação capaz de ser extraída destes grandes repositórios de dados na Web. Um dos grandes desafios atuais na Internet do "Big Data" é lidar com falsidades e perfis falsos em ferramentas sociais, que causam alardes, comoções e danos financeiros significativos em todo o mundo. A inteligência cibernética e computação forense objetivam investigar eventos e constatar informações extraindo dados da rede. Por sua vez, a Ciência da Informação, preocupada com as questões envolvendo a recuperação, tratamento, interpretação e apresentação da informação, dispõe de elementos que quando aplicados neste contexto podem aprimorar processos de coleta e tratamento de grandes volumes de dados, na detecção de perfis falsos. Assim, por meio da presente pesquisa de revisão de literatura, documental e exploratória, buscou-se revisar os estudos internacionais envolvendo a detecção de perfis falsos em redes sociais, investigando técnicas e tecnologias aplicadas e principalmente, suas limitações. Igualmente, apresenta-se no presente trabalho contribuições de áreas da Ciência da Informação e critério... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The development of the Internet changed the virtual world in an endless repository of information. Every single day, in an information-society, people change, catch and turn out files in different tools of social network and Web surrounding. We are in front of "The Big Data", an endless amount of data with invaluable, but hard treating. It doesn't have some dimension of measure information to be able of extracting from these big Web data repositories. One of the most challenges nowadays on the Internet from the "Big Data" is to identify feelings, anticipating sceneries dealing with falsehood and false profiles social tools, which cause fanfare, upheavals and significant financial losses worldwide in front of our true scene. The cyber intelligence has by objective to look for events and finding information, subtracting dates from the Web. On the other hand, the Information Science, worried with the questions involving recovery, processing, interpretation and presentation of information that has important areas of study capable of being applied in this context hone the collection and treatment processes of large volumes of information (datas). Thus, through this research literature review, documentary and exploratory, the researcher aimed to review the International studies implicating the analysis of large volumes of data on social networking tools in falsehoods detection, investigating applied techniques and technologies and especially their limitations. Based on the identifi... (Complete abstract click electronic access below) / Mestre
|
133 |
Using a Scalable Feature Selection Approach For Big Data RegressionsQingdong Cheng (6922766) 13 August 2019 (has links)
Logistic regression is a widely used statistical method in data analysis and machine learning. When the capacity of data is large, it is time-consuming and even infeasible to perform big data machine learning using the traditional approach. Therefore, it is crucial to come up with an efficient way to evaluate feature combinations and update learning models. With the approach proposed by Yang, Wang, Xu, and Zhang (2018) a system can be represented using small enough matrices, which can be hosted in memory. These working sufficient statistics matrices can be applied in updating models in logistic regression. This study applies the working sufficient statistics approach in logistic regression machine learning to examine how this new method improves the performance. This study investigated the difference between the performance of this new working sufficient statistics approach and performance of the traditional approach on Spark\rq s machine learning package. The experiments showed that the working sufficient statistics method could improve the performance of training the logistic regression models when the input size was large.
|
134 |
Att skapa och fånga värde med stöd av Big data / To create and capture value through the use of Big dataMazuga, Alicja, Jurevica, Kristine January 2019 (has links)
Big data är ett omtalad ämne, allt fler organisationer vill investera i och använda Big data som beslutsstöd. Därför har syftet med denna studie varit att få en djupare förståelse för begreppet Big data och mer konkret att undersöka hur Big data påverkar värdeskapande och värdefångst i affärsmodeller. För att undersöka syftet har följande forskningsfråga ställts: Hur påverkar Big data värdeskapande och värdefångst i affärsmodeller? En multipel fallstudiedesign av en kvalitativ karaktär har tillämpats för att uppfylla studiens syfte och besvara forskningsfrågan. Insamlingen av data har skett genom semistrukturerade intervjuer med fyra respondenter från tre olika organisationer. Den insamlade data har sedan analyserats med hjälp av tematisk analys. Resultatet visar att Big data analyser kan skapa stora affärsvärden. Big data är en kraftfull tillgång för organisationer som kan använda dataanalyser som underlag för att fatta väl underbyggda beslut vilket kan leda till utveckling och effektivisering av organisationen. Dock medför arbete med Big data utmaningar gällande kommunikation mellan datadrivna enheter och människor men även att hitta vilka källor som faktiskt levererar värde. Värdeskapande och värdefångst i affärsmodeller är en kontinuerlig process som organisationer strävar efter. Studien har visat att värdeskapande med stöd av Big data handlar om att tillfredsställa kunders behov och skapa långvariga kundrelationer genom att med hjälp av dataanalys identifiera kundbeteende, hitta nya trender och utveckla nya produkter. För att med stöd av Big data öka värdeskapande måste organisationer utföra olika typer av aktiviteter, bland annat hitta källor som levererar väsentlig information, investera i rätt verktyg och medarbetare som kan hantera och bearbeta de stora datavolymerna. Hur mycket värde organisationer kan fånga, det vill säga hur mycket intäkter som genereras från försäljningen beror på organisationens tillgång till resurser, kunskap och om samtliga medarbetare i organisationen integreras i att använda Big data. Genom att integrera medarbetarna i dataanalys kan det bidra till att värdeskapande ökar då deras kunskap och erfarenhet ökar chansen till att finna dolda mönster, nya trender och möjligheten att förutspå saker som ännu inte skett vilket med tiden kommer tillfredsställa kunderna, skapa värde och generera intäkter.
|
135 |
GDPR - Så påverkas detaljhandelns datahantering : en studie av hur GDPR påverkat detaljhandelns datahanteringStafilidis, Dennis, Sjögren, Ludwig January 2019 (has links)
Digitaliseringens framfart har inneburit att användningen av Big Data-analyser har ökat. I takt med digitaliseringens framfart har kraven på datasäkerhet och skydd av personlig integritet ökat. GDPR trädde i kraft 2018 och ställer hårdare krav på hantering av personuppgifter och kunddata. GDPR syftar till att skydda människors integritet och personuppgifter. En av de branscher som hanterar stora mängder data och använder sig av Big Data-analyser för att nå insikter om sina kunder är detaljhandeln. Men för att användningen av Big Data-analyser skall nå sin fulla potential måste den användas upprepade gånger för olika ändamål, medan GDPR föreskriver att kunddata inte får användas för olika syften och ändamål. Syftet med studien är att undersöka och beskriva hur detaljhandelsföretag anpassar sin datahantering till de krav som GDPR ställer. För att undersöka frågeställningen har vi använt oss av en kvalitativ ansats. Vid insamlingen av data har vi genomfört intervjuer, vilka sedan har analyserats genom en tematisk analys. Resultatet i vår studie visar att GDPR har påverkat detaljhandelns hantering av kunddata. Insamlingen har påverkats genom att den blivit mer strikt och genom att inköp av extern kunddata har upphört. Analys av kunddata har påverkats genom ytterligare steg i processen vid behandling samt genom en restriktivare tillgång till databaser och data som används för analys. Aggregering av kunddata har förändrats genom att datakällorna som används har förändrats. Lagringen av kunddata har förändrats då en integrationslösning har skapats som möjliggör radering av lagrad kunddata i olika databaser.
|
136 |
Interpreting "Big Data": Rock Star Expertise, Analytical Distance, and Self-QuantificationWillis, Margaret Mary January 2015 (has links)
Thesis advisor: Natalia Sarkisian / The recent proliferation of technologies to collect and analyze “Big Data” has changed the research landscape, making it easier for some to use unprecedented amounts of real-time data to guide decisions and build ‘knowledge.’ In the three articles of this dissertation, I examine what these changes reveal about the nature of expertise and the position of the researcher. In the first article, “Monopoly or Generosity? ‘Rock Stars’ of Big Data, Data Democrats, and the Role of Technologies in Systems of Expertise,” I challenge the claims of recent scholarship, which frames the monopoly of experts and the spread of systems of expertise as opposing forces. I analyze video recordings (N= 30) of the proceedings of two professional conferences about Big Data Analytics (BDA), and I identify distinct orientations towards BDA practice among presenters: (1) those who argue that BDA should be conducted by highly specialized “Rock Star” data experts, and (2) those who argue that access to BDA should be “democratized” to non-experts through the use of automated technology. While the “data democrats” ague that automating technology enhances the spread of the system of BDA expertise, they ignore the ways that it also enhances, and hides, the monopoly of the experts who designed the technology. In addition to its implications for practitioners of BDA, this work contributes to the sociology of expertise by demonstrating the importance of focusing on both monopoly and generosity in order to study power in systems of expertise, particularly those relying extensively on technology. Scholars have discussed several ways that the position of the researcher affects the production of knowledge. In “Distance Makes the Scholar Grow Fonder? The Relationship Between Analytical Distance and Critical Reflection on Methods in Big Data Analytics,” I pinpoint two types of researcher “distance” that have already been explored in the literature (experiential and interactional), and I identify a third type of distance—analytical distance—that has not been examined so far. Based on an empirical analysis of 113 articles that utilize Twitter data, I find that the analytical distance that authors maintain from the coding process is related to whether the authors include explicit critical reflections about their research in the article. Namely, articles in which the authors automate the coding process are significantly less likely to reflect on the reliability or validity of the study, even after controlling for factors such as article length and author’s discipline. These findings have implications for numerous research settings, from studies conducted by a team of scholars who delegate analytic tasks, to “big data” or “e-science” research that automates parts of the analytic process. Individuals who engage in self-tracking—collecting data about themselves or aspects of their lives for their own purposes—occupy a unique position as both researcher and subject. In the sociology of knowledge, previous research suggests that low experiential distance between researcher and subject can lead to more nuanced interpretations but also blind the researcher to his or her underlying assumptions. However, these prior studies of distance fail to explore what happens when the boundary between researcher and subject collapses in “N of one” studies. In “The Collapse of Experiential Distance and the Inescapable Ambiguity of Quantifying Selves,” I borrow from art and literary theories of grotesquerie—another instance of the collapse of boundaries—to examine the collapse of boundaries in self-tracking. Based on empirical analyses of video testimonies (N=102) and interviews (N=7) with members of the Quantified Self community of self-trackers, I find that ambiguity and multiplicity are integral facets of these data practices. I discuss the implications of these findings for the sociological study of researcher distance, and also the practical implications for the neoliberal turn that assigns responsibility to individuals to collect, analyze, and make the best use of personal data. / Thesis (PhD) — Boston College, 2015. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Sociology.
|
137 |
Outlier Detection In Big DataCao, Lei 29 March 2016 (has links)
The dissertation focuses on scaling outlier detection to work both on huge static as well as on dynamic streaming datasets. Outliers are patterns in the data that do not conform to the expected behavior. Outlier detection techniques are broadly applied in applications ranging from credit fraud prevention, network intrusion detection to stock investment tactical planning. For such mission critical applications, a timely response often is of paramount importance. Yet the processing of outlier detection requests is of high algorithmic complexity and resource consuming. In this dissertation we investigate the challenges of detecting outliers in big data -- in particular caused by the high velocity of streaming data, the big volume of static data and the large cardinality of the input parameter space for tuning outlier mining algorithms. Effective optimization techniques are proposed to assure the responsiveness of outlier detection in big data. In this dissertation we first propose a novel optimization framework called LEAP to continuously detect outliers over data streams. The continuous discovery of outliers is critical for a large range of online applications that monitor high volume continuously evolving streaming data. LEAP encompasses two general optimization principles that utilize the rarity of the outliers and the temporal priority relationships among stream data points. Leveraging these two principles LEAP not only is able to continuously deliver outliers with respect to a set of popular outlier models, but also provides near real-time support for processing powerful outlier analytics workloads composed of large numbers of outlier mining requests with various parameter settings. Second, we develop a distributed approach to efficiently detect outliers over massive-scale static data sets. In this big data era, as the volume of the data advances to new levels, the power of distributed compute clusters must be employed to detect outliers in a short turnaround time. In this research, our approach optimizes key factors determining the efficiency of distributed data analytics, namely, communication costs and load balancing. In particular we prove the traditional frequency-based load balancing assumption is not effective. We thus design a novel cost-driven data partitioning strategy that achieves load balancing. Furthermore, we abandon the traditional one detection algorithm for all compute nodes approach and instead propose a novel multi-tactic methodology which adaptively selects the most appropriate algorithm for each node based on the characteristics of the data partition assigned to it. Third, traditional outlier detection systems process each individual outlier detection request instantiated with a particular parameter setting one at a time. This is not only prohibitively time-consuming for large datasets, but also tedious for analysts as they explore the data to hone in on the most appropriate parameter setting or on the desired results. We thus design an interactive outlier exploration paradigm that is not only able to answer traditional outlier detection requests in near real-time, but also offers innovative outlier analytics tools to assist analysts to quickly extract, interpret and understand the outliers of interest. Our experimental studies including performance evaluation and user studies conducted on real world datasets including stock, sensor, moving object, and Geolocation datasets confirm both the effectiveness and efficiency of the proposed approaches.
|
138 |
The City as Data Machine: Local Governance in the Age of Big DataBaykurt, Burcu January 2019 (has links)
This dissertation is a study of the social dimensions and implications of the smart city, a new kind of urbanism that augments the city’s existing infrastructures with sensors, wireless communication, and software algorithms to generate unprecedented reams of real-time data. It investigates how smartness reshapes civic ties, and transforms the ways of seeing and governing urban centers long-plagued by racial and economic divides. How do the uneven adoption of smart technologies and data-driven practices affect the relationship between citizens and local government? What mediates the understanding and experience of urban inequalities in a data-driven city? In what ways does data-driven local governance address or exacerbate pervasive divides? The dissertation addresses these questions through three years of ethnographic fieldwork in Kansas City, where residents and public officials have partnered with Google and Cisco to test a gigabit internet service and a smart city program respectively.
I show that the foray of tech companies into cities not only changes how urban problems are identified, but also reproduces civic divides. Young, middle-class, white residents embrace the smart city with the goal of turning the city’s problems into an economic opportunity, while already-vulnerable residents are reluctant to adopt what they perceive as surveillance technologies. This divide widens when data-driven practices of the smart city compel public officials and entrepreneurial residents to feign deliberate ignorance against longstanding issues and familiar solutions, or explore spurious connections between different datasets due to their assumptions about how creative breakthroughs surface in the smart city. These enthusiasts hope to discover connections they did not know existed, but their practices perpetuate existing stereotypes and miss underlying patterns in urban inequalities.
By teasing out the intertwined relationships among tech giants, federal/local governments, local entrepreneurial groups, civic tech organizations, and nonprofits, this research demonstrates how the interests and cultural techniques of the contemporary tech industry seep into age-old practices of classification, record keeping, and commensuration in governance. I find that while these new modes of knowledge production in local government restructure the ways public officials and various publics see the city, seeing like a city also shapes the possibilities and limits of governing by data.
|
139 |
Comparative Geospatial Analysis of Twitter Sentiment Data during the 2008 and 2012 U.S. Presidential ElectionsGordon, Josef 10 October 2013 (has links)
The goal of this thesis is to assess and characterize the representativeness of sampled data that is voluntarily submitted through social media. The case study vehicle used is Twitter data associated with the 2012 Presidential election, which were in turn compared to similarly collected 2008 Presidential election Twitter data in order to ascertain the representative statewide changes in the pro-Democrat bias of sentiment-derived Twitter data mentioning either of the Republican or Democrat Presidential candidates.
The results of the comparative analysis show that the MAE lessened by nearly half - from 13.1% in 2008 to 7.23% in 2012 - which would initially suggest a less biased sample. However, the increase in the strength of the positive correlation between tweets per county and population density actually suggests a much more geographically biased sample.
|
140 |
Is operational research in UK universities fit-for-purpose for the growing field of analytics?Mortenson, Michael J. January 2018 (has links)
Over the last decade considerable interest has been generated into the use of analytical methods in organisations. Along with this, many have reported a significant gap between organisational demand for analytical-trained staff, and the number of potential recruits qualified for such roles. This interest is of high relevance to the operational research discipline, both in terms of raising the profile of the field, as well as in the teaching and training of graduates to fill these roles. However, what is less clear, is the extent to which operational research teaching in universities, or indeed teaching on the various courses labelled as analytics , are offering a curriculum that can prepare graduates for these roles. It is within this space that this research is positioned, specifically seeking to analyse the suitability of current provisions, limited to master s education in UK universities, and to make recommendations on how curricula may be developed. To do so, a mixed methods research design, in the pragmatic tradition, is presented. This includes a variety of research instruments. Firstly, a computational literature review is presented on analytics, assessing (amongst other things) the amount of research into analytics from a range of disciplines. Secondly, a historical analysis is performed of the literature regarding elements that can be seen as the pre-cursor of analytics, such as management information systems, decision support systems and business intelligence. Thirdly, an analysis of job adverts is included, utilising an online topic model and correlations analyses. Fourthly, online materials from UK universities concerning relevant degrees are analysed using a bagged support vector classifier and a bespoke module analysis algorithm. Finally, interviews with both potential employers of graduates, and also academics involved in analytics courses, are presented. The results of these separate analyses are synthesised and contrasted. The outcome of this is an assessment of the current state of the market, some reflections on the role operational research make have, and a framework for the development of analytics curricula. The principal contribution of this work is practical; providing tangible recommendations on curricula design and development, as well as to the operational research community in general in respect to how it may react to the growth of analytics. Additional contributions are made in respect to methodology, with a novel, mixed-method approach employed, and to theory, with insights as to the nature of how trends develop in both the jobs market and in academia. It is hoped that the insights here, may be of value to course designers seeking to react to similar trends in a wide range of disciplines and fields.
|
Page generated in 0.0627 seconds