Global ETD Search

511	Large-Scale Matrix Completion Using Orthogonal Rank-One Matrix Pursuit, Divide-Factor-Combine, and Apache Spark January 2014 (has links) abstract: As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the previous generation. One important example of this is the improved performance in the newer tools with the large class of machine learning algorithms which are highly iterative in nature. In this thesis project, I set about to implement a low-rank matrix completion algorithm (as an example of a highly iterative algorithm) within a popular Big Data framework, and to evaluate its performance processing the Netflix Prize dataset. I begin by describing several approaches which I attempted, but which did not perform adequately. These include an implementation of the Singular Value Thresholding (SVT) algorithm within the Apache Mahout framework, which runs on top of the Apache Hadoop MapReduce engine. I then describe an approach which uses the Divide-Factor-Combine (DFC) algorithmic framework to parallelize the state-of-the-art low-rank completion algorithm Orthogoal Rank-One Matrix Pursuit (OR1MP) within the Apache Spark engine. I describe the results of a series of tests running this implementation with the Netflix dataset on clusters of various sizes, with various degrees of parallelism. For these experiments, I utilized the Amazon Elastic Compute Cloud (EC2) web service. In the final analysis, I conclude that the Spark DFC + OR1MP implementation does indeed produce competitive results, in both accuracy and performance. In particular, the Spark implementation performs nearly as well as the MATLAB implementation of OR1MP without any parallelism, and improves performance to a significant degree as the parallelism increases. In addition, the experience demonstrates how Spark's flexible programming model makes it straightforward to implement this parallel and iterative machine learning algorithm. / Dissertation/Thesis / M.S. Computer Science 2014 Computer science Artificial intelligence Big Data Hadoop Machine Learning Mahout Matrix Completion Spark
512	Framgångsfaktorer i en datadriven kultur / Success factors in a data driven culture Stein, Daniel January 2017 (has links) Datadriven kultur är inget nytt begrepp men har blivit allt mer vanligare när organisationer vill göra analyser och rapporter på egen data. Många organisationer har gjort stora investeringar i att blir mer datadrivna men inte lyckats ta till vara på den investeringen som har gjorts. Vilket har skapat problem för många organisationer som har svårt att identifiera var problemet ligger inom organisationen. Att vara datadriven handlar ofta inte om svårigheten i tekniken utan det är många andra faktorer som spelar in. Denna studie har som syfte att ta reda på hur organisationer jobbar för att etablera en datadriven kultur. Studien frågeställning som ska försöka besvaras är:” ”Hur arbetar organisationer med att etablera en datadriven kultur?”. För att besvara frågeställningen kommer två datainsamlingsmetoder att användas. Den primära insamlingen är en systematisk litteraturstudie och den sekundära insamlingen består av kvalitativa intervjuer. Den systematiska litteraturstudien syfte är att söka genom litteraturen för att besvara frågeställningen. De kvalitativa intervjuerna kommer utgå från resultatet av den systematiska litteraturstudie som görs. Resultatet visar att det finns flera olika sätt att etablera en datadriven kultur på. Det går inte att trycka på en sak utan det är flera faktorer som spelar in. Mycket av den litteratur som finns ger indikationer på hur organisationer bör jobba för att etablera en datadriven kultur men enbart ett fåtal beskriver hur organisationer jobbar med att etablera en datadriven kultur. Ett viktigt steg i att etablera en datadriven kultur är att ta reda på hur verksamheten jobbar idag och hur de ska gå vidare till att utnyttja sin data på ett bättre sätt. Det krävs ofta många förändringar i en organisation för att lyckats att etablera en datadriven kultur från organisationsledning ner till den enskild anställd. Datadriven kultur Big Data Business Intelligence Organisationskultur Information Systems
513	Condicionantes do uso efetivo de big data e business analytics em organizações privadas: atitudes, aptidão e resultados SANTOS, Ijon Augusto Borges dos 31 May 2016 (has links) Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2017-04-10T18:23:48Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação de Mestrado_PROPAD_UFPE_Ijon Santos.pdf: 3007544 bytes, checksum: c798b542d8e9f98334c33dbb694d633e (MD5) / Made available in DSpace on 2017-04-10T18:23:48Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação de Mestrado_PROPAD_UFPE_Ijon Santos.pdf: 3007544 bytes, checksum: c798b542d8e9f98334c33dbb694d633e (MD5) Previous issue date: 2016-05-31 / A presente dissertação busca explicar os fatores condicionantes para a adoção efetiva de Big Data e Business Analytics por parte das Organizações Privadas de Pernambuco em termos de atitudes, aptidão e resultados. Para esse fim, um apanhado teórico-conceitual é reunido sobre o avanço no tráfego de dados na era da Revolução Digital e a predisposição das organizações em se apropriar das tecnologias compatíveis de informação e comunicação que transformam o modus faciendi e o modus pensandi da sociedade. No corpus de pesquisa se destacam duas teorias fundamentadoras: A Teoria da Mediação Cognitiva e a Teoria da Estruturação (base do Modelo de Estruturação de Tecnologia). Ambas exploradas no cerne da questão da dualidade tecnologia-uso, em que o convívio com artefatos tecnológicos em interação com as ações humanas inicia um processo mútuo de influência entre esses elementos, constituindo uma nova modalidade de mediação denominada Hipercultura. Em um método quantitativo de pesquisa, tais construtos serão relacionados entre si e investigados em 183 líderes estratégicos pernambucanos, além de comparados com indivíduos equivalentes de outras naturalidades e nacionalidades por meio de um formulário especialmente preparado. Os resultados obtidos indicam o nível de prontidão das empresas sobre este tema e a relação com o sucesso ou fracasso, quando considerados os níveis de hipercultura, de capacidade analítica e das condições de Tecnologias de Informação e Comunicação existentes nas empresas. Ao final do estudo, são levantados possíveis desdobramentos para os conceitos introduzidos. / The present dissertation seeks to explain the determining factors for the effective adoption of Big Data and Business Analytics on Pernambuco’s Private Organization in terms of attitudes, skills and results. For this purpose, a theoretical-conceptual caught is gathered about the progress in data traffic in the Digital Revolution age and the willingness of organizations to take ownership of supported technologies of information and communication that transform the modus faciendi and the modus pensandi of the society. In the research corpus stand two essential theories: The Cognitive Mediation Networks Theory and the Structuration Theory (base Structurational Model of Technology). Both explored the matter of duality-use technology, in which the interaction with technological artifacts interacting with human actions starts a process of mutual influence between these elements, constituting a new form of mediation called Hyperculture. In a quantitative search method, such constructs will be related to each other and investigated 183 strategic leaders from Pernambuco, and equivalents compared to individuals with other places of birth and nationality using a specially prepared form. The results may indicate the level of readiness of the companies on this issue and if there is, or not, relation with success or failure, when considering the hyperculture levels, analytical capacity and conditions of information and communication technologies in the existing companies. At the end of the study a several possible developments, implications, and applications for the concepts introduced are presented.
514	Inteligência competitiva e modelos de séries temporais para previsão de consumo : o estudo de uma empresa do setor metalúrgico Espíndola, André Mauro Santos de 30 August 2013 (has links) O mundo vive um contínuo e acelerado processo de transformação que envolve todas as áreas do conhecimento. É possível afirmar que a velocidade desse processo tem uma relação direta com a rapidez em que ocorrem as mudanças na área tecnológica. Estas mudanças têm tornado cada vez mais as relações globalizadas, modificado as transações comercias e fazendo com que as empresas repensem as formas de competir. Nesse contexto, o conhecimento assume, a partir do volume de dados e informações, um papel de novo insumo, muitas vezes com maior importância que o trabalho, capital e a terra. Essas mudanças e a importância da informação fazem com que as empresas busquem um novo posicionamento, procurando identificar no ambiente externo sinais que possam indicar eventos futuros. O grande desafio das empresas passa pela obtenção de dados, extração da informação e transformação dessa em conhecimento útil para a tomada de decisão. Nessa conjuntura este estudo teve como objetivo identificar qual o modelo de previsão de consumo para análise das informações no processo de Inteligência Competitiva em uma empresa do setor metalúrgico localizada no estado do Rio Grande do Sul. No desenvolvimento do estudo foram utilizados os temas Big Data, Data Mining, Previsão de Demanda e Inteligência Competitiva com a finalidade de responder à seguinte questão: Qual o modelo de previsão de consumo de aço que pode ser usado para análise das informações no processo de Inteligência Competitiva? Na realização do estudo foram analisados dados internos e externos a empresa na busca pela identificação de correlação entre o consumo de aço da empresa e variáveis econômicas que posteriormente foram utilizadas na identificação do modelo de previsão de consumo. Foram identificados dois modelos, um univariado sem intervenção através da metodologia de Box e Jenkins, o segundo modelo foi um modelo de previsão com Função de Transferência. Os dois modelos apresentaram uma boa capacidade de descrever a série histórica do consumo de aço, mas o modelo univariado apresentou melhores resultados na capacidade de previsão. / The world has been in a continuous and rapid process of transformation which involves all the areas of knowledge. It is possible to assert that the speed of this process has a direct relationship with the fast changes in the technological area. These changes have influenced the global relationships even more; modifying the commercial trades and making companies rethink their competitive actions. In this field, knowledge takes on a new role giving more importance to the amount of data and information to the detriment of land, labor and capital. These changes and the importance given to information make companies establish new positions in order to identify signs that anticipate events. Obtaining, extracting and transforming information into useful knowledge to help in the final decision is a challenge. Thus the purpose of this study is determine a model of consumption anticipation to analyze the process of competitive intelligence in a Metallurgy Company located in the state of Rio Grande do Sul. To develop the study the themes Big Data, Data Mining, Demand Prediction and Competitive Intelligence were used aiming to answer the question: Which model to anticipate consumption for iron can be used to analyze information in the process of competitive intelligence? For the study, internal and external data were analyzed to identify the relation between the company iron consumption and the economic variables, which were used in the demand anticipation afterwards. Two models were identified, beeing one of them univariate and having no intervention through Box and Jenkins methodology. The second model had a transfer function. Both of them demonstrated good capability in describing historical series of iron consumption, however the univariate model has demonstrated better results in the capability of anticipation. Big data Tecnologia da informação Steel Competitive intelligence
515	Estudo comparativo entre algoritmos de árvores de decisão baseados em ensembles de classificadores aplicados a Big Data Alves, Melina Brilhadori January 2017 (has links) Orientadora: Profa. Dra. Patrícia Belfiore Fávero / Coorientador: Prof. Dr. Marcelo de Souza Lauretto / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia da Informação, Santo André, 2017. / Big data trouxe vários desafios para os conceitos dos algoritmos de mineração de dados, a iniciar pelas limitações de memória e tempo, bem como dados de natureza e distribuição com variação constante. Essa massa de dados interessa públicos diversos pelas informações intrínsecas em seu interior e a análise de dados é uma importante fonte estratégica aplicada com objetivos de conhecimento, desenvolvimento e planejamento. Nos últimos anos, diversos métodos baseados em ensembles de classificadores têm sido propostos. Nesses métodos, a idéia central é construir vários classificadores "fracos" para formar um classificador "robusto", que utiliza como convergência a soma (ponderada) dos votos dos subclassificadores nas possíveis classes. Os objetivos deste trabalho foram realizar análises comparativas de desempenho de classificadores de Big Data das famílias de árvores de classificação quando combinados na forma de ensembles (ou metaclassificadores) bagging e boosting. Foi implementado um ambiente de testes, utilizando algoritmos de árvores de classificação sobre datasets públicos a fim de verificar três itens fundamentais: a . Para um certo algoritmo de classificação, a configuração de ensemble (entre Bagging e Boosting) que resulta em maior acurácia. b . Para um certo tipo de ensemble, o melhor algoritmo de classificação. c . A possibilidade de identificar as famílias de Big Data (agrupado segundo um conjunto de características) em que cada tipo de classificador possui melhor desempenho. Os resultados indicaram que o ensemble Boosting apresenta acurácia superior para um número maior das amostras testadas em comparação com os demais algoritmos abordados. Entre os classificadores, sugere-se que representantes de árvores de decisão são suscetíveis à escolha do método de ensemble e, principalmente, da amostra. A análise da aplicação dos ensembles sobre as amostras e as características dos conjuntos exibiu resultados muito variáveis, entretanto notou-se uma melhoria de desempenho quando a classificação é binária. / Big data has brought several challenges to the concepts of data mining algorithms, starting with the limitations of memory and time, as well as data of nature and distribution with constant variation. This mass of data interests diverse publics by the intrinsic information inside and the data analysis is an important strategic source applied with objectives of knowledge, development and planning. In recent years, several methods based on ensembles of classifiers have been proposed. In those methods, the central idea is to construct several "weak" classifiers to form a "robust" classifier, which uses as a convergence the (weighted) sum of the subclassifier¿s votes in the possible classes. The objectives of this work were to perform comparative performance analysis of Big Data classifiers of the classification tree families when combined in the form of bagging and boosting ensembles (or metaclassifiers). A test environment was implemented using classification tree algorithms on public datasets in order to verify three fundamental items: a . For a certain classification algorithm, the ensemble configuration (between Bagging and Boosting) results in greater accuracy. b . For a certain type of ensemble, the best classification algorithm. c . The possibility of identifying Big Data families (grouped according to a set of characteristics) in which each type of classifier performs better. The results indicated that the Boosting ensemble presents superior accuracy for a larger number of samples tested in comparison to the other algorithms. Among the classifiers, it is suggested that representatives of decisions trees are susceptible to the choice of the ensemble method and, mainly, of the sample. The ensembles application analysis on the samples and the characteristics of the sets showed very variable results, however a performance improvement was noticed when the classification was binary. BIG DATA ENSEMBLES ALGORITMOS DE CLASSIFICAÇÃO APRENDIZADO SUPERVISIONADO ÁRVORES DE DECISÃO
516	O uso das informações de big data na gestão de crise de marca / The use of big data information in the brand crisis managemant Alexandre Borba Salvador 06 August 2015 (has links) As crises de marca não só experimentam um crescimento em quantidade como também passam a ter sua visibilidade aumentada pelas redes sociais. A repercussão de uma crise de imagem de marca afeta negativamente tanto o brand equity como as vendas no curto prazo. Mais do que isso, gera custosas campanhas para minimização dos efeitos negativos. Se por um lado o avanço tecnológico aumenta a visibilidade da crise, por outro, possibilita acesso a uma série de informações, internas e externas, que podem ajudar na definição de um plano de ação. Big Data é um termo recentemente criado para designar o crescimento das informações, grandes em volume, diversificadas em formato e recebidas em alta velocidade. No ambiente de marketing, o sistema de informação de marketing (SIM) tem por objetivo fornecer as informações ao tomador de decisão de marketing. Informação relevante, confiável e disponibilizada em um curto espaço de tempo é fundamental para que as decisões sejam tomadas rapidamente, garantindo a liderança do processo de gestão de crise. A partir da pergunta \"qual o uso das informações provenientes do big data na gestão de crise de marca?\" e com o objetivo de \"verificar como gestores fazem uso das informações provenientes de big data na gestão de crise\", elaborou-se este estudo exploratório, empírico, qualitativo e com uso de entrevistas em profundidade com executivos de marketing com experiência em gestão de crise de marca. As entrevistas com seis gestores com experiência em crise e dois especialistas possibilitaram verificar uma grande diferença no uso das informações de big data na gestão de crises de marca, nas diferentes etapas da crise identificadas no referencial teórico: identificação e prevenção, gestão da crise, recuperação e melhorias e aprendizados. / The brand crises not only experience a growth in quantity but also now have increased their visibility through social networks. The impact of a brand crisis negatively affects the brand equity and short-term sales. It also generates costly campaigns to minimize its negative effect. If technological advancement increases the visibility of the crisis, it also provides access to a wealth of internal and external information that can help define an action plan. The term Big Data refers to the growth of information volume, diversification of formats and production and reception in real time, so that a traditional processing system could not store and analyze them. In the marketing environment, marketing information system (MIS) aims to provide information to the marketing decision maker. Relevance, reliability and availability of information is critical for the decision process. It also could ensures the leadership of the crisis management process. From the question \"what is the use of information from big data in brand crisis management?\" and in order to \"verify how managers make use of information from big data in crisis management\", this study exploratory empirical, qualitative, using interviews with marketing executives with experience in brand crisis management was elaborated. Interviews with six managers with experience in crisis and two experts made it possible to verify a big difference in the use of big data information on brand management of crises in the different stages of the crisis identified in the theoretical framework: identification and prevention, crisis management, recovery and improvement and learning. Big data Informação Marcas Opinião pública Sistema de informação em marketing Brand Information Information system Public opinion
517	Big data, meio e linguagem novas tecnologias e práticas linguísticas / Big data , medium and language new technologies and linguistic practices Santos, Vinícius Vargas Vieira dos 29 April 2016 (has links) Submitted by Marlene Santos (marlene.bc.ufg@gmail.com) on 2016-08-31T20:54:09Z No. of bitstreams: 2 Dissertação - Vinícius Vargas Vieira dos Santos - 2016.pdf: 1721860 bytes, checksum: d9133355c0bd533b0b6fee0bbeb0f5ad (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-09-01T12:13:50Z (GMT) No. of bitstreams: 2 Dissertação - Vinícius Vargas Vieira dos Santos - 2016.pdf: 1721860 bytes, checksum: d9133355c0bd533b0b6fee0bbeb0f5ad (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2016-09-01T12:13:50Z (GMT). No. of bitstreams: 2 Dissertação - Vinícius Vargas Vieira dos Santos - 2016.pdf: 1721860 bytes, checksum: d9133355c0bd533b0b6fee0bbeb0f5ad (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-04-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Big data, Meio e Linguagem: Novas tecnologias e práticas linguísticas aims to assimilate possible relationships between new digital media and certain conceptual aspects of language, such as meaning and performativity. Big data is a term that refers to digital data gathering, which characterized mass communication media in the past two decades, and it is directly related to the current configuration of Web 2.0 technology services platform. Complex contemporary objects such as big data call for methodological development to meet their super diverse natures. Therefore, it was the goal of this research to expand disciplinary boundaries, searching for theoretical basis in technology studies in the purpose to understand the nature of new media supports. Devices such as computers and mobile phones, with access to the World Wide Web, are increasingly transforming the landscape of linguistic exchanges, enabling communication practices to take place through them. This is why we understand the very computational structure as the medium (media) through which language happens, from then on this structure’s design features (affordances) are stimulating semantic anchoring and linguistic performativity. The scales of excessive volume and variety of digital data and its high speed, which characterize the big data, change the social context settings, and thus causing updates on language. After all, contexts in virtual environments collapse because in assuming the characteristics of the medium they reveal themselves as super diverse, simultaneous, fragmented, unstructured, missing family markers, exceeding traditional scales of time, space and social reach. / Big data, Meio e Linguagem: Novas tecnologias e práticas linguísticas objetiva assimilar possíveis relações entre novas mídias digitais e certos aspectos conceituais da linguagem, como significado e performatividade. Big data é o termo que se refere ao acúmulo de dados digitais que caracterizou as mídias de comunicação em massa nas duas últimas décadas e está diretamente relacionado à atual configuração da plataforma de serviços de tecnologia Web 2.0. Objetos contemporâneos complexos, como big data, nos remetem à consequente necessidade de conceber metodologias que correspondam a suas naturezas superdiversas. Por conseguinte, teve-se em vista, na presente pesquisa, a necessidade de se expandir fronteiras disciplinares, buscando em estudiosos das tecnologias subsídios teóricos para compreensão da natureza dos novos suportes midiáticos. Aparelhos como computadores e celulares com acesso à World Wide Web estão aceleradamente transformando o panorama das trocas linguísticas, possibilitando que práticas comunicacionais, a cada dia mais, realizem-se através dos mesmos. É neste ponto que se compreende a própria estrutura computacional como o meio (mídia) através do qual se efetiva a linguagem, a partir de então suas características próprias de design (affordances) passam a estimular a ancoragem semântica e a performatividade linguística. As escalas de desmedido volume e variedade de dados digitais e altos índices de velocidade que caracterizam o big data modificam as paisagens de contexto social, provocando, consequentemente, atualizações nas escalas da linguagem. Afinal, contextos em ambientes virtuais entram em colapso, pois ao assumir as próprias características do meio, revelam-se superdiversos, simultâneos, fragmentados, não estruturados, ausentes de marcadores familiares, excedendo escalas tradicionais de tempo, espaço e alcance social. Big data Meio Contexto Affordance Performatividade Medium Context Affordance Performativity LINGUISTICA, LETRAS E ARTES::LETRAS
518	A literature study of bottlenecks in 2D and 3D Big Data visualization Hassan, Mohamed January 2017 (has links) Context. Big data visualization is a vital part of today's technological advancement. It is about visualizing different variables on a graph, map, or other means often in real-time. Objectives. This study aims to determine what challenges there are for big data visualization, whether significant amounts of data impact the visualization, and finding existing solutions for the problems. Methods. Databases used in this systematic literature review include Inspec, IEEE Xplore, and BTH Summon. Papers are included in the review if certain criteria are upheld. Results. 6 solutions are found to reduce large data sets and reduce latency when viewing 2D and 3D graphs. Conclusions. In conclusion, many solutions exist in various forms to improve visualizing graphs of different dimensions. Future grows of data might change this though and might require new solutions of the growing data. Human perception Big data N-dimensional Data Visualization. Computer Sciences Datavetenskap (datalogi)
519	Big Data analytics for the forest industry : A proof-of-conceptbuilt on cloud technologies Sellén, David January 2016 (has links) Large amounts of data in various forms are generated at a fast pace in today´s society. This is commonly referred to as “Big Data”. Making use of Big Data has been increasingly important for both business and in research. The forest industry is generating big amounts of data during the different processes of forest harvesting. In Sweden, forest infor-mation is sent to SDC, the information hub for the Swedish forest industry. In 2014, SDC received reports on 75.5 million m3fub from harvester and forwarder machines. These machines use a global stand-ard called StanForD 2010 for communication and to create reports about harvested stems. The arrival of scalable cloud technologies that com-bines Big Data with machine learning makes it interesting to develop an application to analyze the large amounts of data produced by the forest industry. In this study, a proof-of-concept has been implemented to be able to analyze harvest production reports from the StanForD 2010 standard. The system consist of a back-end and front-end application and is built using cloud technologies such as Apache Spark and Ha-doop. System tests have proven that the concept is able to successfully handle storage, processing and machine learning on gigabytes of HPR files. It is capable of extracting information from raw HPR data into datasets and support a machine learning pipeline with pre-processing and K-Means clustering. The proof-of-concept has provided a code base for further development of a system that could be used to find valuable knowledge for the forest industry. Big Data analytics Apache Spark StanForD 2010 forest industry harvest production report Computer Engineering Datorteknik
520	Real-Time Magnetohydrodynamic Space Weather Visualization Carlbaum, Oskar, Novén, Michael January 2017 (has links) This work describes the design and implementation of space weather related phenomena within the interactive astro-visualization software OpenSpace. Data sets from the Community Coordinated Modelling Center (CCMC) at the National Aeronautics and Space Administration (NASA) were used to implement time-varying high-resolution solar imagery from space observatory spacecraft and time-varying field lines from the different models produced at the CCMC. The obtained results were used to take an audience on an interactive journey through the solar system, at the worlds first ever live planetarium show about space weather. Computer Graphics Data Visualization Space Weather Big Data Datateknik Datateknik Media and Communication Technology Medieteknik

Search results