41 |
Crunch the market : a Big Data approach to trading system optimizationMauldin, Timothy Allan 23 April 2014 (has links)
Due to the size of data needed, running software to analyze and tuning intraday trading strategies can take large amounts of time away from analysts, who would like to be able to evaluate strategies and optimize strategy parameters very quickly, ideally in the blink of an eye. Fortunately, Big Data technologies are evolving rapidly and can be leveraged for these purposes. These technologies include software systems for distributed computing, parallel hardware, and on demand computing resources in the cloud. This report presents a distributed software system for trading strategy analysis. It also demonstrates the effectiveness of Machine Learning techniques in decreasing parameter optimization workload. The results from tests run on two different commercial cloud service providers show linear scalability when analyzing intraday trading strategies. / text
|
42 |
Storing and structuring big data with businessintelligence in mindAndersson, Fredrik January 2015 (has links)
Sectra has a customer database with approximately 1600 customers across the world. In this system there exists not only medical information but alsoinformation about the environment which the system runs in, usage pattern and much more. This report is about storing data received from log les into a suitable database. Sectra wants to be able to analyze this information so that they can make strategic decisions and get a better understanding of their customers' needs. The tested databases are MongoDB, Cassandra, and MySQL. The results shows that MySQL is not suitable for storing large amount of data with the current conguration. On the other hand, both MongoDB and Cassandra performed well with the growing amount of data.
|
43 |
Företags hantering av konsumentdata : Speglar användares elektroniska fotspår förhållningssättet till integritet?Spinelli Scala, Robin January 2014 (has links)
The essay concerns corporation’s usage and consumer’s attitude towards the phenomenon Big Data. The aim is to analyse whether corporation-management and future ambitions concerning consumer data coincide with attitudes and opinions of today's consumer, and how these opinions reflect consumer behaviour online. The essay is limited to focus on how companies use consumer data in order to better understand and improve the efficiency of their marketing. A self-composed analyse-model was created using previous studies concerning Internet-usage in Sweden, previous management analysis of corporations use of Big Data combined with theory of Cognitive Dissonance. A questionnaire was then created with the analyse model as foundation. The results show that the respondents were not fully aware of their electronic footprints. An increased awareness leads to the respondents not fully trusting the Internet services they use. In summary, when the community build its functions around Internet or when companies create services that leads to a feeling of dependence, consumers tend to neglect their opinions in favour for their behaviour. Concern over what happens to their consumer data is affected only when a future scenario is presented which the respondents not yet are dependent of.
|
44 |
Data - en ny råvara : en kvalitativ studie om internetanvändare medvetenhet kring cookiesHedberg, Samuel, Åberg, Natalie January 2015 (has links)
Syfte och frågeställningar: Studiens syfte är att undersöka hur internetanvändares medvetenhet, uppfattningar och attityder kring övervakning på internet ser ut. Övervakning i form av insamling och kartläggning av data, genom cookies. Metod och material: En kvantitativ enkätundersökning och kvalitativa djupintervjuer har genomförts. Huvudresultat: Internetanvändare är medvetna om insamlingen, men inte om kartläggningen, av data på internet. Respondenterna och intervjupersonerna uppfattade och associerade cookies med personifierad annonsering, men reflekterade sällan över det. Deltagarna ställde sig övervägande positiva till insamlingen och kartläggnigen, samtidigt som en viss oro fanns kring det.
|
45 |
O papel da curadoria na promoção do fluxo de notícias em espaços informativos voltados para a produção de conhecimentoCastilho, Carlos Albano Volkmer de January 2015 (has links)
Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-graduação em Engenharia e Gestão do Conhecimento, Florianópolis, 2015 / Made available in DSpace on 2016-02-09T03:05:12Z (GMT). No. of bitstreams: 1
337380.pdf: 2007643 bytes, checksum: e50278c850a3956933c9f017a119ccb2 (MD5)
Previous issue date: 2015 / A curadoria de informações é uma nova área de estudo cuja importância é cada vez maior devido às consequências da combinação de dois fenômenos recentes na internet: a Avalancha Informativa e os Grandes Dados (Big Data). Ao filtrar, selecionar, agregar valor e disseminar recomendações, a curadoria contribui para aumentar a estruturação e difusão de informações e intensificar o fluxo de conteúdos necessários à produção de conhecimento. Por se tratar de uma área ainda pouco estudada foi necessário definir termos como dado, notícia, informação, conhecimento, bem como processos de curadoria de informações, fluxo de informações e espaços informativos. A curadoria de informações foi tomada como tema central do trabalho por sua função na disseminação de informações voltadas para a produção de conhecimento em comunidades de usuários. Os procedimentos metodológicos adotados no desenvolvimento do trabalho foram orientados pela Teoria Fundamentada em Dados (Grounded Theory) e pela obra de John Creswell sobre organização de pesquisas acadêmicas. A análise do processo de curadoria de informações partiu da coleta de dados obtidos durante seis meses de monitoramento do blog Content Curation World. Estes dados foram posteriormente interpretados com base nas teorias de Max Boisot, sobre fluxos informativos. A análise de 17.907 reações dos receptores às 167 postagens com recomendações do curador mostrou a relevância do papel do fluxo de dados na alimentação da produção de conhecimento numa rede social frequentada por 4.120 usuários aglutinados em torno da preocupação com a curadoria de informações.<br> / Abstract : Content curation is a brand new area of academic research whose growing relevance is the result of the combination of two interconnected processes: the information overload and the Big Data. By filtering, selecting, aggregating value and disseminating recommendations, the content curation process increases the structuring and diffusion of information and strengthens the flow of contents aiming the production of social relevant knowledge. The English academic literature uses the expression content curation while in Portuguese the equivalent is information curation. Thus we use the expression content curation in this abstract. Due to the fact that this is an area still scarcely researched, especially in Brazil, it?s necessary to set up specific definitions for terms such as data, news, information, knowledge and processes like curation, information flow, news flow and information spaces. Information curation is taken as a key reference due to it?s importance in the diffusion of information (recommendations) related to the knowledge production process in communities of users. The methodological procedures used in this thesis are based on the Grounded Theory and on the works of John Creswell about academic researches. The study of content curation process was based on a six month monitoring of the blog Content Curation World. The data collected was analyzed using the Max Boisot theories of information flows as a framework. The interpretation of 17.907 reactions from the receivers of the 167 recommendations posted by the curator revealed the importance of the role of data flux in the feeding of the knowledge production process inside a social network or information space defined by the blog?s 4.120 users whose common interest is the content curation activity.
|
46 |
Design and Implementation of a MongoDB solution on a Software As a Service PlatformFrenoy, Remy January 2013 (has links)
"NoSQL solution" is today a term that represents a wide spectrum of ways of storing and requesting data. From graph-oriented databases to key-value databases, each solution has been developed to be the best choice in a specific case and for given parameters. As NoSQL solutions are new, there is no guide explaining which solution is best depending on someone's use case. In the first part of this document, we give an overview of each type of solution, explaining when and why a certain type of solution would be a good or a poor choice for a given use case. Once a company has chosen the technology that seems to fit well its need, it faces another problem : how to deploy this new type of data store. Directly deploying a production store would certainly result in poor performances, because some pieces of knowledge are absolutely necessary to implement an efficient NoSQL store. However, there is no "best practices" guide to get this knowledge. This is the reason why building a prototype, with fewer resources, can improve everyone's knowledge. Then, with the experience retrieved from this first experience, engineers and technicians can deploy a store that will be much more efficient than the store they would have deployed without the prototype experience. For this reason, we decided to implement a MongoDB prototype store. Building this prototype, we have tested several configurations that resulted in different levels of performance. In the second part of this document, we explain the main result we got from our experiments. These results could be useful for other companies that are willing to use MongoDB, but they mostly show why a specific knowledge is essential to deploy a good NoSQL store.
|
47 |
Diseño e implementación piloto de people analytics en la Empresa Antofagasta Minerals S.A.Arteaga Páez, Jocelyn Scarlett January 2016 (has links)
Magíster en Gestión para la Globalización / El objetivo de este trabajo es la implementación de la metodología People Analytics en el mercado chileno, a través de una consultoría realizada en la empresa Antofagasta Minerals S.A. (en adelante AMSA). Este estudio le permite a AMSA obtener información cuantitativa sobre un proceso específico de su gestión de personas.
La metodología bajo la cual se realizó este trabajo fue, en primera instancia, recopilación de información teórica y obtención de conocimiento sobre los procesos de la Empresa. Una vez definido que el foco del estudio sería la relación existente entre los resultados de la evaluación de desempeño y los resultados de la encuesta de compromiso aplicada en el año 2014 en AMSA, se realizaron varias reuniones para comprender los procesos de gestión de desempeño. Adicionalmente, se extrajo información del Sistema de Recursos Humanos que tiene implementado AMSA y se realizaron todos los análisis necesarios para la obtención de los resultados. Todo esto con el objeto de dar solución y recomendaciones al problema actual de la Compañía, que es cómo mantener los altos niveles de compromiso de los empleados, luego de haber disminuido los recursos tanto de personal como económicos, dada la difícil situación de la minería en Chile.
Seguidamente de desarrollar todos los análisis, se obtuvo como resultado que no existía una correlación clara y objetiva entre los objetos de estudio, que fueron los resultados de compromiso versus los resultados de la evaluación de desempeño. Lo cual se puede deber a varias razones que se dan a conocer en el detalle del trabajo. De igual forma se utilizó la información ya analizada para buscar las mejores medidas a implementar para dar solución al problema.
Las principales conclusiones de este trabajo de consultoría una vez realizados los análisis, es que se cumplieron los objeticos planteados al inicio del trabajo, obteniendo recomendaciones que permitirán la obtención de beneficios, tales como aumento de productividad y disminución de costos relacionados con el proceso de gestión de personas. Estas recomendaciones permitirán replicar el estudio nuevamente y acercarse al modelo de empresa de alto desempeño que busca AMSA.
El paso a seguir es, una vez implementadas las recomendaciones, es internalizar el concepto de People Analytics sobre manejo de la información al interior de la Organización y tomar un nuevo foco para aplicar este método. Así también sería de gran aporte que AMSA cuente con un colaborador en el Área de Recursos Humanos que esté a cargo del manejo y análisis de la información.
|
48 |
La visualisation d’information à l’ère du Big Data : résoudre les problèmes de scalabilité par l’abstraction multi-échelle / Information Visualization in the Big Data era : tackling scalability issues using multiscale abstractionsPerrot, Alexandre 27 November 2017 (has links)
L’augmentation de la quantité de données à visualiser due au phénomène du Big Data entraîne de nouveaux défis pour le domaine de la visualisation d’information. D’une part, la quantité d’information à représenter dépasse l’espace disponible à l’écran, entraînant de l’occlusion. D’autre part, ces données ne peuvent pas être stockées et traitées sur une machine conventionnelle. Un système de visualisation de données massives doit permettre la scalabilité de perception et de performances. Dans cette thèse, nous proposons une solution à ces deux problèmes au travers de l’abstraction multi-échelle des données. Plusieurs niveaux de détail sont précalculés sur une infrastructure Big Data pour permettre de visualiser de grands jeux de données jusqu’à plusieurs milliards de points. Pour cela, nous proposons deux approches pour implémenter l’algorithme de canopy clustering sur une plateforme de calcul distribué. Nous présentons une application de notre méthode à des données géolocalisées représentées sous forme de carte de chaleur, ainsi qu’à des grands graphes. Ces deux applications sont réalisées à l’aide de la bibliothèque de visualisation dynamique Fatum, également présentée dans cette thèse. / With the advent of the Big Data era come new challenges for Information Visualization. First, the amount of data to be visualized exceeds the available screen space. Second, the data cannot be stored and processed on a conventional computer. To alleviate both of these problems, a Big Data visualization system must provide perceptual and performance scalability. In this thesis, we propose to use multi-scale abstractions as a solution to both of these issues. Several levels of detail can be precomputed using a Big Data Infrastructure in order to visualize big datasets up to several billion points. For that, we propose two approaches to implementing the canopy clustering algorithm for a distributed computation cluster. We present applications of our method to geolocalized data visualized through a heatmap, and big graphs. Both of these applications use the dynamic visualization library, which is also presented in this thesis
|
49 |
"I apologise for my poor blogging": Searching for Apologies in the Birmingham Blog CorpusLutzky, Ursula, Kehoe, Andrew 15 February 2017 (has links) (PDF)
This study addresses a familiar challenge in corpus pragmatic research: the search for functional phenomena in large electronic corpora. Speech acts are one
area of research that falls into this functional domain and the question of how to identify them in corpora has occupied researchers over the past 20 years. This study
focuses on apologies as a speech act that is characterised by a standard set of routine expressions, making it easier to search for with corpus linguistic tools. Nevertheless,
even for a comparatively formulaic speech act, such as apologies, the polysemous nature of forms (cf. e.g. I am sorry vs. a sorry state) impacts the precision of the
search output so that previous studies of smaller data samples had to resort to manual microanalysis. In this study, we introduce an innovative methodological
approach that demonstrates how the combination of different types of collocational analysis can facilitate the study of speech acts in larger corpora. By first establishing
a collocational profile for each of the Illocutionary Force Indicating Devices associated with apologies and then scrutinising their shared and unique collocates,
unwanted hits can be discarded and the amount of manual intervention reduced.
Thus, this article introduces new possibilities in the field of corpus-based speech act analysis and encourages the study of pragmatic phenomena in large corpora.
|
50 |
Leveraging big data resources and data integration in biology: applying computational systems analyses and machine learning to gain insights into the biology of cancersSinkala, Musalula 24 February 2021 (has links)
Recently, many "molecular profiling" projects have yielded vast amounts of genetic, epigenetic, transcription, protein expression, metabolic and drug response data for cancerous tumours, healthy tissues, and cell lines. We aim to facilitate a multi-scale understanding of these high-dimensional biological data and the complexity of the relationships between the different data types taken from human tumours. Further, we intend to identify molecular disease subtypes of various cancers, uncover the subtype-specific drug targets and identify sets of therapeutic molecules that could potentially be used to inhibit these targets. We collected data from over 20 publicly available resources. We then leverage integrative computational systems analyses, network analyses and machine learning, to gain insights into the pathophysiology of pancreatic cancer and 32 other human cancer types. Here, we uncover aberrations in multiple cell signalling and metabolic pathways that implicate regulatory kinases and the Warburg effect as the likely drivers of the distinct molecular signatures of three established pancreatic cancer subtypes. Then, we apply an integrative clustering method to four different types of molecular data to reveal that pancreatic tumours can be segregated into two distinct subtypes. We define sets of proteins, mRNAs, miRNAs and DNA methylation patterns that could serve as biomarkers to accurately differentiate between the two pancreatic cancer subtypes. Then we confirm the biological relevance of the identified biomarkers by showing that these can be used together with pattern-recognition algorithms to infer the drug sensitivity of pancreatic cancer cell lines accurately. Further, we evaluate the alterations of metabolic pathway genes across 32 human cancers. We find that while alterations of metabolic genes are pervasive across all human cancers, the extent of these gene alterations varies between them. Based on these gene alterations, we define two distinct cancer supertypes that tend to be associated with different clinical outcomes and show that these supertypes are likely to respond differently to anticancer drugs. Overall, we show that the time has already arrived where we can leverage available data resources to potentially elicit more precise and personalised cancer therapies that would yield better clinical outcomes at a much lower cost than is currently being achieved.
|
Page generated in 0.0396 seconds