21 |
Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital systemChrimes, Dillon 28 November 2016 (has links)
Background: Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges. The study objective was high performance establishment of interactive BDA platform of hospital system.
Methods: A Hadoop/MapReduce framework formed the BDA platform with HBase (NoSQL database) using hospital-specific metadata and file ingestion. Query performance tested with Apache tools in Hadoop’s ecosystem.
Results: At optimized iteration, Hadoop distributed file system (HDFS) ingestion required three seconds but HBase required four to twelve hours to complete the Reducer of MapReduce. HBase bulkloads took a week for one billion (10TB) and over two months for three billion (30TB). Simple and complex query results showed about two seconds for one and three billion, respectively.
Interpretations: BDA platform of HBase distributed by Hadoop successfully under high performance at large volumes representing the Province’s entire data. Inconsistencies of MapReduce limited operational efficiencies. Importance of the Hadoop/MapReduce on representation of health informatics is further discussed. / Graduate / 0566 / 0769 / 0984 / dillon.chrimes@viha.ca
|
22 |
Transformer les big social data en prévisions - méthodes et technologies : Application à l'analyse de sentiments / Transforming big social data into forecasts - methods and technologiesEl alaoui, Imane 04 July 2018 (has links)
Extraire l'opinion publique en analysant les Big Social data a connu un essor considérable en raison de leur nature interactive, en temps réel. En effet, les données issues des réseaux sociaux sont étroitement liées à la vie personnelle que l’on peut utiliser pour accompagner les grands événements en suivant le comportement des personnes. C’est donc dans ce contexte que nous nous intéressons particulièrement aux méthodes d’analyse du Big data. La problématique qui se pose est que ces données sont tellement volumineuses et hétérogènes qu’elles en deviennent difficiles à gérer avec les outils classiques. Pour faire face aux défis du Big data, de nouveaux outils ont émergés. Cependant, il est souvent difficile de choisir la solution adéquate, car la vaste liste des outils disponibles change continuellement. Pour cela, nous avons fourni une étude comparative actualisée des différents outils utilisés pour extraire l'information stratégique du Big Data et les mapper aux différents besoins de traitement.La contribution principale de la thèse de doctorat est de proposer une approche d’analyse générique pour détecter de façon automatique des tendances d’opinion sur des sujets donnés à partir des réseaux sociaux. En effet, étant donné un très petit ensemble de hashtags annotés manuellement, l’approche proposée transfère l'information du sentiment connue des hashtags à des mots individuels. La ressource lexicale qui en résulte est un lexique de polarité à grande échelle dont l'efficacité est mesurée par rapport à différentes tâches de l’analyse de sentiment. La comparaison de notre méthode avec différents paradigmes dans la littérature confirme l'impact bénéfique de notre méthode dans la conception des systèmes d’analyse de sentiments très précis. En effet, notre modèle est capable d'atteindre une précision globale de 90,21%, dépassant largement les modèles de référence actuels sur l'analyse du sentiment des réseaux sociaux. / Extracting public opinion by analyzing Big Social data has grown substantially due to its interactive nature, in real time. In fact, our actions on social media generate digital traces that are closely related to our personal lives and can be used to accompany major events by analysing peoples' behavior. It is in this context that we are particularly interested in Big Data analysis methods. The volume of these daily-generated traces increases exponentially creating massive loads of information, known as big data. Such important volume of information cannot be stored nor dealt with using the conventional tools, and so new tools have emerged to help us cope with the big data challenges. For this, the aim of the first part of this manuscript is to go through the pros and cons of these tools, compare their respective performances and highlight some of its interrelated applications such as health, marketing and politics. Also, we introduce the general context of big data, Hadoop and its different distributions. We provide a comprehensive overview of big data tools and their related applications.The main contribution of this PHD thesis is to propose a generic analysis approach to automatically detect trends on given topics from big social data. Indeed, given a very small set of manually annotated hashtags, the proposed approach transfers information from hashtags known sentiments (positive or negative) to individual words. The resulting lexical resource is a large-scale lexicon of polarity whose efficiency is measured against different tasks of sentiment analysis. The comparison of our method with different paradigms in literature confirms the impact of our method to design accurate sentiment analysis systems. Indeed, our model reaches an overall accuracy of 90.21%, significantly exceeding the current models on social sentiment analysis.
|
23 |
Indexing and analysis of very large masses of time series / Indexation et analyse de très grandes masses de séries temporellesYagoubi, Djamel edine 19 March 2018 (has links)
Les séries temporelles sont présentes dans de nombreux domaines d'application tels que la finance, l'agronomie, la santé, la surveillance de la Terre ou la prévision météorologique, pour n'en nommer que quelques-uns. En raison des progrès de la technologie des capteurs, de telles applications peuvent produire des millions, voir des des milliards, de séries temporelles par jour, ce qui nécessite des techniques rapides d'analyse et de synthèse.Le traitement de ces énormes volumes de données a ouvert de nouveaux défis dans l'analyse des séries temporelles. En particulier, les techniques d'indexation ont montré de faibles performances lors du traitement des grands volumes des données.Dans cette thèse, nous abordons le problème de la recherche de similarité dans des centaines de millions de séries temporelles. Pour cela, nous devons d'abord développer des opérateurs de recherche efficaces, capables d'interroger une très grande base de données distribuée de séries temporelles avec de faibles temps de réponse. L'opérateur de recherche peut être implémenté en utilisant un index avant l'exécution des requêtes.L'objectif des indices est d'améliorer la vitesse des requêtes de similitude. Dans les bases de données, l'index est une structure de données basées sur des critères de recherche comme la localisation efficace de données répondant aux exigences. Les index rendent souvent le temps de réponse de l'opération de recherche sous linéaire dans la taille de la base de données. Les systèmes relationnels ont été principalement supportés par des structures de hachage, B-tree et des structures multidimensionnelles telles que R-tree, avec des vecteurs binaires jouant un rôle de support. De telles structures fonctionnent bien pour les recherches, et de manière adéquate pour les requêtes de similarité. Nous proposons trois solutions différentes pour traiter le problème de l'indexation des séries temporelles dans des grandes bases de données. Nos algorithmes nous permettent d'obtenir d'excellentes performances par rapport aux approches traditionnelles.Nous étudions également le problème de la détection de corrélation parallèle de toutes paires sur des fenêtres glissantes de séries temporelles. Nous concevons et implémentons une stratégie de calcul incrémental des sketchs dans les fenêtres glissantes. Cette approche évite de recalculer les sketchs à partir de zéro. En outre, nous développons une approche de partitionnement qui projette des sketchs vecteurs de séries temporelles dans des sous-vecteurs et construit une structure de grille distribuée. Nous utilisons cette méthode pour détecter les séries temporelles corrélées dans un environnement distribué. / Time series arise in many application domains such as finance, agronomy, health, earth monitoring, weather forecasting, to name a few. Because of advances in sensor technology, such applications may produce millions to trillions of time series per day, requiring fast analytical and summarization techniques.The processing of these massive volumes of data has opened up new challenges in time series data mining. In particular, it is to improve indexing techniques that has shown poor performances when processing large databases.In this thesis, we focus on the problem of parallel similarity search in such massive sets of time series. For this, we first need to develop efficient search operators that can query a very large distributed database of time series with low response times. The search operator can be implemented by using an index constructed before executing the queries. The objective of indices is to improve the speed of data retrieval operations. In databases, the index is a data structure, which based on search criteria, efficiently locates data entries satisfying the requirements. Indexes often make the response time of the lookup operation sublinear in the database size.After reviewing the state of the art, we propose three novel approaches for parallel indexing and queryin large time series datasets. First, we propose DPiSAX, a novel and efficient parallel solution that includes a parallel index construction algorithm that takes advantage of distributed environments to build iSAX-based indices over vast volumes of time series efficiently. Our solution also involves a parallel query processing algorithm that, given a similarity query, exploits the available processors of the distributed system to efficiently answer the query in parallel by using the constructed parallel index.Second, we propose RadiusSketch a random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. RadiusSketch includes a parallel index construction algorithm that takes advantage of distributed environments to efficiently build sketch-based indices over very large databases of time series, and then query the databases in parallel.Third, we propose ParCorr, an efficient parallel solution for detecting similar time series across distributed data streams. ParCorr uses the sketch principle for representing the time series. Our solution includes a parallel approach for incremental computation of the sketches in sliding windows and a partitioning approach that projects sketch vectors of time series into subvectors and builds a distributed grid structure.Our solutions have been evaluated using real and synthetics datasets and the results confirm their high efficiency compared to the state of the art.
|
24 |
INVESTIGATING MACHINE LEARNING ALGORITHMS WITH IMBALANCED BIG DATAUnknown Date (has links)
Recent technological developments have engendered an expeditious production of big data and also enabled machine learning algorithms to produce high-performance models from such data. Nonetheless, class imbalance (in binary classifications) between the majority and minority classes in big data can skew the predictive performance of the classification algorithms toward the majority (negative) class whereas the minority (positive) class usually holds greater value for the decision makers. Such bias may lead to adverse consequences, some of them even life-threatening, when the existence of false negatives is generally costlier than false positives. The size of the minority class can vary from fair to extraordinary small, which can lead to different performance scores for machine learning algorithms. Class imbalance is a well-studied area for traditional data, i.e., not big data. However, there is limited research focusing on both rarity and severe class imbalance in big data. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection
|
25 |
Battle of Big BendApplen, Jeffery A. 03 December 1997 (has links)
The Battle of Big Bend was the last significant battle of the Rogue River Indian
Wars. The battle occurred 27-28 May 1856 in the Oregon Territory. The location of
the battle was along the Rogue River at a place known as the Big Bend, approximately
eight miles up river from the modern town of Agness, in Curry County, Oregon. The
battle was fought between one reinforced Army company; Company "C", 1st
Dragoons, and a large group of Indians from many different bands. Captain Andrew
Jackson Smith was the commanding officer of Company "C" during the battle, and
Chief John, a member of the Dakubetede Indian band, lead all the warriors. After the
first few hours of fighting, the soldiers had suffered so many casualties that they could
not break out of their surrounded position without abandoning their dead and
wounded. But on the other hand, the soldiers had established their defensive position
on a ridge line which provided them a strong tactical advantage which the Indians could
not overcome in spite of their early battle success. After thirty hours of combat,
Company "G", 1st Infantry, under the command of Captain Christopher C. Augur,
came to the aid of the surrounded soldiers. When Company "G" entered the fight, the Indian warriors elected to quit fighting, and under moderate pressure moved off the field of battle into the surrounding mountains. The purpose of this research was to definitively identify the location of the defensive position used by Company "C", and perform data recovery for the Forest Service using archaeological field methods. The field strategy relied heavily on metal detectors to locate battle related artifacts over the battle area. Using data collected during fieldwork, and correlating it to primary reference sources and materials, the battle position of Company "C" was located for the United States Forest Service. / Graduation date: 1998
|
26 |
Adult's perceptions of children's self-confidence, social competence, and caring behavior as a function of participation in the Big Brothers Big Sisters ProgramPost, Deanne. January 1998 (has links) (PDF)
Thesis--PlanB (M.S.)--University of Wisconsin--Stout, 1998. / Includes bibliographical references.
|
27 |
Fletcher Henderson, king of swing: a summary of his career, his music and his influences /Garner, Charles. January 1991 (has links)
Thesis (Ed.D.) -- Teachers College, Columbia University, 1991. / Typescript; issued also on microfilm. Sponsor: Harold Abeles. Dissertation Committee: Lenore Pogonowski. Discography: p. 228-230. Includes bibliographical references: (leaves 211-217).
|
28 |
Comparing government : big business relations in South Korea and Taiwan /Leung, Lai-sheung. January 1997 (has links)
Thesis (M.A.)--University of Hong Kong, 1997. / Includes bibliographical references (leaf 112-115).
|
29 |
Comparing government big business relations in South Korea and Taiwan /Leung, Lai-sheung. January 1997 (has links)
Thesis (M.A.)--University of Hong Kong, 1997. / Includes bibliographical references (leaf 112-115). Also available in print.
|
30 |
Big data v sociologii / Big Data in SociologyLančová, Táňa January 2018 (has links)
The aim of this work is to provide a holistic view on Big Data in sociology and with this way to reflect the actual topic, which has not been systematically elaborated yet. This theses summarizes approaches to Big Data specification, which provides insight into complexity of this phenomenon. It describes attitudes of contemporary sociology of Big Data. It identifies Big data specifics, which lead to reasons, why Big Data have not been fully accepted by sociology yet. It provides comprehensive description of Big Data sources sorted by the owners and brings an overview of methods for Big Data analysis. It sorts and reflects Critical Data Studies and brings new topics. Key words: Big data, Big data analysis, methodology
|
Page generated in 0.0344 seconds