201 |
Fast Data Analysis Methods For Social Media DataNhlabano, Valentine Velaphi 07 August 2018 (has links)
The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database.
A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time. / Dissertation (MSc)--University of Pretoria, 2019. / National Research Foundation (NRF) - Scarce skills / Computer Science / MSc / Unrestricted
|
202 |
Improving Efficiency of Data Compaction by Creating & Evaluating a Random Compaction Strategy in Apache CassandraKATIKI REDDY, RAHUL REDDY January 2020 (has links)
Background: Cassandra is a NoSQL database, where the data in the background is stored in the immutable tables which are called SSTables. These SSTables are subjected to a method called Compaction to reclaim the disk space and to improve READ performance. Size Tiered Compaction Strategy and Leveled Compaction Strategy are the most used generic compaction strategies for different use cases. Space Amplification and Write Amplification are the main limitations of the above compaction strategies, respectively. This research aims to address the limitations of existing generic compaction strategies. Objectives: A new random compaction strategy will be created to improve the efficiency and effectiveness of compaction. This newly created random compaction strategy will be evaluated by comparing the read, write and space amplification with the existing generic compaction strategies, for different use cases. Methods: In this study, Design Science has been used as a research method to answer both the research questions. Focus groups meetings have been conducted to gain knowledge on the limitations of existing compaction strategies, newly created random compaction strategy, and it’s appropriate solutions. During the evaluation, The metrics have been collected from Prometheus server and visualization is carried out in Grafana server. The compaction strategies are compared significantly by performing statistical tests. Results: The results in this study showed that the random compaction strategy is performing almost similar to Leveled Compaction Strategy. The Random Compaction Strategy solves the space amplification problem and write amplification problem in the Size Tiered Compaction Strategy and Leveled Compaction Strategy, respectively. In this section, eight important metrics have been analyzed for all three compaction strategies. Conclusions: The main artefact of this research is a new Random Compaction Strategy. After performing two iterations, a new stable random compaction strategy is designed. The results were analyzed by comparing the Size Tiered Compaction Strategy, Leveled Compaction Strategy and Random Compaction Strategy on two different use cases. The new random compaction strategy has performed great for Ericsson buffer management use case.
|
203 |
Dimensionality Reduction in Healthcare Data Analysis on Cloud PlatformRay, Sujan January 2020 (has links)
No description available.
|
204 |
Implementation and Evaluation of a DataPipeline for Industrial IoT Using ApacheNiFiVilhelmsson, Lina, Sjöberg, Pontus January 2020 (has links)
In the last few years, the popularity of Industrial IoT has grown a lot, and it is expected to have an impact of over 14 trillion USD on the global economy by 2030. One application of Industrial IoT is using data pipelining tools to move raw data from industry machines to data storage, where the data can be processed by analytical instruments to help optimize the industrial operations. This thesis analyzes and evaluates a data pipeline setup for Industrial IoT built with the tool Apache NiFi. A data flow setup was designed in NiFi, which connected an SQL database, a file system, and a Kafka topic to a distributed file system. To evaluate the NiFi data pipeline setup, some tests were conducted to see how the system performed under different workloads. The first test consisted of determining which size to merge a FlowFile into to get the lowest latency, the second test if data from the different data sources should be kept separate or be merged together. The third test was to compare the NiFi setup with an alternative setup, which had a Kafka topic as an intermediary between NiFi and the endpoint. The first test showed that the lowest latency was achieved when merging FlowFiles together into 10 kB files. In the second test, merging together FlowFiles from all three sources gave a lower latency than keeping them separate for larger merging sizes. Finally, it was shown that there was no significant difference between the two test setups.
|
205 |
Presepsina: biomarcador asociado al seguimiento y pronóstico en sepsis de pacientes hospitalizados en la UCI del Hospital Arzobispo LoayzaChumacero Ortiz, Jenner Erwin January 2012 (has links)
Determina la utilidad de la presepsina en el pronóstico y seguimiento de pacientes sépticos hospitalizados en la UCI del Hospital Arzobispo Loayza. Realiza un estudio analítico, descriptivo, retrospectivo, longitudinal, observacional e intrahospitalario. Encuentra que el promedio de APACHE II fue de 17.93, en de 332.67 pg/ml, en sepsis severa fue de 600 pg/ml y en shock séptico fue de 2399.3 pg/ml. Obtiene los siguientes resultados en el análisis ROC de presepsin, procalcitonina, SOFA, APACHEII en relación a mortalidad al alta: el análisis ROC comparando la exactitud en la predicción de la mortalidad al alta de la UCI muestra áreas bajo la curva (AUC) para presepsin de 0.730, PCT de 0.791, score APACHE II de 0.947, score SOFA de 0.955; permite concluir a un nivel de significancia del 95%, que no existen diferencias significativas entre la capacidad predictiva de las cuatro variables. En el análisis ROC de presepsin, procalcitonina, SOFA, APACHEII en relación a mortalidad a 30 días: el análisis ROC comparando la exactitud en la predicción de la mortalidad a los 30 días, muestra áreas bajo la curva (AUC) para presepsin de 0.765, para PCT de 0.803, score APACHE II de 0.865, score SOFA de 0.888; nos permite concluir a un nivel relación a SOFA el promedio fue de 7.57, el promedio de procalcitonina (PCT) en sepsis fue de 0.147 ng/ml, en sepsis severa fue de 2.61 ng/ml; en shock séptico fue de 13.77 ng/ml. El promedio de presepsina en sepsis fue de significancia del 95%, que no existen diferencias significativas entre la capacidad predictiva de las cuatro variables. La mortalidad al alta y a los 30 días estuvo en relación a la edad, al número de vasopresores usados, al mayor número de días con vasopresores, necesidad de ventilación mecánica y a los días con ventilador mecánico. Los scores APACHE II, SOFA se asociaron con mortalidad al alta y a los 30 días, así como también los biomarcadores procalcitonina y presepsina. / Trabajo académico
|
206 |
Distributed graph decomposition algorithms on Apache SparkMandal, Aritra 20 April 2018 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Structural analysis and mining of large and complex graphs for describing the
characteristics of a vertex or an edge in the graph have widespread use in graph
clustering, classification, and modeling. There are various methods for structural
analysis of graphs including the discovery of frequent subgraphs or network motifs,
counting triangles or graphlets, spectral analysis of networks using eigenvectors of
graph Laplacian, and finding highly connected subgraphs such as cliques and quasi
cliques. Unfortunately, the algorithms for solving most of the above tasks are quite
costly, which makes them not-scalable to large real-life networks.
Two such very popular decompositions, k-core and k-truss of a graph give very
useful insight about the graph vertex and edges respectively. These decompositions
have been applied to solve protein functions reasoning on protein-protein networks,
fraud detection and missing link prediction problems.
k-core decomposition with is linear time complexity is scalable to large real-life
networks as long as the input graph fits in the main memory. k-truss on the other
hands is computationally more intensive due to its definition relying on triangles and
their is no linear time algorithm available for it.
In this paper, we propose distributed algorithms on Apache Spark for k-truss and
k-core decomposition of a graph. We also compare the performance of our algorithm
with state-of-the-art Map-Reduce and parallel algorithms using openly available real
world network data. Our proposed algorithms have shown substantial performance
improvement.
|
207 |
Možnosti optimalizace výkonu LAMP (linux/apache/mysql/php) / Optimization of LAMP (linux/apache/mysql/php)Kotlář, Pavel January 2009 (has links)
This work deals with topic of LAMP software bundle performance optimalization. Step by step, it tries to discover performance problems in all four parts of LAMP (in Linux, HTTP server Apache, MySQL database and PHP language interpreter). A model web application is created for these testing purposes. When a problem is found, a change in configuration files is done or a performance improving technology is applied to the corresponding part. A set of optimalization recommendations is compiled and verified on server running real web application.
|
208 |
Internetový obchod s lyžařským vybavením / Internet Shop with Skiing EquipmentŠtrbík, Zdeněk January 2007 (has links)
The objective of this project is to design and create internet shop offering functions, which are common and essential for this type of application. It will also offer functions, which will ensure troublefree and safe run. The base of design are UML models (ER diagram, USE CASE diagram). This models will be created for compact design of database, and structure of whole application. This shop will offer standard administrative functions, which will allow administrators to control the shop. Users will be not only offered by goods, but also will have their own account, where they will be allowed to check the history of their operations. The system will use PHP, html, JavaScript and SQL languages. It will be based on MySql database, Apache server and Rewrite modul of Apache.
|
209 |
Optimization of the Photovoltaic Time-series Analysis Process Through Hybrid Distributed ComputingHwang, Suk Hyun 01 June 2020 (has links)
No description available.
|
210 |
Performance comparison between Apache and NGINX under slow rate DoS attacksAl-Saydali, Josef, Al-Saydali, Mahdi January 2021 (has links)
One of the novel threats to the internet is the slow HTTP Denial of Service (DoS) attack on the application level targeting web server software. The slow HTTP attack can leave a high impact on web server availability to normal users, and it is affordable to be established compared to other types of attacks, which makes it one of the most feasible attacks against web servers. This project investigates the slow HTTP attack impact on the Apache and Nginx servers comparably, and review the available configurations for mitigating such attack. The performance of the Apache and NGINX servers against slow HTTP attack has been compared, as these two servers are the most globally used web server software. Identifying the most resilient web server software against this attack and knowing the suitable configurations to defeat it play a key role in securing web servers from one of the major threats on the internet. From comparing the results of the experiments that have been conducted on the two web servers, it has been found that NGINX performs better than the Apache server under slow rate DoS attack without using any configured defense mechanism. However, when defense mechanisms have been applied to both servers, the Apache server acted similarly to NGINX and was successful to defeat the slow rate DoS attack.
|
Page generated in 0.0474 seconds