Spelling suggestions: "subject:"dataanalytics"" "subject:"analytics""
241 |
Um framework de testes unitários para procedimentos de carga em ambientes de business intelligenceSantos, Igor Peterson Oliveira 30 August 2016 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Business Intelligence (BI) relies on Data Warehouse (DW), a historical data repository designed to support the decision making process. Despite the potential benefits of a DW, data quality issues prevent users from realizing the benefits of a BI environment and Data Analytics. Problems related to data quality can arise in any stage of the ETL (Extract, Transform and Load) process, especially in the loading phase. This thesis presents an approach to automate the selection and execution of previously identified test cases for loading procedures in BI environments and Data Analytics based on DW. To verify and validate the approach, a unit test framework was developed. The overall goal is achieve data quality improvement. The specific aim is reduce test effort and, consequently, promote test activities in DW process. The experimental evaluation was performed by two controlled experiments in the industry. The first one was carried out to investigate the adequacy of the proposed method for DW procedures development. The Second one was carried out to investigate the adequacy of the proposed method against a generic framework for DW procedures development. Both results showed that our approach clearly reduces test effort and coding errors during the testing phase in decision support environments. / A qualidade de um produto de software está diretamente relacionada com os testes empregados durante o seu desenvolvimento. Embora os processos de testes para softwares aplicativos e sistemas transacionais já apresentem um alto grau de maturidade, estes devem ser investigados para os processos de testes em um ambiente de Business Intelligence (BI) e Data Analytics. As diferenças deste ambiente em relação aos demais tipos de sistemas fazem com que os processos e ferramentas de testes existentes precisem ser ajustados a uma nova realidade. Neste contexto, grande parte das aplicações de Business Intelligence (BI) efetivas depende de um Data Warehouse (DW), um repositório histórico de dados projetado para dar suporte a processos de tomada de decisão. São as cargas de dados para o DW que merecem atenção especial relativa aos testes, por englobar procedimentos críticos em relação à qualidade. Este trabalho propõe uma abordagem de testes, baseada em um framework de testes unitários, para procedimentos de carga em um ambiente de BI e Data Analytics. O framework proposto, com base em metadados sobre as rotinas de carga, realiza a execução automática de casos de testes, por meio da geração de estados iniciais e a análise dos estados finais, bem como seleciona os casos de testes a serem aplicados. O objetivo é melhorar a qualidade dos procedimentos de carga de dados e reduzir o tempo empregado no processo de testes. A avaliação experimental foi realizada através de dois experimentos controlados executados na indústria. O primeiro avaliou a utilização de casos de testes para as rotinas de carga, comparando a efetividade do framework com uma abordagem manual. O segundo experimento efetuou uma comparação com um framework genérico e similar do mercado. Os resultados indicaram que o framework pode contribuir para o aumento da produtividade e redução dos erros de codificação durante a fase de testes em ambientes de suporte à decisão.
|
242 |
Big Data Analytics for Agriculture Input Supply Chain in Ethiopia : Supply Chain Management Professionals PerspectiveHassen, Abdurahman, Chen, Bowen January 2020 (has links)
In Ethiopia, agriculture accounts for 85% of the total employment, and the country’s export entirely relies on agricultural commodities. The country is continuously affected by chronic food shortage. In the last 40 years, the country’s population have almost tripled; and more agricultural productivity is required to support the livelihood of millions of citizens. As reported by various research, Ethiopia needs to address a number of policy and strategic priorities to improve agriculture; however, in-efficient agriculture supply chain for the supply of input is identified as one of the significant challenges to develop agricultural productivity in the country. The research problem that interest this thesis is to understand Big Data Analytics’ (BDA) potential in achieving better Agriculture Input Supply Chain in Ethiopia. Based on this, we conducted a basic qualitative study to understand the expectations of Supply Chain Management (SCM) professionals, the requirements for the potential applications of Big Data Analytics - and the implications of applying the same from the perspectives of SCM professionals in Ethiopia. The findings of the study suggest that BDA may bring operational and strategic benefit to agriculture input supply chain in Ethiopia, and the application of BDA may have positive implication to agricultural productivity and food security in the country. The findings of this study are not generalizable beyond the participants interviewed.
|
243 |
Sistema para el control y monitoreo de alteraciones hipertensivas en el embarazo / Wearable technology model to control and monitor hypertension during pregnancyBalbin Lopez, Betsy Diamar, Reyes Coronado, Diego Antonio 31 January 2019 (has links)
En el Perú, según estudios realizados en el 2010, el 42% de los pacientes hipertensos son tratados, pero solo el 14% de los pacientes logran ser controlados. Esto se debe a que el proceso actual de control de la hipertensión no es completamente eficiente debido a que el paciente no se adhiere completamente al tratamiento y que los controles de la tensión arterial resultan ser muy puntuales tras periodos de tiempo largos de los cuales no se tiene información confiable relacionada con el progreso del paciente.
Se plantea un sistema para el control y monitoreo de alteraciones hipertensivas en el embarazo con el uso de sensores biomédicos no invasivos. De esta manera aseguramos que la medición continua brinde la información precisa y confiable para que las mujeres gestantes puedan detectar a tiempo alguna alteración hipertensiva. Además, en segunda instancia, el sistema alerta a los familiares y al médico encargado sobre los niveles de presión arterial en caso de emergencia.
El aporte del proyecto es reducir el aumento en la prevalencia de las enfermedades crónicas mediante la integración de los servicios de salud con la tecnología, y gestionar la información desde la colección de datos a través del wearable hasta la exposición. En base a las pruebas realizadas con pacientes gestantes, se obtiene que el 38.64% son controladas y monitoreadas el 75% del tiempo. Estos resultados indican que el uso de la tecnología puede influenciar positivamente en la reducción de la hipertensión en general o en enfermedades crónicas similares. / In Peru, according to studies conducted in 2010, 42% of hypertensive patients are treated, but only 14% of patients manage to be controlled. This is due to the fact that the current process of hypertension control is not completely efficient because the patient does not completely adhere to the treatment and that blood pressure controls turn out to be very punctual after long periods of time from which there is no reliable information related to the progress of the patient.
A system is proposed for the control and monitoring of hypertensive disorders in pregnancy with the use of non-invasive biomedical sensors. In this way we ensure that continuous measurement provides accurate and reliable information so that pregnant women can detect any hypertensive disorder on time. In addition, the system alerts the family members and the doctor in charge about the blood pressure levels in case of emergency.
The contribution of the project is to reduce the increase in the prevalence of chronic diseases by integrating health services with technology, and to manage information from data collection through wearable until data exposure. Based on the tests carried out with pregnant patients, 38.64% are controlled and monitored 75% of the time. These results indicate that the use of technology can positively influence the reduction of hypertension in general or in similar chronic diseases. / Tesis
|
244 |
Feedback-Driven Data ClusteringHahmann, Martin 28 October 2013 (has links)
The acquisition of data and its analysis has become a common yet critical task in many areas of modern economy and research. Unfortunately, the ever-increasing scale of datasets has long outgrown the capacities and abilities humans can muster to extract information from them and gain new knowledge. For this reason, research areas like data mining and knowledge discovery steadily gain importance. The algorithms they provide for the extraction of knowledge are mandatory prerequisites that enable people to analyze large amounts of information. Among the approaches offered by these areas, clustering is one of the most fundamental. By finding groups of similar objects inside the data, it aims to identify meaningful structures that constitute new knowledge. Clustering results are also often used as input for other analysis techniques like classification or forecasting.
As clustering extracts new and unknown knowledge, it obviously has no access to any form of ground truth. For this reason, clustering results have a hypothetical character and must be interpreted with respect to the application domain. This makes clustering very challenging and leads to an extensive and diverse landscape of available algorithms. Most of these are expert tools that are tailored to a single narrowly defined application scenario. Over the years, this specialization has become a major trend that arose to counter the inherent uncertainty of clustering by including as much domain specifics as possible into algorithms. While customized methods often improve result quality, they become more and more complicated to handle and lose versatility. This creates a dilemma especially for amateur users whose numbers are increasing as clustering is applied in more and more domains. While an abundance of tools is offered, guidance is severely lacking and users are left alone with critical tasks like algorithm selection, parameter configuration and the interpretation and adjustment of results.
This thesis aims to solve this dilemma by structuring and integrating the necessary steps of clustering into a guided and feedback-driven process. In doing so, users are provided with a default modus operandi for the application of clustering. Two main components constitute the core of said process: the algorithm management and the visual-interactive interface. Algorithm management handles all aspects of actual clustering creation and the involved methods. It employs a modular approach for algorithm description that allows users to understand, design, and compare clustering techniques with the help of building blocks. In addition, algorithm management offers facilities for the integration of multiple clusterings of the same dataset into an improved solution. New approaches based on ensemble clustering not only allow the utilization of different clustering techniques, but also ease their application by acting as an abstraction layer that unifies individual parameters. Finally, this component provides a multi-level interface that structures all available control options and provides the docking points for user interaction.
The visual-interactive interface supports users during result interpretation and adjustment. For this, the defining characteristics of a clustering are communicated via a hybrid visualization. In contrast to traditional data-driven visualizations that tend to become overloaded and unusable with increasing volume/dimensionality of data, this novel approach communicates the abstract aspects of cluster composition and relations between clusters. This aspect orientation allows the use of easy-to-understand visual components and makes the visualization immune to scale related effects of the underlying data. This visual communication is attuned to a compact and universally valid set of high-level feedback that allows the modification of clustering results. Instead of technical parameters that indirectly cause changes in the whole clustering by influencing its creation process, users can employ simple commands like merge or split to directly adjust clusters.
The orchestrated cooperation of these two main components creates a modus operandi, in which clusterings are no longer created and disposed as a whole until a satisfying result is obtained. Instead, users apply the feedback-driven process to iteratively refine an initial solution. Performance and usability of the proposed approach were evaluated with a user study. Its results show that the feedback-driven process enabled amateur users to easily create satisfying clustering results even from different and not optimal starting situations.
|
245 |
Energieffektivisering inom fordonsindustrin : Hur energianvändning inom fordonsindustrin kan bli mer hållbar / Energy management in the automotive industry : How energy use in the automotive industry can become more sustainableThoong, John, Belzacq, Johanna January 2022 (has links)
About a third of the energy in Sweden is used for production in industry, where a few energy-intensive industries account for a large proportion of the energy use. When energy efficiency takes place here, the positive environmental effects will be substantial. Therefore, there is often great potential to reduce energy use in these industries. The purpose of this study is to investigate the company's energy use in order to be able to present proposals for cost-effective improvement measures that lead to a reduction in the company's energy consumption, which in turn leads to a reduced environmental impact. The study has used sustainable development, total quality management and Kotter’s 8-step process for leading change as a theoretical background. The study is a qualitative and quantitative case study and the data collection was done using semi-structured and unstructured interviews, observations and document analysis. The study’s results show that the company uses unnecessary energy in three energy consumption areas: compressed air, heating and electricity. To reduce energy use, the company needs to put in place a shut-down management, appoint an energy coordinator, prioritize preventive work, repair broken equipment and introduce preventive maintenance, optimize the ovens, involve employees in continuous improvement and to have a committed leadership. By reducing energy consumption, the company can reduce its impact on the environment, and by implementing the improvement measures, the company can save several million SEK each year. / Ungefär en tredjedel av energin i Sverige används för produktion inom industrin, där ett fåtal energiintensiva branscher står för en stor andel av industrins energianvändning. När energieffektivisering sker här blir de positiva miljöeffekterna stora. Därför finns det ofta stor potential att minska energianvändningen i dessa branscher. Syftet med denna studie är att undersöka företagets energianvändning för att sedan kunna presentera förslag på kostnadseffektiva förbättringsåtgärder som leder till en minskning av företagets energiförbrukning, vilket i sin tur leder till en minskad miljöpåverkan. Studien har använt sig av hållbar utveckling, hörnstensmodellen och Kotters 8-stegsmodell för förändringsledning som teoretisk bakgrund. Studien är en kvalitativ och kvantitativ fallstudie och datainsamlingen gjordes med hjälp av semistrukturerade och ostrukturerade intervjuer, observationer och dokumentstudier. Studiens resultat visar på att företaget använder energi i onödan inom tre energiförbrukningsområden: tryckluft, värme och el. För att minska energianvändningen behöver företaget införa avstängningsrutiner, utse en energikoordinator, prioritera förebyggande arbete, reparera trasig utrustning och införa förebyggande underhåll, optimera ugnarna, involvera medarbetarna i förbättringsarbetet och ha ett engagerat och delaktigt ledarskap. Genom att minska energiförbrukningen kan företaget minska sin påverkan på miljön och genom att implementera förbättringsåtgärder kan företaget spara flera miljoner kronor varje år.
|
246 |
Avatar Playing Style : From analysis of football data to recognizable playing stylesEdberger Persson, Jakob, Danielsson, Emil January 2022 (has links)
Football analytics is a rapid growing area which utilizes conventional data analysis and computational methods on gathered data from football matches. The results emerging out of this can give insights of performance levels when it comes to individual football players, different teams and clubs. A difficulty football analytics struggles with daily is to translate the analysis results into actual football qualities and knowledge which the wider public can understand. In this master thesis we therefore take on the ball event data collected from football matches and develop a model which classifies individual football player’s playing styles, where the playing styles are well known among football followers. This is carried out by first detecting the playing positions: ’Strikers’, ’Central midfielders’, ’Outer wingers’, ’Full backs’, ’Centre backs’ and ’Goalkeepers’ using K-Means clustering, with an accuracy of 0.89 (for Premier league 2021/2022) and 0.84 (for Allsvenskan 2021). Secondly, we create a simplified binary model which only classifies the player’s playing style as "Offensive"/"Defensive". From the bad results of this model we show that there exist more than just these two playing styles. Finally, we use an unsupervised modelling approach where Principal component analysis (PCA) is applied in an iterative manner. For the playing position ’Striker’ we find the playing styles: ’The Target’, ’The Artist’, ’The Poacher’ and ’The Worker’ which, when comparing with a created validation data set, give a total accuracy of 0.79 (best of all positions and the only one covered in detail in the report due to delimitations). The playing styles can, for each player, be presented visually where it is seen how well a particular player fits into the different playing styles. Ultimately, the results in the master thesis indicates that it is easier to find playing styles which have clear and obvious on-the-ball-actions that distinguish them from other players within their respective position. Such playing styles, easier to find, are for example "The Poacher" and "The Target", while harder to find playing styles are for example " The Box-to-box" and "The Inverted". Finally, conclusions are that the results will come to good use and the goals of the thesis are met, although there still exist a lot of improvements and future work which can be made. Developed models can be found in a simplified form on the GitHub repository: https://github.com/Sommarro-Devs/avatar-playing-style. The report can be read stand-alone, but parts of it are highly connected to the models and code in the GitHub repository.
|
247 |
From data collection to electric grid performance : How can data analytics support asset management decisions for an efficient transition toward smart grids?Koziel, Sylvie Evelyne January 2021 (has links)
Physical asset management in the electric power sector encompasses the scheduling of the maintenance and replacement of grid components, as well as decisions about investments in new components. Data plays a crucial role in these decisions. The importance of data is increasing with the transformation of the power system and its evolution toward smart grids. This thesis deals with questions related to data management as a way to improve the performance of asset management decisions. Data management is defined as the collection, processing, and storage of data. Here, the focus is on the collection and processing of data. First, the influence of data on the decisions related to assets is explored. In particular, the impacts of data quality on the replacement time of a generic component (a line for example) are quantified using a scenario approach, and failure modeling. In fact, decisions based on data of poor quality are most likely not optimal. In this case, faulty data related to the age of the component leads to a non-optimal scheduling of component replacement. The corresponding costs are calculated for different levels of data quality. A framework has been developed to evaluate the amount of investment needed into data quality improvement, and its profitability. Then, the ways to use available data efficiently are investigated. Especially, the possibility to use machine learning algorithms on real-world datasets is examined. New approaches are developed to use only available data for component ranking and failure prediction, which are two important concepts often used to prioritize components and schedule maintenance and replacement. A large part of the scientific literature assumes that the future of smart grids lies in big data collection, and in developing algorithms to process huge amounts of data. On the contrary, this work contributes to show how automatization and machine learning techniques can actually be used to reduce the need to collect huge amount of data, by using the available data more efficiently. One major challenge is the trade-offs needed between precision of modeling results, and costs of data management. / <p>QC 20210330</p>
|
248 |
Community Detection of Anomaly in Large-Scale Network Dissertation - Adefolarin Bolaji .pdfAdefolarin Alaba Bolaji (10723926) 29 April 2021 (has links)
<p>The
detection of anomalies in real-world networks is applicable in different
domains; the application includes, but is not limited to, credit card fraud
detection, malware identification and classification, cancer detection from
diagnostic reports, abnormal traffic detection, identification of fake media
posts, and the like. Many ongoing and current researches are providing tools
for analyzing labeled and unlabeled data; however, the challenges of finding
anomalies and patterns in large-scale datasets still exist because of rapid
changes in the threat landscape. </p><p>In this study, I implemented a
novel and robust solution that combines data science and cybersecurity to solve
complex network security problems. I used Long Short-Term Memory (LSTM) model, Louvain
algorithm, and PageRank algorithm to identify and group anomalies in large-scale
real-world networks. The network has billions of packets. The developed model
used different visualization techniques to provide further insight into how the
anomalies in the network are related. </p><p>Mean absolute error (MAE) and root mean square error (RMSE) was used to validate the anomaly detection models, the
results obtained for both are 5.1813e-04
and 1e-03 respectively. The low loss from the training
phase confirmed the low RMSE at loss: 5.1812e-04, mean absolute error:
5.1813e-04, validation loss: 3.9858e-04, validation mean absolute error:
3.9858e-04. The result from the community detection
shows an overall modularity value of 0.914 which is proof of the existence of
very strong communities among the anomalies. The largest sub-community of the
anomalies connects 10.42% of the total nodes of the anomalies. </p><p>The broader aim and impact of this study was to provide
sophisticated, AI-assisted countermeasures to cyber-threats in large-scale
networks. To close the existing gaps created by the shortage of skilled and
experienced cybersecurity specialists and analysts in the cybersecurity field,
solutions based on out-of-the-box thinking are inevitable; this research was aimed
at yielding one of such solutions. It was built to detect specific and
collaborating threat actors in large networks and to help speed up how the
activities of anomalies in any given large-scale network can be curtailed in
time.</p><div><div><div>
</div>
</div>
</div>
<br>
|
249 |
Improving data-driven decision making through data democracy : Case study of a Swedish bankAmerian, Irsa January 2021 (has links)
Nowadays, becoming data-driven is the vision of almost all organizations. However, achieving this vision is not as easy as it may look like and there are many factors that affect, enable, support and sustain the data-driven ecosystem in an organization. Among these factors, this study focuses on data democracy which can be defined as the intra-organizational open data that aims to empower the employees getting faster and easier access to data in order to benefit from the business insight they need without the interfere of external help. In the existing literature, while the importance of becoming data-driven has been widely discussed, when it comes to data democracy within organizations, there is a noticeable gap. As a result, this master’s thesis aims to justify the importance and role of the data democracy in becoming a data-driven organization, focusing on the case of a Swedish bank. Additionally, it intends to provide extra investigation on the role of data analytics tools in achieving data democracy. The results of the study show that there is a strong connection between the benefits of the empowering different actors of the organization with the needed data knowledge, and the speeding up of the data-driven transformation journey. Based on the study, shared data and the availability of data to a larger number of stakeholders inside an organization result into a better understanding of different aspects of the problems, simplify the data-driven decision making and make the organization more data-driven. In the process of becoming data-driven, the organizations should provide the analytics tools not only to the data specialists but even to the non-data technical people. And by offering the needed support, training and collaboration possibilities between the two groups of employees (data specialists and non-data specialists), it should be attempted to enable the second group to extract the insight from the data, independently from the help of the data scientists. An organization can succeed in the path of becoming data-driven when they invest on the reusable capabilities of its employees, by discovering the data science skills across various departments and turning their domain experts into citizen data scientists of the organization.
|
250 |
An analysis of new functionalities enabled by the second generation of smart meters in Sweden / Analys av nya funktioner möjliggjort av andra generationen smarta mätare i SverigeDrummond, Jose January 2021 (has links)
It is commonly agreed among energy experts that smart meters (SMs) are the key component that will facilitate the transition towards the smart grid. Fast-peace innovations in the smart metering infrastructure (AMI) are exposing countless benefits that network operators can obtain when they integrate SMs applications into their daily operations. Following the amendment in 2017, where the Swedish government dictated that all SMs should now include new features such as remote control, higher time resolution for the energy readings and a friendly interface for customers to access their own data; network operators in Sweden are currently replacing their SMs for a new model, also called the second generation of SMs. While the replacement of meters is in progress, many utilities like Hemab are trying to reveal which technical and financial benefits the new generation of SMs will bring to their operations. As a first step, this thesis presents the results of a series of interviews carried out with different network operators in Sweden. It is studied which functionalities have the potential to succeed in the near future, as well as those functionalities that are already being tested or fully implemeneted by some utilities in Sweden. Furthermore, this thesis analyses those obstacles and barriers that utilities encounter when trying to implement new applications using the new SMs. In a second stage, an alarm system for power interruptions and voltage-quality events (e.g., overvoltage and undervoltage) using VisionAir software and OMNIPOWER 3-phase meters is evaluated. The results from the evaluation are divided into three sections: a description of the settings and functionalities of the alarm, the outcomes from the test, and a final discussion of potential applications. This study has revealed that alarm functions, data analytics (including several methods such as load forecasting, customer segmentation and non-technical losses analysis), power quality monitoring, dynamic pricing, and load shedding have the biggest potential to succeed in Sweden in the coming years. Furthermore, it can be stated that the lack of time, prioritization of other projects in the grid and the integration of those new applications into the current system seem to be the main barrier for Swedish utilities nowadays. Regarding the alarm system, it was found that the real benefits for network operators arrive when the information coming from an alarm system is combined with a topology interface of the network and a customer notifications server. Both applications could improve customer satisfaction by significantly reducing outage time and providing customers with real-time and precise information about the problems in the grid.
|
Page generated in 0.0363 seconds