Global ETD Search

1	Towards a framework for engineering big data: An automotive systems perspective Byrne, Thomas J., Campean, Felician, Neagu, Daniel 05 1900 (has links) Yes / Demand for more sophisticated models to meet big data expectations require significant data repository obligations, operating concurrently in higher-level applications. Current models provide only disjointed modelling paradigms. The proposed framework addresses the need for higher-level abstraction, using low-level logic in the form of axioms, from which higher-level functionality is logically derived. The framework facilitates definition and usage of subjective structures across the cyber-physical system domain, and is intended to converge the range of heterogeneous data-driven objects. Knowledge management Big data analysis Design models Automotive systems
2	Data collection for digitalization of the Stockholm Metro : A study of data sources needed to digitalize the Stockholm Metro / Datainsamling för digitalisering av Stockholms tunnelbana Feng, Benny January 2019 (has links) Many organizations are looking to implement data-driven technologies such as big data analytics, artificial intelligence and machine learning in their operations due to their rapid development and increased usefulness in recent years. With technology changing fast, it is difficult for managers to determine which sources of data are relevant in the context of these technologies. This paper aims to explore opportunities to implement data-driven technologies in the Stockholm metro. The technologies are assessed based on their usefulness and feasibility. The assessment is also done in regards to the current state of the organization in charge of the Stockholm metro, Trafikförvaltningen, and its internal capabilities. The study has been conducted through interviews aimed at understanding Trafikförvaltningen as an organization, as well as literary reviews of state-of-the-art technologies aimed at understanding what is technically possible. By aligning the state of the organization with current technologies, it was concluded that big data for preventive maintenance and smart grids for minimizing energy consumption were the most relevant data-driven technologies to implement. / Många organisationer vill implementera datadrivna teknologier som stordataanalys, artificiell intelligens och maskininlärning i sina verksamheter på grund av de senaste årens dess snabba utvecklingstakt och ökade användbarhet. I och med den snabba teknologiska utvecklingstakten är det svårt för beslutsfattare att avgöra vilka datakällor som är relevanta för dessa teknologier. Den här uppsatsen syftar till att undersöka möjligheterna att implementera datadrivna teknologier i Stockholms tunnelbanesystem. Dessa teknologier är bedömda efter användbarhet och möjlighet för lyckad implementation. Bedömningen tar även hänsyn till det nuvarande tillståndet av organisationen som är ansvarig för Stockholm tunnelbana, Trafikförvaltningen, och dess interna färdigheter. Studien har genomförts via intervjuer som syftat till att förstå Trafikförvaltningen som organisation, tillsammans med en litteraturstudie av den senaste tekniken som syftat till att förstå vad som är tekniskt möjligt. Genom en analys av organisationens nuvarande tillstånd och nuvarande teknologier drogs slutsatsen att stordataanalys för preventivt underhåll och smarta elnät för minskad energikonsumtion är de mest relevanta datadrivna teknologierna att implementera. Digitalization big data analysis dynamic capabilities Digitalisering stordataanalys dynamiska förmågor Engineering and Technology Teknik och teknologier
3	AN EMPIRICAL STUDY OF AN INNOVATIVE CLUSTERING APPROACH TOWARDS EFFICIENT BIG DATA ANALYSIS Bowers, Jacob Robert 01 May 2024 (has links) (PDF) The dramatic growth of big data presents formidable challenges for traditional clustering methodologies, which often prove unwieldy and computationally expensive when processing vast quantities of data. This study explores a novel clustering approach exemplified by Sow & Grow, a density-based clustering algorithm akin to DBSCAN developed to address the issues inherent to big data by enabling end-users to strategically allocate computational resources toward regions of noted interest. Achieved through a unique procedure of seeding points and subsequently fostering their growth into coherent clusters, this method significantly reduces computational waste by ignoring insignificant segments of the dataset and provides information relevant to the end user. The implementation of this algorithm developed as part of this research showcases promising results in various experimental settings, exhibiting notable speedup over conventional clustering methods. Additionally, the incorporation of dynamic load balancing further enhances the algorithm's performance, ensuring optimal resource utilization across parallel processing threads when handling superclusters or unbalanced data distributions. Through a detailed study of the theoretical underpinnings of this innovative clustering approach and the limitations of traditional clustering techniques, this research demonstrates the practical utility of the Sow & Grow algorithm in expediting the clustering processes while providing results pertinent to end users. Big Data Analysis Big Data Clustering Cluster Analysis User-Centered Clustering
4	Big data analýzy a statistické zpracování metadat v archivu obrazové zdravotnické dokumentace / Big Data Analysis and Metadata Statistics in Medical Images Archives Pšurný, Michal January 2017 (has links) This Diploma thesis describes issues of big data in healthcare focus on picture archiving and communication system. DICOM format are store images with header where it could be other valuable information. This thesis mapping data from 1215 studies.
5	Food Industry Sales Prediction : A Big Data Analysis & Sales Forecast of Bake-off Products Lindström, Maja January 2021 (has links) In this thesis, the sales of bread and coffee bread at Coop Värmland AB have been studied. The aim was to find what factors that are important for the sales and then make predictions of how the sales will look like in the future to reduce waste and increase profits. Big data analysis and data exploration was used to get to know the data and find the factors that affect the sales the most. Time series forecasting and supervised machine learning models were used to predict future sales. The main focus was five different models that were compared and analysed, they were; Decision tree regression, Random forest regression, Artificial neural networks, Recurrent neural networks and a time series model called Prophet. Comparing the observed values to the predictions made by the models indicated that using a model based on the time series is to be preferred, that is, Prophet and Recurrent neural network. These two models gave the lowest errors and by that, the most accurate results. Prophet yielded mean absolute percentage errors of 8.295% for bread and 9.156% for coffee bread. The Recurrent neural network gave mean absolute percentage errors of 7.938% for bread and 13.12% for coffee bread. That is about twice as good as the models they are using today at Coop which are based on the mean value of the previous sales. / I denna avhandling har försäljningen av matbröd och fikabröd på Coop Värmland AB studerats. Målet var att hitta vilka faktorer som är viktiga för försäljningen och sedan förutsäga hur försäljningen kommer att se ut i framtiden för att minska svinn och öka vin- ster. Big data- analys och explorativ dataanalys har använts för att lära känna datat och hitta de faktorer som påverkar försäljningen mest. Tidsserieprediktion och olika mask- ininlärningsmodeller användes för att förutspå den framtida försäljningen. Huvudfokus var fem olika modeller som jämfördes och analyserades. De var Decision tree regression, Random forest regression, Artificial neural networks, Recurrent neural networks och en tidsseriemodell som kallas Prophet. Jämförelse mellan de observerade värdena och de värden som predicerats med modellerna indikerade att de modeller som är baserade på tidsserierna är att föredra, det vill säga Prophet och Recurrent neural networks. Dessa två modeller gav de lägsta felen och därmed de mest exakta resultaten. Prophet gav genomsnittliga absoluta procentuella fel på 8.295% för matbröd och 9.156% för fikabröd. Recurrent neural network gav genomsnittliga absoluta procentuella fel på 7.938% för matbröd och 13.12% för fikabröd. Det är ungefär dubbelt så korrekt som de modeller de använder idag på Coop som baseras på medelvärdet av tidigare försäljning. Machine Learning Big Data Analysis Data Exploration Prediction Statistics Mathematical Analysis Matematisk analys Probability Theory and Statistics Sannolikhetsteori och statistik
6	Delayed Transfer Entropy applied to Big Data / Delayed Transfer Entropy aplicado a Big Data Dourado, Jonas Rossi 30 November 2018 (has links) Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence. / A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas. Análise de Big Data Big Data analysis Causalidade Causality Cluster heterogêneo de computadores Delayed Transfer Entropy Delayed Transfer Entropy Estratégias de paralelismo Heterogeneous computer cluster Parallelism strategies Surrogate Surrogate
7	Delayed Transfer Entropy applied to Big Data / Delayed Transfer Entropy aplicado a Big Data Jonas Rossi Dourado 30 November 2018 (has links) Recent popularization of technologies such as Smartphones, Wearables, Internet of Things, Social Networks and Video streaming increased data creation. Dealing with extensive data sets led the creation of term big data, often defined as when data volume, acquisition rate or representation demands nontraditional approaches to data analysis or requires horizontal scaling for data processing. Analysis is the most important Big Data phase, where it has the objective of extracting meaningful and often hidden information. One example of Big Data hidden information is causality, which can be inferred with Delayed Transfer Entropy (DTE). Despite DTE wide applicability, it has a high demanding processing power which is aggravated with large datasets as those found in big data. This research optimized DTE performance and modified existing code to enable DTE execution on a computer cluster. With big data trend in sight, this results may enable bigger datasets analysis or better statistical evidence. / A recente popularização de tecnologias como Smartphones, Wearables, Internet das Coisas, Redes Sociais e streaming de Video aumentou a criação de dados. A manipulação de grande quantidade de dados levou a criação do termo Big Data, muitas vezes definido como quando o volume, a taxa de aquisição ou a representação dos dados demanda abordagens não tradicionais para analisar ou requer uma escala horizontal para o processamento de dados. A análise é a etapa de Big Data mais importante, tendo como objetivo extrair informações relevantes e às vezes escondidas. Um exemplo de informação escondida é a causalidade, que pode ser inferida utilizando Delayed Transfer Entropy (DTE). Apesar do DTE ter uma grande aplicabilidade, ele possui uma grande demanda computacional, esta última, é agravada devido a grandes bases de dados como as encontradas em Big Data. Essa pesquisa otimizou e modificou o código existente para permitir a execução de DTE em um cluster de computadores. Com a tendência de Big Data em vista, esse resultado pode permitir bancos de dados maiores ou melhores evidências estatísticas. Análise de Big Data Causalidade Cluster heterogêneo de computadores Delayed Transfer Entropy Estratégias de paralelismo Surrogate Big Data analysis Causality Delayed Transfer Entropy Heterogeneous computer cluster Parallelism strategies Surrogate
8	Using Statistical Methods to Determine Geolocation Via Twitter Wright, Christopher M. 01 May 2014 (has links) With the ever expanding usage of social media websites such as Twitter, it is possible to use statistical inquires to form a geographic location of a person using solely the content of their tweets. According to a study done in 2010, Zhiyuan Cheng, was able to detect a location of a Twitter user within 100 miles of their actual location 51% of the time. While this may seem like an already significant find, this study was done while Twitter was still finding its ground to stand on. In 2010, Twitter had 75 million unique users registered, as of March 2013, Twitter has around 500 million unique users. In this thesis, my own dataset was collected and using Excel macros, a comparison of my results to that of Cheng’s will see if the results have changed over the three years since his study. If found to be that Cheng’s 51% can be shown more efficiently using a simpler methodology, this could have a significant impact on Homeland Security and cyber security measures. Large Data Set Linguistics Big Data Analysis Human Intelligence Twitter Facebook Social Networking Software Engineering Internet Communication Technology and New Media OS and Networks Software Engineering Theory and Algorithms
9	Real-time anomaly detection with in-flight data : streaming anomaly detection with heterogeneous communicating agents / Détection des anomalies sur les données en vol en temps réel avec des agents communicants hétérogènes Aussel, Nicolas 21 June 2019 (has links) Avec l'augmentation du nombre de capteurs et d'actuateurs dans les avions et le développement de liaisons de données fiables entre les avions et le sol, il est devenu possible d'améliorer la sécurité et la fiabilité des systèmes à bord en appliquant des techniques d'analyse en temps réel. Cependant, étant donné la disponibilité limité des ressources de calcul embarquées et le coût élevé des liaisons de données, les solutions architecturelles actuelles ne peuvent pas exploiter pleinement toutes les ressources disponibles, limitant leur précision.Notre but est de proposer un algorithme distribué de prédiction de panne qui pourrait être exécuté à la fois à bord de l'avion et dans une station au sol tout en respectant un budget de communication. Dans cette approche, la station au sol disposerait de ressources de calcul rapides et de données historiques et l'avion disposerait de ressources de calcul limitées et des données de vol actuelles.Dans cette thèse, nous étudierons les spécificités des données aéronautiques et les méthodes déjà existantes pour produire des prédictions de pannes à partir de ces dernières et nous proposerons une solution au problème posé. Notre contribution sera détaillé en trois parties.Premièrement, nous étudierons le problème de prédiction d'événements rares créé par la haute fiabilité des systèmes aéronautiques. Beaucoup de méthodes d'apprentissage en classification reposent sur des jeux de données équilibrés. Plusieurs approches existent pour corriger le déséquilibre d'un jeu de donnée et nous étudierons leur efficacité sur des jeux de données extrêmement déséquilibrés.Deuxièmement, nous étudierons le problème d'analyse textuelle de journaux car de nombreux systèmes aéronautiques ne produisent pas d'étiquettes ou de valeurs numériques faciles à interpréter mais des messages de journaux textuels. Nous étudierons les méthodes existantes basées sur une approche statistique et sur l'apprentissage profond pour convertir des messages de journaux textuels en une forme utilisable en entrée d'algorithmes d'apprentissage pour classification. Nous proposerons notre propre méthode basée sur le traitement du langage naturel et montrerons comment ses performances dépassent celles des autres méthodes sur un jeu de donnée public standard.Enfin, nous offrirons une solution au problème posé en proposant un nouvel algorithme d'apprentissage distribué s'appuyant sur deux paradigmes d'apprentissage existant, l'apprentissage actif et l'apprentissage fédéré. Nous détaillerons notre algorithme, son implémentation et fournirons une comparaison de ses performances avec les méthodes existantes / With the rise of the number of sensors and actuators in an aircraft and the development of reliable data links from the aircraft to the ground, it becomes possible to improve aircraft security and maintainability by applying real-time analysis techniques. However, given the limited availability of on-board computing and the high cost of the data links, current architectural solutions cannot fully leverage all the available resources limiting their accuracy.Our goal is to provide a distributed algorithm for failure prediction that could be executed both on-board of the aircraft and on a ground station and that would produce on-board failure predictions in near real-time under a communication budget. In this approach, the ground station would hold fast computation resources and historical data and the aircraft would hold limited computational resources and current flight's data.In this thesis, we will study the specificities of aeronautical data and what methods already exist to produce failure prediction from them and propose a solution to the problem stated. Our contribution will be detailed in three main parts.First, we will study the problem of rare event prediction created by the high reliability of aeronautical systems. Many learning methods for classifiers rely on balanced datasets. Several approaches exist to correct a dataset imbalance and we will study their efficiency on extremely imbalanced datasets.Second, we study the problem of log parsing as many aeronautical systems do not produce easy to classify labels or numerical values but log messages in full text. We will study existing methods based on a statistical approach and on Deep Learning to convert full text log messages into a form usable as an input by learning algorithms for classifiers. We will then propose our own method based on Natural Language Processing and show how it outperforms the other approaches on a public benchmark.Last, we offer a solution to the stated problem by proposing a new distributed learning algorithm that relies on two existing learning paradigms Active Learning and Federated Learning. We detail our algorithm, its implementation and provide a comparison of its performance with existing methods Apprentissage automatique Analyse temps réel Architecture logicielle répartie Analyse de grands volumes de données Détéction d'anomalies Machine learning Real-time analysis Distributed software architecture Big data analysis Anomaly detection
10	Essays on Business Cycles Fluctuations and Forecasting Methods Pacce, Matías José 03 July 2017 (has links) This doctoral dissertation proposes methodologies which, from a linear or a non-linear approach, accommodate to the information flow and can deal with a large amount of data. The empirical application of the proposed methodologies contributes to answer some of the questions that have emerged or that it has potentiated after the 2008 global crisis. Thus, essential aspects of the macroeconomic analysis are studied, like the identification and forecast of business cycles turning points, the business cycles interactions between countries or the development of tools able to forecast the evolution of key economic indicators based on new data sources, like those which emerge from search engines. Business Cycles Mixed Frequency Data Bayesian Estimation Markov-switching International Business Cycles Spillover effects Tourism Big Data Analysis Time Series Fundamentos del Análisis Económico

Search results