Global ETD Search

221	Making Sense of Big (Kinematic) Data: A Comprehensive Analysis of Movement Parameters in a Diverse Population Nunis, Naomi Wilma 01 January 2023 (has links) (PDF) OBJECTIVE The purpose of this study was to determine how kinematic, big data can be evaluated using computational, comprehensive analysis of movement parameters in a diverse population. METHODS Retrospective data was collected, cleaned, and reviewed for further analysis of biomechanical movement in an active population using 3D collinear resistance loads. The active sample of the population involved in the study ranged from age 7 to 82 years old and respectively identified as active in 13 different sports. Moreover, a series of exercises were conducted by each participant across multiple sessions. Exercises were measured and recorded based on 6 distinct biometric movement parameters: explosiveness, velocity, power, deceleration, braking, consistency, endurance, and range of motion. Analysis and data visualization portrayed how 3D collinear resistance load impacted specific muscles and performance metrics. RESULTS The model with the highest accuracy rate was Naive Bayes and Fast Large Margin at 58.3% for future predictions considering impact for specific muscles, movement parameters, and performance metric data. The data visualization involved a proof-of-concept human-computer interface and presented each component in relation to one another within the active population database, movement parameters, and performance metrics. DISCUSSION Understanding the findings regarding 3D collinear resistance sets a precedence for future development for the active population and research in the sports analytics field. Additionally, the visual proof of concept interface promotes future development for a diverse, active population. Biomechanics Biostatistics Computer science 3D Collinear Resistance Data Analytics Information Technology Kinesiology Sports Analytics Sports Medicine Biomechanics and Biotransport Biotechnology Computer Sciences Social and Behavioral Sciences Sports Studies
222	Data-driven Strategies for Systemic Risk Mitigation and Resilience Management of Infrastructure Projects Gondia, Ahmed January 2021 (has links) Public infrastructure systems are crucial components of modern urban communities as they play major roles in elevating countries’ socio-economics. However, the inherent complexity and systemic interdependence of infrastructure construction/renewal projects have left sites hindered with multiple forms of performance disruptions (e.g., schedule delays, cost overruns, workplace injuries) that result in long-term consequences such as claims, disputes, and stakeholder dissatisfactions. The evolution of advanced data-driven tools (e.g., machine learning and complex network analytics) can play a pivotal role in driving improvements in the management strategies of complex projects due to such tools’ usefulness in applications related to interdependent systems. In this respect, the research presented in this dissertation is aimed at developing data-driven strategies geared towards a resilience-based approach to managing complex infrastructure projects. Such strategies can support project managers and stakeholders with data-informed decision-making to mitigate the impacts of systemic interdependence-induced risks at different levels of their projects. Specifically, the developed data-driven resilience-based strategies can empower decision-makers with the ability to: i) predict potential performance disruptions based on real-time and dynamic project conditions such that proactive response/mitigation strategies and/or contingencies can be deployed ahead of time; and ii) develop adaptive solutions against potential interdependence-induced cascade project disruptions such that rapid restoration of the most important set of performance targets can be restored. It is important to note that data-driven strategies and other analytics-based approaches are not proposed herein to replace but rather to complement the expertise and sensible judgment of project managers and the capabilities of available analysis tools. Specifically, the enriched predictive and analytical insights together with the proactive and rapid adaptation capabilities facilitated by the developed strategies can empower the new paradigm of resilience-guided management of complex dynamic infrastructure projects. / Thesis / Doctor of Philosophy (PhD) Infrastructure resilience Risk management Project management Construction management Data analytics Machine learning Network science Predictive solutions Adaptive systems Performance indicators Systemic risk Complex systems Interdependence
223	Node Centric Community Detection and Evolutional Prediction in Dynamic Networks Oluwafolake A Ayano (13161288) 27 July 2022 (has links) <p> </p> <p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p> <p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p> <p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p> <p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p> Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Community Detection Dynamic Networks IP Networks Clustering Big Data Analytics
224	Data governance in big data : How to improve data quality in a decentralized organization / Datastyrning och big data Landelius, Cecilia January 2021 (has links) The use of internet has increased the amount of data available and gathered. Companies are investing in big data analytics to gain insights from this data. However, the value of the analysis and decisions made based on it, is dependent on the quality ofthe underlying data. For this reason, data quality has become a prevalent issue for organizations. Additionally, failures in data quality management are often due to organizational aspects. Due to the growing popularity of decentralized organizational structures, there is a need to understand how a decentralized organization can improve data quality. This thesis conducts a qualitative single case study of an organization currently shifting towards becoming data driven and struggling with maintaining data quality within the logistics industry. The purpose of the thesis is to answer the questions: • RQ1: What is data quality in the context of logistics data? • RQ2: What are the obstacles for improving data quality in a decentralized organization? • RQ3: How can these obstacles be overcome? Several data quality dimensions were identified and categorized as critical issues,issues and non-issues. From the gathered data the dimensions completeness, accuracy and consistency were found to be critical issues of data quality. The three most prevalent obstacles for improving data quality were data ownership, data standardization and understanding the importance of data quality. To overcome these obstacles the most important measures are creating data ownership structures, implementing data quality practices and changing the mindset of the employees to a data driven mindset. The generalizability of a single case study is low. However, there are insights and trends which can be derived from the results of this thesis and used for further studies and companies undergoing similar transformations. / Den ökade användningen av internet har ökat mängden data som finns tillgänglig och mängden data som samlas in. Företag påbörjar därför initiativ för att analysera dessa stora mängder data för att få ökad förståelse. Dock är värdet av analysen samt besluten som baseras på analysen beroende av kvaliteten av den underliggande data. Av denna anledning har datakvalitet blivit en viktig fråga för företag. Misslyckanden i datakvalitetshantering är ofta på grund av organisatoriska aspekter. Eftersom decentraliserade organisationsformer blir alltmer populära, finns det ett behov av att förstå hur en decentraliserad organisation kan arbeta med frågor som datakvalitet och dess förbättring. Denna uppsats är en kvalitativ studie av ett företag inom logistikbranschen som i nuläget genomgår ett skifte till att bli datadrivna och som har problem med att underhålla sin datakvalitet. Syftet med denna uppsats är att besvara frågorna: • RQ1: Vad är datakvalitet i sammanhanget logistikdata? • RQ2: Vilka är hindren för att förbättra datakvalitet i en decentraliserad organisation? • RQ3: Hur kan dessa hinder överkommas? Flera datakvalitetsdimensioner identifierades och kategoriserades som kritiska problem, problem och icke-problem. Från den insamlade informationen fanns att dimensionerna, kompletthet, exakthet och konsekvens var kritiska datakvalitetsproblem för företaget. De tre mest förekommande hindren för att förbättra datakvalité var dataägandeskap, standardisering av data samt att förstå vikten av datakvalitet. För att överkomma dessa hinder är de viktigaste åtgärderna att skapa strukturer för dataägandeskap, att implementera praxis för hantering av datakvalitet samt att ändra attityden hos de anställda gentemot datakvalitet till en datadriven attityd. Generaliseringsbarheten av en enfallsstudie är låg. Dock medför denna studie flera viktiga insikter och trender vilka kan användas för framtida studier och för företag som genomgår liknande transformationer. Data quality Data governance Big data analytics Decentralization Organic organizations Datakvalitet Datastyrning Analys av big data Decentralisering Organiska organisationsstrukturer Engineering and Technology Teknik och teknologier Economics and Business Ekonomi och näringsliv
225	Big Data Analytics for Assessing Surface Transportation Systems Jairaj Chetas Desai (12454824) 25 April 2022 (has links) <p> </p> <p>Most new vehicles manufactured in the last two years are connected vehicles (CV) that transmit back to the original equipment manufacturer at near real-time fidelity. These CVs generate billions of data points on an hourly basis, which can provide valuable data to agencies to improve the overall mobility experience for users. However, with this growing scale of CV big data, stakeholders need efficient and scalable methodologies that allow agencies to draw actionable insights from this large-scale data for daily operational use. This dissertation presents a suite of applications, illustrated through case studies, that use CV data for assessing and managing mobility and safety on surface transportation systems.</p> <p>A systematic review of construction zone CV data and crashes on Indiana’s interstates for the calendar year 2019, found a strong correlation between crashes and hard-braking event data reported by CVs. Trajectory-level CV data analyzed for a construction zone on interstate 70 provided valuable insights into travel time and traffic signal performance impacts on the surrounding road network. An 11-state analysis of electric and hybrid vehicle usage in proximity to public charging stations highlighted regions under and overserved by charging infrastructure, providing quantitative support for infrastructure investment allocations informed by real-world usage trends. CV data were further leveraged to document route choice behavior during active freeway incidents providing stakeholders with a historical record of observed routing patterns to inform future alternate route planning strategies. CV trajectory data analysis facilitated the identification of trip chaining activities resulting in improved outlier curation and realistic estimation of travel time metrics.</p> <p>The overall contribution of this thesis is developing analytical big data procedures to process billions of CV data records to inform engineering and public policy investments in infrastructure capacity, highway safety improvements, and new EV infrastructure. These scalable and efficient analysis techniques proposed in this dissertation will help agencies at the federal, state and local levels in addition to private sector stakeholders in assessing transportation system performance at-scale and enable informed data-driven decision making.</p> Transport engineering Big Data analytics applications Transportation infrastructure connected vehicle (CV) Connected and Autonomous Vehicles Trajectory data Transport Engineering
226	Predicting the Effects of Sedative Infusion on Acute Traumatic Brain Injury Patients McCullen, Jeffrey Reynolds 09 April 2020 (has links) Healthcare analytics has traditionally relied upon linear and logistic regression models to address clinical research questions mostly because they produce highly interpretable results [1, 2]. These results contain valuable statistics such as p-values, coefficients, and odds ratios that provide healthcare professionals with knowledge about the significance of each covariate and exposure for predicting the outcome of interest [1]. Thus, they are often favored over new deep learning models that are generally more accurate but less interpretable and scalable. However, the statistical power of linear and logistic regression is contingent upon satisfying modeling assumptions, which usually requires altering or transforming the data, thereby hindering interpretability. Thus, generalized additive models are useful for overcoming this limitation while still preserving interpretability and accuracy. The major research question in this work involves investigating whether particular sedative agents (fentanyl, propofol, versed, ativan, and precedex) are associated with different discharge dispositions for patients with acute traumatic brain injury (TBI). To address this, we compare the effectiveness of various models (traditional linear regression (LR), generalized additive models (GAMs), and deep learning) in providing guidance for sedative choice. We evaluated the performance of each model using metrics for accuracy, interpretability, scalability, and generalizability. Our results show that the new deep learning models were the most accurate while the traditional LR and GAM models maintained better interpretability and scalability. The GAMs provided enhanced interpretability through pairwise interaction heat maps and generalized well to other domains and class distributions since they do not require satisfying the modeling assumptions used in LR. By evaluating the model results, we found that versed was associated with better discharge dispositions while ativan was associated with worse discharge dispositions. We also identified other significant covariates including age, the Northeast region, the Acute Physiology and Chronic Health Evaluation (APACHE) score, Glasgow Coma Scale (GCS), and ethanol level. The versatility of versed may account for its association with better discharge dispositions while ativan may have negative effects when used to facilitate intubation. Additionally, most of the significant covariates pertain to the clinical state of the patient (APACHE, GCS, etc.) whereas most non-significant covariates were demographic (gender, ethnicity, etc.). Though we found that deep learning slightly improved over LR and generalized additive models after fine-tuning the hyperparameters, the deep learning results were less interpretable and therefore not ideal for making the aforementioned clinical insights. However deep learning may be preferable in cases with greater complexity and more data, particularly in situations where interpretability is not as critical. Further research is necessary to validate our findings, investigate alternative modeling approaches, and examine other outcomes and exposures of interest. / Master of Science / Patients with Traumatic Brain Injury (TBI) often require sedative agents to facilitate intubation and prevent further brain injury by reducing anxiety and decreasing level of consciousness. It is important for clinicians to choose the sedative that is most conducive to optimizing patient outcomes. Hence, the purpose of our research is to provide guidance to aid this decision. Additionally, we compare different modeling approaches to provide insights into their relative strengths and weaknesses. To achieve this goal, we investigated whether the exposure of particular sedatives (fentanyl, propofol, versed, ativan, and precedex) was associated with different hospital discharge locations for patients with TBI. From best to worst, these discharge locations are home, rehabilitation, nursing home, remains hospitalized, and death. Our results show that versed was associated with better discharge locations and ativan was associated with worse discharge locations. The fact that versed is often used for alternative purposes may account for its association with better discharge locations. Further research is necessary to further investigate this and the possible negative effects of using ativan to facilitate intubation. We also found that other variables that influence discharge disposition are age, the Northeast region, and other variables pertaining to the clinical state of the patient (severity of illness metrics, etc.). By comparing the different modeling approaches, we found that the new deep learning methods were difficult to interpret but provided a slight improvement in performance after optimization. Traditional methods such as linear regression allowed us to interpret the model output and make the aforementioned clinical insights. However, generalized additive models (GAMs) are often more practical because they can better accommodate other class distributions and domains. Data Mining Data Analytics eICU Database Healthcare Health Analytics Machine learning ICU Deep learning (Machine learning) Neural Networks Linear Regression Traumatic Brain Injury Classification Regression Sedatives
227	Tillförlitlighet hos Big Social Data : En fallstudie om upplevd problematik kopplat till beslutfattande i en organisationskontext Rangnitt, Eric, Wiljander, Louise January 2020 (has links) Den växande globala användningen av sociala medier skapar enorma mängder social data online, kallat för Big Social Data (BSD). Tidigare forskning lyfter problem med att BSD ofta har bristande tillförlitlighet som underlag vid beslutsfattande och att det är starkt kopplat till dataoch informationskvalitet. Det finns dock en avsaknad av forskning som redogör för praktikers perspektiv på detta. Därför undersökte denna studie vad som upplevs problematiskt kring transformation av BSD till tillförlitlig information för beslutsfattande i en organisationskontext, samt hur detta skiljer sig i teori jämfört med praktik. En fallstudie gjordes av mjukvaruföretaget SAS Institute (SAS). Datainsamlingen genomfördes via intervjuer samt insamling av dokument och resultaten analyserades kvalitativt. Studien gjorde många intressanta fynd gällande upplevda problem kopplat till transformation av BSD, bl.a. hög risk för partisk data och låg analysmognad, samt flera skillnader mellan teori och praktik. Tidigare forskning gör inte heller skillnad mellan begreppen datakvalitet och informationskvalitet, vilket görs i praktiken. / The growing use of social media generates enormous amounts of online social data, called Big Social Data (BSD). Previous research highlights problems with BSD reliability related to decision making, and that reliability is strongly connected to data quality and information quality. However, there is a lack of research with a focus on practitioners’ perspectives on this matter. To address this gap, this study set out to investigate what is perceived as a problem when transforming BSD into reliable information for decision making in an organisational context, and also how this differs in theory compared with practice. A case study was conducted of the software company SAS Institute (SAS). Data collection was done through interviews and gathering of documents, and results were analysed qualitatively. The study resulted in many interesting findings regarding perceived problems connected to the transformation of BSD, e.g. high risk of biased data and low maturity regarding data analysis, as well as several differences between theory and practice. Furthermore, previous research makes no distinction between the terms data quality and information quality, but this is done in practice. Social media Data analysis Big Social Data Social Media Analytics Big Social Data Analytics Data quality Information quality Reliability Veracity Decision making Sociala medier Dataanalys Big Social Data Social Media Analytics Big Social Data Analytics Datakvalitet Informationskvalitet Tillförlitlighet Veracity Beslutsfattande Information Systems, Social aspects Information Systems
228	透過Spark平台實現大數據分析與建模的比較：以微博為例 / Accomplish Big Data Analytic and Modeling Comparison on Spark: Weibo as an Example 潘宗哲, Pan, Zong Jhe Unknown Date (has links) 資料的快速增長與變化以及分析工具日新月異，增加資料分析的挑戰，本研究希望透過一個完整機器學習流程，提供學術或企業在導入大數據分析時的參考藍圖。我們以Spark作為大數據分析的計算框架，利用MLlib的Spark.ml與Spark.mllib兩個套件建構機器學習模型，解決傳統資料分析時可能會遇到的問題。在資料分析過程中會比較Spark不同分析模組的適用性情境，首先使用本地端叢集進行開發，最後提交至Amazon雲端叢集加快建模與分析的效能。大數據資料分析流程將以微博為實驗範例，並使用香港大學新聞與傳媒研究中心提供的2012年大陸微博資料集，我們採用RDD、Spark SQL與GraphX萃取微博使用者貼文資料的特增值，並以隨機森林建構預測模型，來預測使用者是否具有官方認證的二元分類。 / The rapid growth of data volume and advanced data analytics tools dramatically increase the challenge of big data analytics services adoption. This paper presents a big data analytics pipeline referenced blueprint for academic and company when they consider importing the associated services. We propose to use Apache Spark as a big data computing framework, which Spark MLlib contains two packages Spark.ml and Spark.mllib, on building a machine learning model. This resolves the traditional data analytics problem. In this big data analytics pipeline, we address a situation for adopting suitable Spark modules. We first use local cluster to develop our data analytics project following the jobs submitted to AWS EC2 clusters to accelerate analytic performance. We demonstrate the proposed big data analytics blueprint by using 2012 Weibo datasets. Finally, we use Spark SQL and GraphX to extract information features from large amount of the Weibo users’ posts. The official certification prediction model is constructed for Weibo users through Random Forest algorithm. 大數據分析機器學習微博分析流程亞馬遜雲端服務 Big data analytics machine learning Weibo analytics pipeline Amazon EC2
229	Security Analytics: Using Deep Learning to Detect Cyber Attacks Lambert, Glenn M, II 01 January 2017 (has links) Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities. The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area. Thesis University of North Florida UNF Big Data Analytics Security Analytics Deep Learning Cyber Attacks Artificial Intelligence and Robotics Information Security
230	Reliable Information Exchange in IIoT : Investigation into the Role of Data and Data-Driven Modelling Lavassani, Mehrzad January 2018 (has links) The concept of Industrial Internet of Things (IIoT) is the tangible building block for the realisation of the fourth industrial revolution. It should improve productivity, efficiency and reliability of industrial automation systems, leading to revenue growth in industrial scenarios. IIoT needs to encompass various disciplines and technologies to constitute an operable and harmonious system. One essential requirement for a system to exhibit such behaviour is reliable exchange of information. In industrial automation, the information life-cycle starts at the field level, with data collected by sensors, and ends at the enterprise level, where that data is processed into knowledge for business decision making. In IIoT, the process of knowledge discovery is expected to start in the lower layers of the automation hierarchy, and to cover the data exchange between the connected smart objects to perform collaborative tasks. This thesis aims to assist the comprehension of the processes for information exchange in IIoT-enabled industrial automation- in particular, how reliable exchange of information can be performed by communication systems at field level given an underlying wireless sensor technology, and how data analytics can complement the processes of various levels of the automation hierarchy. Furthermore, this work explores how an IIoT monitoring system can be designed and developed. The communication reliability is addressed by proposing a redundancy-based medium access control protocol for mission-critical applications, and analysing its performance regarding real-time and deterministic delivery. The importance of the data and the benefits of data analytics for various levels of the automation hierarchy are examined by suggesting data-driven methods for visualisation, centralised system modelling and distributed data streams modelling. The design and development of an IIoT monitoring system are addressed by proposing a novel three-layer framework that incorporates wireless sensor, fog, and cloud technologies. Moreover, an IIoT testbed system is developed to realise the proposed framework. The outcome of this study suggests that redundancy-based mechanisms improve communication reliability. However, they can also introduce drawbacks, such as poor link utilisation and limited scalability, in the context of IIoT. Data-driven methods result in enhanced readability of visualisation, and reduced necessity of the ground truth in system modelling. The results illustrate that distributed modelling can lower the negative effect of the redundancy-based mechanisms on link utilisation, by reducing the up-link traffic. Mathematical analysis reveals that introducing fog layer in the IIoT framework removes the single point of failure and enhances scalability, while meeting the latency requirements of the monitoring application. Finally, the experiment results show that the IIoT testbed works adequately and can serve for the future development and deployment of IIoT applications. / SMART (Smarta system och tjänster för ett effektivt och innovativt samhälle) Industrial Internet of Things Industrial Automation Data Analytics Data-Driven Modelling Distributed Modelling Industrial Wireless Sensor Networks Wireless Sensor Networks Industrial Monitoring Framework Reliability Real-Time Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Communication Systems Kommunikationssystem

Search results