• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 642
  • 99
  • 46
  • 40
  • 22
  • 13
  • 10
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • Tagged with
  • 992
  • 992
  • 992
  • 140
  • 128
  • 107
  • 105
  • 94
  • 93
  • 88
  • 84
  • 83
  • 79
  • 68
  • 63
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
601

Assessing Anonymized System Logs Usefulness for Behavioral Analysis in RNN Models

Vagis, Tom Richard, Ghiasvand, Siavash 06 August 2024 (has links)
Assessing Anonymized System Logs Usefulness for Behavioral Analysis in RNN Models Tom Richard Vargis1,∗, Siavash Ghiasvand1,2 1Technische Universität Dresden, Germany 2Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Germany Abstract System logs are a common source of monitoring data for analyzing computing systems behavior. Due to the complexity of modern computing systems and the large size of collected monitoring data, automated analysis mechanisms are required. Numerous machine learning and deep learning methods are proposed to address this challenge. However, due to the existence of sensitive data in system logs their analysis and storage raise serious privacy concerns. Anonymization methods could be used to cleanse the monitoring data before analysis. However, anonymized system logs in general do not provide an adequate usefulness for majority of behavioral analysis. Content-aware anonymization mechanisms such as 𝑃𝛼𝑅𝑆 preserve the correlation of system logs even after anonymization. This work evaluates the usefulness of anonymized system logs of Taurus HPC cluster anonymized using 𝑃𝛼𝑅𝑆, for behavioural analysis via recurrent neural network models. To facilitate the reproducibility and further development of this work, the implemented prototype and monitoring data are publicly available [12].
602

Machine Learning Algorithms to Study Multi-Modal Data for Computational Biology

Ahmed, Khandakar Tanvir 01 January 2024 (has links) (PDF)
Advancements in high-throughput technologies have led to an exponential increase in the generation of multi-modal data in computational biology. These datasets, comprising diverse biological measurements such as genomics, transcriptomics, proteomics, metabolomics, and imaging data, offer a comprehensive view of biological systems at various levels of complexity. However, integrating and analyzing such heterogeneous data present significant challenges due to differences in data modalities, scales, and noise levels. Another challenge for multi-modal analysis is the complex interaction network that the modalities share. Understanding the intricate interplay between different biological modalities is essential for unraveling the underlying mechanisms of complex biological processes, including disease pathogenesis, drug response, and cellular function. Machine learning algorithms have emerged as indispensable tools for studying multi-modal data in computational biology, enabling researchers to extract meaningful insights, identify biomarkers, and predict biological outcomes. In this dissertation, we first propose a multi-modal integration framework that takes two interconnected data modalities and their interaction network to iteratively update the modalities into new representations with better disease outcome predictive abilities. The deep learning-based model underscores the importance and performance gains achieved through the incorporation of network information into integration process. Additionally, a multi-modal framework is developed to estimate protein expression from mRNA and microRNA (miRNA) expressions, along with the mRNA-miRNA interaction network. The proposed network propagation model simulates in-vivo miRNA regulation on mRNA translation, offering a cost-effective alternative to experimental protein quantification. Analysis reveals that predicted protein expression exhibits a stronger correlation with ground truth protein expression compared to mRNA expression. Moreover, the effectiveness of integrative models is contingent upon the quality of input data modalities and the completeness of interaction networks, with missing values and network noise adversely affecting downstream tasks. To address these challenges, two multi-modal imputation models are proposed, facilitating the imputation of missing values in time series data. The first model allows the imputation of missing values in time series gene expression utilizing single nucleotide polymorphism (SNP) data for children at high risk of type 1 diabetes. The imputed gene expression allows us to predict the progression towards type 1 diabetes at birth with six years prediction horizon. Subsequently, a follow-up study introduces a generalized multi-modal imputation framework capable of imputing missing values in time series data using either another time series or cross-sectional data collected from the same set of samples. These models excel at imputation tasks, whether values are missing randomly or an entire time step in the series is absent. Additionally, leveraging the additional modality, they are able to estimate a completely missing time series without prior values. Finally, to mitigate noise in the interaction network, a link prediction framework for drug-target interaction prediction is developed. This study demonstrates exceptional performance in cold start predictions and investigates the efficacy of large language models for such predictions. Through a comprehensive review and evaluation of state-of-the-art algorithms, this dissertation aims to provide researchers with valuable insights, methodologies, and tools for harnessing the rich information embedded within multi-modal biological datasets.
603

Estimating partial group delay

Zhang, Nien-fan January 1985 (has links)
Partial group delay is a spectral parameter, which measures the time lag between two time series in a system after the spurious effects of the other series in the system have been eliminated. For weakly-stationary processes, estimators for partial group delay are proposed based on indirect and direct approaches. Conditions for weak consistency and asymptotic normality of the proposed estimators are obtained. Applications to a multiple test of partial group delay are investigated. The time lag interpretation of partial group delay is justified, which provides insight into the nature of linear relationships among weakly-stationary processes. Extensions are made to group delay estimation and partial group delay estimation for non-stationary "oscillatory" processes. / Ph. D.
604

Empirical Bayes methods in time series analysis

Khoshgoftaar, Taghi M. January 1982 (has links)
In the case of repetitive experiments of a similar type, where the parameters vary randomly from experiment to experiment, the Empirical Bayes method often leads to estimators which have smaller mean squared errors than the classical estimators. Suppose there is an unobservable random variable θ, where θ ~ G(θ), usually called a prior distribution. The Bayes estimator of θ cannot be obtained in general unless G(θ) is known. In the empirical Bayes method we do not assume that G(θ) is known, but the sequence of past estimates is used to estimate θ. This dissertation involves the empirical Bayes estimates of various time series parameters: The autoregressive model, moving average model, mixed autoregressive-moving average, regression with time series errors, regression with unobservable variables, serial correlation, multiple time series and spectral density function. In each case, empirical Bayes estimators are obtained using the asymptotic distributions of the usual estimators. By Monte Carlo simulation the empirical Bayes estimator of first order autoregressive parameter, ρ, was shown to have smaller mean squared errors than the conditional maximum likelihood estimator for 11 past experiences. / Doctor of Philosophy
605

Enhancing Computational Efficiency in Anomaly Detection with a Cascaded Machine Learning Model

Yu, Teng-Sung January 2024 (has links)
This thesis presents and evaluates a new cascading machine learning model framework for anomaly detection, which are essential for modern industrial applications where computing efficiency is crucial. Traditional deep learning algorithms frequently struggle to effectively deploy in edge computing due to the limitations of processing power and memory. This study addresses the challenge by creating a cascading model framework that strategically combines lightweight and more complex models to improve the efficiency of inference while maintaining the accuracy of detection.  We proposed a cascading model framework consisting of a One-Class Support Vector Machine (OCSVM) for rapid initial anomaly detection and a Variational Autoencoder (VAE) for more precise prediction in uncertain cases. The cascading technique between the OCSVM and VAE enables the system to efficiently handle regular data instances, while assigning more complex analyses only when required. This framework was tested in real-world scenarios, including anomaly detection in air pressure system of automotive industry as well as with the MNIST datasets. These tests demonstrate the framework's practical applicability and effectiveness across diverse settings, underscoring its potential for broad implementation in industrial applications.
606

Towards a Polyalgorithm for Land Use and Land Cover Change Detection

Saxena, Rishu 23 February 2018 (has links)
Earth observation satellites (EOS) such as Landsat provide image datasets that can be immensely useful in numerous application domains. One way of analyzing satellite images for land use and land cover change (LULCC) is time series analysis (TSA). Several algorithms for time series analysis have been proposed by various groups in remote sensing; more algorithms (that can be adapted) are available in the general time series literature. However, in spite of an abundance of algorithms, the choice of algorithm to be used for analyzing an image stack is presently an open question. A concurrent issue is the prohibitive size of Landsat datasets, currently of the order of petabytes and growing. This makes them computationally unwieldy --- both in storage and processing. An EOS image stack typically consists of multiple images of a fixed area on the Earth's surface (same latitudes and longitudes) taken at different time points. Experiments on multicore servers indicate that carrying out meaningful time series analysis on one such interannual, multitemporal stack with existing state of the art codes can take several days. This work proposes using multiple algorithms to analyze a given image stack in a polyalgorithmic framework. A polyalgorithm combines several basic algorithms, each meant to solve the same problem, producing a strategy that unites the strengths and circumvents the weaknesses of constituent algorithms. The foundation of the proposed TSA based polyalgorithm is laid using three algorithms (LandTrendR, EWMACD, and BFAST). These algorithms are precisely described mathematically, and chosen to be fundamentally distinct from each other in design and in the phenomena they capture. Analysis of results representing success, failure, and parameter sensitivity for each algorithm is presented. Scalability issues, important for real simulations, are also discussed, along with scalable implementations, and speedup results. For a given pixel, Hausdorff distance is used to compare the distance between the change times (breakpoints) obtained from two different algorithms. Timesync validation data, a dataset that is based on human interpretation of Landsat time series in concert with historical aerial photography, is used for validation. The polyalgorithm yields more accurate results than EWMACD and LandTrendR alone, but counterintuitively not better than BFAST alone. This nascent work will be directly useful in land use and land cover change studies, of interest to terrestrial science research, especially regarding anthropogenic impacts on the environment, and in much broader applications such as health monitoring and urban transportation. / M. S. / Numerous manmade satellites circling around the Earth regularly take pictures (images) of the Earth’s surface from up above. These images naturally provide information regarding the land cover of any given piece of land at the moment of capture (for e.g., whether the land area in the picture is covered with forests or with agriculture or housing). Therefore, for a fixed land area, if a person looks at a chronologically arranged series of images, any significant changes in land use can be identified. Identifying such changes is of critical importance, especially in this era where deforestation, urbanization, and global warming are major concerns. The goal of this thesis is to investigate the design of methodologies (algorithms) that can efficiently and accurately use satellite images for answering questions regarding land cover trend and change. Experience shows that the state-of-the-art methodologies produce great results for the region they were originally designed on but their performance on other regions is unpredictable. In this work, therefore, a ‘polyalgorithm’ is proposed. A ‘polyalgorithm’ utilizes multiple simple methodologies and strategically combines them so that the outcome is better than the individual components. In this introductory work, three component methodologies are utilized; each component methodology is capable of capturing phenomenon different from the other two. Mathematical formulation of each component methodology is presented. Initial strategy for combining the three component algorithms is proposed. The outcomes of each component methodology as well the polyalgorithm are tested on human interpreted data. The strengths and limitations of each methodology are also discussed. Efficiency of the codes used for implementing the polyalgorithm is also discussed; this is important because the satellite data that needs to be processed is known to be huge (petabytes sized already and growing). This nascent work will be directly useful especially in understanding the impact of human activities on the environment. It will also be useful in other applications such as health monitoring and urban transportation.
607

Eavesdropping-Driven Profiling Attacks on Encrypted WiFi Networks: Unveiling Vulnerabilities in IoT Device Security

Alwhbi, Ibrahim A 01 January 2024 (has links) (PDF)
Abstract—This dissertation investigates the privacy implications of WiFi communication in Internet-of-Things (IoT) environments, focusing on the threat posed by out-of-network observers. Recent research has shown that in-network observers can glean information about IoT devices, user identities, and activities. However, the potential for information inference by out-of-network observers, who do not have WiFi network access, has not been thoroughly examined. The first study provides a detailed summary dataset, utilizing Random Forest for data summary classification. This study highlights the significant privacy threat to WiFi networks and IoT applications from out-of-network observers. Building on this investigation, the second study extends the research by utilizing a new set of time series monitored WiFi data frames and advanced machine learning algorithms, specifically xGboost, for Time Series classification. This extension achieved high accuracy of up to 94\% in identifying IoT devices and their working status, demonstrating faster IoT device profiling while maintaining classification accuracy. Furthermore, the study underscores the ease with which outside intruders can harm IoT devices without joining a WiFi network, launching attacks quickly and leaving no detectable footprints. Additionally, the dissertation presents a comprehensive survey of recent advancements in machine-learning-driven encrypted traffic analysis and classification. Given the challenges posed by encryption for traditional packet and traffic inspection, understanding and classifying encrypted traffic are crucial. The survey provides insights into utilizing machine learning for encrypted network traffic analysis and classification, reviewing state-of-the-art techniques and methodologies. This survey serves as a valuable resource for network administrators, cybersecurity professionals, and policy enforcement entities, offering insights into current practices and future directions in encrypted traffic analysis and classification.
608

Out-of-distribution Recognition and Classification of Time-Series Pulsed Radar Signals / Out-of-distribution Igenkänning och Klassificering av Pulserade Radar Signaler

Hedvall, Paul January 2022 (has links)
This thesis investigates out-of-distribution recognition for time-series data of pulsedradar signals. The classifier is a naive Bayesian classifier based on Gaussian mixturemodels and Dirichlet process mixture models. In the mixture models, we model thedistribution of three pulse features in the time series, namely radio-frequency in thepulse, duration of the pulse, and pulse repetition interval which is the time betweenpulses. We found that simple thresholds on the likelihood can effectively determine ifsamples are out-of-distribution or belong to one of the classes trained on. In addition,we present a simple method that can be used for deinterleaving/pulse classification andshow that it can robustly classify 100 interleaved signals and simultaneously determineif pulses are out-of-distribution. / Det här examensarbetet undersöker hur en maskininlärnings-modell kan anpassas för attkänna igen när pulserade radar-signaler inte tillhör samma fördelning som modellen är tränadmed men också känna igen om signalen tillhör en tidigare känd klass. Klassifieringsmodellensom används här är en naiv Bayesiansk klassifierare som använder sig av Gaussian mixturemodels och Dirichlet Process mixture models. Modellen skapar en fördelning av tidsseriedatan för pulserade radar-signaler och specifikt för frekvensen av varje puls, pulsens längd och tiden till nästa puls. Genom att sätta gränser i sannolikheten av varje puls eller sannolikhetenav en sekvens kan vi känna igen om datan är okänd eller tillhör en tidigare känd klass.Vi presenterar även en enkel metod för att klassifiera specifika pulser i sammanhang närflera signaler överlappar och att metoden kan användas för att robust avgöra om pulser ärokända.
609

Time-series analysis of the relationship between influenza-like illness and mortality due to respiratory and cardiovascular diseases in Hong Kong

Lau, Siu-pik, 劉少碧 January 2005 (has links)
published_or_final_version / Community Medicine / Master / Master of Public Health
610

時間數列分析及其應用的研究

詹正基 Unknown Date (has links)
在現代廣泛變動的經濟社會裡,任何一個經濟體系都需對未來的活動作一有系統有組織的規劃,包括國家政府的經濟政策,各種產業與個別廠商的經營決策,都需對未來的活動作一正確而完善的計畫。這種對未來的展望與規劃都基於過去的活動,事實,及經驗,利用過去所發生的現象來預定未來可能發生的情況。對過去所發生事實的瞭解,以致於對未來的預定,最簡單的方法,就是時間數列分析(Time Series Analysis)。傳統之時間數列分析建立在數列資料是由長期趨勢、季節變動、循環與偶發移動,四成分所組成的,企業經理人員與經濟學者應用時間數列分析的方法,瞭解過去之企業活動與經濟現象,以便對未來的活動作一適當的預測與控制。 本篇依據傳統之時間數列分析方法,從事有關經濟與企業活動的長期趨勢。季節變動、循環波動與偶發移動之測定與分析。為求使本篇臻於完善起見,首先於第二章就傳統時間數列之成分與特質加以闡述,並就時間數列本身作一隨機獨立之檢定與資料之數序相關(Serival Correlation)檢定。由於時間數列資料室隨著時間變數作一有順序之排列,不同於自母體隨意抽取之樣本,因而在尚未從事長期趨勢測定之前,受先需對原數列資料作一簡單之隨機獨立檢定與數序相關檢定。 長期趨勢成分在時間數列分析裡是最為重要的,不僅有助於一般經濟與企業活動的計畫,同時可幫助研究其他影響數列資料之變動。測定長期趨勢通常以趨勢方程式表示在適當選定趨勢模型時,為了配合數列之長期趨勢對於趨勢測定期與配合長期趨勢之趨勢模型選擇以及求算方法將於第三章作一詳盡的解說,並加以比較。 季節變動通常在月別、季別、週別或日別數列裡出現,主要由於季節變動是在一年內有規則之節奏移動,季節變動一般以指數形成表示。因此於第四章將對測定時間數列之季節指數方法與優劣加以分析?以求得合理而正確之季節變動效果。 循環波動與偶發移動成分在時間數列中是以殘值的方式來決定的,因此循環與偶發移動之測定值為估計值。循環波動估計乃在於瞭解經濟或企業活動之一般狀況,因此循環波動估計對企業管理相當重要。第五章就循環波動與偶發移動估計之方法加以比較,並對循環一偶發移動估計加以評論。 長期趨勢分析之直接實用價值在於長期預測,同時季節變動分析實用於短期預測,循環波動與偶發移動估計的分析,對於企業經理人員之預測價值雖然不確實,但是可瞭解經濟與企業活動之狀況。因此時間數列分析可應用於企業一般活動預測應用傳統之時間數列分析方法從事預測工作時,對於未來活動之預測是大約的估計,為求此估計值之偏估為極小企業經理人員或研究者,必須對長期趨勢測定,季節變動測定與循環一偶發移動估計作一合理而正確的分析,同時應充分瞭解時間數列分析的特質與方法,只能做為企業決策之參考,因此仍須不斷的分析一般企業環境因素,以便適應動態之經濟活動,亦即本篇主旨。

Page generated in 0.0908 seconds