Spelling suggestions: "subject:"achine 1earning inn cloud computing"" "subject:"achine 1earning inn cloud acomputing""
1 |
AIOPS–Driven Adaptive Anomaly Detection in Evolving Cloud Environments Using Transfer LearningShivakumar, Mayur 01 April 2025 (has links) (PDF)
As cloud-based microservice architectures have become the foundation of contempo- rary enterprise solutions, performance interference, wherein co-located services com- pete for shared resources, remains a significant challenge. This phenomenon, often referred to as the noisy neighbor problem, manifests when one workload unexpect- edly increases the CPU, memory, disk I/O, or network consumption, resulting in latency spikes or throughput degradation for other services. While existing isolation mechanisms (e.g., cgroups and QoS policies) provide some mitigation, they rarely prevent contention entirely, particularly in dynamic, rapidly evolving environments with frequent code deployments.
This thesis proposes an AIOps-driven adaptive anomaly detection framework that integrates drift detection and Transfer Learning for real-time monitoring of con- tainerized workloads. By focusing on operational metrics, including CPU, memory, I/O, and network usage, rather than solely application-level features, the system de- tects early performance interference signals. A key innovation is the application of the Kolmogorov-Smirnov (KS) test to identify statistically significant shifts in these resource distributions and automatically differentiate between benign updates and potentially disruptive anomalies. When the drift threshold is exceeded, selective re- training is initiated, ensuring that the models maintain accuracy without incurring the overhead of indiscriminate or excessively frequent retraining.
Experimental validation employs Acme Air and the DeathStarBench Social Network, two diverse microservice platforms that exhibit varied resource consumption patterns. By systematically introducing new functionalities and monitoring workload evolution, the framework demonstrates how KS-based drift detection surpasses conventional static-threshold methods in identifying early-stage noisy neighbor scenarios. Transfer Learning preserves prior knowledge while adapting rapidly to novel resource-usage profiles, offering a cost-effective approach for continuous, high-precision anomaly detection.
In conclusion, this thesis bridges data drift detection and system-level performance monitoring within an MLOps context. It presents a scalable and proactive strategy for mitigating performance interference in multi-tenant, cloud-native environments, ultimately enhancing reliability and preserving service-level objectives.
|
Page generated in 0.1022 seconds