Global ETD Search

41	EXPLORING HEALTH WEBSITE USERS BY WEB MINING Kong, Wei 07 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / With the continuous growth of health information on the Internet, providing user-orientated health service online has become a great challenge to health providers. Understanding the information needs of the users is the first step to providing tailored health service. The purpose of this study is to examine the navigation behavior of different user groups by extracting their search terms and to make some suggestions to reconstruct a website for more customized Web service. This study analyzed five months’ of daily access weblog files from one local health provider’s website, discovered the most popular general topics and health related topics, and compared the information search strategies for both patient/consumer and doctor groups. Our findings show that users are not searching health information as much as was thought. The top two health topics which patients are concerned about are children’s health and occupational health. Another topic that both user groups are interested in is medical records. Also, patients and doctors have different search strategies when looking for information on this website. Patients get back to the previous page more often, while doctors usually go to the final page directly and then leave the page without coming back. As a result, some suggestions to redesign and improve the website are discussed; a more intuitive portal and more customized links for both user groups are suggested. Web Mining Web Log Analysis Internet Search Term Computer Interface Information Seeking Behavior Patient Doctor Data mining Web sites -- Evaluation Medical informatics
42	Discover patterns within train log data using unsupervised learning and network analysis Guo, Zehua January 2022 (has links) With the development of information technology in recent years, log analysis has gradually become a hot research topic. However, manual log analysis requires specialized knowledge and is a time-consuming task. Therefore, more and more researchers are searching for ways to automate log analysis. In this project, we explore methods for train log analysis using natural language processing and unsupervised machine learning. Multiple language models are used in this project to extract word embeddings, one of which is the traditional language model TF-IDF, and the other three are the very popular transformer-based model, BERT, and its variants, the DistilBERT and the RoBERTa. In addition, we also compare two unsupervised clustering algorithms, the DBSCAN and the Mini-Batch k-means. The silhouette coefficient and Davies-Bouldin score are utilized for evaluating the clustering performance. Moreover, the metadata of the train logs is used to verify the effectiveness of the unsupervised methods. Apart from unsupervised learning, network analysis is applied to the train log data in order to explore the connections between the patterns, which are identified by train control system experts. Network visualization and centrality analysis are investigated to analyze the relationship and, in terms of graph theory, importance of the patterns. In general, this project provides a feasible direction to conduct log analysis and processing in the future. / I och med informationsteknologins utveckling de senaste åren har logganalys gradvis blivit ett hett forskningsämne. Manuell logganalys kräver dock specialistkunskap och är en tidskrävande uppgift. Därför söker fler och fler forskare efter sätt att automatisera logganalys. I detta projekt utforskar vi metoder för tåglogganalys med hjälp av naturlig språkbehandling och oövervakad maskininlärning. Flera språkmodeller används i detta projekt för att extrahera ordinbäddningar, varav en är den traditionella språkmodellen TF-IDF, och de andra tre är den mycket populära transformatorbaserade modellen, BERT, och dess varianter, DistilBERT och RoBERTa. Dessutom jämför vi två oövervakade klustringsalgoritmer, DBSCAN och Mini-Batch k-means. Siluettkoefficienten och Davies-Bouldin-poängen används för att utvärdera klustringsprestandan. Dessutom används tågloggarnas metadata för att verifiera effektiviteten hos de oövervakade metoderna. Förutom oövervakad inlärning tillämpas nätverksanalys på tågloggdata för att utforska sambanden mellan mönstren, som identifieras av experter på tågstyrsystem. Nätverksvisualisering och centralitetsanalys undersöks för att analysera sambandet och grafteoriskt betydelsen av mönstren mönstren. I allmänhet ger detta projekt en genomförbar riktning för att genomföra logganalys och bearbetning i framtiden. Log analysis Natural language processing Unsupervised learning Clustering Network analysis Logganalys Bearbetning av naturligt språk Oövervakat lärande Clustering Nätverksanalys Computer and Information Sciences Data- och informationsvetenskap
43	Integrating Telecommunications-Specific Language Models into a Trouble Report Retrieval Approach / Integrering av telekommunikationsspecifika språkmodeller i en metod för hämtning av problemrapporter Bosch, Nathan January 2022 (has links) In the development of large telecommunications systems, it is imperative to identify, report, analyze and, thereafter, resolve both software and hardware faults. This resolution process often relies on written trouble reports (TRs), that contain information about the observed fault and, after analysis, information about why the fault occurred and the decision to resolve the fault. Due to the scale and number of TRs, it is possible that a newly written fault is very similar to previously written faults, e.g., a duplicate fault. In this scenario, it can be beneficial to retrieve similar TRs that have been previously created to aid the resolution process. Previous work at Ericsson [1], introduced a multi-stage BERT-based approach to retrieve similar TRs given a newly written fault observation. This approach significantly outperformed simpler models like BM25, but suffered from two major challenges: 1) it did not leverage the vast non-task-specific telecommunications data at Ericsson, something that had seen success in other work [2], and 2) the model did not generalize effectively to TRs outside of the telecommunications domain it was trained on. In this thesis, we 1) investigate three different transfer learning strategies to attain stronger performance on a downstream TR duplicate retrieval task, notably focusing on effectively integrating existing telecommunicationsspecific language data into the model fine-tuning process, 2) investigate the efficacy of catastrophic forgetting mitigation strategies when fine-tuning the BERT models, and 3) identify how well the models perform on out-of-domain TR data. We find that integrating existing telecommunications knowledge through the form of a pretrained telecommunications-specific language model into our fine-tuning strategies allows us to outperform a domain adaptation fine-tuning strategy. In addition to this, we find that Elastic Weight Consolidation (EWC) is an effective strategy for mitigating catastrophic forgetting and attaining strong downstream performance on the duplicate TR retrieval task. Finally, we find that the generalizability of models is strong enough to perform reasonably effectively on out-of-domain TR data, indicating that the approaches may be eligible in a real-world deployment. / Vid utvecklingen av stora telekommunikationssystem är det absolut nödvändigt att identifiera, rapportera, analysera och därefter lösa både mjukvaru och hårdvarufel. Denna lösningsprocess bygger ofta på noggrant skrivna felrapporter (TRs), som innehåller information om det observerade felet och, efter analys, information om varför felet uppstod och beslutet att åtgärda felet. På grund av skalan och antalet TR:er är det möjligt att ett nyskrivet fel är mycket likt tidigare skrivna fel, t.ex. ett duplikatfel. I det här scenariot kan det vara mycket fördelaktigt att hämta tidigare skapade, liknande TR:er för att underlätta upplösningsprocessen. Tidigare arbete på Ericsson [1], introducerade en flerstegs BERT-baserad metod för att hämta liknande TRs givet en nyskriven felobservation. Detta tillvägagångssätt överträffade betydligt enklare modeller som BM-25, men led av två stora utmaningar: 1) det utnyttjade inte den stora icke-uppgiftsspecifika telekommunikationsdatan hos Ericsson, något som hade sett framgång i annat arbete [2], och 2) modellen generaliserades inte effektivt till TR:er utanför den telekommunikationsdomän som den bildades på. I den här masteruppsatsen undersöker vi 1) tre olika strategier för överföringsinlärning för att uppnå starkare prestanda på en nedströms TR dubbletthämtningsuppgift, varav några fokuserar på att effektivt integrera fintliga telekommunikationsspecifika språkdata i modellfinjusteringsprocessen, 2) undersöker effektiviteten av katastrofala missglömningsreducerande strategier vid finjustering av BERT-modellerna, och 3) identifiera hur väl modellerna presterar på TR-data utanför domänen. Resultatet är genom att integrera befintlig telekommunikationskunskap i form av en förtränad telekommunikationsspecifik språkmodell i våra finjusteringsstrategier kan vi överträffa en finjusteringsstrategi för domänanpassning. Utöver detta har vi fåt fram att EWC är en effektiv strategi för att mildra katastrofal glömska och uppnå stark nedströmsprestanda på dubbla TR hämtningsuppgiften. Slutligen finner vi att generaliserbarheten av modeller är tillräckligt stark för att prestera någorlunda effektivt på TR-data utanför domänen, vilket indikerar att tillvägagångssätten som beskrivs i denna avhandling kan vara kvalificerade i en verklig implementering. information retrieval neural ranking trouble reports log analysis natural language processing informationssökning neural rangordning felrapporter logganalys naturlig språkbehandling Computer and Information Sciences Data- och informationsvetenskap
44	Extraction automatique de protocoles de communication pour la composition de services Web / Automatic extraction of communication protocols for web services composition Musaraj, Kreshnik 13 December 2010 (has links) La gestion des processus-métiers, des architectures orientées-services et leur rétro-ingénierie s’appuie fortement sur l’extraction des protocoles-métier des services Web et des modèles des processus-métiers à partir de fichiers de journaux. La fouille et l’extraction de ces modèles visent la (re)découverte du comportement d'un modèle mis en œuvre lors de son exécution en utilisant uniquement les traces d'activité, ne faisant usage d’aucune information a priori sur le modèle cible. Notre étude préliminaire montre que : (i) une minorité de données sur l'interaction sont enregistrées par le processus et les architectures de services, (ii) un nombre limité de méthodes d'extraction découvrent ce modèle sans connaître ni les instances positives du protocole, ni l'information pour les déduire, et (iii) les approches actuelles se basent sur des hypothèses restrictives que seule une fraction des services Web issus du monde réel satisfont. Rendre possible l'extraction de ces modèles d'interaction des journaux d'activité, en se basant sur des hypothèses réalistes nécessite: (i) des approches qui font abstraction du contexte de l'entreprise afin de permettre une utilisation élargie et générique, et (ii) des outils pour évaluer le résultat de la fouille à travers la mise en œuvre du cycle de vie des modèles découverts de services. En outre, puisque les journaux d'interaction sont souvent incomplets, comportent des erreurs et de l’information incertaine, alors les approches d'extraction proposées dans cette thèse doivent être capables de traiter ces imperfections correctement. Nous proposons un ensemble de modèles mathématiques qui englobent les différents aspects de la fouille des protocoles-métiers. Les approches d’extraction que nous présentons, issues de l'algèbre linéaire, nous permettent d'extraire le protocole-métier tout en fusionnant les étapes classiques de la fouille des processus-métiers. D'autre part, notre représentation du protocole basée sur des séries temporelles des variations de densité de flux permet de récupérer l'ordre temporel de l'exécution des événements et des messages dans un processus. En outre, nous proposons la définition des expirations propres pour identifier les transitions temporisées, et fournissons une méthode pour les extraire en dépit de leur propriété d'être invisible dans les journaux. Finalement, nous présentons un cadre multitâche visant à soutenir toutes les étapes du cycle de vie des workflow de processus et des protocoles, allant de la conception à l'optimisation. Les approches présentées dans ce manuscrit ont été implantées dans des outils de prototypage, et validées expérimentalement sur des ensembles de données et des modèles de processus et de services Web. Le protocole-métier découvert, peut ensuite être utilisé pour effectuer une multitude de tâches dans une organisation ou une entreprise. / Business process management, service-oriented architectures and their reverse engineering heavily rely on the fundamental endeavor of mining business process models and Web service business protocols from log files. Model extraction and mining aim at the (re)discovery of the behavior of a running model implementation using solely its interaction and activity traces, and no a priori information on the target model. Our preliminary study shows that : (i) a minority of interaction data is recorded by process and service-aware architectures, (ii) a limited number of methods achieve model extraction without knowledge of either positive process and protocol instances or the information to infer them, and (iii) the existing approaches rely on restrictive assumptions that only a fraction of real-world Web services satisfy. Enabling the extraction of these interaction models from activity logs based on realistic hypothesis necessitates: (i) approaches that make abstraction of the business context in order to allow their extended and generic usage, and (ii) tools for assessing the mining result through implementation of the process and service life-cycle. Moreover, since interaction logs are often incomplete, uncertain and contain errors, then mining approaches proposed in this work need to be capable of handling these imperfections properly. We propose a set of mathematical models that encompass the different aspects of process and protocol mining. The extraction approaches that we present, issued from linear algebra, allow us to extract the business protocol while merging the classic process mining stages. On the other hand, our protocol representation based on time series of flow density variations makes it possible to recover the temporal order of execution of events and messages in the process. In addition, we propose the concept of proper timeouts to refer to timed transitions, and provide a method for extracting them despite their property of being invisible in logs. In the end, we present a multitask framework aimed at supporting all the steps of the process workflow and business protocol life-cycle from design to optimization.The approaches presented in this manuscript have been implemented in prototype tools, and experimentally validated on scalable datasets and real-world process and web service models.The discovered business protocols, can thus be used to perform a multitude of tasks in an organization or enterprise. Fouille de données Analyse de journaux d’interaction Inférence de modèles Extraction de connaissances Protocole-métier Workflow Service Web Data mining Interaction log analysis Model inference Knowledge extraction Business protocol Workflow Web service 004.6
45	Cooperative security log analysis using machine learning : Analyzing different approaches to log featurization and classification / Kooperativ säkerhetslogganalys med maskininlärning Malmfors, Fredrik January 2022 (has links) This thesis evaluates the performance of different machine learning approaches to log classification based on a dataset derived from simulating intrusive behavior towards an enterprise web application. The first experiment consists of performing attacks towards the web app in correlation with the logs to create a labeled dataset. The second experiment consists of one unsupervised model based on a variational autoencoder and four super- vised models based on both conventional feature-engineering techniques with deep neural networks and embedding-based feature techniques followed by long-short-term memory architectures and convolutional neural networks. With this dataset, the embedding-based approaches performed much better than the conventional one. The autoencoder did not perform well compared to the supervised models. To conclude, embedding-based ap- proaches show promise even on datasets with different characteristics compared to natural language. Machine learning word embeddings deep learning LSTM CNN auto encoder NLP natural language processing intrusion detection log analysis logs log classification anomaly detection supervised learning unsupervised learning Computer Sciences Datavetenskap (datalogi)
46	Subsurface Depositional Systems Analysis of the Cambrian Eau Claire Formation in Western Ohio Laneville, Michael Warren 26 November 2018 (has links) No description available. Geology Eau Claire Formation Mt Simon Sandstone Depositional Systems Analysis Sequence Stratigraphy Basin Analysis Lithofacies Analysis Geophysical Log Analysis Core Analysis Estuarine Environments Bayhead Delta Tide Dominant Wave Dominant Fluvial
47	Analysing User Viewing Behaviour in Video Streaming Services Markou, Ioannis January 2021 (has links) The user experience offered by a video streaming service plays a fundamental role in customer satisfaction. This experience can be degraded by poor playback quality and buffering issues. These problems can be caused by a user demand that is higher than the video streaming service capacity. Resource scaling methods can increase the available resources to cover the need. However, most resource scaling systems are reactive and scale up in an automated fashion when a certain demand threshold is exceeded. During popular live streaming content, the demand can be so high that even by scaling up at the last minute, the system might still be momentarily under-provisioned, resulting in a bad user experience. The solution to this problem is proactive scaling which is event-based, using content-related information to scale up or down, according to knowledge from past events. As a result, proactive resource scaling is a key factor in providing reliable video streaming services. Users viewing habits heavily affect demand. To provide an accurate model for proactive resource scaling tools, these habits need to be modelled. This thesis provides such a forecasting model for user views that can be used by a proactive resource scaling mechanism. This model is created by applying machine learning algorithms to data from both live TV and over-the-top streaming services. To produce a model with satisfactory accuracy, numerous data attributes were considered relating to users, content and content providers. The findings of this thesis show that user viewing demand can be modelled with high accuracy, without heavily relying on user-related attributes but instead by analysing past event logs and with knowledge of the schedule of the content provider, whether it is live tv or a video streaming service. / Användarupplevelsen som erbjuds av en videostreamingtjänst spelar en grundläggande roll för kundnöjdheten. Denna upplevelse kan försämras av dålig uppspelningskvalitet och buffertproblem. Dessa problem kan orsakas av en efterfrågan från användare som är högre än videostreamingtjänstens kapacitet. Resursskalningsmetoder kan öka tillgängliga resurser för att täcka behovet. De flesta resursskalningssystem är dock reaktiva och uppskalas automatiskt när en viss behovströskel överskrids. Under populärt livestreaminginnehåll kan efterfrågan vara så hög att även genom att skala upp i sista minuten kan systemet fortfarande vara underutnyttjat tillfälligt, vilket resulterar i en dålig användarupplevelse. Lösningen på detta problem är proaktiv skalning som är händelsebaserad och använder innehållsrelaterad information för att skala upp eller ner, enligt kunskap från tidigare händelser. Som ett resultat är proaktiv resursskalning en nyckelfaktor för att tillhandahålla tillförlitliga videostreamingtjänster. Användares visningsvanor påverkar efterfrågan kraftigt. För att ge en exakt modell för proaktiva resursskalningsverktyg måste dessa vanor modelleras. Denna avhandling ger en sådan prognosmodell för användarvyer som kan användas av en proaktiv resursskalningsmekanism. Denna modell är skapad genom att använda maskininlärningsalgoritmer på data från både live-TV och streamingtjänster. För att producera en modell med tillfredsställande noggrannhet ansågs ett flertal dataattribut relaterade till användare, innehåll och innehållsleverantörer. Resultaten av den här avhandlingen visar att efterfrågan på användare kan modelleras med hög noggrannhet utan att starkt förlita sig på användarrelaterade attribut utan istället genom att analysera tidigare händelseloggar och med kunskap om innehållsleverantörens schema, vare sig det är live-tv eller tjänster för videostreaming. Computer and Information Sciences Data- och informationsvetenskap
48	How to Estimate Local Performance using Machine learning Engineering (HELP ME) : from log files to support guidance / Att estimera lokal prestanda med hjälp av maskininlärning Ekinge, Hugo January 2023 (has links) As modern systems are becoming increasingly complex, they are also becoming more and more cumbersome to diagnose and fix when things go wrong. One domain where it is very important for machinery and equipment to stay functional is in the world of medical IT, where technology is used to improve healthcare for people all over the world. This thesis aims to help with reducing downtime on critical life-saving equipment by implementing automatic analysis of system logs that without any domain experts involved can give an indication of the state that the system is in. First, a literature study was performed where three potential candidates of suitable neural network architectures was found. Next, the networks were implemented and a data pipeline for collecting and labeling training data was set up. After training the networks and testing them on a separate data set, the best performing model out of the three was based on GRU (Gated Recurrent Unit). Lastly, this model was tested on some real world system logs from two different sites, one without known issues and one with slow image import due to network issues. The results showed that it was feasible to build such a system that can give indications on external parameters such as network speed, latency and packet loss percentage using only raw system logs as input data. GRU, 1D-CNN (1-Dimensional Convolutional Neural Network) and Transformer's Encoder are the three models that were tested, and the best performing model was shown to produce correct patterns even on the real world system logs. / I takt med att moderna system ökar i komplexitet så blir de även svårare att felsöka och reparera när det uppstår problem. Ett område där det är mycket viktigt att maskiner och utrustning fungerar korrekt är inom medicinsk IT, där teknik används för att förbättra hälso- och sjukvården för människor över hela världen. Syftet med denna avhandling är att bidra till att minska tiden som kritisk livräddande utrustning inte fungerar genom att implementera automatisk analys av systemloggarna som utan hjälp av experter inom området kan ge en indikation på vilket tillstånd som systemet befinner sig i. Först genomfördes en litteraturstudie där tre lovande typer av neurala nätverk valdes ut. Sedan implementerades dessa nätverk och det sattes upp en datapipeline för insamling och märkning av träningsdata. Efter att ha tränat nätverken och testat dem på en separat datamängd så visade det sig att den bäst presterande modellen av de tre var baserad på GRU (Gated Recurrent Unit). Slutligen testades denna modell på riktiga systemloggar från två olika sjukhus, ett utan kända problem och ett där bilder importerades långsamt på grund av nätverksproblem. Resultaten visade på att det är möjligt att konstruera ett system som kan ge indikationer på externa parametrar såsom nätverkshastighet, latens och paketförlust i procent genom att enbart använda systemloggar som indata. De tre modeller som testades var GRU, 1D-CNN (1-Dimensional Convolutional Neural Network) och Transformer's Encoder. Den bäst presterande modellen visade sig kunna producera korrekta mönster även för loggdata från verkliga system. Machine learning GRU 1D-CNN Transformer log analysis parameter estimation regression performance monitoring deep learning troubleshooting support Maskininlärning GRU 1D-CNN Transformer logganalys parameteruppskattning regression prestandaövervakning djupinlärning felsökning support Computer Sciences Datavetenskap (datalogi)
49	Petrophysics and fluid mechanics of selected wells in Bredasdorp Basin South Africa Ile, Anthony January 2013 (has links) Magister Scientiae - MSc / Pressure drop within a field can be attributed to several factors. Pressure drop occurs when fractional forces cause resistance to flowing fluid through a porous medium. In this thesis, the sciences of petrophysics and rock physics were employed to develop understanding of the physical processes that occurs in reservoirs. This study focussed on the physical properties of rock and fluid in order to provide understanding of the system and the mechanism controlling its behaviour. The change in production capacity of wells E-M 1, 2, 3, 4&5 prompted further research to find out why the there will be pressure drop from the suits of wells and which well was contributing to the drop in production pressure. The E-M wells are located in the Bredasdorp Basin and the reservoirs have trapping mechanisms of stratigraphical and structural systems in a moderate to good quality turbidite channel sandstone. The basin is predominantly an elongated north-west and south-east inherited channel from the synrift sub basin and was open to relatively free marine circulation. By the southwest the basin is enclose by southern Outeniqua basin and the Indian oceans. Sedimentation into the Bredasdorp basin thus occurred predominantly down the axis of the basin with main input direction from the west. Five wells were studied E-M1, E-M2, E-M3, E-M4, and E-M5 to identify which well is susceptible to flow within this group. Setting criteria for discriminator the result generated four well as meeting the criteria except for E-M1. The failure of E-M1 reservoir well interval was in consonant with result showed by evaluation from the log, pressure and rock physics analyses for E-M1.iv Various methods in rock physics were used to identify sediments and their conditions and by applying inverse modelling (elastic impedance) the interval properties were better reflected. Also elastic impedance proved to be an economical and quicker method in describing the lithology and depositional environment in the absence of seismic trace. South Africa Block 9 Bredasdorp Basin Petrophysics Sandstone unit Shale and clay unit Shale base line Rock physics Elastic impedance Fluid mechanics Fluid substitution Well log Data Log analysis Fluid and matrix properties Well prognosis Pore/ fracture pressure gradient Cut off and summation Amplitude versus offset
50	Katalyzátory pro kladnou elektrodu kyslíko-vodíkového palivového článku / Catalysts for positive electrode of hydrogen-oxygen fuel cell Kováč, Martin January 2010 (has links) Master's thesis deals with new methods of preparing catalytic materials for positive electrode of an oxygen-hydrogen fuel cell and the influence of potassium permanganate or doping agent molar mass change on theirs attributes. Further it studies the use of proper measuring methods designed to qualify theirs attributes and the presentation of achieved results. In particular methods of linear sweep and cyclic voltammetry and the processing of data using Koutecky-Levich and Tafel plot and wave log analysis. Values of half-wave and onset potential and kinetic coefficient have been measured and calculated.

Search results