Global ETD Search

1	Influential factors affecting the undesired fault correction outcomes in large-scaled companies / Influential factors affecting the undesired fault correction outcomes in large-scaled companies Selvi, Mehmet, Büyükcan, Güral January 2014 (has links) Context. Fault correction process is one of the two main activities in software evolution model. As it is very important for software maintainability, software industry especially large-scaled global companies, aim to have mature fault correction processes that detect faults and correct them in a continuous and efficient way. Considerable amount of effort is needed and some measures should be taken in order to be successful. This master thesis is mainly related with fault correction and finding possible solutions for better process. Objectives. The main aim of this study is to investigate and identify influential factors having affects over undesired fault correction outcomes. This study has three main stages: 1) to identify factors from company data that have affects over target factors, 2) to elicit influential factors from interviews and literature review, 3) to prioritize influential factors based on their significance. Based on the outcomes, giving recommendations to company and software industry is the other aim of this master thesis. Methods. This study mainly reflects the empirical research of software fault correction process and undesired outcomes of it. In this master thesis, both quantitative and qualitative data analysis were performed. Case study was conducted with Ericsson AB that data analysis was made with the archival data by using several methods including Machine Learning and Apriori. Also, surveys and semi-structured interviews were used for data collection instruments. Apart from this, literature review was performed in order to collect influential factors for fault correction process. Prioritization of the influential factors was made by using hierarchical cumulative voting. Results. Throughout the case study, quantitative data analysis, interviews and literature review was conducted and totally 45 influential factors were identified. By using these factors prioritization was performed with 26 practitioners (4 internal and 22 external) in order to find which factors are most a) significant and b) relevant in undesired fault correction outcomes. Based on the outcomes of prioritization, cause-effect diagram was drawn which includes all the important factors. Conclusions. This research showed that there are lots of factors influencing fault correction process. The practitioners mostly complained about the lack of analysis of deeply including correction of faults are not resulted the new requirements and they are not used for process improvement. Also, limited resources (such as work force, vacations and sickness), unbalanced fault correction task assignment and too much fault reports at the same time cause problems. Moreover, priorities of faults and customers affect the lead time of fault correction process as the critical faults are fixed at first. / +90 533 7698780 fault correction trouble reports customer service reports Influential factors undesired fault correction outcomes Information Systems Software Engineering Programvaruteknik
2	Integrating Telecommunications-Specific Language Models into a Trouble Report Retrieval Approach / Integrering av telekommunikationsspecifika språkmodeller i en metod för hämtning av problemrapporter Bosch, Nathan January 2022 (has links) In the development of large telecommunications systems, it is imperative to identify, report, analyze and, thereafter, resolve both software and hardware faults. This resolution process often relies on written trouble reports (TRs), that contain information about the observed fault and, after analysis, information about why the fault occurred and the decision to resolve the fault. Due to the scale and number of TRs, it is possible that a newly written fault is very similar to previously written faults, e.g., a duplicate fault. In this scenario, it can be beneficial to retrieve similar TRs that have been previously created to aid the resolution process. Previous work at Ericsson [1], introduced a multi-stage BERT-based approach to retrieve similar TRs given a newly written fault observation. This approach significantly outperformed simpler models like BM25, but suffered from two major challenges: 1) it did not leverage the vast non-task-specific telecommunications data at Ericsson, something that had seen success in other work [2], and 2) the model did not generalize effectively to TRs outside of the telecommunications domain it was trained on. In this thesis, we 1) investigate three different transfer learning strategies to attain stronger performance on a downstream TR duplicate retrieval task, notably focusing on effectively integrating existing telecommunicationsspecific language data into the model fine-tuning process, 2) investigate the efficacy of catastrophic forgetting mitigation strategies when fine-tuning the BERT models, and 3) identify how well the models perform on out-of-domain TR data. We find that integrating existing telecommunications knowledge through the form of a pretrained telecommunications-specific language model into our fine-tuning strategies allows us to outperform a domain adaptation fine-tuning strategy. In addition to this, we find that Elastic Weight Consolidation (EWC) is an effective strategy for mitigating catastrophic forgetting and attaining strong downstream performance on the duplicate TR retrieval task. Finally, we find that the generalizability of models is strong enough to perform reasonably effectively on out-of-domain TR data, indicating that the approaches may be eligible in a real-world deployment. / Vid utvecklingen av stora telekommunikationssystem är det absolut nödvändigt att identifiera, rapportera, analysera och därefter lösa både mjukvaru och hårdvarufel. Denna lösningsprocess bygger ofta på noggrant skrivna felrapporter (TRs), som innehåller information om det observerade felet och, efter analys, information om varför felet uppstod och beslutet att åtgärda felet. På grund av skalan och antalet TR:er är det möjligt att ett nyskrivet fel är mycket likt tidigare skrivna fel, t.ex. ett duplikatfel. I det här scenariot kan det vara mycket fördelaktigt att hämta tidigare skapade, liknande TR:er för att underlätta upplösningsprocessen. Tidigare arbete på Ericsson [1], introducerade en flerstegs BERT-baserad metod för att hämta liknande TRs givet en nyskriven felobservation. Detta tillvägagångssätt överträffade betydligt enklare modeller som BM-25, men led av två stora utmaningar: 1) det utnyttjade inte den stora icke-uppgiftsspecifika telekommunikationsdatan hos Ericsson, något som hade sett framgång i annat arbete [2], och 2) modellen generaliserades inte effektivt till TR:er utanför den telekommunikationsdomän som den bildades på. I den här masteruppsatsen undersöker vi 1) tre olika strategier för överföringsinlärning för att uppnå starkare prestanda på en nedströms TR dubbletthämtningsuppgift, varav några fokuserar på att effektivt integrera fintliga telekommunikationsspecifika språkdata i modellfinjusteringsprocessen, 2) undersöker effektiviteten av katastrofala missglömningsreducerande strategier vid finjustering av BERT-modellerna, och 3) identifiera hur väl modellerna presterar på TR-data utanför domänen. Resultatet är genom att integrera befintlig telekommunikationskunskap i form av en förtränad telekommunikationsspecifik språkmodell i våra finjusteringsstrategier kan vi överträffa en finjusteringsstrategi för domänanpassning. Utöver detta har vi fåt fram att EWC är en effektiv strategi för att mildra katastrofal glömska och uppnå stark nedströmsprestanda på dubbla TR hämtningsuppgiften. Slutligen finner vi att generaliserbarheten av modeller är tillräckligt stark för att prestera någorlunda effektivt på TR-data utanför domänen, vilket indikerar att tillvägagångssätten som beskrivs i denna avhandling kan vara kvalificerade i en verklig implementering. information retrieval neural ranking trouble reports log analysis natural language processing informationssökning neural rangordning felrapporter logganalys naturlig språkbehandling Computer and Information Sciences Data- och informationsvetenskap
3	Duplicate detection of multimodal and domain-specific trouble reports when having few samples : An evaluation of models using natural language processing, machine learning, and Siamese networks pre-trained on automatically labeled data / Dublettdetektering av multimodala och domänspecifika buggrapporter med få träningsexempel : En utvärdering av modeller med naturlig språkbehandling, maskininlärning, och siamesiska nätverk förtränade på automatiskt märkt data Karlstrand, Viktor January 2022 (has links) Trouble and bug reports are essential in software maintenance and for identifying faults—a challenging and time-consuming task. In cases when the fault and reports are similar or identical to previous and already resolved ones, the effort can be reduced significantly making the prospect of automatically detecting duplicates very compelling. In this work, common methods and techniques in the literature are evaluated and compared on domain-specific and multimodal trouble reports from Ericsson software. The number of samples is few, which is a case not so well-studied in the area. On this basis, both traditional and more recent techniques based on deep learning are considered with the goal of accurately detecting duplicates. Firstly, the more traditional approach based on natural language processing and machine learning is evaluated using different vectorization techniques and similarity measures adapted and customized to the domain-specific trouble reports. The multimodality and many fields of the trouble reports call for a wide range of techniques, including term frequency-inverse document frequency, BM25, and latent semantic analysis. A pipeline processing each data field of the trouble reports independently and automatically weighing the importance of each data field is proposed. The best performing model achieves a recall rate of 89% for a duplicate candidate list size of 10. Further, obtaining knowledge on which types of data are most important for duplicate detection is explored through what is known as Shapley values. Results indicate that utilizing all types of data indeed improve performance, and that date and code parameters are strong indicators. Secondly, a Siamese network based on Transformer-encoders is evaluated on data fields believed to have some underlying representation of the semantic meaning or sequentially important information, which a deep model can capture. To alleviate the issues when having few samples, pre-training through automatic data labeling is studied. Results show an increase in performance compared to not pre-training the Siamese network. However, compared to the more traditional model it performs on par, indicating that traditional models may perform equally well when having few samples besides also being simpler, more robust, and faster. / Buggrapporter är kritiska för underhåll av mjukvara och för att identifiera fel — en utmanande och tidskrävande uppgift. I de fall då felet och rapporterna liknar eller är identiska med tidigare och redan lösta ärenden, kan tiden som krävs minskas avsevärt, vilket gör automatiskt detektering av dubbletter mycket önskvärd. I detta arbete utvärderas och jämförs vanliga metoder och tekniker i litteraturen på domänspecifika och multimodala buggrapporter från Ericssons mjukvara. Antalet tillgängliga träningsexempel är få, vilket inte är ett så välstuderat fall. Utifrån detta utvärderas både traditionella samt nyare tekniker baserade på djupinlärning med målet att detektera dubbletter så bra som möjligt. Först utvärderas det mer traditionella tillvägagångssättet baserat på naturlig språkbearbetning och maskininlärning med hjälp av olika vektoriseringstekniker och likhetsmått specialanpassade till buggrapporterna. Multimodaliteten och de många datafälten i buggrapporterna kräver en rad av tekniker, så som termfrekvens-invers dokumentfrekvens, BM25 och latent semantisk analys. I detta arbete föreslås en modell som behandlar varje datafält i buggrapporterna separat och automatiskt sammanväger varje datafälts betydelse. Den bäst presterande modellen uppnår en återkallningsfrekvens på 89% för en lista med 10 dubblettkandidater. Vidare undersöks vilka datafält som är mest viktiga för dubblettdetektering genom Shapley-värden. Resultaten tyder på att utnyttja alla tillgängliga datafält förbättrar prestandan, och att datum och kodparametrar är starka indikatorer. Sedan utvärderas ett siamesiskt nätverk baserat på Transformator-kodare på datafält som tros ha en underliggande representation av semantisk betydelse eller sekventiellt viktig information, vilket en djup modell kan utnyttja. För att lindra de problem som uppstår med få träningssexempel, studeras det hur den djupa modellen kan förtränas genom automatisk datamärkning. Resultaten visar på en ökning i prestanda jämfört med att inte förträna det siamesiska nätverket. Men jämfört med den mer traditionella modellen presterar den likvärdigt, vilket indikerar att mer traditionella modeller kan prestera lika bra när antalet träningsexempel är få, förutom att också vara enklare, mer robusta, och snabbare. Duplicate detection Bug reports Trouble reports Natural language processing Information retrieval Machine learning Siamese neural network Transformers Automated data labeling Shapley values Dubblettdetektering Felrapporter Buggrapporter Naturlig språkbehandling Informationssökning Maskininlärning Siamesiska neurala nätverk Transformatorer Automatiserad datamärkning Shapley-värden Computer and Information Sciences Data- och informationsvetenskap

Search results

Influential factors affecting the undesired fault correction outcomes in large-scaled companies / Influential factors affecting the undesired fault correction outcomes in large-scaled companies

Integrating Telecommunications-Specific Language Models into a Trouble Report Retrieval Approach / Integrering av telekommunikationsspecifika språkmodeller i en metod för hämtning av problemrapporter