• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 50
  • 7
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 76
  • 76
  • 19
  • 17
  • 13
  • 13
  • 12
  • 11
  • 10
  • 9
  • 9
  • 8
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Database Tuning using Evolutionary and Search Algorithms

Raneblad, Erica January 2023 (has links)
Achieving optimal performance of a database can be crucial for many businesses, and tuning its configuration parameters is a necessary step in this process. Many existing tuning methods involve complex machine learning algorithms and require large amounts of historical data from the system being tuned. However, training machine learning models can be problematic if a considerable amount of computational resources and data storage is required. This paper investigates the possibility of using less complex search algorithms or evolutionary algorithms to tune database configuration parameters, and presents a framework that employs Hill Climbing and Particle Swarm Optimization. The performance of the algorithms are tested on a PostgreSQL database using read-only workloads. Particle Swarm Optimization displayed the largest improvement in query response time, improving it by 26.09% compared to using the configuration parameters' default values. Given the improvement shown by Particle Swarm Optimization, evolutionary algorithms may be promising in the field of database tuning.
52

Characterization and Enhancement of Data Locality and Load Balancing for Irregular Applications

Niu, Qingpeng 14 May 2015 (has links)
No description available.
53

A study of trilateral flash cycles for low-grade waste heat recovery-to-power generation

Ajimotokan, Habeeb A. 10 1900 (has links)
There has been renewed significance for innovative energy conversion technologies, particularly the heat recovery-to-power technologies for sustainable power generation from renewable energies and waste heat. This is due to the increasing concern over high demand for electricity, energy shortage, global warming and thermal pollution. Among the innovative heat recovery-to- power technologies, the proposed trilateral flash cycle (TFC) is a promising option, which presents a great potential for development. Unlike the Rankine cycles, the TFC starts the working fluid expansion from the saturated liquid condition rather than the saturated, superheated or supercritical vapour phase, bypassing the isothermal boiling phase. The challenges associated with the need to establish system design basis and facilitate system configuration design-supporting analysis from proof-of-concept towards a market-ready TFC technology are significant. Thus, there is a great need for research to improve the understanding of its operation, behaviour and performance. The objective of this study is to develop and establish simulation tools of the TFCs for improving the understanding of their operation, physics of performance metrics and to evaluate novel system configurations for low-grade heat recovery-to-power generation. This study examined modelling and process simulation of the TFC engines in order to evaluate their performance metrics, predictions for guiding system design and parameters estimations. A detailed thermodynamic analysis, performance optimization and parametric analysis of the cycles were conducted, and their optimized performance metrics compared. These were aimed at evaluating the effects of the key parameters on system performances and to improve the understanding of the performance behaviour. Four distinct system configurations of the TFC, comprising the simple TFC, TFC with IHE, reheat TFC and TFC with feed fluid-heating (or regenerative TFC) were examined. Steady-state steady-flow models of the TFC power plants, corresponding to their thermodynamic processes were thermodynamically modelled and implemented using engineering equation solver (ESS). These models were used to determine the optimum synthesis/ design parameters of the cycles and to evaluate their performance metrics, at the subcritical operating conditions and design criteria. Thus, they can be valuable tools in the preliminary prototype system design of the power plants. The results depict that the thermal efficiencies of the simple TFC, TFC with IHE, reheat TFC and regenerative TFC employing n-pentane are 11.85 - 21.97%, 12.32 - 23.91%, 11.86 - 22.07% and 12.01 - 22.9% respectively over the cycle high temperature limit of 393 - 473 K. These suggest that the integration of an IHE, fluid-feed heating and reheating in optimized design of the TFC engine enhanced the heat exchange efficiencies and system performances. The effects of varying the expander inlet pressure at the cycle high temperature and expander isentropic efficiency on performance metrics of the cycles were significant. They have assisted in selecting the optimum-operating limits for the maximum performance metrics. The thermal efficiencies of all the cycles increased as the inlet pressures increased from 2 - 3 MPa and increased as the expander isentropic efficiencies increased from 50 - 100%, while their exergy efficiencies increased. This is due to increased net work outputs that suggest optimal value of pressure ratios between the expander inlets and their outlets. A comprehensive evaluation depicted that the TFC with IHE attained the best performance metrics among the cycles. This is followed by the regenerative TFC whereas the simple TFC and reheat TFC have the lowest at the same subcritical operating conditions. The results presented show that the performance metrics of the cycles depend on the system configuration, and the operating conditions of the cycles, heat source and heat sink. The results also illustrate how system configuration design and sizing might be altered for improved performance and experimental measurements for preliminary prototype development.
54

A study of trilateral flash cycles for low-grade waste heat recovery-to-power generation

Ajimotokan, Habeeb A. January 2014 (has links)
There has been renewed significance for innovative energy conversion technologies, particularly the heat recovery-to-power technologies for sustainable power generation from renewable energies and waste heat. This is due to the increasing concern over high demand for electricity, energy shortage, global warming and thermal pollution. Among the innovative heat recovery-to- power technologies, the proposed trilateral flash cycle (TFC) is a promising option, which presents a great potential for development. Unlike the Rankine cycles, the TFC starts the working fluid expansion from the saturated liquid condition rather than the saturated, superheated or supercritical vapour phase, bypassing the isothermal boiling phase. The challenges associated with the need to establish system design basis and facilitate system configuration design-supporting analysis from proof-of-concept towards a market-ready TFC technology are significant. Thus, there is a great need for research to improve the understanding of its operation, behaviour and performance. The objective of this study is to develop and establish simulation tools of the TFCs for improving the understanding of their operation, physics of performance metrics and to evaluate novel system configurations for low-grade heat recovery-to-power generation. This study examined modelling and process simulation of the TFC engines in order to evaluate their performance metrics, predictions for guiding system design and parameters estimations. A detailed thermodynamic analysis, performance optimization and parametric analysis of the cycles were conducted, and their optimized performance metrics compared. These were aimed at evaluating the effects of the key parameters on system performances and to improve the understanding of the performance behaviour. Four distinct system configurations of the TFC, comprising the simple TFC, TFC with IHE, reheat TFC and TFC with feed fluid-heating (or regenerative TFC) were examined. Steady-state steady-flow models of the TFC power plants, corresponding to their thermodynamic processes were thermodynamically modelled and implemented using engineering equation solver (ESS). These models were used to determine the optimum synthesis/ design parameters of the cycles and to evaluate their performance metrics, at the subcritical operating conditions and design criteria. Thus, they can be valuable tools in the preliminary prototype system design of the power plants. The results depict that the thermal efficiencies of the simple TFC, TFC with IHE, reheat TFC and regenerative TFC employing n-pentane are 11.85 - 21.97%, 12.32 - 23.91%, 11.86 - 22.07% and 12.01 - 22.9% respectively over the cycle high temperature limit of 393 - 473 K. These suggest that the integration of an IHE, fluid-feed heating and reheating in optimized design of the TFC engine enhanced the heat exchange efficiencies and system performances. The effects of varying the expander inlet pressure at the cycle high temperature and expander isentropic efficiency on performance metrics of the cycles were significant. They have assisted in selecting the optimum-operating limits for the maximum performance metrics. The thermal efficiencies of all the cycles increased as the inlet pressures increased from 2 - 3 MPa and increased as the expander isentropic efficiencies increased from 50 - 100%, while their exergy efficiencies increased. This is due to increased net work outputs that suggest optimal value of pressure ratios between the expander inlets and their outlets. A comprehensive evaluation depicted that the TFC with IHE attained the best performance metrics among the cycles. This is followed by the regenerative TFC whereas the simple TFC and reheat TFC have the lowest at the same subcritical operating conditions. The results presented show that the performance metrics of the cycles depend on the system configuration, and the operating conditions of the cycles, heat source and heat sink. The results also illustrate how system configuration design and sizing might be altered for improved performance and experimental measurements for preliminary prototype development.
55

High Performance by Exploiting Information Locality through Reverse Computing

Bahi, Mouad 21 December 2011 (has links) (PDF)
The main resources for computation are time, space and energy. Reducing them is the main challenge in the field of processor performance.In this thesis, we are interested in a fourth factor which is information. Information has an important and direct impact on these three resources. We show how it contributes to performance optimization. Landauer has suggested that independently on the hardware where computation is run information erasure generates dissipated energy. This is a fundamental result of thermodynamics in physics. Therefore, under this hypothesis, only reversible computations where no information is ever lost, are likely to be thermodynamically adiabatic and do not dissipate power. Reversibility means that data can always be retrieved from any point of the program. Information may be carried not only by the data but also by the process and input data that generate it. When a computation is reversible, information can also be retrieved from other already computed data and reverse computation. Hence reversible computing improves information locality.This thesis develops these ideas in two directions. In the first part, we address the issue of making a computation DAG (directed acyclic graph) reversible in terms of spatial complexity. We define energetic garbage as the additional number of registers needed for the reversible computation with respect to the original computation. We propose a reversible register allocator and we show empirically that the garbage size is never more than 50% of the DAG size. In the second part, we apply this approach to the trade-off between recomputing (direct or reverse) and storage in the context of supercomputers such as the recent vector and parallel coprocessors, graphical processing units (GPUs), IBM Cell processor, etc., where the gap between processor cycle time and memory access time is increasing. We show that recomputing in general and reverse computing in particular helps reduce register requirements and memory pressure. This approach of reverse rematerialization also contributes to the increase of instruction-level parallelism (Cell) and thread-level parallelism in multicore processors with shared register/memory file (GPU). On the latter architecture, the number of registers required by the kernel limits the number of running threads and affects performance. Reverse rematerialization generates additional instructions but their cost can be hidden by the parallelism gain. Experiments on the highly memory demanding Lattice QCD simulation code on Nvidia GPU show a performance gain up to 11%.
56

Analyse et optimisation des réseaux avioniques hétérogènes / Analysis and optimiozation of heterogeneous avionics networks

Ayed, Hamdi 27 November 2014 (has links)
La complexité des architectures de communication avioniques ne cesse de croître avec l’augmentation du nombre des terminaux interconnectés et l’expansion de la quantité des données échangées. Afin de répondre aux besoins émergents en terme de bande passante, latence et modularité, l’architecture de communication avionique actuelle consiste à utiliser le réseau AFDX (Avionics Full DupleX Switched Ethernet) pour connecter les calculateurs et utiliser des bus d’entrée/sortie (par exemple le bus CAN (Controller Area Network)) pour connecter les capteurs et les actionneurs. Les réseaux ainsi formés sont connectés en utilisant des équipements d’interconnexion spécifiques, appelés RDC (Remote Data Concentrators) et standardisé sous la norme ARINC655. Les RDCs sont des passerelles de communication modulaires qui sont reparties dans l’avion afin de gérer l’hétérogénéité entre le réseau cœur AFDX et les bus d’entrée/sortie. Certes, les RDCs permettent d’améliorer la modularité du système avionique et de réduire le coût de sa maintenance; mais, ces équipements sont devenus un des défis majeurs durant la conception de l’architecture avionique afin de garantir les performances requises du système. Les implémentations existantes du RDC effectuent souvent une translation direct des trames et n’implémentent aucun mécanisme de gestion de ressources. Or, une utilisation efficace des ressources est un besoin important dans le contexte avionique afin de faciliter l’évolution du système et l’ajout de nouvelles fonctions. Ainsi, l’objectif de cette thèse est la conception et la validation d’un RDC optimisé implémentant des mécanismes de gestion des ressources afin d’améliorer les performances de l’architecture de communication avionique tout en respectant les contraintes temporelles du système. Afin d’atteindre cet objectif, un RDC pour les architectures réseaux de type CAN-AFDX est conçu, intégrant les fonctions suivantes: (i) groupement des trames appliqué aux flux montants, i.e., flux générés par les capteurs et destinés à l’AFDX, pour minimiser le coût des communication sur l’AFDX; (ii) la régulation des flux descendants, i.e., flux générés par des terminaux AFDX et destinés aux actionneurs, pour réduire les contentions sur le bus CAN. Par ailleurs, notre RDC permet de connecter plusieurs bus CAN à la fois tout en garantissant une isolation entre les flux. Par la suite, afin d’analyser l’impact de ce nouveau RDC sur les performances du système avionique, nous procédons à la modélisation de l’architecture CAN-AFDX, et particulièrement le RDC et ses nouvelles fonctions. Ensuite, nous introduisons une méthode d’analyse temporelle pour calculer des bornes maximales sur les délais de bout en bout et vérifier le respect des contraintes temps-réel. Plusieurs configurations du RDC peuvent répondre aux exigences du système avionique tout en offrant des économies de ressources. Nous procédons donc au paramétrage du RDC afin de minimiser la consommation de bande passante sur l’AFDX tout en respectant les contraintes temporelles. Ce problème d’optimisation est considéré comme NP-complet, et l’introduction des heuristiques adéquates s’est avérée nécessaire afin de trouver la meilleure configuration possible du RDC. Enfin, les performances de ce nouveau RDC sont validées à travers une architecture CAN-AFDX réaliste, avec plusieurs bus CAN et des centaines de flux échangés. Différents niveaux d’utilisation des bus CAN ont été considérés et les résultats obtenus ont montré l’efficacité de notre RDC à améliorer la gestion des ressources du système avionique tout en respectant les contraintes temporelles de communication. En particulier, notre RDC offre une réduction de la bande passante AFDX allant jusqu’à 40% en comparaison avec le RDC actuellement utilisé. / The aim of my thesis is to provide a resources-efficient gateway to connect Input/Output (I/O) CAN buses to a backbone network based on AFDX technology, in modern avionics communication architectures. Currently, the Remote Data Concentrator (RDC) is the main standard for gateways in avionics; and the existing implementations do not integrate any resource management mechanism. To handle these limitations, we design an enhanced CAN-AFDX RDC integrating new functions: (i) Frame Packing (FP) allowing to reduce communication overheads with reference to the currently used "1 to 1" frame conversion strategy; (ii) Hierarchical Traffic Shaping (HTS) to reduce contention on the CAN bus. Furthermore, our proposed RDC allows the connection of multiple I/O CAN buses to AFDX while guaranteeing isolation between different criticality levels, using a software partitioning mechanism. To analyze the performance guarantees offered by our proposed RDC, we considered two metrics: the end-to-end latency and the induced AFDX bandwidth consumption. Furthermore, an optimization process was proposed to achieve an optimal configuration of our proposed RDC, i.e., minimizing the bandwidth utilization while meeting the real-time constraints of communication. Finally, the capacity of our proposed RDC to meet the emerging avionics requirements has been validated through a realistic avionics case study.
57

Performance optimization of a class of deterministic timed Petri nets : weighted marked graphs / Optimisation de performance d'une classe de réseaux de Pétri déterministes et temporisés : les graphes d'événements valués

He, Zhou 09 June 2017 (has links)
Au cours des dernières décennies, la complexité croissante des systèmes de production et de leur commande a rendu crucial le besoin d’utiliser les méthodes formelles pour faire face aux problèmes relatifs au contrôle, à la fiabilité, au diagnostic des fautes et à l’utilisation optimale des ressources dans les installations de production. Cela concerne en particulier les systèmes automatisés de production (SAP), caractérisés par des cycles technologiques complexes qui doivent s’adapter à des conditions changeantes. Les SAP modernes sont des sous-systèmes interconnectés tels que des machines à commande numérique, des stations d'assemblage , des véhicules guidés automatisés (AGV), des cellules robotisées, des convoyeurs et des systèmes de contrôle par ordinateur. Les fabricants utilisent des machines automatisées et des contrôleurs pour assurer des produits de qualité plus rapidement et plus efficacement. Aussi, ces systèmes automatisés peuvent fournir des informations essentielles pour aider les gestionnaires à prendre les bonnes décisions. Cependant, en raison de la grande flexibilité des SAP, des défaillances telles qu’un mauvais assemblage ou le dépôt d’une pièce dans un tampon inapproprié peuvent se produire lors du fonctionnement du système. De tels dysfonctionnements diminuent la productivité du système générant ainsi des pertes économiques et des effets perturbateurs sur le système. En conséquence, le problème de l’optimisation des performances des SAP est impératif.Cette thèse se focalise sur l’évaluation et l’optimisation des performances des systèmes de production automatisés via le modèle des réseaux de Pétri temporisés. / In the last decades, there has been a constant increase in the awareness of company management about the importance of formal techniques in industrial settings to address problems related to monitoring and reliability, fault diagnosis, and optimal use of resources, during the management of plants. Of particular relevance in this setting are the so-called Automated Manufacturing Systems (AMSs), which are characterized by complex technological cycles that must adapt to changing demands. Modern AMSs are interconnected subsystems such as numerically controlled machines, assembly stations, automated guided vehicles, robots, conveyors and computer control systems. Manufacturers are using automated machines and controls to produce quality products faster and more efficiently. Meanwhile, these automated systems can provide critical information to help managers make good business decisions. However, due to the high flexibility of AMSs, failures such as a wrong assembly or a part put in a wrong buffer may happen during the operation of the system. Such failures may decrease the productivity of the system which has an economical consequence and can cause a series of disturbing issues. As a result, the performance optimization in AMSs are imperative. This thesis focuses on the performance evaluation and performance optimization of automated manufacturing systems using timed Petri nets models.
58

Performance Optimization of Signal Processing Algorithms for SIMD Architectures

Yagneswar, Sharan January 2017 (has links)
Digital Signal Processing(DSP) algorithms are widely implemented in real time systems.In fields such as digital music technology, many of these said algorithms areimplemented, often in combination, to achieve the desired functionality. When itcomes to implementation, DSP algorithms are performance critical as they havetight deadlines. In this thesis, performance optimization using Single InstructionMultiple Data(SIMD) vectorization technique is performed on the ARM Cortex-A15 architecture for six commonly used DSP algorithms; Gain, Mix, Gain and Mix,Complex Number Multiplication, Envelope Detection and Cascaded IIR Filter. Toensure optimal performance, the instructions should be scheduled with minimalpipeline stalls. This requires execution time to be measured with fine time granularity.First, a technique of accurately measuring the execution time using thecycle counter of the processor’s Performance Management Unit(PMU) along withsynchronization barriers is developed. It was found that the execution time measuredby using the operating system calls have high variations and very low timegranularity, whereas the cycle counter method was accurate and produced reliableresults. The cost associated with the cycle counter method is 75 clock cycles. Usingthis technique, the contribution by each SIMD instruction towards the executiontime is measured and is used to schedule the instructions. This thesis also presentsa guideline on how to schedule instructions which have data dependencies usingthe cycle counter timing execution time measurement technique, to ensure that thepipeline stalls are minimized. The algorithms are also modified, if needed, to favorvectorization and are implemented using ARM architecture specific SIMD instructions.These implementations are then compared to that which are automaticallyproduced by the compiler’s auto-vectorization feature. The execution times of theSIMD implementations was much lower compared to that produced by the compilerand the speedup ranged from 2.47 to 5.11. Also, the performance increaseis significant when the instructions are scheduled in an optimal way. This thesisconcludes that the auto-vectorized code does poorly for complex algorithms andproduces code with a lot of data dependencies causing pipeline stalls, even with fulloptimizations enabled. Using the guidelines presented in this thesis for schedulingthe instructions, the performance of the DSP algorithms have significant improvementscompared to their auto-vectorized counterparts. / Digitala signalbehandlingsalgoritmer(DSP) implementeras ofta i realtidssystem. Inomfält som exempelvis digital musikteknik används dessa algoritmer, ofta i olika kombinationer,för att ge önskad funktionalitet. Implementationen av DSP-algoritmerär prestandakritisk eftersom systemen ofta har små tidsmarginaler. I det härexamensarbetet genomförs prestandaoptimering med Single Instruction MultipleData(SIMD)-vektorisering på en ARM A15-arkitektur för 6 vanliga DSP-algoritmer;volym, mix, volym och mix, multiplikation av komplexa tal, amplituddetektering,och seriekopplade IIR-filter. Maximal optimering av algoritmerna kräver ocksåatt antalet pipeline stalls i processorn minimeras. För att kunna observera dettakrävs att exekveringstiden kan mätas med hög tidsupplösning. I det här examensarbeteutvecklas först en teknik för att mäta exekveringstiden med hjälp aven klockcykelräknare i processorns Performance Management Unit(PMU) tillsammansmed synkroniseringsbarriärer. Tidsmätning med hjälp av operativsystemsfunktionervisade sig ha sämre noggrannhet och tidsupplösning än metoden medatt räkna klockcykler, som gav tillförlitliga resultat. Den extra exekveringstidenför klockcykelräkning uppmättes till 75 klockcykler. Med den här tekniken är detmöjligt att mäta hur mycket varje SIMD-instruktion bidrar till den totala exekveringstiden.Examensarbete presenterar också en metod att ordna instruktioner somhar databeroenden sinsemellan med hjälp av ovanstående tidsmätningsmetod, såatt antalet pipeline stalls minimeras. I de fall det behövdes, skrevs koden till algoritmernaom för att bättre kunna utnyttja ARM-arkitekturens specifika SIMDinstruktioner.Dessa jämfördes sedan med resultaten från kompilatorns automatgenereradevektoriseringkod. Exekveringstiden för SIMD-implementationerna varsignifikant kortare än för de kompilatorgenererade och visade på en förbättring påmellan 2,47 och 5,11 gånger, mätt i exekveringstid. Resultaten visade också på entydlig förbättring när instruktionerna exekveras i en optimal ordning. Resultatenvisar att automatgenererad vektorisering presterar sämre för komplexa algoritmeroch producerar maskinkod med signifikanta databeroenden som orsakar pipelinestalls, även med optimeringsflaggor påslagna. Med hjälp av metoder presenteradei det här examensarbete kan prestandan i DSP-algoritmer förbättras betydligt ijämförelse med automatgenererad vektorisering.
59

Analysis of web performance optimization and its impact on user experience / Analys av optimering av webbprestanda och dess inverkan på användarupplevelse

Marang, Ah Zau January 2018 (has links)
User experience (UX) is one of the most popular subjects in the industry nowadays and plays a significant role in the business success. As the growth of a business depends on customers, it is essential to emphasize on the UX that can help to enhance customer satisfaction. There has been statements that the overall end-user experience is to a great extent influenced by page load time, and that UX is primarily associated with performance of applications. This paper analyzes the effectiveness of performance optimization techniques and their impact on user experience. Principally, the web performance optimization techniques used in this study were caching data, fewer HTTP requests, Web Workers and prioritizing content. A profiling method Manual Logging was utilized to measure performance improvements. A UX survey consists of User Experience Questionnaire (UEQ) and three qualitative questions, was conducted for UX testing before and after performance improvements. Quantitative and qualitative methods were used to analyze collected data. Implementations and experiments in this study are based on an existing tool, a web-based application. Evaluation results show an improvement of 45% on app load time, but no significant impact on the user experience after performance optimizations, which entails that web performance does not really matter for the user experience. Limitation of the performance techniques and other factors that influence the performance were found during the study. / Användarupplevelse (UX) är idag en av de mest populära ämnena inom IT-branschen och spelar en viktig roll i affärsframgångar. Eftersom tillväxten av ett företag beror på kunder är det viktigt att betona på UX som kan bidra till att öka kundnöjdheten. Det har konstaterats att den övergripande slutanvändarupplevelsen i stor utsträckning påverkas av sidladdningstiden och att UX huvudsakligen är förknippad med applikationernas prestanda. I denna studie analyseras effektiviteten av optimeringstekniker och deras inverkan på användarupplevelse. Huvudsakligen, optimeringstekniker som användes i denna studie var cache-lösning, färre HTTP-förfrågningar, Web Workers och prioritering av innehåll. Profileringsmetoden "Manual Logging" användes för att mäta prestandaförbättringar. En enkätutvärdering som består av User Experience Questionnaire (UEQ) och tre kvalitativa frågor, genomfördes med fokus på användarupplevelsen före och efter prestandaförbättringar. Kvantitativa och kvalitativa metoder användes för att analysera insamlade data. Implementeringar och experiment i denna studie är baserade på ett befintligt verktyg, en webbaserad applikation. Utvärderingsresultatet visar en förbättring på 45% när det gäller sidladdningstid men ingen signifikant inverkan på användarupplevelsen efter prestandaförbättringar, vilket innebär att webbprestanda inte spelar någon roll för användarupplevelsen. Begränsning av optimeringstekniker och andra faktorer som påverkar prestationen hittades under studien.
60

Optimising Machine Learning Models for Imbalanced Swedish Text Financial Datasets: A Study on Receipt Classification : Exploring Balancing Methods, Naive Bayes Algorithms, and Performance Tradeoffs

Hu, Li Ang, Ma, Long January 2023 (has links)
This thesis investigates imbalanced Swedish text financial datasets, specifically receipt classification using machine learning models. The study explores the effectiveness of under-sampling and over-sampling methods for Naive Bayes algorithms, collaborating with Fortnox for a controlled experiment. Evaluation metrics compare balancing methods regarding the accuracy, Matthews's correlation coefficient (MCC) , F1 score, precision, and recall. Findings contribute to Swedish text classification, providing insights into balancing methods. The thesis report examines balancing methods and parameter tuning on machine learning models for imbalanced datasets. Multinomial Naive Bayes (MultiNB) algorithms in Natural language processing (NLP) are studied, with potential application in image classification for assessing industrial thin component deformation. Experiments show balancing methods significantly affect MCC and recall, with a recall-MCC-accuracy tradeoff. Smaller alpha values generally improve accuracy.  Synthetic Minority Oversampling Technique  (SMOTE) and Tomek's algorithm for removing links developed in 1976 by Ivan Tomek. First Tomek, then SMOTE (TomekSMOTE)  yield promising accuracy improvements. Due to time constraints, Over-sampling using SMOTE and cleaning using Tomek links. First SMOTE, then Tomek (SMOTETomek) training is incomplete. This thesis report finds the best MCC is achieved when $\alpha$ is 0.01 on imbalanced datasets.

Page generated in 0.1579 seconds