Global ETD Search

141	CArDIS: A Swedish Historical Handwritten Character and Word Dataset for OCR Thummanapally, Shivani, Rijwan, Sakib January 2022 (has links) Background: To preserve valuable sources and cultural heritage, digitization of handwritten characters is crucial. For this, Optical Character Recognition (OCR) systems were introduced and most widely used to recognize digital characters. Incase of ancient or historical characters, automatic transcription is more challenging due to lack of data, high complexity and low quality of the resource. To solve these problems, multiple image based handwritten dataset were collected from historicaland modern document images. But these dataset also have some limitations. To overcome the limitations, we were inspired to create a new image-based historical handwritten character and word dataset and evaluate it’s performance using machine learning algorithms. Objectives: The main objective of this thesis is to create a first ever Swedish historical handwritten character and word dataset named CArDIS (Character Arkiv Digital Sweden) which will be publicly available for further research. In addition,verify the correctness of the dataset and perform a quantitative analysis using different machine learning methods. Methods: Initially we searched for existing character dataset to know how modern character dataset differs from the historical handwritten dataset. We have performed literature review to learn about most commonly used dataset for OCR. On the other hand, we have also studied different machine learning algorithms and their applica-tions. Finally, we have trained six different machine learning methods namely Support Vector Machine, k-Nearest Neighbor, Convolutional Neural Network, Recurrent Neural Network, Random Forest, SVM-HOG with existing dataset and newly created dataset to evaluate the performance and efficiency of recognizing ancient handwritten characters. Results: The performance/evaluation results show that the machine learning classifiers struggle to recognise the ancient handwritten characters with less recognition accuracy. Out of which CNN outperforms with highest recognition accuracy. Conclusions: The current thesis introduces first ever newly created historical hand-written character and word dataset in Swedish named CArDIS. The character dataset contains 1,01,500 Latin and Swedish character images belonging to 29 classes while the word dataset contains 10,000 word images containing ten popular Swedish names belonging to 10 classes in RGB color space. Also, the performance of six machine learning classifiers on CArDIS and existing datasets have been reported. The thesis concludes that classifiers when trained on existing dataset and tested on CArDIS dataset show low recognition accuracy proving that, the CArDIS dataset have unique characteristics and features over the existing handwritten datasets. Finally, this re-search provided a first Swedish character and word dataset, which is robust with a proven accuracy; also it is publicly available for further research. Handwritten Text Recognition Optical Character Recognition Machine learning methods handwritten character dataset Computer Sciences Datavetenskap (datalogi)
142	Investigating techniques for improving accuracy and limiting overfitting for YOLO and real-time object detection on iOS Güven, Jakup January 2019 (has links) I detta arbete genomförs utvecklingen av ett realtids objektdetekteringssystem för iOS. För detta ändamål används YOLO, en ett-stegs objektdetekterare och ett s.k. ihoplänkat neuralt nätverk vilket åstadkommer betydligt bättre prestanda än övriga realtidsdetek- terare i termer av hastighet och precision. En dörrdetekterare baserad på YOLO tränas och implementeras i en systemutvecklingsprocess. Maskininlärningsprocessen sammanfat- tas och praxis för att undvika överträning eller “overfitting” samt för att öka precision och hastighet diskuteras och appliceras. Vidare genomförs en rad experiment vilka pekar på att dataaugmentation och inkludering av negativ data i ett dataset medför ökad precision. Hyperparameteroptimisering och kunskapsöverföring pekas även ut som medel för att öka en objektdetekringsmodells prestanda. Författaren lyckas öka modellens mAP, ett sätt att mäta precision för objektdetekterare, från 63.76% till 86.73% utifrån de erfarenheter som dras av experimenten. En modells tendens för överträning utforskas även med resultat som pekar på att träning med över 300 epoker rimligen orsakar en övertränad modell. / This paper features the creation of a real time object detection system for mobile iOS using YOLO, a state-of-the-art one stage object detector and convoluted neural network far surpassing other real time object detectors in speed and accuracy. In this process an object detecting model is trained to detect doors. The machine learning process is outlined and practices to combat overfitting and increasing accuracy and speed are discussed. A series of experiments are conducted, the results of which suggests that data augmentation, including negative data in a dataset, hyperparameter optimisation and transfer learning are viable techniques in improving the performance of an object detection model. The author is able to increase mAP, a measurement of accuracy for object detectors, from 63.76% to 86.73% based on the results of experiments. The tendency for overfitting is also explored and results suggest that training beyond 300 epochs is likely to produce an overfitted model. YOLO object detection overfitting dataset composition hyperparameter optimisation transfer learning iOS real-time improving accuracy Engineering and Technology Teknik och teknologier
143	Using Machine Learning techniques to understand glucose fluctuation in response to breathing signals Karamichalis, Nikolaos January 2021 (has links) Blood glucose (BG) prediction and classification plays big role in diabetic patients' daily lives. Based on International Diabetes Federation (IDF) in 2019, 463 million people are diabetic globally and the projection by 2045 is that the number will rise to 700 million people. Continuous glucose monitor (CGM) systems assist diabetic patients daily, by alerting them about their BG levels fluctuations continuously. The history of CGM systems started in 1999, when the Food and Drug Administration (FDA) approved the first CGM system, until nowadays where the developments of the system's accurate reading and delay on reporting are continuously improving. CGM systems are key elements in closed-loop systems, that are using BG monitoring in order to calculate and deliver with the patient's supervision the needed insulin to the patient automatically. Data quality and the feature variation are essential for CGM systems, therefore many studies are being conducted in order to support the developments and improvements of CGM systems and diabetics daily lives. This thesis aims to show that physiological signals retrieved from various sensors, can assist the classification and prediction of BG levels and more specifically that breathing rate can enhance the accuracy of CGM systems for diabetic patients and also healthy individuals. The results showed that physiological data can improve the accuracy of prediction and classification of BG levels and improve the performance of CGM systems during classification and prediction tasks. Finally, future improvements could include the use of predictive horizon (PH) regarding the data and also the selection and use of different models. diabetes machine learning classification prediction cgm physiological signals glucose type 1 d1namo dataset Computer Sciences Datavetenskap (datalogi)
144	Parallel Coordinates Diagram Implementation in 3D Geometry Suma, Christopher G. January 2018 (has links) No description available. Computer Science parallel coordinates diagram three-dimensional brushing WPF ParallelCoordinates3D XDAT 3D Archimedean spiral multivariate data large dataset
145	Distribution-based Summarization for Large Scale Simulation Data Visualization and Analysis Wang, Ko-Chih 11 July 2019 (has links) No description available. Computer Science Computer Engineering
146	Text simplification in Swedish using transformer-based neural networks / Textförenkling på Svenska med transformer-baserade neurala nätverk Söderberg, Samuel January 2023 (has links) Textförenkling innebär modifiering av text så att den blir lättare att läsa genom ersättning av komplexa ord, ändringar av satsstruktur och/eller borttagning av onödig information. Forskning existerar kring textförenkling på svenska, men användandet av neurala nätverk inom området är begränsat. Neurala nätverk kräver storaskaliga och högkvalitativa dataset, men sådana dataset är sällsynta för textförenkling på svenska. Denna studie undersöker framtagning av dataset för textförenkling på svenska genom parafrasutvinning från webbsidor och genom översättning av existerande dataset till svenska, och hur neurala nätverk tränade på sådana dataset presterar. Tre dataset med sekvenspar av komplexa och motsvarande simpla sekvenser skapades, den första genom parafrasutvinning från web data, det andra genom översättning av ett dataset från engelska till svenska, och ett tredje genom att kombinera de framtagna dataseten till ett. Dessa dataset användes sedan för att finjustera ett neuralt vätverk av BART modell, förtränad på stora mängder svensk data. Utvärdering av de tränade modellerna utfördes sedan genom en manuell undersökning och kategorisering av output, och en automatiserad bedömning med mätverktygen SARI och LIX. Två olika dataset för testning skapades och användes i utvärderingen, ett översatt från engelska och ett manuellt framtaget från svenska texter. Den automatiska utvärderingen med SARI gav resultat nära, men inte lika bra, som liknande forskning inom textförenkling på engelska. Utvärderingen med LIX gav resultat på liknande nivå eller bättre än nuvarande forskning inom textförenkling på svenska. Den manuella utvärderingen visade att modellen tränad på datat från parafrasutvinningen oftast producerade korta sekvenser med många ändringar jämfört med originalet, medan modellen tränad på det översatta datasetet oftast producerade oförändrade sekvenser och/eller sekvenser med få ändringar. Dock visade det sig att modellen tränad på de utvunna paragraferna producerade många fler oanvändbara sekvenser än vad modellen tränad på det översatta datasetet gjorde. Modellen tränad på det kombinerade datasetet presterade mellan de två andra modellerna i dessa två avseenden, då den producerade färre oanvändbara sekvenser än modellen tränad på de utvunna paragraferna och färre oförändrade sekvenser jämfört med modellen tränad på det översatta datat. Många sekvenser förenklades bra med de tre modellerna, men den manuella utvärderingen visade att en signifikant andel av de genererade sekvenserna förblev oförändrade eller oanvändbara, vilket belyser behovet av ytterligare forskning, utforskning av metoder, och förfinande av de använda verktygen. / Text simplification involves modifying text to make it easier to read by replacing complex words, altering sentence structure, and/or removing unnecessary information. It can be used to make text more accessible to a larger crowd. While research in text simplification exists for Swedish, the use of neural networks in the field is limited. Neural networks require large-scale high-quality datasets, but such datasets are scarce for text simplification in Swedish. This study investigates the acquisition of datasets through paraphrase mining from web snapshots and translation of existing datasets for text simplification in English to Swedish and aims to assess the performance of neural network models trained on such acquired datasets. Three datasets with complex-to-simple sequence pairs were created, one through mining paraphrases from web data, another by translating a dataset from English to Swedish, and a third by combining the acquired mined and translated datasets into one. These datasets were then used to fine-tune a BART neural network model pre-trained on large amounts of Swedish data. An evaluation was conducted through manual examination and categorization of output, and automated assessment using the SARI and LIX metrics. Two different test sets were evaluated, one translated from English and one manually constructed from Swedish texts. The automatic evaluation produced SARI scores close to, but not as well as, similar research in text simplification in English. When considering LIX scores, the models perform on par or better than existing research into automatic text simplification in Swedish. The manual evaluation revealed that the model trained on the mined paraphrases generally produced short sequences that had many alterations compared to the original, while the translated dataset often produced unchanged sequences and sequences with few alterations. However, the model trained on the mined dataset produced many more sequences that were unusable, either with corrupted Swedish or by altering the meaning of the sequences, compared to the model trained on the translated dataset. The model trained on the combined dataset reached a middle ground in these two regards, producing fewer unusable sequences than the model trained on the mined dataset and fewer unchanged sequences compared to the model trained on the translated dataset. Many sequences were successfully simplified using the three models, but the manual evaluation revealed that a significant portion of the generated sequences remains unchanged or unusable, highlighting the need for further research, exploration of methods, and tool refinement. Machine learning Natural language processing Text simplification Datasets Maskininlärning Neurolingvistisk programmering Textförenkling Dataset Computer and Information Sciences Data- och informationsvetenskap
147	Context-aware Swedish Lexical Simplification : Using pre-trained language models to propose contextually fitting synonyms / Kontextmedveten lexikal förenkling på svenska : Användningen av förtränade språkmodeller för att föreslå kontextuellt passande synonymer. Graichen, Emil January 2023 (has links) This thesis presents the development and evaluation of context-aware Lexical Simplification (LS) systems for the Swedish language. In total three versions of LS models, LäsBERT, LäsBERT-baseline, and LäsGPT, were created and evaluated on a newly constructed Swedish LS evaluation dataset. The LS systems demonstrated promising potential in aiding audiences with reading difficulties by providing context-aware word replacements. While there were areas for improvement, particularly in complex word identification, the systems showed agreement with human annotators on word replacements. The effects of fine-tuning a BERT model for substitution generation on easy-to-read texts were explored, indicating no significant difference in the number of replacements between fine-tuned and non-fine-tuned versions. Both versions performed similarly in terms of synonymous and simplifying replacements, although the fine-tuned version exhibited slightly reduced performance compared to the baseline model. An important contribution of this thesis is the creation of an evaluation dataset for Lexical Simplification in Swedish. The dataset was automatically collected and manually annotated. Evaluators assessed the quality, coverage, and complexity of the dataset. Results showed that the dataset had high quality and a perceived good coverage. Although the complexity of the complex words was perceived to be low, the dataset provides a valuable resource for evaluating LS systems and advancing research in Swedish Lexical Simplification. Finally, a more transparent and reader-empowering approach to Lexical Simplification isproposed. This new approach embraces the challenges with contextual synonymy and reduces the number of failure points in the conventional LS pipeline, increasing the chancesof developing a fully meaning-preserving LS system. Links to different parts of the project can be found here: The Lexical Simplification dataset: https://github.com/emilgraichen/SwedishLSdataset The lexical simplification algorithm: https://github.com/emilgraichen/SwedishLexicalSimplifier automatic text simplification lexical simplification Swedish BERT GPT-3 evaluation dataset synonymy
148	Polarimetric Imagery for Object Pose Estimation Siefring, Matthew D. 15 May 2023 (has links) No description available. Electrical Engineering Optics Polarimetric Imagery visible-spectrum deep-learning object pose estimation CNN late-fusion Stokes-products dataset
149	Enhancing Efficiency and Trustworthiness of Deep Learning Algorithms Isha Garg (15341896) 24 April 2023 (has links) <p>This dissertation explore two major goals in Deep Learning algorithm design: efficiency and trustworthiness. We motivate these concerns in Chapter 1 and give relevant background in Chapter 2. We then discuss six works to target these two goals. </p> <p>The first of these discusses how to make the model compression methodology more efficient, so it can be done in a single shot. This allows us to create models with reduced size and layers, so we can have faster and more efficient inference, and is covered in Chapter 3. We then extend this to target efficiency in continual learning in Chapter 4, while mitigating the problem of catastrophic forgetting. The method discussed also allows us to circumvent the potential for data leakage by avoiding the need to store any data from the past tasks. Next, we consider brain-inspired computing as an alternative to traditional neural networks to improve compute efficiency of networks. The spiking neural networks discussed however have large inference latency due to the need for accumulating spikes over many timesteps. We tackle this by introducing a new scheme that distributes an image over time by breaking it down into a sum of its ranked sinusoidal bases in Chapter 5. This results in networks that are faster and more efficient to deploy. Chapter 6 targets mitigating both the communication expense and potential for data leakage in federated learning, by distilling the gradients to be communicated in a small number of images that resemble noise. Communicating these images is more efficient, and circumvents the potential for data leakage as they resemble noise. We then explore the applications of studying curvature of loss with respect to input data points in the last two chapters. We first utilize curvature to create performant coresets to reduce the size of datasets, to make training more efficient in Chapter 7. In Chapter 8, we use curvature as a metric for overfitting and use it to expose dataset integrity issues arising from memorization.</p> Computer vision Deep learning Model Compression Efficiency Continual Learning Privacy Federated Learning Neuromorphic Computing Dataset Integrity CNN models Coresets
150	Development of Artificial Intelligence-based In-Silico Toxicity Models. Data Quality Analysis and Model Performance Enhancement through Data Generation. Malazizi, Ladan January 2008 (has links) Toxic compounds, such as pesticides, are routinely tested against a range of aquatic, avian and mammalian species as part of the registration process. The need for reducing dependence on animal testing has led to an increasing interest in alternative methods such as in silico modelling. The QSAR (Quantitative Structure Activity Relationship)-based models are already in use for predicting physicochemical properties, environmental fate, eco-toxicological effects, and specific biological endpoints for a wide range of chemicals. Data plays an important role in modelling QSARs and also in result analysis for toxicity testing processes. This research addresses number of issues in predictive toxicology. One issue is the problem of data quality. Although large amount of toxicity data is available from online sources, this data may contain some unreliable samples and may be defined as of low quality. Its presentation also might not be consistent throughout different sources and that makes the access, interpretation and comparison of the information difficult. To address this issue we started with detailed investigation and experimental work on DEMETRA data. The DEMETRA datasets have been produced by the EC-funded project DEMETRA. Based on the investigation, experiments and the results obtained, the author identified a number of data quality criteria in order to provide a solution for data evaluation in toxicology domain. An algorithm has also been proposed to assess data quality before modelling. Another issue considered in the thesis was the missing values in datasets for toxicology domain. Least Square Method for a paired dataset and Serial Correlation for single version dataset provided the solution for the problem in two different situations. A procedural algorithm using these two methods has been proposed in order to overcome the problem of missing values. Another issue we paid attention to in this thesis was modelling of multi-class data sets in which the severe imbalance class samples distribution exists. The imbalanced data affect the performance of classifiers during the classification process. We have shown that as long as we understand how class members are constructed in dimensional space in each cluster we can reform the distribution and provide more knowledge domain for the classifier. Predictive toxicology Toxicity data Pesticides Artificial intelligence Data quality Data generation Model performance QSAR Classification algorithm Clustering Imbalanced dataset Endpoints

Search results