Spelling suggestions: "subject:"pytorch"" "subject:"gpytorch""
1 |
Measurement of machine learning performance with different condition and hyperparameterYin, Jiaqi 08 October 2020 (has links)
No description available.
|
2 |
Djupinlärning för kameraövervakningBlomqvist, Linus January 2020 (has links)
Allt fler misshandelsbrott sker i Sverige enligt Brå. För att reducera detta kan det som fångats på övervakningskameror användas i brottsutredningar, för att senare användas som bevismaterial till att döma den eller de skyldiga till brottet. Genom att optimera övervakningen kan företag använda sig av automatiserad igenkänning. Automatisering för igenkänningen av normala kontra onormala beteenden går att lösa med djupinlärning. Syftet med denna undersökning är att finna en lämplig modell som kan identifiera det onormala beteendet (till exempel ett slagsmål). Modell arkitekturen som användes under projektet var 3D ResNet, eftersom den klara av en djupare arkitektur. Ett djupare nätverk, innebär bättre prediktion av problemet. 3DResNet-34 var den modell arkitekturen som gav högst noggrannhet med 93,33%. Implementering av projektet utfördes i ramverket PyTorch. Undersökningen har visat att med hjälp av överförd inlärning går det att återanvända kunskap från förtränade modeller och applicera dessa kunskaper på det aktuella problemet. Detta bidrar till en mer pålitligare modell med noggrann prediktion på nytt övervaknings material. / According to Brå, more assault crimes are taking place in Sweden. To reduce this, information that was captured on surveillance cameras can be used in criminal investigations, to convict the perpetrator or perpetrators of the crime. To optimize monitoring, companies can use automation. Automation of the recognition of normal versus abnormal activities can be solved with deep learning. The purpose of this study is to find a suitable model that can identify the abnormal activity (for example, a fight). The model architecture used during the project was 3D ResNet, because it was capable of handling deeper architectures. Having a deeper network means better prediction of the problem. 3D ResNet-34 was the model architecture that gave the highest accuracy with 93,33%. Implementation of the project was carried out in the framework of PyTorch. The study has shown that with the help of transfer learning it is possible to transfer knowledge from pre-trained models and apply this knowledge to the current problem. This contributes to a more reliable model with accurate prediction for new surveillance footage.
|
3 |
Additional Classes Effect on Model Accuracy using Transfer LearningKazan, Baran January 2020 (has links)
This empirical research study discusses how much the model’s accuracy changes when adding a new image class by using a pre-trained model with the same labels and measuring the precision of the previous classes to observe the changes. The purpose is to determine if using transfer learning is beneficial for users that do not have enough data to train a model. The pre-trained model that was used to create a new model was the Inception V3. It has the same labels as the eight different classes that were used to train the model. To test this model, classes of wild and non-wild animals were taken as samples. The algorithm used to train the model was implemented in a single class programmed in Python programming language with PyTorch and TensorBoard library. The Tensorboard library was used to collect and represent the result. Research results showed that the accuracy of the first two classes was 94.96% in training and 97.07% in validation. When training the model with a total of eight classes, the accuracy was 91.89% in training and 95.40 in validation. The precision of both classes was detected at 100% when the model solely had cat and dog classes. After adding six additional classes in the model, the precision changed to 95.82% of the cats and 97.16% of the dogs.
|
4 |
Benchmarking a machine learning model in the transformation from PyTorch to CoreMLAhremark, Jens, Bazso, Simon January 2022 (has links)
Due to rapid development in the field of machine learning and large increases in the capabilitiesof mobile devices, utilizing machine learning on these is becoming increasingly popular. Onemethod of deployment is to develop a machine learning model in well-established deep learningframeworks like PyTorch. However, to be able to run these models on mobile devices, specificframeworks are usually needed. In this thesis, we investigate the performance of a popular objectdetection model, YOLOv5, while being converted from PyTorch to CoreML. This includesmeasuring the performance of the model while running on different hardware. To accomplishthis, we put forward several common benchmarking metrics and compare the different stages.Our results show that CoreML greatly reduces the latency of a machine learning model and hascomparable detection accuracy of within a few percent in the metrics chosen. For iOS deviceswith the ANE-chipset, we also found that ANE (Apple Neural Engine) has significantly fasterlatency compared to running the model on GPU and CPU, while detection accuracy ismaintained. We discuss what could be the root cause for the small loss of accuracy in the modeland a foundation is laid for future work.
|
5 |
QPLaBSE: Quantized and Pruned Language-Agnostic BERT Sentence Embedding Model : Production-ready compression for multilingual transformers / QPLaBSE: Kvantiserad och prunerad LaBSE : Produktionsklar komprimering för flerspråkiga transformer-modellerLangde, Sarthak January 2021 (has links)
Transformer models perform well on Natural Language Processing and Natural Language Understanding tasks. Training and fine-tuning of these models consume a large amount of data and computing resources. Fast inference also requires high-end hardware for user-facing products. While distillation, quantization, and head-pruning for transformer models are well- explored domains in academia, the practical application is not straightforward. Currently, for good accuracy of the optimized models, it is necessary to fine-tune them for a particular task. This makes the generalization of the model difficult. If the same model has to be used for multiple downstream tasks, then it would require applying the process of optimization with fine-tuning for each task. This thesis explores the techniques of quantization and pruning for optimization of the Language-Agnostic BERT Sentence Embedding (LaBSE) model without fine-tuning for a downstream task. This should enable the model to be generalized enough for any downstream task. The techniques explored in this thesis are dynamic quantization, static quantization, quantize-aware training quantization, and head-pruning. The downstream performance is evaluated using sentiment classification, intent classification, and language-agnostic classification tasks. The results show that LaBSE can be accelerated on the CPU to 2.6x its original inference time without any loss of accuracy. Head-pruning 50% of the heads from each layer leads to 1.2x speedup while removing all heads but one leads to 1.32x speedup. A speedup of almost 9x is achieved by combining quantization with head-pruning with average 8% drop in accuracy on downstream evaluation tasks. / Transformer-modeller ger bra resultat i uppgifter som rör behandling av och förståelse för naturligt språk. Träning och finjustering av dessa modeller kräver dock en stor mängd data och datorresurser. Snabb inferensförmåga kräver också högkvalitativ hårdvara för användarvänliga produkter och tjänster. Även om destillering, kvantisering och head-pruning för transformer-modeller är väl utforskade områden inom den akademiska världen är den praktiska tillämpningen inte okomplicerad. För närvarande är det nödvändigt att finjustera de optimerade modellerna för en viss uppgift för att uppnå god noggrannhet där. Detta gör det svårt att generalisera modellerna. Om samma modell skall användas för flera uppgifter i sekvens så måste man tillämpa optimeringsprocessen med finjustering för varje uppgift. I den här uppsatsen undersöks tekniker för kvantisering och prunering för optimering av LaBSE- modellen (Language-Agnostic BERT Sentence Embedding) utan finjustering för en downstream-uppgift. Detta bör göra det möjligt att generalisera modellen tillräckligt mycket för alla efterföljande uppgifter. De tekniker som undersöks är dynamisk kvantisering, statisk kvantisering, samt kvantisering för träning och head-pruning. Prestandan i efterföljande led utvärderas med hjälp av klassificering av känslor, avsiktsklassificering och språkagnostiska klassificeringsuppgifter. Resultaten visar att LaBSE kan öka effektiviteten hos CPU:n till 2,6 gånger sin ursprungliga inferenstid utan någon förlust av noggrannhet. Om 50% av huvudena från varje lager tas bort leder det till 1,2 gånger snabbare hastighet, medan det leder till 1,32 gånger snabbare hastighet om alla huvuden utom ett tas bort. Genom att kombinera kvantisering med head-pruning uppnås en ökning av hastigheten med nästan 9x, med en genomsnittlig minskning av noggrannheten med 8% i utvärderingsuppgifter nedströms.
|
6 |
Tolkning av handskrivna siffror i formulär : Betydelsen av datauppsättningens storlek vid maskininlärningKirik, Engin January 2021 (has links)
Forskningen i denna studie har varit att tag fram hur mycket betydelse storleken på datauppsättningen har för inverkan på resultat inom objektigenkänning. Forskningen implementerades i att träna en modell inom datorseende som skall kunna identifiera och konvertera handskrivna siffror från fysisk-formulär till digitaliserad-format. Till denna process användes två olika ramverk som heter TensorFlow och PyTorch. Processen tränades inom två olika miljöer, ena modellen tränades i CPU-miljö och den andra i Google Clouds GPU-miljö. Tanken med studien är att förbättra resultat från tidigare examensarbete och forska vidare till att utöka utvecklingen extra genom att skapa en modell som identifierar och digitaliserar flera handskrivna siffror samtidigt på ett helt formulär. För att vidare i fortsättningen kunna användas till applikationer som räknar ihop tex poängskörden på ett formulär med hjälp av en mobilkamera för igenkänning. Projektet visade ett resultat av ett felfritt igenkännande av flera siffror samtidigt, när datauppsättningen ständigt utökades. Resultat kring enskilda siffror lyckades identifiera alla siffror från 0 till 9 med både ramverket TensorFlow och PyTorch. / The research in this study has been to extract how important the size of the dataset is for the impact on results within object recognition. The research was implemented in training a model in computer vision that should be able to identify and convert handwritten numbers from physical forms to digitized format. Two different frameworks called TensorFlow and PyTorch were used for this process. The process was trained in two different environments, one model was trained in the CPU environment and the other in the Google Cloud GPU environment. The idea of the study is to improve results from previous degree projects and further research to expand the development extra by creating a model that identifies and digitizes several handwritten numbers simultaneously on a complete form, which will continue to be able to help and be used in the future for applications that sums up points on a form using a mobile camera for recognition. The project showed a result of an error-free recognition of several numbers at the same time, when the data set was constantly expanded. Results around individual numbers managed to identify all numbers from 0 to 9 with both the TensorFlow and PyTorch frameworks.
|
7 |
Developing a Neural Network Model for Semantic Segmentation / Utveckling av en neural nätverksmodell för semantisk segmenteringWestphal, Ronny January 2023 (has links)
This study details the development of a neural network model designed for real-time semantic segmentation, specifically to distinguish sky pixels from other elements within an image. The model is incorporated into a feature for an Augmented Reality application in Unity, leveraging Unity Barracuda—a versatile neural network inference library. While Barracuda offers cross-platform compatibility, it poses challenges due to its lack of support for certain layers and operations. Consequently, it lacks the support of most state-of-the-art models, and this study aims to provide a model that works within Barracuda. Given Unity's absence of a framework for model development, the development and training of the model was conducted in an open-source machine learning library. The model is continuously evaluated to optimize the trade-off between prediction accuracy and operational speed. The resulting model is able to predict and classify each pixel in an image at around 137 frames per second. While its predictions might not be on par with some of the top-performing models in the industry, it effectively meets its objectives, particularly in the real-time classification of sky pixels within Barracuda. / Denna rapport beskriver utvecklingen av en neural nätverksmodell avsedd för semantisk segmentering i realtid, specifikt för att särskilja himlen från andra element inom en bild. Modellen integreras i en funktion för en applikation med augmenterad verklighet i Unity, med hjälp av Unity Barracuda - ett mångsidigt bibliotek för neurala nätverk. Även om Barracuda erbjuder kompatibilitet över olika plattformar, medför det utmaningar på grund av dess brist på stöd för vissa lager och operationer. Följaktligen saknar den stöd från de bäst presterande modellerna, och denna studie syftar till att erbjuda en modell som fungerar inom Barracuda. Med tanke på Unitys avsaknad av ett ramverk för modellutveckling valdes ett open-source maskininlärningsbibliotek. Modellen utvärderas kontinuerligt för att optimera avvägningen mellan förutsägelseprecision och driftshastighet. Den resulterande modellen kan förutsäga och klassificera varje pixel i en bild med en hastighet på cirka 137 bilder per sekund. Även om dess förutsägelseprecision inte är i nivå med några av de bäst presterande modellerna inom branschen, uppfyller den effektivt sina mål, särskilt när det gäller realtidsklassificering av himlen inom Barracuda.
|
8 |
Detekce pohybujících se objektů ve videu s využitím neuronových sítí / Object detection in video using neural networksMikulský, Petr January 2021 (has links)
This diploma thesis deals with the detection of moving objects in a video recording using neural networks. The aim of the thesis was to detect road users in video recordings. Pre-trained YOLOv5 object detection model was used for a practical part of the thesis. As part of the solution, an own dataset of traffic road video recordings was created and annotated with following classes: a car, a bus, a van, a motorcycle, a truck and a trailer truck. Final version of this dataset comprise 5404 frames and 6467 annotated objects in total. After training, the YOLOv5 model achieved 0.995 mAP, 0.995 precision and 0.986 recall on the dataset. All steps leading to the final form of the dataset are described in the conclusion chapter.
|
9 |
Self-supervised učení v aplikacích počítačového vidění / Self-supervised learning in computer vision applicationsVančo, Timotej January 2021 (has links)
The aim of the diploma thesis is to make research of the self-supervised learning in computer vision applications, then to choose a suitable test task with an extensive data set, apply self-supervised methods and evaluate. The theoretical part of the work is focused on the description of methods in computer vision, a detailed description of neural and convolution networks and an extensive explanation and division of self-supervised methods. Conclusion of the theoretical part is devoted to practical applications of the Self-supervised methods in practice. The practical part of the diploma thesis deals with the description of the creation of code for working with datasets and the application of the SSL methods Rotation, SimCLR, MoCo and BYOL in the role of classification and semantic segmentation. Each application of the method is explained in detail and evaluated for various parameters on the large STL10 dataset. Subsequently, the success of the methods is evaluated for different datasets and the limiting conditions in the classification task are named. The practical part concludes with the application of SSL methods for pre-training the encoder in the application of semantic segmentation with the Cityscapes dataset.
|
10 |
Detekce a klasifikace poškození otisku prstu s využitím neuronových sítí / Detection and Classification of Damage in Fingerprint Images Using Neural NetsVican, Peter January 2021 (has links)
The aim of this diploma thesis is to study and design experimental improvement of the convolutional neural network for disease detection. Another goal is to extend the classifier with a new type of detection. he new type of detection is damage fingerprint by pressure. The experimentally improved convolutional network is implemented by PyTorch. The network detects which part of the fingerprint is damaged and draws this part into the fingerprint. Synthetic fingerprints are used when training the net. Real fingerprints are added to the synthetic fingerprints.
|
Page generated in 0.0191 seconds