Spelling suggestions: "subject:"image recognition"" "subject:"lmage recognition""
91 |
Impact of data augmentations when training the Inception model for image classificationBarai, Milad, Heikkinen, Anthony January 2017 (has links)
Image classification is the process of identifying to which class a previously unobserved object belongs to. Classifying images is a commonly occurring task in companies. Currently many of these companies perform this classification manually. Automated classification however, has a lower expected accuracy. This thesis examines how automated classification could be improved by the addition of augmented data into the learning process of the classifier. We conduct a quantitative empirical study on the effects of two image augmentations, random horizontal/vertical flips and random rotations (<180◦). The data set that is used is from an auction house search engine under the commercial name of Barnebys. The data sets contain 700 000, 50 000 and 28 000 images with each set containing 28 classes. In this bachelor’s thesis, we re-trained a convolutional neural network model called the Inception-v3 model with the two larger data sets. The remaining set is used to get more class specific accuracies. In order to get a more accurate value of the effects we used a tenfold cross-validation method. Results of our quantitative study shows that the Inception-v3 model can reach a base line mean accuracy of 64.5% (700 000 data set) and a mean accuracy of 51.1% (50 000 data set). The overall accuracy decreased with augmentations on our data sets. However, our results display an increase in accuracy for some classes. The highest flat accuracy increase observed is in the class "Whine & Spirits" in the small data set where it went from 42.3% correctly classified images to 72.7% correctly classified images of the specific class. / Bildklassificering är uppgiften att identifiera vilken klass ett tidigare osett objekt tillhör. Att klassificera bilder är en vanligt förekommande uppgift hos företag. För närvarande utför många av dessa företag klassificering manuellt. Automatiserade klassificerare har en lägre förväntad nogrannhet. I detta examensarbete studeradas hur en maskinklassificerar kan förbättras genom att lägga till ytterligare förändrad data i inlärningsprocessen av klassificeraren. Vi genomför en kvantitativ empirisk studie om effekterna av två bildförändringar, slumpmässiga horisontella/vertikala speglingar och slumpmässiga rotationer (<180◦). Bilddatasetet som används är från ett auktionshus sökmotor under det kommersiella namnet Barnebys. De dataseten som används består av tre separata dataset, 700 000, 50 000 och 28 000 bilder. Var och en av dataseten innehåller 28 klasser vilka mappas till verksamheten. I det här examensarbetet har vi tränat Inception-v3-modellen med dataset av storlek 700 000 och 50 000. Vi utvärderade sedan noggrannhet av de tränade modellerna genom att klassificera 28 000-datasetet. För att få ett mer exakt värde av effekterna använde vi en tiofaldig korsvalideringsmetod. Resultatet av vår kvantitativa studie visar att Inceptionv3-modellen kan nå en genomsnittlig noggrannhet på 64,5% (700 000 dataset) och en genomsnittlig noggrannhet på 51,1% (50 000 dataset). Den övergripande noggrannheten minskade med förändringar på vårat dataset. Dock visar våra resultat en ökad noggrannhet i vissa klasser. Den observerade högsta noggrannhetsökningen var i klassen Åhine & Spirits", där vi gick från 42,3 % korrekt klassificerade bilder till 72,7 % korrekt klassificerade bilder i det lilla datasetet med förändringar.
|
92 |
Real-time object detection robotcontrol : Investigating the use of real time object detection on a Raspberry Pi for robot control / Autonom robot styrning via realtids bildigenkänning : Undersökning av användningen av realtids bildigenkänning på en Raspberry Pi för robotstyrningRyberg, Simon, Jansson, Jonathan January 2022 (has links)
The field of autonomous robots have been explored more and more over the last decade. The combination of machine learning advances and increases in computational power have created possibilities to explore the usage of machine learning models on edge devices. The usage of object detection on edge devices is bottlenecked by the edge devices' limited computational power and they therefore have constraints when compared to the usage of machine learning models on other devices. This project explored the possibility to use real time object detection on a Raspberry Pi as input in different control systems. The Raspberry with the help of a coral USB accelerator was able to find a specified object and drive to it, and it did so successfully with all the control systems tested. As the robot was able to navigate to the specified object with all control systems, the possibility of using real time object detection in faster paced situations can be explored. / Ämnet autonoma robotar har blivit mer och mer undersökt under de senaste årtiondet. Kombinationen av maskin inlärnings förbättringar och ökade beräknings möjligheter hos datorer och chip har gjort det möjligt att undersöka användningen av maskin inlärningsmodeller på edge enheter. Användandet av bildigenkänning på edge enheter är begränsad av edge enheten begränsade datorkraft, och har därför mer begränsningar i jämförelse med om man använder bildigenkänning på en annan typ av enhet. Det här projektet har undersökt möjligheten att använda bildigenkänning i realtid som input för kontrollsystem på en Raspberry Pi. Raspberry Pien med hjälp av en Coral USB accelerator lyckades att lokalisera och köra till ett specificerat objekt, Raspberryn gjorde detta med alla kontrollsystem som testades på den. Eftersom roboten lyckades med detta, så öppnas möjligheten att använda bildigenkänning på edge enheter i snabbare situationer.
|
93 |
Comparing the Cost-effectiveness of Image Recognition for Elastic Cloud Computing : A cost comparison between Amazon Web Services EC2 instances / Jämför kostnadseffetiviten av bildigenkänning för Elastic Cloud Computing : En kostnadsjämförelse mellan Amazon Web Services EC2 instanserGauffin, Christopher, Rehn, Erik January 2021 (has links)
With the rise of the usage of AI, the need for computing power has grown exponentially. This has made cloud computing a popular option with its cost- effective and highly scalable capabilities. However, due to its popularity there exists thousands of possible services to choose from, making it hard to find the right tool for the job. The purpose of this thesis is to provide a methodological approach for evaluating which alternative is the best for machine learning applications deployed in the cloud. Nine different instances were evaluated on a major cloud provider and compared for their performance relative to their cost. This was accomplished by developing a cost evaluation model together with a test environment for image recognition models. The environment can be used on any type of cloud instance to aid in the decision-making. The results derived from the specific premises used in this study indicate that the higher the hourly cost an instance had, the less cost-effective it was. However, when making the same comparison within an instance family of similar machines the same conclusion can not be made. Regardless of the conclusions made in this thesis, the problem addressed remains, as the domain is too large to cover in one report. But the methodology used holds great value as it can act as guidance for similar evaluation with a different set of premises. / Användingen av Artificiell Intelligens har aldrig varit så stor som den är idag och behovet av att kunna göra tyngre och mer komplexa beräkningar har växt exponentiellt. Detta har gjort att molnet, cloud, ett mycket populärt alternativt för sin kostadseffektiva och skalbara förmåga. Däremot så finns det tusentals alternativ att välja emellan vilket gör det svårt att hitta rätt verktyg för jobbet. Syftet med denna uppsats är att förse läsaren med en användbar metodik för att evaluera vilket instans som passar bäst för maskininlärnings applikationer som distribueras i molnet. Nio stycken olika instanser evaluerades på en molnleverantör genom att jämföra deras prestanda kontra deras kostnad. Detta gjordes genom att utveckla en kostnadsmodell tillsammans med en testmiljö för bildigenkänningsmodeller. Testmiljön som användes kan appliceras på flertal instanser som inte ingick i denna rapport för att tillåta andra att använda den för egna tester. Resultaten för studien var att de instanserna med högre timkostnad tenderar till att vara mindre kostnadseffektiva. Gör man samma jämförelse med endast instanser av samma typ som är anpassade för maskininlärning så är samma slutsats inte lika självklar. Oavsett slutsatser som ges i denna rapport så består problemet. Detta beror på att molnet berör så många olika faktorer som bör värderas i evalueringen, till exempel utvecklingstid och modellens förmåga att förutspå en bild vilket alla kräver sin egna tes. Men metodiken som används kan definitivt vara till stor nytta om man vill göra en liknande utvärdering med andra premisser.
|
94 |
Programming with shapes / Programmering med formerWebb, Jack January 2024 (has links)
This thesis investigated how shapes can be mapped to programming constructs, offering a new way to compose and understand code with the long term goal of creating a tactile programming tool. By doing so it delved into the challenges of translating shapes into abstract programming concepts. Existing programming tools rely heavily on visual interfaces, making them inaccessible to individuals with visual impairments. Similar endeavours to create tactile programming tools were analysed and were shown to be domain-specific rather than Turing-complete which greatly limits their usefulness. The solution was to map a set of shapes to a set of Brainfuck (BF) instructions and classifying these shapes with a Support Vector Machine (SVM). Results are promising but are as of yet untested in less than ideal conditions, such as it would be in a real world application. More work has to be done to reach the goal of a tactile programming tool accessible to individuals with visual impairments. / Denna avhandling undersökte hur former kan kartläggas till programmerings-konstruktioner, vilket erbjuder ett nytt sätt att komponera och förstå kod med ett långsiktigt mål att skapa ett taktilt programmingsverktyg. Genom att göra det går den in på utmaningarna med att översätta former till abstrakta programmeringskoncept. Befintliga programmeringsverktyg förlitar sig i hög grad på visuella gränssnitt, vilket gör dem otillgängliga för personer med synnedsättningar. Liknande försök att skapa taktila programmeringsverktyg analyserades och visades vara domänspecifika snarare än Turing-kompletta, vilket starkt begränsar deras användbarhet. Lösningen var att kartlägga en uppsättning former till en uppsättning Brainfuck (BF)-instruktioner och klassificera dessa former med en Support Vector Machine (SVM). Resultaten är lovande men har ännu inte testats under mindre än ideala förhållanden, såsom det skulle vara i en verklig tillämpning. Mer arbete måste göras för att nå målet med ett taktilt programmeringsverktyg som är tillgängligt för personer med synnedsättningar.
|
95 |
Identifikation sensibler anatomischer Strukturen in der roboterassistierten Kolorektalchirurgie mittels maschineller LernverfahrenRinner, Franziska Maria 05 February 2025 (has links)
Das Kolorektale Karzinom zählt zu den häufigsten malignen Erkrankungen und ist verbunden mit einer hohen Letalität. Die Therapie des KRK beinhaltet häufig chirurgische Herangehensweisen, welche sich in den letzten Jahren zunehmend in Richtung minimalinvasiver und roboterassistierter Verfahren entwickeln. Die operative Versorgung bringt, auch aufgrund der engen anatomischen Lagebeziehungen, ein Verletzungsrisiko für verschiedene anatomische Strukturen mit sich. Hierzu zählen neben seltenen Organverletzungen auch Schädigungen der Nerven, welche in die Lebensqualität oft stark beeinträchtigenden urogenitalen Dysfunktionen resultieren können. Daraus können Komplikationen folgen, deren Vermeidung stets angestrebt wird. Maschinelle Lernverfahren erleben in den letzten Jahren eine zunehmende Popularität und werden auf eine wachsende Vielzahl von Arbeitsbereichen mit Erfolg angewendet. Darunter findet sich auch die Medizin. Da die mangelhafte Erkennung von abdominellen Strukturen einen relevanten Risikofaktor verschiedener Operationen darstellt, der einen großen Einfluss auf die Prognose und Lebensqualität der Patient*innen hat, bietet sich hier großes Potential. An dieser Stelle soll durch eine KI-basierte Assistenz eine Lücke geschlossen werden, deren Relevanz bisher noch nicht ausreichend untersucht wurde. Auch wenn sich bereits zeigte, dass maschinelle Lernverfahren das Potential haben, optisch differenzierbare Strukturen im chirurgischen Kontext zu erkennen, bleibt die klinische Bedeutung dessen bislang unklar. In dieser Arbeit wird untersucht, inwiefern die Erkennung anatomischer Strukturen durch Bilderkennungsalgorithmen auf intraoperativem Bildmaterial möglich ist. Dies ist eine notwendige Grundlage für die Entwicklung weiterführender Technologien für die Erleichterung von operativen Eingriffen, die Vermeidung von Komplikationen oder die Erkennung morphologisch sichtbarer Pathologien. Hierbei soll sich künftig nicht nur auf die Anwendung in roboterassistierten Rektum- bzw. Sigmaresektionen beschränkt werden, sondern eine Anwendung für alle minimalinvasiven OPs ermöglicht werden. Damit können Kosten und Dauer von Eingriffen ebenso sinken wie die kognitiven Anforderungen an die Operateur*innen und eine Verbesserung der postoperativen Ergebnisse und Lebensqualität für die Patient*innen erreicht werden. Es wurden 43 zwischen Februar 2019 und März 2021 an der Klinik für Viszeral-, Thorax- und Gefäßchirurgie des Universitätsklinikums Carl Gustav Carus in Dresden durchgeführte roboterassistierte Sigma- bzw. Rektumresektionen und -exstirpationen einbezogen. Diese wurden hinsichtlich der Sichtbarkeit 6 verschiedener anatomischer Strukturen untersucht. Für die Kategorien Leber, Magen, Milz, Nerven, Pankreas und Ureter wurden jeweils zwischen 1023 und 1754 Einzelbilder aus 18 bis 23 OPs verwendet. Damit wurden sowohl anatomische Strukturen geringerer als auch höchster Komplexität betrachtet. Auf den Schritt der temporalen Annotation folgte nach der Einzelbildextraktion die semantische Segmentierung eines jeden Bildes. Dabei wurden alle Bereiche, in denen die jeweilige Struktur zu sehen ist, in ihren exakten Grenzen markiert. Diese segmentierten Bilder stellten die Grundlage für den anschließenden deep learning Prozess mittels eines CNNs dar. Das Resultat dessen war für jede Struktur ein Bilderkennungsalgorithmus, der sie automatisiert erkennen und semantische Segmentierungen anfertigen kann. Die Evaluation der Erkennungsleistung erfolgte mittels Intersection over Union, F1- Score, Precision Score, Recall Score, Specificity und Accuracy. Des Weiteren wurde die Performance des Algorithmus am Beispiel von 35 Bildern, von denen 16 das Pankreas zeigten, hinsichtlich der IoU für dieses Segment mit derjenigen von 28 Proband*innen verschiedener Wissens- und Ausbildungsstände verglichen. Dabei wurde das Konzept der Bounding box–Annotation verwendet. Die Ergebnisse der erarbeiteten Bilderkennungsalgorithmen bewegten sich für die 6 untersuchten Strukturen bei einer durchschnittlichen IoU, welche den Grad an Überlappung zweier Segmente beschreibt, zwischen 0,744 ± 0,275 und 0,255 ± 0,147. In der klinischen Evaluation wurden durch den auf das Pankreas trainierten Algorithmus Ergebnisse erzielt, die sich im Vergleich mit den 28 Proband*innen mit 0,31 an zweiter Stelle hinter einer Person von höchster Expertise eingliederten. Die von den Teilnehmenden erzielte durchschnittliche IoU betrug 0,100 ± 0,097. Die in dieser Arbeit erreichten Ergebnisse stellen einen guten Ausgangspunkt für die Weiterentwicklung KI-basierter Assistenzfunktionalitäten für den chirurgischen Alltag dar. Trotz einiger Limitationen hinsichtlich der Generalisierbarkeit einer eher kleinen und monozentrischen Untersuchung und Verbesserungspotential bezüglich der Generierung der Segmentierungen konnte ein hochwertiger Datensatz erarbeitet und publiziert werden. Insgesamt betrachtet lässt sich aus den Ergebnissen der darauf basierenden Bilderkennungsalgorithmen schließen, dass Methoden künstlicher Intelligenz bereits jetzt das Potential haben, viele Organe sehr gut und zuverlässig zu erkennen und abzugrenzen. Dadurch besitzen sie das Potential einer relevanten Unterstützung im chirurgischen Alltag. Bis zum Erreichen des Ziels einer klinischen Anwendung sind jedoch noch einige Schritte zu gehen, insbesondere die Übertragung des Erreichten auf Bewegtbildmaterial steht hierbei im Vordergrund. Zudem zeigte sich, dass die etablierten Metriken nicht ideal geeignet sind, um die klinische Relevanz der Prädiktionen abzubilden. Damit ist es notwendig, weiterhin über geeignete Metriken für die praktische Anwendung zu diskutieren und an der Entwicklung neuer Maßzahlen zu arbeiten. Allerdings kann bereits jetzt eine wertvolle Hilfestellung, besonders für Personen ohne langjährige chirurgische Erfahrung, geleistet werden. Dies stellt eine hervorragende Grundlage für die Weiterentwicklung der Algorithmen und deren Implementierung als Grundlage weiterführender Technologien im Bereich der intraoperativen Assistenzsysteme dar. / Colorectal cancer (CRC) is one of the most common malignancies and is associated with a high mortality rate. The treatment of CRC often involves surgical approaches, which in recent years have increasingly evolved towards minimally invasive and robotic-assisted procedures. Surgical treatment entails a risk of injury to various anatomical structures, in part due to the narrow surgical field and close anatomical positioning. In addition to rare organ injuries these include the more common nerve lesions which often result in urogenital dysfunction affecting patients' quality of life tremendously. Consequently, complications can occur and the aim is always to avoid them. In recent years, machine learning techniques have become increasingly popular and are being successfully applied in a growing variety of fields. One such area is medical applications. Since the inadequate detection of anatomical structures represents a relevant risk factor in various surgical procedures with a high impact on patient prognosis and quality of life, there is great potential here. At this point, AI-based assistance aims to fill a gap whose practical relevance has not yet been sufficiently investigated. Although machine learning has already been shown to identify optically differentiable structures in a surgical context, its clinical significance remains unclear to date. This work examines the possibility of using image recognition algorithms to identify anatomical structures on intraoperative image material. This research is fundamental to the development of future technologies that facilitate surgical interventions, reduce the likelihood of complications or identify morphologically visible organ pathologies. This technique is intended to be used in the widespread field of minimally invasive surgery, rather than being limited to robot-assisted rectal and sigmoid resections. This could result in a decrease in the expenses and duration of surgery, as well as a reduction in the cognitive demands on the surgeons. Additionally, it could lead to improvements in post-operative outcomes and quality of life for patients. Between February 2019 and March 2021, 43 robot-assisted rectal and sigmoid resections and extirpations performed at the Clinic for Visceral, Thoracic and Vascular Surgery of the University Hospital Carl Gustav Carus in Dresden were included in this work. The surgery recordings were examined with regard to the visibility of 6 different anatomical structures. For the categories liver, stomach, spleen, nerves, pancreas and ureter, between 1023 and 1754 individual frames from 18 to 23 surgeries were used in each case. Thus, both anatomical structures of lower as well as highest complexity were considered. After performing temporal annotation and extracting single frames, each frame underwent semantic segmentation. In this step, all areas displaying the respective structure were marked with their exact boundaries. The resulting segmented images provided as input to the subsequent deep learning process using a CNN. As a result, we have obtained an image recognition algorithm for each structure considered capable of automatic detection and semantic segmentation. The recognition performance was evaluated using metrics including Intersection over Union, F1- Score, Precision Score, Recall Score, Specificity and Accuracy. Additionally, the algorithm's performance was compared with the organ recognition skills of 28 individuals with varying levels of medical knowledge and training. A clinical evaluation was performed using the example of the pancreas, which had to be highlighted by bounding boxes. For this purpose, a sample of 35 images, 16 of which included the pancreas, was used to examine the IoU for this segment. The developed image recognition algorithms produced results regarding the average IoU, which describes the degree of overlap between two segments, ranging from 0.744 ± 0.275 to 0.255 ± 0.147 for the 6 anatomical structures investigated. During the clinical validation, the algorithm's results for the generation of bounding boxes for the segment pancreas achieved the second-highest score, at an IoU of 0.31. This ranking placed it second among the 28 individuals, surpassed only by a single person with the highest level of expertise. The average IoU obtained by the participants was 0.100 ± 0.097. The results of this work provide a good starting point for the development of further AI-based assistance functionalities for everyday surgical practice. Despite some limitations regarding the generalisability of a rather small and monocentric study and potential for improvement in the generation of the segmentations, a high quality dataset has been compiled and published. Overall, the resulting image recognition algorithms' outcomes indicate that AI techniques already have the potential to detect and differentiate a lot of organs very well and dependably. As a consequence, they have the capacity to offer significant assistance to the surgical routine. Before moving to clinical application, several additional steps need to be taken, with particular emphasis on the crucial process of transferring what has been achieved to moving image material. Furthermore, it has become evident that the established metrics are not entirely capable of representing the clinical value of the predictions. Thus, it is necessary to have further discussions regarding appropriate metrics for practical implementation and to focus on the development of new measures. However, even now they can provide a precious assistance, particularly for individuals without extensive surgical training. This signifies an excellent foundation for advancing the algorithms and implementing them as the basis for future technologies in the field of intra-operative assistance systems.
|
96 |
APPLICATIONS OF 4-STATE NANOMAGNETIC LOGIC USING MULTIFERROIC NANOMAGNETS POSSESSING BIAXIAL MAGNETOCRYSTALLINE ANISOTROPY AND EXPERIMENTS ON 2-STATE MULTIFERROIC NANOMAGNETIC LOGICD'Souza, Noel 01 January 2014 (has links)
Nanomagnetic logic, incorporating logic bits in the magnetization orientations of single-domain nanomagnets, has garnered attention as an alternative to transistor-based logic due to its non-volatility and unprecedented energy-efficiency. The energy efficiency of this scheme is determined by the method used to flip the magnetization orientations of the nanomagnets in response to one or more inputs and produce the desired output. Unfortunately, the large dissipative losses that occur when nanomagnets are switched with a magnetic field or spin-transfer-torque inhibit the promised energy-efficiency. Another technique offering superior energy efficiency, “straintronics”, involves the application of a voltage to a piezoelectric layer to generate a strain which is transferred to an elastically coupled magnetrostrictive layer, causing magnetization rotation. The functionality of this scheme can be enhanced further by introducing magnetocrystalline anisotropy in the magnetostrictive layer, thereby generating four stable magnetization states (instead of the two stable directions produced by shape anisotropy in ellipsoidal nanomagnets). Numerical simulations were performed to implement a low-power universal logic gate (NOR) using such 4-state magnetostrictive/piezoelectric nanomagnets (Ni/PZT) by clocking the piezoelectric layer with a small electrostatic potential (~0.2 V) to switch the magnetization of the magnetic layer. Unidirectional and reliable logic propagation in this system was also demonstrated theoretically. Besides doubling the logic density (4-state versus 2-state) for logic applications, these four-state nanomagnets can be exploited for higher order applications such as image reconstruction and recognition in the presence of noise, associative memory and neuromorphic computing. Experimental work in strain-based switching has been limited to magnets that are multi-domain or magnets where strain moves domain walls. In this work, we also demonstrate strain-based switching in 2-state single-domain ellipsoidal magnetostrictive nanomagnets of lateral dimensions ~200 nm fabricated on a piezoelectric substrate (PMN-PT) and studied using Magnetic Force Microscopy (MFM). A nanomagnetic Boolean NOT gate and unidirectional bit information propagation through a finite chain of dipole-coupled nanomagnets are also shown through strain-based "clocking". This is the first experimental demonstration of strain-based switching in nanomagnets and clocking of nanomagnetic logic (Boolean NOT gate), as well as logic propagation in an array of nanomagnets.
|
97 |
REGTEST - an Automatic & Adaptive GUI Regression Testing Tool.Forsgren, Robert, Petersson Vasquez, Erik January 2018 (has links)
Software testing is something that is very common and is done to increase the quality of and confidence in a software. In this report, an idea is proposed to create a software for GUI regression testing which uses image recognition to perform steps from test cases. The problem that exists with such a solution is that if a GUI has had changes made to it, then many test cases might break. For this reason, REGTEST was created which is a GUI regression testing tool that is able to handle one type of change that has been made to the GUI component, such as a change in color, shape, location or text. This type of solution is interesting because setting up tests with such a tool can be very fast and easy, but one previously big drawback of using image recognition for GUI testing is that it has not been able to handle changes well. It can be compared to tools that use IDs to perform a test where the actual visualization of a GUI component does not matter; It only matters that the ID stays the same; however, when using such tools, it either requires underlying knowledge of the GUI component naming conventions or the use of tools which automatically constructs XPath queries for the components. To verify that REGTEST can work as well as existing tools a comparison was made against two professional tools called Ranorex and Kantu. In those tests, REGTEST proved very successful and performed close to, or better than the other software.
|
98 |
Video Recommendation Based on Object DetectionNyberg, Selma January 2018 (has links)
In this thesis, various machine learning domains have been combined in order to build a video recommender system that is based on object detection. The work combines two extensively studied research fields, recommender systems and computer vision, that also are rapidly growing and popular techniques on commercial markets. To investigate the performance of the approach, three different content-based recommender systems have been implemented at Spotify, which are based on the following video features: object detections, titles and descriptions, and user preferences. These systems have then been evaluated and compared against each other together with their hybridized result. Two algorithms have been implemented, the prediction and the top-N algorithm, where the former is the more reliable source for evaluating the system's performance. The evaluation of the system shows that the overall performance scores for predicting values of the users' liked and disliked videos are in the range from about 40 % to 70 % for the prediction algorithm and from about 15 % to 70 % for the top-N algorithm. The approach based on object detection performs worse in comparison to the other approaches. Hence, there seems to be is a low correlation between the user preferences and the video contents in terms of object detection data. Therefore, this data is not very suitable for describing the content of videos and using it in the recommender system. However, the results of this study cannot be generalized to apply for other systems before the approach has been evaluated in other environments and for various data sets. Moreover, there are plenty of room for refinements and improvements to the system, as well as there are many interesting research areas for future work.
|
99 |
Méthodes fréquentielles pour la reconnaissance d'images couleur : une approche par les algèbres de Clifford / Frequency methods for color image recognition : An approach based on Clifford algebrasMennesson, José 18 November 2011 (has links)
Dans cette thèse, nous nous intéressons à la reconnaissance d’images couleur à l’aide d’une nouvelle approche géométrique du domaine fréquentiel. La plupart des méthodes existantes ne traitent que les images en niveaux de gris au travers de descripteurs issus de la transformée de Fourier usuelle. L’extension de telles méthodes aux images multicanaux, comme par exemple les images couleur, consiste généralement à reproduire un traitement identique sur chacun des canaux. Afin d’éviter ce traitement marginal, nous étudions et mettons en perspective les différentes généralisations de la transformée de Fourier pour les images couleur. Ce travail nous oriente vers la transformée de Fourier Clifford pour les images couleur définie dans le cadre des algèbres géométriques. Une étude approfondie de celle-ci nous conduit à définir un algorithme de calcul rapide et à proposer une méthode de corrélation de phase pour les images couleur. Dans un deuxième temps, nous cherchons à généraliser à travers cette transformée de Fourier les définitions des descripteurs de Fourier de la littérature. Nous étudions ainsi les propriétés, notamment l’invariance à la translation, rotation et échelle, des descripteurs existants. Ce travail nous mène à proposer trois nouveaux descripteurs appelés “descripteurs de Fourier couleur généralisés”(GCFD) invariants en translation et en rotation.Les méthodes proposées sont évaluées sur des bases d’images usuelles afin d’estimer l’apport du contenu fréquentiel couleur par rapport aux méthodes niveaux de gris et marginales. Les résultats obtenus à l’aide d’un classifieur SVM montrent le potentiel des méthodes proposées ; les descripteurs GCFD se révèlent être plus compacts, de complexité algorithmique moindre pour des performances de classification au minimum équivalentes. Nous proposons également des heuristiques pour le choix du paramètre de la transformée de Fourier Clifford.Cette thèse constitue un premier pas vers une généralisation des méthodes fréquentielles aux images multicanaux. / In this thesis, we focus on color image recognition using a new geometric approach in the frequency domain. Most existing methods only process grayscale images through descriptors defined from the usual Fourier transform. The extension of these methods to multichannel images such as color images usually consists in reproducing the same processing for each channel. To avoid this marginal processing,we study and compare the different generalizations of color Fourier transforms. This work leads us to use the Clifford Fourier transform for color images defined in the framework of geometric algebra. A detailed study of it leads us to define a fast algorithm and to propose a phase correlation for colorimages. In a second step, with the aim of generalizing Fourier descriptors of the literature with thisFourier transform, we study their properties, including invariance to translation, rotation and scale.This work leads us to propose three new descriptors called “generalized color Fourier descriptors”(GCFD) invariant in translation and in rotation.The proposed methods are evaluated on usual image databases to estimate the contribution of color frequency content compared with grayscale and marginal methods. The results obtained usingan SVM classifier show the potential of the proposed methods ; the GCFD are more compact, have less computational complexity and give better recognition rates. We also propose heuristics for choosing the parameter of the color Clifford Fourier transform.This thesis is a first step towards a generalization of frequency methods to multichannel images.
|
100 |
Charakterizace chodců ve videu / Pedestrian Attribute AnalysisStudená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
|
Page generated in 0.0904 seconds