• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • 1
  • Tagged with
  • 6
  • 6
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

The Link Between Image Segmentation and Image Recognition

Sharma, Karan 01 January 2012 (has links)
A long standing debate in computer vision community concerns the link between segmentation and recognition. The question I am trying to answer here is, Does image segmentation as a preprocessing step help image recognition? In spite of a plethora of the literature to the contrary, some authors have suggested that recognition driven by high quality segmentation is the most promising approach in image recognition because the recognition system will see only the relevant features on the object and not see redundant features outside the object (Malisiewicz and Efros 2007; Rabinovich, Vedaldi, and Belongie 2007). This thesis explores the following question: If segmentation precedes recognition, and segments are directly fed to the recognition engine, will it help the recognition machinery? Another question I am trying to address in this thesis is of scalability of recognition systems. Any computer vision system, concept or an algorithm, without exception, if it is to stand the test of time, will have to address the issue of scalability.
2

Algorithmes d'extraction de modèles géométriques discrets pour la représentation robuste des formes / Recognition algorithms of digital geometric patterns for robust shape representation

Roussillon, Tristan 19 November 2009 (has links)
Cette thèse se situe à l'interface entre l'analyse d'images, dont l'objectif est la description automatique du contenu visuel, et la géométrie discrète, qui est l'un des domaines dédiés au traitement des images numériques. Pour être stocké et manipulé sur un ordinateur, un signal observé est régulièrement échantillonné. L'image numérique, qui est le résultat de ce processus d'acquisition, est donc constituée d'un ensemble fini d'éléments distincts. La géométrie discrète se propose d'étudier les propriétés géométriques d'un tel espace dépourvu de continuité. Dans ce cadre, nous avons considéré les régions homogènes et porteuses de sens d'une image, avec l'objectif de représenter leur contour au moyen de modèles géométriques ou de les décrire à l'aide de mesures. L'étendue des applications de ce travail en analyse d'images est vaste, que ce soit au cours du processus de segmentation, ou en vue de la reconnaissance d'un objet. Nous nous sommes concentrés sur trois modèles géométriques discrets définis par la discrétisation de Gauss : la partie convexe ou concave, l'arc de cercle discret et le segment de droite discrète. Nous avons élaboré des algorithmes dynamiques (mise à jour à la volée de la décision et du paramétrage), exacts (calculs en nombres entiers sans erreur d'approximation) et rapides (calculs simplifiés par l'exploitation de propriétés arithmétiques et complexité en temps linéaire) qui détectent ces modèles sur un contour. L'exécution de ces algorithmes le long d'un contour aboutit à des décompositions ou à des polygonalisations réversibles. De plus, nous avons défini des mesures de convexité, linéarité et circularité, qui vérifient un ensemble de propriétés fondamentales : elles sont robustes aux transformations rigides, elles s'appliquent à des parties de contour et leur valeur maximale est atteinte pour le modèle de forme qui sert de comparaison et uniquement sur celui-ci. Ces mesures servent à l'introduction de nouveaux modèles dotés d'un paramètre variant entre 0 et 1. Le paramètre est fixé à 1 quand on est sûr de la position du contour, mais fixé à une valeur inférieure quand le contour est susceptible d'avoir été déplacé par un bruit d'acquisition. Cette approche pragmatique permet de décomposer de manière robuste un contour en segments de droite ou en parties convexes et concaves. / The work presented in this thesis concerns the fields of image analysis and discrete geometry. Image analysis aims at automatically describing the visual content of a digital image and discrete geometry provides tools devoted to digital image processing. A two-dimensional analog signal is regularly sampled in order to be handled on computers. This acquisition process results in a digital image, which is made up of a finite set of discrete elements. The topic of discrete geometry is to study the geometric properties of such kind of discrete spaces. In this work, we consider homogeneous regions of an image having a meaning for a user. The objective is to represent their digital contour by means of geometric patterns and compute measures. The scope of applications is wide in image analysis. For instance, our results would be of great interest for segmentation or object recognition. We focus on three discrete geometric patterns defined by Gauss digitization: the convex or concave part, the digital straight segment and the digital circular arc. We present several algorithms that detect or recognize these patterns on a digital contour. These algorithms are on-line, exact (integer-only computations without any approximation error) and fast (simplified computations thanks to arithmetic properties and linear-time complexity). They provide a way for segmenting a digital contour or for representing a digital contour by a reversible polygon. Moreover, we define a measure of convexity, a measure of straightness and a measure of circularity. These measures fulfil the following important properties: they are robust to rigid transformations, they may be applied on any part of a digital contour, they reach their maximal value for the template with which the data are compared to. From these measures, we introduce new patterns having a parameter that ranges from 0 to 1. The parameter is set to 1 when the localisation of the digital contour is reliable, but is set to a lower value when the digital contour is expected to have been shifted because of some acquisition noise. This measure-based approach provides a way for robustly decomposing a digital contour into convex, concave or straight parts.
3

Identifikation sensibler anatomischer Strukturen in der roboterassistierten Kolorektalchirurgie mittels maschineller Lernverfahren

Rinner, Franziska Maria 05 February 2025 (has links)
Das Kolorektale Karzinom zählt zu den häufigsten malignen Erkrankungen und ist verbunden mit einer hohen Letalität. Die Therapie des KRK beinhaltet häufig chirurgische Herangehensweisen, welche sich in den letzten Jahren zunehmend in Richtung minimalinvasiver und roboterassistierter Verfahren entwickeln. Die operative Versorgung bringt, auch aufgrund der engen anatomischen Lagebeziehungen, ein Verletzungsrisiko für verschiedene anatomische Strukturen mit sich. Hierzu zählen neben seltenen Organverletzungen auch Schädigungen der Nerven, welche in die Lebensqualität oft stark beeinträchtigenden urogenitalen Dysfunktionen resultieren können. Daraus können Komplikationen folgen, deren Vermeidung stets angestrebt wird. Maschinelle Lernverfahren erleben in den letzten Jahren eine zunehmende Popularität und werden auf eine wachsende Vielzahl von Arbeitsbereichen mit Erfolg angewendet. Darunter findet sich auch die Medizin. Da die mangelhafte Erkennung von abdominellen Strukturen einen relevanten Risikofaktor verschiedener Operationen darstellt, der einen großen Einfluss auf die Prognose und Lebensqualität der Patient*innen hat, bietet sich hier großes Potential. An dieser Stelle soll durch eine KI-basierte Assistenz eine Lücke geschlossen werden, deren Relevanz bisher noch nicht ausreichend untersucht wurde. Auch wenn sich bereits zeigte, dass maschinelle Lernverfahren das Potential haben, optisch differenzierbare Strukturen im chirurgischen Kontext zu erkennen, bleibt die klinische Bedeutung dessen bislang unklar. In dieser Arbeit wird untersucht, inwiefern die Erkennung anatomischer Strukturen durch Bilderkennungsalgorithmen auf intraoperativem Bildmaterial möglich ist. Dies ist eine notwendige Grundlage für die Entwicklung weiterführender Technologien für die Erleichterung von operativen Eingriffen, die Vermeidung von Komplikationen oder die Erkennung morphologisch sichtbarer Pathologien. Hierbei soll sich künftig nicht nur auf die Anwendung in roboterassistierten Rektum- bzw. Sigmaresektionen beschränkt werden, sondern eine Anwendung für alle minimalinvasiven OPs ermöglicht werden. Damit können Kosten und Dauer von Eingriffen ebenso sinken wie die kognitiven Anforderungen an die Operateur*innen und eine Verbesserung der postoperativen Ergebnisse und Lebensqualität für die Patient*innen erreicht werden. Es wurden 43 zwischen Februar 2019 und März 2021 an der Klinik für Viszeral-, Thorax- und Gefäßchirurgie des Universitätsklinikums Carl Gustav Carus in Dresden durchgeführte roboterassistierte Sigma- bzw. Rektumresektionen und -exstirpationen einbezogen. Diese wurden hinsichtlich der Sichtbarkeit 6 verschiedener anatomischer Strukturen untersucht. Für die Kategorien Leber, Magen, Milz, Nerven, Pankreas und Ureter wurden jeweils zwischen 1023 und 1754 Einzelbilder aus 18 bis 23 OPs verwendet. Damit wurden sowohl anatomische Strukturen geringerer als auch höchster Komplexität betrachtet. Auf den Schritt der temporalen Annotation folgte nach der Einzelbildextraktion die semantische Segmentierung eines jeden Bildes. Dabei wurden alle Bereiche, in denen die jeweilige Struktur zu sehen ist, in ihren exakten Grenzen markiert. Diese segmentierten Bilder stellten die Grundlage für den anschließenden deep learning Prozess mittels eines CNNs dar. Das Resultat dessen war für jede Struktur ein Bilderkennungsalgorithmus, der sie automatisiert erkennen und semantische Segmentierungen anfertigen kann. Die Evaluation der Erkennungsleistung erfolgte mittels Intersection over Union, F1- Score, Precision Score, Recall Score, Specificity und Accuracy. Des Weiteren wurde die Performance des Algorithmus am Beispiel von 35 Bildern, von denen 16 das Pankreas zeigten, hinsichtlich der IoU für dieses Segment mit derjenigen von 28 Proband*innen verschiedener Wissens- und Ausbildungsstände verglichen. Dabei wurde das Konzept der Bounding box–Annotation verwendet. Die Ergebnisse der erarbeiteten Bilderkennungsalgorithmen bewegten sich für die 6 untersuchten Strukturen bei einer durchschnittlichen IoU, welche den Grad an Überlappung zweier Segmente beschreibt, zwischen 0,744 ± 0,275 und 0,255 ± 0,147. In der klinischen Evaluation wurden durch den auf das Pankreas trainierten Algorithmus Ergebnisse erzielt, die sich im Vergleich mit den 28 Proband*innen mit 0,31 an zweiter Stelle hinter einer Person von höchster Expertise eingliederten. Die von den Teilnehmenden erzielte durchschnittliche IoU betrug 0,100 ± 0,097. Die in dieser Arbeit erreichten Ergebnisse stellen einen guten Ausgangspunkt für die Weiterentwicklung KI-basierter Assistenzfunktionalitäten für den chirurgischen Alltag dar. Trotz einiger Limitationen hinsichtlich der Generalisierbarkeit einer eher kleinen und monozentrischen Untersuchung und Verbesserungspotential bezüglich der Generierung der Segmentierungen konnte ein hochwertiger Datensatz erarbeitet und publiziert werden. Insgesamt betrachtet lässt sich aus den Ergebnissen der darauf basierenden Bilderkennungsalgorithmen schließen, dass Methoden künstlicher Intelligenz bereits jetzt das Potential haben, viele Organe sehr gut und zuverlässig zu erkennen und abzugrenzen. Dadurch besitzen sie das Potential einer relevanten Unterstützung im chirurgischen Alltag. Bis zum Erreichen des Ziels einer klinischen Anwendung sind jedoch noch einige Schritte zu gehen, insbesondere die Übertragung des Erreichten auf Bewegtbildmaterial steht hierbei im Vordergrund. Zudem zeigte sich, dass die etablierten Metriken nicht ideal geeignet sind, um die klinische Relevanz der Prädiktionen abzubilden. Damit ist es notwendig, weiterhin über geeignete Metriken für die praktische Anwendung zu diskutieren und an der Entwicklung neuer Maßzahlen zu arbeiten. Allerdings kann bereits jetzt eine wertvolle Hilfestellung, besonders für Personen ohne langjährige chirurgische Erfahrung, geleistet werden. Dies stellt eine hervorragende Grundlage für die Weiterentwicklung der Algorithmen und deren Implementierung als Grundlage weiterführender Technologien im Bereich der intraoperativen Assistenzsysteme dar. / Colorectal cancer (CRC) is one of the most common malignancies and is associated with a high mortality rate. The treatment of CRC often involves surgical approaches, which in recent years have increasingly evolved towards minimally invasive and robotic-assisted procedures. Surgical treatment entails a risk of injury to various anatomical structures, in part due to the narrow surgical field and close anatomical positioning. In addition to rare organ injuries these include the more common nerve lesions which often result in urogenital dysfunction affecting patients' quality of life tremendously. Consequently, complications can occur and the aim is always to avoid them. In recent years, machine learning techniques have become increasingly popular and are being successfully applied in a growing variety of fields. One such area is medical applications. Since the inadequate detection of anatomical structures represents a relevant risk factor in various surgical procedures with a high impact on patient prognosis and quality of life, there is great potential here. At this point, AI-based assistance aims to fill a gap whose practical relevance has not yet been sufficiently investigated. Although machine learning has already been shown to identify optically differentiable structures in a surgical context, its clinical significance remains unclear to date. This work examines the possibility of using image recognition algorithms to identify anatomical structures on intraoperative image material. This research is fundamental to the development of future technologies that facilitate surgical interventions, reduce the likelihood of complications or identify morphologically visible organ pathologies. This technique is intended to be used in the widespread field of minimally invasive surgery, rather than being limited to robot-assisted rectal and sigmoid resections. This could result in a decrease in the expenses and duration of surgery, as well as a reduction in the cognitive demands on the surgeons. Additionally, it could lead to improvements in post-operative outcomes and quality of life for patients. Between February 2019 and March 2021, 43 robot-assisted rectal and sigmoid resections and extirpations performed at the Clinic for Visceral, Thoracic and Vascular Surgery of the University Hospital Carl Gustav Carus in Dresden were included in this work. The surgery recordings were examined with regard to the visibility of 6 different anatomical structures. For the categories liver, stomach, spleen, nerves, pancreas and ureter, between 1023 and 1754 individual frames from 18 to 23 surgeries were used in each case. Thus, both anatomical structures of lower as well as highest complexity were considered. After performing temporal annotation and extracting single frames, each frame underwent semantic segmentation. In this step, all areas displaying the respective structure were marked with their exact boundaries. The resulting segmented images provided as input to the subsequent deep learning process using a CNN. As a result, we have obtained an image recognition algorithm for each structure considered capable of automatic detection and semantic segmentation. The recognition performance was evaluated using metrics including Intersection over Union, F1- Score, Precision Score, Recall Score, Specificity and Accuracy. Additionally, the algorithm's performance was compared with the organ recognition skills of 28 individuals with varying levels of medical knowledge and training. A clinical evaluation was performed using the example of the pancreas, which had to be highlighted by bounding boxes. For this purpose, a sample of 35 images, 16 of which included the pancreas, was used to examine the IoU for this segment. The developed image recognition algorithms produced results regarding the average IoU, which describes the degree of overlap between two segments, ranging from 0.744 ± 0.275 to 0.255 ± 0.147 for the 6 anatomical structures investigated. During the clinical validation, the algorithm's results for the generation of bounding boxes for the segment pancreas achieved the second-highest score, at an IoU of 0.31. This ranking placed it second among the 28 individuals, surpassed only by a single person with the highest level of expertise. The average IoU obtained by the participants was 0.100 ± 0.097. The results of this work provide a good starting point for the development of further AI-based assistance functionalities for everyday surgical practice. Despite some limitations regarding the generalisability of a rather small and monocentric study and potential for improvement in the generation of the segmentations, a high quality dataset has been compiled and published. Overall, the resulting image recognition algorithms' outcomes indicate that AI techniques already have the potential to detect and differentiate a lot of organs very well and dependably. As a consequence, they have the capacity to offer significant assistance to the surgical routine. Before moving to clinical application, several additional steps need to be taken, with particular emphasis on the crucial process of transferring what has been achieved to moving image material. Furthermore, it has become evident that the established metrics are not entirely capable of representing the clinical value of the predictions. Thus, it is necessary to have further discussions regarding appropriate metrics for practical implementation and to focus on the development of new measures. However, even now they can provide a precious assistance, particularly for individuals without extensive surgical training. This signifies an excellent foundation for advancing the algorithms and implementing them as the basis for future technologies in the field of intra-operative assistance systems.
4

A Study Of Utility Of Smile Profile For Face Recognition

Bhat, Srikrishna K K 08 1900 (has links)
Face recognition is one of the most natural activities performed by the human beings. It has wide range of applications in the areas of Human Computer Interaction, Surveillance, Security etc. Face information of people can be obtained in a non-intrusive manner, without violating privacy. But, robust face recognition which is invariant under varying pose, illumination etc is still a challenging problem. The main aim of this thesis is to explore the usefulness of smile profile of human beings as an extra aid in recognizing people by faces. Smile profile of a person is the sequence of images captured by a camera when the person voluntarily smiles. Using sequence of images instead of a single image will increase the required computational resources significantly. The challenge here is to design a feature extraction technique from a smile sample, which is useful for authentication and is also efficient in terms of storage and computational aspects. There are some experimental evidences which support the claim that facial expressions have some person specific information. But, to the best of our knowledge, systematic study of a particular facial expression for biometrical purposes has not been done so far. The smile profile of human beings, which is captured under some reasonably controlled setup, is used for first time for face recognition purpose. As a first step, we applied two of the recent subspace based face classifiers on the smile samples. We were not able to obtain any conclusive results out of this experiment. Next we extracted features using only the difference vectors obtained from smile samples. The difference vectors depend only on the variations which occur in the corresponding smile profile. Hence any characterization we obtain from such features can be fully attributed to the smiling action. The feature extraction technique we employed is very much similar to PCA. The smile signature that we have obtained is named as Principal Direction of Change(PDC). PDC is a unit vector (in some high dimensional space) which represents the direction in which the major changes occurred during the smile. We obtained a reasonable recognition rate by applying Nearest Neighbor Classifier(NNC) on these features. In addition to that, these features turn out to be less sensitive to the speed of smiling action and minor variations in face detection and head orientation, while capturing the pattern of variations in various regions of face due to smiling action. Using set of experiments on PDC based features we establish that smile has some person specific characteristics. But the recognition rates of PDC based features are less than the recent conventional techniques. Next we have used PDC based features to aid a conventional face classifier. We have used smile signatures to reject some candidate faces. Our experiments show that, using smile signatures, we can reject some of the potential false candidate faces which would have been accepted by the conventional face classifier. Using this smile signature based rejection, the performance of the conventional classifier is improved significantly. This improvement suggests that, the biometric information available in smile profiles does not exist in still images. Hence the usefulness of smile profiles for biometric applications is established through this experimental investigation.
5

Joint Evaluation Of Multiple Speech Patterns For Speech Recognition And Training

Nair, Nishanth Ulhas 01 1900 (has links)
Improving speech recognition performance in the presence of noise and interference continues to be a challenging problem. Automatic Speech Recognition (ASR) systems work well when the test and training conditions match. In real world environments there is often a mismatch between testing and training conditions. Various factors like additive noise, acoustic echo, and speaker accent, affect the speech recognition performance. Since ASR is a statistical pattern recognition problem, if the test patterns are unlike anything used to train the models, errors are bound to occur, due to feature vector mismatch. Various approaches to robustness have been proposed in the ASR literature contributing to mainly two topics: (i) reducing the variability in the feature vectors or (ii) modify the statistical model parameters to suit the noisy condition. While some of those techniques are quite effective, we would like to examine robustness from a different perspective. Considering the analogy of human communication over telephones, it is quite common to ask the person speaking to us, to repeat certain portions of their speech, because we don't understand it. This happens more often in the presence of background noise where the intelligibility of speech is affected significantly. Although exact nature of how humans decode multiple repetitions of speech is not known, it is quite possible that we use the combined knowledge of the multiple utterances and decode the unclear part of speech. Majority of ASR algorithms do not address this issue, except in very specific issues such as pronunciation modeling. We recognize that under very high noise conditions or bursty error channels, such as in packet communication where packets get dropped, it would be beneficial to take the approach of repeated utterances for robust ASR. In this thesis, we have formulated a set of algorithms for both joint evaluation/decoding for recognizing noisy test utterances as well as utilize the same formulation for selective training of Hidden Markov Models (HMMs), again for robust performance. We first address joint recognition of multiple speech patterns given that they belong to the same class. We formulated this problem considering the patterns as isolated words. If there are K test patterns (K ≥ 2) of a word by a speaker, we show that it is possible to improve the speech recognition accuracy over independent single pattern evaluation of test speech, for the case of both clean and noisy speech. We also find the state sequence which best represents the K patterns. This formulation can be extended to connected word recognition or continuous speech recognition also. Next, we consider the benefits of joint multi-pattern likelihood for HMM training. In the usual HMM training, all the training data is utilized to arrive at a best possible parametric model. But, it is possible that the training data is not all genuine and therefore may have labeling errors, noise corruptions, or plain outlier exemplars. Such outliers will result in poorer models and affect speech recognition performance. So it is important to selectively train them so that the outliers get a lesser weightage. Giving lesser weight to an entire outlier pattern has been addressed before in speech recognition literature. However, it is possible that only some portions of a training pattern are corrupted. So it is important that only the corrupted portions of speech are given a lesser weight during HMM training and not the entire pattern. Since in HMM training, multiple patterns of speech from each class are used, we show that it is possible to use joint evaluation methods to selectively train HMMs such that only the corrupted portions of speech are given a lesser weight and not the entire speech pattern. Thus, we have addressed all the three main tasks of a HMM, to jointly utilize the availability of multiple patterns belonging to the same class. We experimented the new algorithms for Isolated Word Recognition in the case of both clean speech and noisy speech. Significant improvement in speech recognition performance is obtained, especially for speech affected by transient/burst noise.
6

Learning from biometric distances: Performance and security related issues in face recognition systems

Mohanty, Pranab 01 June 2007 (has links)
We present a theory for constructing linear, black box approximations to face recognition algorithms and empirically demonstrate that a surprisingly diverse set of face recognition approaches can be approximated well using a linear model. The construction of the linear model to a face recognition algorithm involves embedding of a training set of face images constrained by the distances between them, as computed by the face recognition algorithm being approximated. We accomplish this embedding by iterative majorization, initialized by classical multi-dimensional scaling (MDS). We empirically demonstrate the adequacy of the linear model using six face recognition algorithms, spanning both template based and feature based approaches on standard face recognition benchmarks such as the Facial Recognition Technology (FERET) and Face Recognition Grand Challenge (FRGC) data sets. The experimental results show that the average Error in Modeling for six algorithms is 6.3% at 0.001 False Acceptance Rate (FAR), for FERET fafb probe set which contains maximum number of subjects among all the probe sets. We demonstrate the usefulness of the linear model for algorithm dependent indexing of face databases and find that it results in more than 20 times reduction in face comparisons for Bayesian Intra/Extra-class person classifier (BAY), Elastic Bunch Graph Matching algorithm (EBGM), and the commercial face recognition algorithms. We also propose a novel paradigm to reconstruct face templates from match scores using the linear model and use the reconstructed templates to explore the security breach in a face recognition system. We evaluate the proposed template reconstruction scheme using three, fundamentally different, face recognition algorithms: Principal Component Analysis (PCA), Bayesian Intra/Extra-class person classifier (BAY), and a feature based commercial algorithm. With an operational point set at 1% False Acceptance Rate (FAR) and 99% True Acceptance Rate (TAR) for 1196 enrollments (FERET gallery), we show that at most 600 attempts (score computations) are required to achieve 73%, 72% and 100% chance of breaking in as a randomly chosen target subject for the commercial, BAY and PCA based face recognition system, respectively. We also show that the proposed reconstruction scheme has 47% more probability of breaking in as a randomly chosen target subject for the commercial system as compared to a hill climbing approach with the same number of attempts.

Page generated in 0.1063 seconds