251 |
Facing the Hard Problems in FGVCAnderson, Connor Stanley 29 July 2020 (has links)
In fine-grained visual categorization (FGVC), there is a near-singular focus in pursuit of attaining state-of-the-art (SOTA) accuracy. This work carefully analyzes the performance of recent SOTA methods, quantitatively, but more importantly, qualitatively. We show that these models universally struggle with certain "hard" images, while also making complementary mistakes. We underscore the importance of such analysis, and demonstrate that combining complementary models can improve accuracy on the popular CUB-200 dataset by over 5%. In addition to detailed analysis and characterization of the errors made by these SOTA methods, we provide a clear set of recommended directions for future FGVC researchers.
|
252 |
Image Embedding into Generative Adversarial NetworksAbdal, Rameen 14 April 2020 (has links)
We propose an e cient algorithm to embed a given image into the latent space of
StyleGAN. This embedding enables semantic image editing operations that can be
applied to existing photographs. Taking the StyleGAN trained on the FFHQ dataset
as an example, we show results for image morphing, style transfer, and expression
transfer. Studying the results of the embedding algorithm provides valuable insights
into the structure of the StyleGAN latent space. We propose a set of experiments
to test what class of images can be embedded, how they are embedded, what latent
space is suitable for embedding, and if the embedding is semantically meaningful.
|
253 |
Deep Learning for Crack-Like Object DetectionZhang, Kaige 01 August 2019 (has links)
Cracks are common defects on surfaces of man-made structures such as pavements, bridges, walls of nuclear power plants, ceilings of tunnels, etc. Timely discovering and repairing of the cracks are of great significance and importance for keeping healthy infrastructures and preventing further damages. Traditionally, the cracking inspection was conducted manually which was labor-intensive, time-consuming and costly. For example, statistics from the Central Intelligence Agency show that the world’s road network length has reached 64,285,009 km, of which the United States has 6,586,610 km. It is a huge cost to maintain and upgrade such an immense road network. Thus, fully automatic crack detection has received increasing attention.
With the development of artificial intelligence (AI), the deep learning technique has achieved great success and has been viewed as the most promising way for crack detection. Based on deep learning, this research has solved four important issues existing in crack-like object detection. First, the noise problem caused by the textured background is solved by using a deep classification network to remove the non-crack region before conducting crack detection. Second, the computational efficiency is highly improved. Third, the crack localization accuracy is improved. Fourth, the proposed model is very stable and can be used to deal with a wide range of crack detection tasks. In addition, this research performs a preliminary study about the future AI system, which provides a concept that has potential to realize fully automatic crack detection without human’s intervention.
|
254 |
Contributions à l'analyse d'images médicales pour la reconnaissance du cancer du sein / Contributions to medical images analysis for breast cancer recognitionGoubalan, Sègbédji Rethice Théophile Junior 09 December 2016 (has links)
Le diagnostic assisté par ordinateur du cancer du sein suscite de plus en plus un réel engouement en raison de la quantité sans cesse croissante d'images mammographiques issues des campagnes de dépistage. L'accent est mis sur les opacités mammaires en raison du risque élevé de cancer qui leur est associé. En effet, la variabilité des formes rencontrées et la difficulté à discerner les masses surtout quand ces dernières sont embarquées dans des densités importantes exigent une nouvelle stratégie plutôt adaptée aux cas les plus complexes à savoir les masses appartenant aux classes BI-RADS IV et V, c-à-d. respectivement les masses malignes spiculées et les distorsions architecturales. Dans ce travail, un système de diagnostic assisté par ordinateur entièrement automatique et conçu pour la segmentation et la classification des opacités dans les catégories bénigne/maligne ou graisseuse/dense, spécifiquement pour celles de type BI-RADS IV et V est abordé. Dans un premier temps, nous avons développé une approche de pré-traitement des images fondée sur l'apprentissage d'un dictionnaire parcimonieux sur les bases d'images, combiné à une réduction de dimension afin de supprimer de façon efficace et rapide le bruit de numérisation des images mammographiques présentes dans les bases utilisées pour concevoir notre système de diagnostic en comparaison des approches déjà existantes. Une fois les images pré-traitées, nous avons mis en place une procédure de segmentation non-supervisée des masses basée sur les champs de Markov et qui a l'avantage d'être à la fois plus rapide, plus efficace et plus robuste que les meilleures techniques de segmentation disponibles dans l'état-de-l'art. De plus, la méthode proposée s'affranchit de la variabilité des masses et ce quelque soit la densité de l'image. Dans l'idée de décrire convenablement les lésions malignes spiculées, nous avons conçu une méthode de segmentation des spicules qui présente la particularité de ne pas recourir à l'utilisation de descripteurs extraits manuellement dont les performances peuvent varier en fonction de leur qualité. L'approche proposée repose sur des hypothèses que nous avons formulées concernant l'aspect des spicules. Celles-ci nous ont conduits à développer un modèle Markovien combiné à une transformée de Radon locale pour extraire les structures curvilignes de l'image. Ensuite, nous servant d'un modèle a contrario, nous avons pu extraire les spicules de l'ensemble des structures détectées. Cette phase, vient clore la première partie de la conception de notre système, qui est en mesure d'extraire soit des masses spiculées, soit des distorsions architecturales. Afin de finaliser sa conception, nous avons procédé à la création d'un modèle d'aide à la décision qui, à l'inverse de ce qui s'est toujours fait dans l'état-de-l'art pour la discrimination des masses, procède à une extraction non-supervisée des descripteurs à l'aide d'une méthode issue du Deep learning, à savoir les réseaux de neurones à convolution. Les descripteurs extraits, sont ensuite utilisés dans un classifieur SVM pour apprendre un modèle. Ce modèle servira par la suite à la reconnaissance du cancer du sein. Les résultats obtenus pour chacune des étapes du système de diagnostic sont très intéressants et viennent combler un vide important dans la classification des masses en général et dans la distinction des masses malignes entre elles en particulier en se fondant sur trois niveaux de décision que sont la forme, la densité et les spicules. / Computer-aided diagnosis of breast cancer is raising increasingly a genuine enthusiasm because of the ever-increasing quantity of mammographic images from breast cancer screening campaigns. The focus is on breast masses due to the high risk of cancer associated with them. Indeed, the variability of shape encountered and the difficulty to discern the masses especially when theyare embedded in a high density require a new approach especially suited for the most complex cases namely the masses which belong to classes BI-RADS IV and V, i.e. spiculated breast mass and architectural distortion. In this work, a fully automatic computer-aided diagnosis system is designed for the segmentation and classification of breast mass especially for malignant masses of classes BI-RADS IV and BI-RADS V. Initially, we developped a pre-processing method combined with the reduction of the dictionary size in order to remove effectively and quickly the digitization noise of the mammographic images that make up the database used to design our computer-aided diagnosis system in comparison with the existing approaches. After the image pre-processing, we haveproposed an unsupervised segmentation method based on a Markov random field which has the advantage of being faster, more efficient and more robust than the state-of-art segmentation methods. Furthermore, the proposed method overcomes the variability of the breast masses whatever the image density. In purpose to describe correctly the spiculated malignant lesions, we proposed anapproach which avoid the computation and extraction of local features, and to rely on general-purpose classification procedures whose performance and computational efficiency can greatly vary depending on design and image characteristics. The proposed method is based on several assumptions on the structure of spicules as they appear in mammograms which have been reported in the literature. In order to make use of the above assumptions, the proposed method proceeds the following steps: first the mammogram is separated into patches onto which the curvilinear structures are discretized into segments due to Radon transform. Then, Markov modeling and contextual information are used to refine the segment positions and associate segments into curvilinear structures. Finally, spicules are detected based on a contrario model. This stage conclude the first part of the design of our computer-aided diagnosis system, that is able to extract both spiculated masses and architectural distortion. In order to complete the design of the diagnosis system, we carried out the creation of a decision support model which, contrary to what has always been done in the state-of-art for discrimination of the masses, conducts an unsupervised extraction of features through Deep learning approach - namely convolutional artificial neural networks -, combined with an SVM-type classifier. The obtained model is then stored and used as a classifier for breast cancer recognition tasks during the generalization phase. The results obtained for each step of the design of our system are very interesting and come to fill an important gap in the distinction of different type of malignant masses.
|
255 |
Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels / Deep learning for multimodal and temporal contents analysisVielzeuf, Valentin 19 November 2019 (has links)
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre certaines tâches, il est donc pertinent d’utiliser différentes modalités, telles que le son ou l’image.Cette thèse s’intéresse à cette notion dans le cadre de l’apprentissage neuronal profond. Pour cela, elle cherche à répondre à une problématique en particulier : comment fusionner les différentes modalités au sein d’un réseau de neurones ?Nous proposons tout d’abord d’étudier un problème d’application concret : la reconnaissance automatique des émotions dans des contenus audio-visuels.Cela nous conduit à différentes considérations concernant la modélisation des émotions et plus particulièrement des expressions faciales. Nous proposons ainsi une analyse des représentations de l’expression faciale apprises par un réseau de neurones profonds.De plus, cela permet d’observer que chaque problème multimodal semble nécessiter l’utilisation d’une stratégie de fusion différente.C’est pourquoi nous proposons et validons ensuite deux méthodes pour obtenir automatiquement une architecture neuronale de fusion efficace pour un problème multimodal donné, la première se basant sur un modèle central de fusion et ayant pour visée de conserver une certaine interprétation de la stratégie de fusion adoptée, tandis que la seconde adapte une méthode de recherche d'architecture neuronale au cas de la fusion, explorant un plus grand nombre de stratégies et atteignant ainsi de meilleures performances.Enfin, nous nous intéressons à une vision multimodale du transfert de connaissances. En effet, nous détaillons une méthode non traditionnelle pour effectuer un transfert de connaissances à partir de plusieurs sources, i.e. plusieurs modèles pré-entraînés. Pour cela, une représentation neuronale plus générale est obtenue à partir d’un modèle unique, qui rassemble la connaissance contenue dans les modèles pré-entraînés et conduit à des performances à l'état de l'art sur une variété de tâches d'analyse de visages. / Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain tasks, it is therefore relevant to use different modalities, such as sound or image.This thesis focuses on this notion in the context of deep learning. For this, it seeks to answer a particular problem: how to merge the different modalities within a deep neural network?We first propose to study a problem of concrete application: the automatic recognition of emotion in audio-visual contents.This leads us to different considerations concerning the modeling of emotions and more particularly of facial expressions. We thus propose an analysis of representations of facial expression learned by a deep neural network.In addition, we observe that each multimodal problem appears to require the use of a different merge strategy.This is why we propose and validate two methods to automatically obtain an efficient fusion neural architecture for a given multimodal problem, the first one being based on a central fusion network and aimed at preserving an easy interpretation of the adopted fusion strategy. While the second adapts a method of neural architecture search in the case of multimodal fusion, exploring a greater number of strategies and therefore achieving better performance.Finally, we are interested in a multimodal view of knowledge transfer. Indeed, we detail a non-traditional method to transfer knowledge from several sources, i.e. from several pre-trained models. For that, a more general neural representation is obtained from a single model, which brings together the knowledge contained in the pre-trained models and leads to state-of-the-art performances on a variety of facial analysis tasks.
|
256 |
Assessment of acute vestibular syndrome using deep learning : Classification based on head-eye positional data from a video head-impulse testJohansson, Hugo January 2021 (has links)
The field of medicine is always evolving and one step in this evolution is the use of decision support systems like artificial intelligence. These systems open the possibility to minimize human error in diagnostics as practitioners can use objective measurements and analysis to assist with the diagnosis. In this study the focus has been to explore the possibility of using deep learning models to classify stroke, vestibular neuritis and control groups based on datafrom a video head impulse test (vHIT). This was done by pre-processing data from vHIT into features that could be used as input to an artificial neural network. Three different modelswere designed, where the first two used mean motion data describing the motion of the head and eyes and their standard deviations, and the last model used extracted parameters. The models were trained from vHIT-data from 76 control cases, 37 vestibular neuritis cases and 46 stroke cases. To get a better grasp of the differences between the groups, a comparison was made between the parameters and the mean curves. The resulting models performed to a varying degree with the first model correctly classified 77.8 % of the control cases, 55.6 % of the stroke cases and 80 % of the vestibular neuritis cases. The second model correctly classified 100 % of the control cases, 11.1 % of the stroke cases and 80.0 % of thevestibular neuritis cases. Lastly the third model correctly classified 77.8 % of the control cases, 22.2 % of the stroke cases and 100 % of the vestibular neuritis cases. The results are still insufficient when it comes to clinical use, as the stroke classification requires a higher sensitivity. This means that the cases are correctly classified and gets the urgent care they need. However, with more data and research, these methods could improve further and then provide a valuable service as decision support systems.
|
257 |
Developing Bottom-Up, Integrated Omics Methodologies for Big Data Biomarker DiscoveryKechavarzi, Bobak David 11 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The availability of highly-distributed computing compliments the proliferation of
next generation sequencing (NGS) and genome-wide association studies (GWAS)
datasets. These data sets are often complex, poorly annotated or require complex domain
knowledge to sensibly manage. These novel datasets provide a rare, multi-dimensional
omics (proteomics, transcriptomics, and genomics) view of a single sample or patient.
Previously, biologists assumed a strict adherence to the central dogma:
replication, transcription and translation. Recent studies in genomics and proteomics
emphasize that this is not the case. We must employ big-data methodologies to not only
understand the biogenesis of these molecules, but also their disruption in disease states.
The Cancer Genome Atlas (TCGA) provides high-dimensional patient data and illustrates
the trends that occur in expression profiles and their alteration in many complex disease
states.
I will ultimately create a bottom-up multi-omics approach to observe biological
systems using big data techniques. I hypothesize that big data and systems biology
approaches can be applied to public datasets to identify important subsets of genes in
cancer phenotypes. By exploring these signatures, we can better understand the role of
amplification and transcript alterations in cancer.
|
258 |
Surgical Workflow AnticipationYuan, Kun 12 January 2022 (has links)
As a non-robotic minimally invasive surgery, endoscopic surgery is one of the widely used surgeries for the medical domain to reduce the risk of infection, incisions, and the discomfort of the patient. The endoscopic surgery procedure, also named surgical workflow in this work, can be divided into different sub-phases. During the procedure, the surgeon inserts a thin, flexible tube with a video camera through a small incision or a natural orifice like the mouth or nostrils. The surgeon can utilize tiny surgical instruments while viewing organs on the computer monitor through these tubes. The surgery only allows a limited number of instruments simultaneously appearing in the body, requiring a sufficient instrument preparation method. Therefore, surgical workflow anticipation, including surgical instrument and phase anticipation, is essential for an intra-operative decision-support system. It deciphers the surgeon's behaviors and the patient's status to forecast surgical instrument and phase occurrence before they appear, supporting instrument preparation and computer-assisted intervention (CAI) systems. In this work, we investigate an unexplored surgical workflow anticipation problem by proposing an Instrument Interaction Aware Anticipation Network (IIA-Net). Spatially, it utilizes rich visual features about the context information around the instrument, i.e., instrument interaction with their surroundings. Temporally, it allows for a large receptive field to capture the long-term dependency in the long and untrimmed surgical videos through a causal dilated multi-stage temporal convolutional network. Our model enforces an online inference with reliable predictions even with severe noise and artifacts in the recorded videos. Extensive experiments on Cholec80 dataset demonstrate the performance of our proposed method exceeds the state-of-the-art method by a large margin (1.40 v.s. 1.75 for inMAE and 2.14 v.s. 2.68 for eMAE).
|
259 |
Benchmarking and Accelerating TensorFlow-based Deep Learning on Modern HPC SystemsBiswas, Rajarshi 12 October 2018 (has links)
No description available.
|
260 |
A Deep Learning Approach to Seizure Prediction with a Desirable Lead TimeHuang, Yan 23 May 2019 (has links)
No description available.
|
Page generated in 0.0974 seconds