Global ETD Search

691	Learning Based Food Image Analysis - Detection, Recognition and Segmentation Runyu Mao (15340783) 25 April 2023 (has links) <p>Advances of mobile and wearable technologies have enabled a wide range of new methods for dietary assessment and monitoring, such as active or passive capturing of food images of an eating scene. Compared to traditional methods, these approaches are less burdensome and can reduce biased measurements. Food image analysis, consists of food region detection, food category classification, and food segmentation, can benefit subsequent nutrient analysis. However, due to the different complexity levels of food images and inter-class similarity of food categories, it is challenging for an image-based food analysis system to achieve high performance outside of a lab setting.</p> <p>In this thesis, we investigate four research topics related to image-based dietary assess- ment: (1) construction of the VIPER-FoodNet dataset, (2) food recognition, (3) nutrient integrated hierarchy food classification, and (4) weakly supervised segmentation. For topic (1), we developed a learning-based method to automatically remove non-food images for dataset construction. For topic (2), we proposed a novel two-step food recognition system that consists of food localization and hierarchical food classification. For topic (3), we de- veloped a cross-domain food classification framework that integrates nutrition information to help the classification system make better mistakes. Finally, for topic (4), a weakly- supervised segmentation system is developed which only requires image-level supervision during training.</p> <p>In addition, we developed a high-quality Photoplethysmography (PPG) signal selection method for a wearable device when subjects are undergoing daily life activities, which could be used to inform the health status of the individual.</p> Signal processing computer vision detection deep network (DN)
692	SINGLE MOLECULE ANALYSIS AND WAVEFRONT CONTROL WITH DEEP LEARNING Peiyi Zhang (15361429) 27 April 2023 (has links) <p> </p> <p> Analyzing single molecule emission patterns plays a critical role in retrieving the structural and physiological information of their tagged targets, and further, understanding their interactions and cellular context. These emission patterns of tiny light sources (i.e. point spread functions, PSFs) simultaneously encode information such as the molecule’s location, orientation, the environment within the specimen, and the paths the emitted photons took before being captured by the camera. However, retrieving multiple classes of information beyond the 3D position from complex or high-dimensional single molecule data remains challenging, due to the difficulties in perceiving and summarizing a comprehensive yet succinct model. We developed smNet, a deep neural network that can extract multiplexed information near the theoretical limit from both complex and high-dimensional point spread functions. Through simulated and experimental data, we demonstrated that smNet can be trained to efficiently extract both molecular and specimen information, such as molecule location, dipole orientation, and wavefront distortions from complex and subtle features of the PSFs, which otherwise are considered too complex for established algorithms. </p> <p> Single molecule localization microscopy (SMLM) forms super-resolution images with a resolution of several to tens of nanometers, relying on accurate localization of molecules’ 3D positions from isolated single molecule emission patterns. However, the inhomogeneous refractive indices distort and blur single molecule emission patterns, reduce the information content carried by each detected photon, increase localization uncertainty, and thus cause significant resolution loss, which is irreversible by post-processing. To compensate tissue induced aberrations, conventional sensorless adaptive optics methods rely on iterative mirror-changes and image-quality metrics to compensate aberrations. But these metrics result in inconsistent, and sometimes opposite, metric responses which fundamentally limited the efficacy of these approaches for aberration correction in tissues. Bypassing the previous iterative trial-then-evaluate processes, we developed deep learning driven adaptive optics (DL-AO), for single molecule localization microscopy (SMLM) to directly infer wavefront distortion and compensate distortion near real-time during data acquisition. our trained deep neural network monitors the individual emission patterns from single molecule experiments, infers their shared wavefront distortion, feeds the estimates through a dynamic filter (Kalman), and drives a deformable mirror to compensate sample induced aberrations. We demonstrated that DL-AO restores single molecule emission patterns approaching the conditions untouched by specimen and improves the resolution and fidelity of 3D SMLM through brain tissues over 130 µm, with as few as 3-20 mirror changes.</p> Biomedical imaging Deep learning single molecule imaging adaptive optics imaging
693	Deep Learning Based Crop Row Detection Doha, Rashed Mohammad 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Detecting crop rows from video frames in real time is a fundamental challenge in the field of precision agriculture. Deep learning based semantic segmentation method, namely U-net, although successful in many tasks related to precision agriculture, performs poorly for solving this task. The reasons include paucity of large scale labeled datasets in this domain, diversity in crops, and the diversity of appearance of the same crops at various stages of their growth. In this work, we discuss the development of a practical real-life crop row detection system in collaboration with an agricultural sprayer company. Our proposed method takes the output of semantic segmentation using U-net, and then apply a clustering based probabilistic temporal calibration which can adapt to different fields and crops without the need for retraining the network. Experimental results validate that our method can be used for both refining the results of the U-net to reduce errors and also for frame interpolation of the input video stream. Upon the availability of more labeled data, we switched our approach from a semi-supervised model to a fully supervised end-to-end crop row detection model using a Feature Pyramid Network or FPN. Central to the FPN is a pyramid pooling module that extracts features from the input image at multiple resolutions. This results in the network’s ability to use both local and global features in classifying pixels to be crop rows. After training the FPN on the labeled dataset, our method obtained a mean IoU or Jaccard Index score of over 70% as reported on the test set. We trained our method on only a subset of the corn dataset and tested its performance on multiple variations of weed pressure and crop growth stages to verify that the performance does translate over the variations and is consistent across the entire dataset. computer vision machine learning data mining precision agriculture deep learning
694	Predicting Game Level Difficulty Using Deep Neural Networks / Uppskattning av spelbanors svårighetsgrad med djupa neurala nätverk Purmonen, Sami January 2017 (has links) We explored the usage of Monte Carlo tree search (MCTS) and deep learning in order to predict game level difficulty in Candy Crush Saga (Candy) measured as number of attempts per success. A deep neural network (DNN) was trained to predict moves from game states from large amounts of game play data. The DNN played a diverse set of levels in Candy and a regression model was fitted to predict human difficulty from bot difficulty. We compared our results to an MCTS bot. Our results show that the DNN can make estimations of game level difficulty comparable to MCTS in substantially shorter time. / Vi utforskade användning av Monte Carlo tree search (MCTS) och deep learning för attuppskatta banors svårighetsgrad i Candy Crush Saga (Candy). Ett deep neural network(DNN) tränades för att förutse speldrag från spelbanor från stora mängder speldata. DNN:en spelade en varierad mängd banor i Candy och en modell byggdes för att förutsemänsklig svårighetsgrad från DNN:ens svårighetsgrad. Resultatet jämfördes medMCTS. Våra resultat indikerar att DNN:ens kan göra uppskattningar jämförbara medMCTS men på substantiellt kortare tid. Deep learning neural networks machine learning Engineering and Technology Teknik och teknologier
695	Reconstruction and recommendation of realistic 3D models using cGANs / Rekonstruktion och rekommendation av realistiska 3D-modeller som använder cGANs Villanueva Aylagas, Mónica January 2018 (has links) Three-dimensional modeling is the process of creating a representation of a surface or object in three dimensions via a specialized software where the modeler scans a real-world object into a point cloud, creates a completely new surface or edits the selected representation. This process can be challenging due to factors like the complexity of the 3D creation software or the number of dimensions in play. This work proposes a framework that recommends three types of reconstructions of an incomplete or rough 3D model using Generative AdversarialNetworks (GANs). These reconstructions follow the distribution of real data, resemble the user model and stay close to the dataset while keeping features of the input, respectively. The main advantage of this approach is the acceptance of 3Dmodels as input for the GAN instead of latent vectors, which prevents the need of training an extra network to project the model into the latent space. The systems are evaluated both quantitatively and qualitatively. The quantitative measure lies upon the Intersection over Union (IoU) metric while the quantitative evaluation is measured by a user study. Experiments show that it is hard to create a system that generates realistic models, following the distribution of the dataset, since users have different opinions on what is realistic. However, similarity between the user input and the reconstruction is well accomplished and, in fact, the most valued feature for modelers. / Tredimensionell modellering är processen att skapa en representation av en yta eller ett objekt i tre dimensioner via en specialiserad programvara där modelleraren skannar ett verkligt objekt i ett punktmoln, skapar en helt ny yta eller redigerar den valda representationen. Denna process kan vara utmanande på grund av faktorer som komplexiteten i den 3D-skapande programvaran eller antalet dimensioner i spel. I det här arbetet föreslås ett ramverk som rekommenderar tre typer av rekonstruktioner av en ofullständig eller grov 3D-modell med Generative Adversarial Networks (GAN). Dessa rekonstruktioner följer distributionen av reella data, liknar användarmodellen och håller sig nära datasetet medan respektive egenskaper av ingången behålls. Den främsta fördelen med detta tillvägagångssätt är acceptansen av 3D-modeller som input för GAN istället för latentavektorer, vilket förhindrar behovet av att träna ett extra nätverk för att projicera modellen i latent rymd. Systemen utvärderas både kvantitativt och kvalitativt. Den kvantitativa åtgärden beror på Intersection over Union (IoU) metrisk medan den kvantitativa utvärderingen mäts av en användarstudie. Experiment visar att det är svårt att skapa ett system som genererar realistiska modeller efter distributionen av datasetet, eftersom användarna har olika åsikter om vad som är realistiskt. Likvärdighet mellan användarinmatning och rekonstruktion är väl genomförd och i själva verket den mest uppskattade funktionen för modellerare. Deep Learning GANs 3D Computer Sciences Datavetenskap (datalogi)
696	Deep Active Learning for Short-Text Classification / Aktiv inlärning i djupa nätverk för klassificering av korta texter Zhao, Wenquan January 2017 (has links) In this paper, we propose a novel active learning algorithm for short-text (Chinese) classification applied to a deep learning architecture. This topic thus belongs to a cross research area between active learning and deep learning. One of the bottlenecks of deeplearning for classification is that it relies on large number of labeled samples, which is expensive and time consuming to obtain. Active learning aims to overcome this disadvantage through asking the most useful queries in the form of unlabeled samples to belabeled. In other words, active learning intends to achieve precise classification accuracy using as few labeled samples as possible. Such ideas have been investigated in conventional machine learning algorithms, such as support vector machine (SVM) for imageclassification, and in deep neural networks, including convolutional neural networks (CNN) and deep belief networks (DBN) for image classification. Yet the research on combining active learning with recurrent neural networks (RNNs) for short-text classificationis rare. We demonstrate results for short-text classification on datasets from Zhuiyi Inc. Importantly, to achieve better classification accuracy with less computational overhead,the proposed algorithm shows large reductions in the number of labeled training samples compared to random sampling. Moreover, the proposed algorithm is a little bit better than the conventional sampling method, uncertainty sampling. The proposed activelearning algorithm dramatically decreases the amount of labeled samples without significantly influencing the test classification accuracy of the original RNNs classifier, trainedon the whole data set. In some cases, the proposed algorithm even achieves better classification accuracy than the original RNNs classifier. / I detta arbete studerar vi en ny aktiv inlärningsalgoritm som appliceras på en djup inlärningsarkitektur för klassificering av korta (kinesiska) texter. Ämnesområdet hör därmedtill ett ämnesöverskridande område mellan aktiv inlärning och inlärning i djupa nätverk .En av flaskhalsarna i djupa nätverk när de används för klassificering är att de beror avtillgången på många klassificerade datapunkter. Dessa är dyra och tidskrävande att skapa. Aktiv inlärning syftar till att överkomma denna typ av nackdel genom att generera frågor rörande de mest informativa oklassade datapunkterna och få dessa klassificerade. Aktiv inlärning syftar med andra ord till att uppnå bästa klassificeringsprestanda medanvändandet av så få klassificerade datapunkter som möjligt. Denna idé har studeratsinom konventionell maskininlärning, som tex supportvektormaskinen (SVM) för bildklassificering samt inom djupa neuronnätverk inkluderande bl.a. convolutional networks(CNN) och djupa beliefnetworks (DBN) för bildklassificering. Emellertid är kombinationenav aktiv inlärning och rekurrenta nätverk (RNNs) för klassificering av korta textersällsynt. Vi demonstrerar här resultat för klassificering av korta texter ur en databas frånZhuiyi Inc. Att notera är att för att uppnå bättre klassificeringsnoggranhet med lägre beräkningsarbete (overhead) så uppvisar den föreslagna algoritmen stora minskningar i detantal klassificerade träningspunkter som behövs jämfört med användandet av slumpvisadatapunkter. Vidare, den föreslagna algoritmen är något bättre än den konventionellaurvalsmetoden, osäkherhetsurval (uncertanty sampling). Den föreslagna aktiva inlärningsalgoritmen minska dramatiskt den mängd klassificerade datapunkter utan att signifikant påverka klassificeringsnoggranheten hos den ursprungliga RNN-klassificeraren när den tränats på hela datamängden. För några fall uppnår den föreslagna algoritmen t.o.m.bättre klassificeringsnoggranhet än denna ursprungliga RNN-klassificerare. Active Learning Deep Learning Text Classification Computer Sciences Datavetenskap (datalogi)
697	Sentiment Classification with Deep Neural Networks Kalogiras, Vasileios January 2017 (has links) Attitydanalys är ett delfält av språkteknologi (NLP) som försöker analysera känslan av skriven text. Detta är ett komplext problem som medför många utmaningar. Av denna anledning har det studerats i stor utsträckning. Under de senaste åren har traditionella maskininlärningsalgoritmer eller handgjord metodik använts och givit utmärkta resultat. Men den senaste renässansen för djupinlärning har växlat om intresse till end to end deep learning-modeller.Å ena sidan resulterar detta i mer kraftfulla modeller men å andra sidansaknas klart matematiskt resonemang eller intuition för dessa modeller. På grund av detta görs ett försök i denna avhandling med att kasta ljus på nyligen föreslagna deep learning-arkitekturer för attitydklassificering. En studie av deras olika skillnader utförs och ger empiriska resultat för hur ändringar i strukturen eller kapacitet hos modellen kan påverka exaktheten och sättet den representerar och ''förstår'' meningarna. / Sentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences. deep learning sentiment analysis sentence representations Computer Sciences Datavetenskap (datalogi)
698	Homography Estimation using Deep Learning for Registering All-22 Football Video Frames / Homografiuppskattning med deep learning för registrering av bildrutor från video av amerikansk fotboll Fristedt, Hampus January 2017 (has links) Homography estimation is a fundamental task in many computer vision applications, but many techniques for estimation rely on complicated feature extraction pipelines. We extend research in direct homography estimation (i.e. without explicit feature extraction) by implementing a convolutional network capable of estimating homographies. Previous work in deep learning based homography estimation calculates homographies between pairs of images, whereas our network takes single image input and registers it to a reference view where no image data is available. The application of the work is registering frames from American football video to a top-down view of the field. Our model manages to register frames in a test set with an average corner error equivalent to less than 2 yards. / Homografiuppskattning är ett förkrav för många problem inom datorseende, men många tekniker för att uppskatta homografier bygger på komplicerade processer för att extrahera särdrag mellan bilderna. Vi bygger på tidigare forskning inom direkt homografiuppskattning (alltså, utan att explicit extrahera särdrag) genom att implementera ett Convolutional Neural Network (CNN) kapabelt av att direkt uppskatta homografier. Arbetet tillämpas för att registrera bilder från video av amerikansk fotball till en referensvy av fotbollsplanen. Vår modell registrerar bildramer från ett testset till referensvyn med ett snittfel i bildens hörn ekvivalent med knappt 2 yards. Deep learning Convolutional neural network homography Computer Sciences Datavetenskap (datalogi)
699	Decoding Electrocorticography Signals by Deep Learning for Brain-Computer Interface / Deep learning-baserad avkodning av elektrokortikografiska signaler för ett hjärn-datorsgränssnitt JUBIEN, Guillaume January 2019 (has links) Brain-Computer Interface (BCI) offers the opportunity to paralyzed patients to control their movements without any neuromuscular activity. Signal processing of neuronal activity enables to decode movement intentions. Ability for patient to control an effector is closely linked to this decoding performance. In this study, I tackle a recent way to decode neuronal activity: Deep learning. The study is based on public data extracted by Schalk et al. for BCI Competition IV. Electrocorticogram (ECoG) data from three epileptic patients were recorded. During the experiment setup, the team asked subjects to move their fingers and recorded finger movements thanks to a data glove. An artificial neural network (ANN) was built based on a common BCI feature extraction pipeline made of successive convolutional layers. This network firstly mimics a spatial filtering with a spatial reduction of sources. Then, it realizes a time-frequency analysis and performs a log power extraction of the band-pass filtered signals. The first investigation was on the optimization of the network. Then, the same architecture was used on each subject and the decoding performances were computed for a 6-class classification. I especially investigated the spatial and temporal filtering. Finally, a preliminary study was conducted on prediction of finger movement. This study demonstrated that deep learning could be an effective way to decode brain signal. For 6-class classification, results stressed similar performances as traditional decoding algorithm. As spatial or temporal weights after training are slightly described in the literature, we especially worked on interpretation of weights after training. The spatial weight study demonstrated that the network is able to select specific ECoG channels notified in the literature as the most informative. Moreover, the network is able to converge to the same spatial solution, independently to the initialization. Finally, a preliminary study was conducted on prediction of movement position and gives encouraging results. Deep learning BCI ECoG Spatial filtering Medical Engineering Medicinteknik
700	Domain-Independent Moving Object Depth Estimation using Monocular Camera / Domän-oberoende djupestimering av objekt i rörelse med monokulär kamera Nassir, Cesar January 2018 (has links) Today automotive companies across the world strive to create vehicles with fully autonomous capabilities. There are many benefits of developing autonomous vehicles, such as reduced traffic congestion, increased safety and reduced pollution, etc. To be able to achieve that goal there are many challenges ahead, one of them is visual perception. Being able to estimate depth from a 2D image has been shown to be a key component for 3D recognition, reconstruction and segmentation. Being able to estimate depth in an image from a monocular camera is an ill-posed problem since there is ambiguity between the mapping from colour intensity and depth value. Depth estimation from stereo images has come far compared to monocular depth estimation and was initially what depth estimation relied on. However, being able to exploit monocular cues is necessary for scenarios when stereo depth estimation is not possible. We have presented a novel CNN network, BiNet which is inspired by ENet, to tackle depth estimation of moving objects using only a monocular camera in real-time. It performs better than ENet in the Cityscapes dataset while adding only a small overhead to the complexity. / I dag strävar bilföretag över hela världen för att skapa fordon med helt autonoma möjligheter. Det finns många fördelar med att utveckla autonoma fordon, såsom minskad trafikstockning, ökad säkerhet och minskad förorening, etc. För att kunna uppnå det målet finns det många utmaningar framåt, en av dem är visuell uppfattning. Att kunna uppskatta djupet från en 2D-bild har visat sig vara en nyckelkomponent för 3D-igenkännande, rekonstruktion och segmentering. Att kunna uppskatta djupet i en bild från en monokulär kamera är ett svårt problem eftersom det finns tvetydighet mellan kartläggningen från färgintensitet och djupvärde. Djupestimering från stereobilder har kommit långt jämfört med monokulär djupestimering och var ursprungligen den metod som man har förlitat sig på. Att kunna utnyttja monokulära bilder är dock nödvändig för scenarier när stereodjupuppskattning inte är möjligt. Vi har presenterat ett nytt nätverk, BiNet som är inspirerat av ENet, för att ta itu med djupestimering av rörliga objekt med endast en monokulär kamera i realtid. Det fungerar bättre än ENet med datasetet Cityscapes och lägger bara till en liten kostnad på komplexiteten. Robotics Deep Learning Computer Vision Depth Estimation Robotics Robotteknik och automation

Search results