• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 106
  • 14
  • 6
  • 6
  • 4
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 174
  • 174
  • 31
  • 28
  • 27
  • 20
  • 19
  • 17
  • 16
  • 15
  • 15
  • 15
  • 14
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Development of Novel Reconstruction Methods Based on l1--Minimization for Near Infrared Diffuse Optical Tomography

Shaw, Calbvin B January 2012 (has links) (PDF)
Diffuse optical tomography uses near infrared (NIR) light as the probing media to recover the distributions of tissue optical properties. It has a potential to become an adjunct imaging modality for breast and brain imaging, that is capable of providing functional information of the tissue under investigation. As NIR light propagation in the tissue is dominated by scattering, the image reconstruction problem (inverse problem) tends to be non-linear and ill-posed, requiring usage of advanced computational methods to compensate this. Traditional image reconstruction methods in diffuse optical tomography employ l2 –norm based regularization, which is known to remove high frequency noises in the re-constructed images and make them appear smooth. The recovered contrast in the reconstructed image in these type of methods are typically dependent on the iterative nature of the method employed, in which the non-linear iterative technique is known to perform better in comparison to linear techniques. The usage of non-linear iterative techniques in the real-time, especially in dynamical imaging, becomes prohibitive due to the computational complexity associated with them. In the rapid dynamic diffuse optical imaging, assumption of a linear dependency in the solutions between successive frames results in a linear inverse problem. This new frame work along with the l1–norm based regularization can provide better robustness to noise and results in a better contrast recovery compared to conventional l2 –based techniques. Moreover, it is shown that the proposed l1-based technique is computationally efficient compared to its counterpart(l2 –based one). The proposed framework requires a reasonably close estimate of the actual solution for the initial frame and any suboptimal estimate leads to erroneous reconstruction results for the subsequent frames. Modern diffuse optical imaging systems are multi-modal in nature, where diffuse optical imaging is combined with traditional imaging modalities such as MRI, CT, and Ultrasound. A novel approach that can more effectively use the structural information provided by the traditional imaging modalities in these scenarios is introduced, which is based on prior image constrained- l1 minimization scheme. This method has been motivated by the recent progress in the sparse image reconstruction techniques. It is shown that the- l1 based frame work is more effective in terms of localizing the tumor region and recovering the optical property values both in numerical and gelatin phantom cases compared to the traditional methods that use structural information.
152

Locality and compositionality in representation learning for complex visual tasks

Sylvain, Tristan 03 1900 (has links)
L'utilisation d'architectures neuronales profondes associée à des innovations spécifiques telles que les méthodes adversarielles, l’entraînement préalable sur de grands ensembles de données et l'estimation de l'information mutuelle a permis, ces dernières années, de progresser rapidement dans de nombreuses tâches de vision par ordinateur complexes telles que la classification d'images de catégories préalablement inconnues (apprentissage zéro-coups), la génération de scènes ou la classification multimodale. Malgré ces progrès, il n’est pas certain que les méthodes actuelles d’apprentissage de représentations suffiront à atteindre une performance équivalente au niveau humain sur des tâches visuelles arbitraires et, de fait, cela pose des questions quant à la direction de la recherche future. Dans cette thèse, nous nous concentrerons sur deux aspects des représentations qui semblent nécessaires pour atteindre de bonnes performances en aval pour l'apprentissage des représentations : la localité et la compositionalité. La localité peut être comprise comme la capacité d'une représentation à retenir des informations locales. Ceci sera pertinent dans de nombreux cas, et bénéficiera particulièrement à la vision informatique, domaine dans lequel les images naturelles comportent intrinsèquement des informations locales, par exemple des parties pertinentes d’une image, des objets multiples présents dans une scène... D'autre part, une représentation compositionnelle peut être comprise comme une représentation qui résulte d'une combinaison de parties plus simples. Les réseaux neuronaux convolutionnels sont intrinsèquement compositionnels, et de nombreuses images complexes peuvent être considérées comme la composition de sous-composantes pertinentes : les objets et attributs individuels dans une scène, les attributs sémantiques dans l'apprentissage zéro-coups en sont deux exemples. Nous pensons que ces deux propriétés détiennent la clé pour concevoir de meilleures méthodes d'apprentissage de représentations. Dans cette thèse, nous présentons trois articles traitant de la localité et/ou de la compositionnalité, et de leur application à l'apprentissage de représentations pour des tâches visuelles complexes. Dans le premier article, nous introduisons des méthodes de mesure de la localité et de la compositionnalité pour les représentations d'images, et nous démontrons que les représentations locales et compositionnelles sont plus performantes dans l'apprentissage zéro-coups. Nous utilisons également ces deux notions comme base pour concevoir un nouvel algorithme d'apprentissage des représentations qui atteint des performances de pointe dans notre cadre expérimental, une variante de l'apprentissage "zéro-coups" plus difficile où les informations externes, par exemple un pré-entraînement sur d'autres ensembles de données d'images, ne sont pas autorisées. Dans le deuxième article, nous montrons qu'en encourageant un générateur à conserver des informations locales au niveau de l'objet, à l'aide d'un module dit de similarité de graphes de scène, nous pouvons améliorer les performances de génération de scènes. Ce modèle met également en évidence l'importance de la composition, car de nombreux composants fonctionnent individuellement sur chaque objet présent. Pour démontrer pleinement la portée de notre approche, nous effectuons une analyse détaillée et proposons un nouveau cadre pour évaluer les modèles de génération de scènes. Enfin, dans le troisième article, nous montrons qu'en encourageant une forte information mutuelle entre les représentations multimodales locales et globales des images médicales en 2D et 3D, nous pouvons améliorer la classification et la segmentation des images. Ce cadre général peut être appliqué à une grande variété de contextes et démontre les avantages non seulement de la localité, mais aussi de la compositionnalité, car les représentations multimodales sont combinées pour obtenir une représentation plus générale. / The use of deep neural architectures coupled with specific innovations such as adversarial methods, pre-training on large datasets and mutual information estimation has in recent years allowed rapid progress in many complex vision tasks such as zero-shot learning, scene generation, or multi-modal classification. Despite such progress, it is still not clear if current representation learning methods will be enough to attain human-level performance on arbitrary visual tasks, and if not, what direction should future research take. In this thesis, we will focus on two aspects of representations that seem necessary to achieve good downstream performance for representation learning: locality and compositionality. Locality can be understood as a representation's ability to retain local information. This will be relevant in many cases, and will specifically benefit computer vision where natural images inherently feature local information, i.e. relevant patches of an image, multiple objects present in a scene... On the other hand, a compositional representation can be understood as one that arises from a combination of simpler parts. Convolutional neural networks are inherently compositional, and many complex images can be seen as composition of relevant sub-components: individual objects and attributes in a scene, semantic attributes in zero-shot learning are two examples. We believe both properties hold the key to designing better representation learning methods. In this thesis, we present 3 articles dealing with locality and/or compositionality, and their application to representation learning for complex visual tasks. In the first article, we introduce ways of measuring locality and compositionality for image representations, and demonstrate that local and compositional representations perform better at zero-shot learning. We also use these two notions as the basis for designing class-matching deep info-max, a novel representation learning algorithm that achieves state-of-the-art performance on our proposed "Zero-shot from scratch" setting, a harder zero-shot setting where external information, e.g. pre-training on other image datasets is not allowed. In the second article, we show that by encouraging a generator to retain local object-level information, using a scene-graph similarity module, we can improve scene generation performance. This model also showcases the importance of compositionality as many components operate individually on each object present. To fully demonstrate the reach of our approach, we perform detailed analysis, and propose a new framework to evaluate scene generation models. Finally, in the third article, we show that encouraging high mutual information between local and global multi-modal representations of 2D and 3D medical images can lead to improvements in image classification and segmentation. This general framework can be applied to a wide variety of settings, and demonstrates the benefits of not only locality, but also of compositionality as multi-modal representations are combined to obtain a more general one.
153

A Multi-modal Emotion Recognition Framework Through The Fusion Of Speech With Visible And Infrared Images

Siddiqui, Mohammad Faridul Haque 29 August 2019 (has links)
No description available.
154

Data Collection and Layout Analysis on Visually Rich Documents using Multi-Modular Deep Learning.

Stahre, Mattias January 2022 (has links)
The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. / Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.
155

THE IMPACT OF PLANS, POLICIES AND PRACTICES OF METROPOLITAN PLANNING ORGANIZATIONS ON THE DESIGN AND IMPLEMENTATION OF STREETS FOR ALL USERS

Riemann, Deborah 14 May 2013 (has links)
No description available.
156

Multi-modal Neural Representations for Semantic Code Search / Multimodala neurala representationer för semantisk kodsökning

Gu, Jian January 2020 (has links)
In recent decades, various software systems have gradually become the basis of our society. Programmers search existing code snippets from time to time in their daily life. It would be beneficial and meaningful to have better solutions for the task of semantic code search, which is to find the most semantically relevant code snippets for a given query. Our approach is to introduce tree representations by multi-modal learning. The core idea is to enrich semantic information for code snippets by preparing data of different modalities, and meanwhile ignore syntactic information. We design one novel tree structure named Simplified Semantic Tree and then extract RootPath representations from that. We utilize RootPath representation to complement the conventional sequential representation, namely the token sequence of the code snippet. Our multi-modal model receives code-query pair as input and computes similarity score as output, following the pseudo-siamese architecture. For each pair, besides the ready-made code sequence and query sequence, we extra one extra tree sequence from Simplified Semantic Tree. There are three encoders in our model, and they respectively encode these three sequences as vectors of the same length. Then we combine the code vector with the tree vector for one joint vector, which is still of the same length, as the multi-modal representation for the code snippet. We introduce triplet loss to ensure vectors of code and query in the same pair be close at the shared vector space. We conduct experiments in one large-scale multi-language corpus, with comparisons of strong baseline models by specified performance metrics. Among baseline models, the simplest Neural Bag-of-Words model is with the most satisfying performance. It indicates that syntactic information is likely to distract complex models from critical semantic information. Results show that our multi-modal representation approach performs better because it surpasses baseline models by far in most cases. The key to our multi-modal model is that it is totally about semantic information, and it learns from data of multiple modalities. / Under de senaste decennierna har olika programvarusystem gradvis blivit basen i vårt samhälle. Programmerare söker i befintliga kodavsnitt från tid till annan i deras dagliga liv. Det skulle vara fördelaktigt och meningsfullt att ha bättre lösningar för uppgiften att semantisk kodsökning, vilket är att hitta de mest semantiskt relevanta kodavsnitten för en given fråga. Vår metod är att introducera trädrepresentationer genom multimodal inlärning. Grundidén är att berika semantisk information för kodavsnitt genom att förbereda data med olika modaliteter och samtidigt ignorera syntaktisk information. Vi designar en ny trädstruktur med namnet Simplified Semantic Tree och extraherar sedan RootPath-representationer från det. Vi använder RootPath-representation för att komplettera den konventionella sekvensrepresentationen, nämligen kodsekvensens symbolsekvens. Vår multimodala modell får kodfrågeställningar som inmatning och beräknar likhetspoäng som utgång efter den pseudo-siamesiska arkitekturen. För varje par, förutom den färdiga kodsekvensen och frågesekvensen, extrager vi en extra trädsekvens från Simplified Semantic Tree. Det finns tre kodare i vår modell, och de kodar respektive tre sekvenser som vektorer av samma längd. Sedan kombinerar vi kodvektorn med trädvektorn för en gemensam vektor, som fortfarande är av samma längd som den multimodala representationen för kodavsnittet. Vi introducerar tripletförlust för att säkerställa att vektorer av kod och fråga i samma par är nära det delade vektorn. Vi genomför experiment i ett storskaligt flerspråkigt korpus, med jämförelser av starka baslinjemodeller med specificerade prestandametriker. Bland baslinjemodellerna är den enklaste Neural Bag-of-Words-modellen med den mest tillfredsställande prestanda. Det indikerar att syntaktisk information sannolikt kommer att distrahera komplexa modeller från kritisk semantisk information. Resultaten visar att vår multimodala representationsmetod fungerar bättre eftersom den överträffar basmodellerna i de flesta fall. Nyckeln till vår multimodala modell är att den helt handlar om semantisk information, och den lär sig av data om flera modaliteter.
157

Multi-modal Reading For Low Level Readers

O'Neal, Jamie 01 January 2010 (has links)
The value of this research hinges on the idea that exchanging illustrations for descriptive text can provide appropriate schemas for students with reading difficulties and thereby improve their comprehension and vocabulary acquisition. The research in this dissertation is based on theories and earlier research in the fields of psychology, education, reading, and narratology. A review of these fields offers a variety of perspectives on the processes involved in reading and comprehension. These processes range from the physical systems involved in reading (e.g., early childhood development, eye movement) to the psychological systems, which include cognitive load theory as well as image and text processing models. This study compares two reading methods by analyzing students' vocabulary and comprehension gains. Both groups read the same text and completed the same pre- and post-tests. The control group read the text from the book which was text only. The experimental group read from a modified text on the computer screen. The text was modified by replacing some sentences with images designed to transmit the same information (e.g., descriptions of the setting, vocabulary items) in a graphic format. The images were in-line with the text, and designed to be read as part of the story, not as additional illustrations. Final analysis shows that the experimental format performed as well as the control format for most students. However, students who have learning disabilities, particularly language learners who have learning disabilities, did not make gains in the text only control format. These same students did show statistically significant gains with the experimental format, particularly the section of reading where the vocabulary words were explicitly presented in the images. Disparate, non-homogenous groupings of students reflect the actual teaching and learning circumstances in the school, as required by the school system. This situation thus represents the actual status quo situation faced by teachers in our school. We leave it to future researchers to work with more homogenous groups of students in order to attain clearer, stronger and more plaintively useful results.
158

Robust Multi-Modal Fusion for 3D Object Detection : Using multiple sensors of different types to robustly detect, classify, and position objects in three dimensions. / Robust multi-modal fusion för 3D-objektdetektion : Använda flera sensorer av olka typer för att robust detektera, klassificera och positionera objekt i tre dimensioner.

Kårefjärd, Viktor January 2023 (has links)
The computer vision task of 3D object detection is fundamentally necessary for autonomous driving perception systems. These vehicles typically feature a multitude of sensors, such as cameras, radars, and light detection and ranging sensors. A neural network architecture approach to make use of these sensor modalities is a multi-modal 3D object detection network with a fusion step that combines the information from multiple data streams to jointly predicted bounding boxes of detected objects. How this step should be performed, however, remains largely an open question due to the contemporary nature of this literature space. Thus, the question arises: How can sensor information from different sensors be combined to perform 3D object detection for a real-world application such as a mobile delivery robot with robustness requirements and how should a fusion step be performed as a part of a larger multi-modal fusion network? This work explores state-of-the-art multi-modal fusion models by testing with sub-optimal sensor data augmentations to quantify robustness including LiDAR point cloud subsampling and low-resolution LiDAR data. Sensor-to-sensor misalignments from poor calibration, decalibration, or spatial-temporal mis-synchronization problems are also simulated and a set of fusion steps are compared and evaluated. Three novel fusion steps are proposed where the best-performing fusion step is a convolution fusion with an encode-decoder and a squeeze and excitation block. The results indicate how early and late fusion methods are sensitive to sub-optimal LiDAR sensor conditions, and thus not suitable for an application with requirements of robust detection. Instead, Deep-fusion based models are preferred. Furthermore, a bird’s eye fusion model is demonstrated to not be overly sensitive to small sensor-to-sensor misalignments, and how the proposed fusion step with an encoder-decoder structure and a squeeze and excitation block can further limit misalignment-related performance deficits. The introduction of sensor misalignment as a training augmentation is also proven to alleviate and generalize the fusion step under heavy misalignment. / Datorseende uppgiften 3D-objektdetektering är i grunden nödvändig för autonomt körande system. Dessa fordon har vanligtvis ett flertal sensorer, såsom kameror, radar och ljusdetekterings- och avståndssensorer. Ett tillvägagångssätt med neural nätverksarkitektur för att använda dessa sensormodaliteter är ett multimodalt 3D-objektdetekteringsnätverk med ett fusionssteg som kombinerar informationen från flera dataströmmar för att gemensamt föreslå beggrränsade boxar för upptäckta objekt. Hur detta steg bör utformas förblir dock till stor del en öppen fråga på grund av litteraturutrymmes obestämda karaktär. Därför uppstår frågan: Hur kan sensorinformation från olika sensorer kombineras för att utföra 3D-objektdetektering för en verklig applikation som en mobil leveransrobot med robusthetskrav och hur ska ett fusionssteg utföras som en del av i ett större multimodalt fusionsnätverk? Detta arbete utforskar moderna multimodala fusionsmodeller genom att testa med suboptimala sensordataaugmenteringar för att kvantifiera robusthet inklusive LiDAR punktmolnsdelsampling och lågupplöst LiDAR-data. Sensor-till-sensor feljusteringar från dålig kalibrering, dekalibrering eller rumsliga-temporala felsynkroniseringsproblem simuleras också och en uppsättning fusionssteg jämförs och utvärderas. Tre nya fusionssteg föreslås där det bästa fusionssteget av de presterande är en convolution med en inkodare-avkodare och ett kläm- och exciteringsblock. Resultaten indikerar hur tidiga och sena fusionsmetoder är känsliga för suboptimala LiDAR-sensorförhållanden och därför inte lämpar sig för en applikation med krav på robust detektion. Istället föredras djupfusion modeller. Dessutom har en fusionsmodell av fågelvy typ visat sig inte vara känslig för små sensor-till-sensor feljusteringar, och hur det föreslagna fusionssteget med en inkodare-avkodarestruktur och ett kläm- och exciteringsblock ytterligare kan begränsa feljusteringsrelaterade prestandabrister. Införandet av sensorfeljustering som en träningsaugmentering har också visat sig lindra och generalisera fusionssteget under kraftig feljustering.
159

Biometric Multi-modal User Authentication System based on Ensemble Classifier

Assaad, Firas Souhail January 2014 (has links)
No description available.
160

Multimodal Composing In Support of Disciplinary Literacy: A Search For Context In ELA and History Classrooms

Walsh-Moorman , Elizabeth A. 02 May 2018 (has links)
No description available.

Page generated in 0.0344 seconds