Global ETD Search

321	Towards Scalable Deep 3D Perception and Generation Qian, Guocheng 11 October 2023 (has links) Scaling up 3D deep learning systems emerges as a paramount issue, comprising two primary facets: (1) Model scalability that designs a 3D network that is scalefriendly, i.e. model archives improving performance with increasing parameters and can run efficiently. Unlike 2D convolutional networks, 3D networks have to accommodate the irregularities of 3D data, such as respecting permutation invariance in point clouds. (2) Data scalability: high-quality 3D data is conspicuously scarce in the 3D field. 3D data acquisition and annotations are both complex and costly, hampering the development of scalable 3D deep learning. This dissertation delves into 3D deep learning including both perception and generation, addressing the scalability challenges. To address model scalability in 3D perception, I introduce ASSANet which outlines an approach for efficient 3D point cloud representation learning, allowing the model to scale up with a low cost of computation, and notably achieving substantial accuracy gains. I further introduce the PointNeXt framework, focusing on data augmentation and scalability of the architecture, that outperforms state-of-the-art 3D point cloud perception networks. To address data scalability, I present Pix4Point which explores the utilization of abundant 2D images to enhance 3D understanding. For scalable 3D generation, I propose Magic123 which leverages a joint 2D and 3D diffusion prior for zero-shot image-to-3D content generation without the necessity of 3D supervision. These collective efforts provide pivotal solutions to model and data scalability in 3D deep learning. 3D Deep Learning 3D Understanding 3D Generation Point Cloud
322	New Approaches to Optical Music Recognition Alfaro-Contreras, María 22 September 2023 (has links) El Reconocimiento Óptico de Música (Optical Music Recognition, OMR) es un campo de investigación que estudia cómo leer computacionalmente la notación musical presente en documentos y almacenarla en un formato digital estructurado. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. La incorporación del Aprendizaje Profundo (Deep Learning, DL) en el OMR ha producido un cambio hacia el uso de sistemas holísticos o de extremo a extremo basados en redes neuronales para la etapa de segmentación y clasificación de símbolos, tratando el proceso de reconocimiento como un único paso en lugar de dividirlo en distintas subtareas. Al aprender simultáneamente los procesos de extracción de características y clasificación, estas soluciones eliminan la necesidad de diseñar procesos específicos para cada caso: las características necesarias para la clasificación se infieren directamente de los datos. Para lograrlo, solo son necesarios pares de entrenamiento formados por la imagen de entrada y su correspondiente transcripción. En otras palabras, este enfoque evita la necesidad de anotar las posiciones exactas de los símbolos, lo que simplifica aún más el proceso de transcripción. El enfoque de extremo a extremo ha sido recientemente explorado en la literatura, pero siempre bajo la suposición de que un determinado preproceso ya ha segmentado los diferentes pentagramas de una partitura. El objetivo es, por tanto, recuperar la serie de símbolos musicales que aparecen en una imagen de un pentagrama. En este contexto, las Redes Neuronales Convolucionales Recurrentes (Convolutional Recurrent Neural Networks, CRNN) representan el estado del arte: el bloque convolucional se encarga de extraer características relevantes de la imagen de entrada, mientras que las capas recurrentes interpretan estas características en términos de secuencias de símbolos musicales. Las CRNN se entrenan principalmente utilizando la función de pérdida de Clasificación Temporal Conexionista (Connectionist Temporal Classification, CTC), la cual permite el entrenamiento sin requerir información explícita sobre la ubicación de los símbolos en la imagen. Para la etapa de inferencia, generalmente se emplea una política de decodificación voraz, es decir, se recupera la secuencia de mayor probabilidad. Esta tesis presenta una serie de contribuciones, organizadas en tres grupos distintos pero interconectados, que avanzan en el desarrollo de sistemas de OMR a nivel de pentagrama más robustos y generalizables. El primer grupo de contribuciones se centra en la reducción del esfuerzo humano al utilizar sistemas de OMR. Se comparan los tiempos de transcripción con y sin la ayuda de un sistema de OMR, observando que su uso acelera el proceso, aunque requiere una cantidad suficiente de datos etiquetados, lo cual implica un esfuerzo humano. Por lo tanto, se propone utilizar técnicas de Aprendizaje Auto- Supervisado (Self-Supervised Learning, SSL) para preentrenar un clasificador de símbolos, logrando una precisión superior al 80% al utilizar solo un ejemplo por clase en el entrenamiento. Este clasificador de símbolos puede acelerar el proceso de etiquetado de datos. El segundo grupo de contribuciones mejora el rendimiento de los sistemas de OMR de dos maneras. Por un lado, se propone una codificación musical que permite reconocer música monofónica y homofónica. Por otro lado, se mejora el rendimiento de los sistemas mediante el uso de la bidimensionalidad de la representación agnóstica, introduciendo tres cambios en el enfoque estándar: (i) una nueva arquitectura que incluye ramas específicas para captura características relacionadas con la forma (duración del evento) o la altura (tono) de los símbolos musicales, (ii) el uso de una representación de secuencia dividida, que requiere que el modelo prediga los atributos de forma y altura de manera secuencial, y (iii) un algoritmo de decodificación voraz personalizado que garantiza que la representación mencionada se cumple en la secuencia predicha. El tercer y último grupo de contribuciones explora las sinergias entre OMR y su equivalente en audio, la Transcripción Automática de Música (Automatic Music Transcription, AMT). Estas contribuciones confirman la existencia de sinergias entre ambos campos y evalúan distintos enfoques de fusión tardía para la transcripción multimodal, lo que se traduce en mejoras significativas en la precisión de la transcripción. Por último, la tesis concluye comparando los enfoques de fusión temprana y fusión tardía, y afirma que la fusión tardía ofrece más flexibilidad y mejor rendimiento. / Esta tesis ha sido financiada por el Ministerio de Universidades a través del programa de ayudas para la formación de profesorado universitario (Ref. FPU19/04957). Deep Learning Optical Music Recognition Automatic Music Transcription
323	FOCALSR: REVISITING IMAGE SUPER-RESOLUTION TRANSFORMERS WITH FFT-ENABLED CROSS ATTENTION LAYERS Botong Ou (17536914) 06 December 2023 (has links) <p dir="ltr">Motion blur arises from camera instability or swift movement of subjects within a scene. The objective of image deblurring is to eliminate these blur effects, thereby enhancing the image's quality. This task holds significant relevance, particularly in the era of smartphones and portable cameras. Yet, it remains a challenging issue, notwithstanding extensive research undertaken over many years. The fundamental concept in deblurring an image involves restoring a blurred pixel back to its initial state.</p><p dir="ltr">Deep learning (DL) algorithms, recognized for their capability to identify unique and significant features from datasets, have gained significant attention in the field of machine learning. These algorithms have been increasingly adopted in geoscience and remote sensing (RS) for analyzing large volumes of data. In these applications, low-level attributes like spectral and texture features form the foundational layer. The high-level feature representations derived from the upper layers of the network can be directly utilized in classifiers for pixel-based analysis. Thus, for enhancing the accuracy of classification using RS data, ensuring the clarity and quality of each collected data in the dataset is crucial for the effective construction of deep learning models.</p><p dir="ltr">In this thesis, we present the FFT-Cross Attention Transformer, an innovative approach amalgamating channel-focused and window-centric self-attention within a state-of-the-art(SOTA) Vision Transformer model. Augmented with a Fast Fourier Convolution Layer, this approach extends the Transformer's capability to capture intricate details in low-resolution images. Employing unified task pre-training during model development, we confirm the robustness of these enhancements through comprehensive testing, resulting in substantial performance gains. Notably, we achieve a remarkable 1dB improvement in the PSNR metric for remote sensing imagery, underscoring the transformative potential of the FFT-Cross Attention Transformer in advancing image processing and domain-specific vision tasks.</p> Computer vision Image processing Computer Vision Deep Learning Image Processing
324	Advanced Deep-Learning Methods For Automatic Change Detection and Classification of Multitemporal Remote-Sensing Images Bergamasco, Luca 09 June 2022 (has links) Deep-Learning (DL) methods have been widely used for Remote Sensing (RS) applications in the last few years, and they allow improving the analysis of the temporal information in bi-temporal and multi-temporal RS images. DL methods use RS data to classify geographical areas or find changes occurring over time. DL methods exploit multi-sensor or multi-temporal data to retrieve results more accurately than single-source or single-date processing. However, the State-of-the-Art DL methods exploit the heterogeneous information provided by these data by focusing the analysis either on the spatial information of multi-sensor multi-resolution images using multi-scale approaches or on the time component of the image time series. Most of the DL RS methods are supervised, so they require a large number of labeled data that is challenging to gather. Nowadays, we have access to many unlabeled RS data, so the creation of long image time series is feasible. However, supervised methods require labeled data that are expensive to gather over image time series. Hence multi-temporal RS methods usually follow unsupervised approaches. In this thesis, we propose DL methodologies that handle these open issues. We propose unsupervised DL methods that exploit multi-resolution deep feature maps derived by a Convolutional Autoencoder (CAE). These DL models automatically learn spatial features from the input during the training phase without any labeled data. We then exploit the high temporal resolution of image time series with the high spatial information of Very-High-Resolution (VHR) images to perform a multi-temporal and multi-scale analysis of the scene. We merge the information provided by the geometrical details of VHR images with the temporal information of the image time series to improve the RS application tasks. We tested the proposed methods to detect changes over bi-temporal RS images acquired by various sensors, such as Landsat-5, Landsat-8, and Sentinel-2, representing burned and deforested areas, and kinds of pasture impurities using VHR orthophotos and Sentinel-2 image time series. The results proved the effectiveness of the proposed methods.
325	Numerical Modeling and Inverse Design of Complex Nanophotonic Systems Baxter, Joshua Stuart Johannes 10 January 2024 (has links) Nanophotonics is the study and technological application of the interaction of electromagnetic waves (light) and matter at the nanometer scale. The field's extensive research focuses on generating, detecting, and controlling light using nanoscale features such as nanoparticles, waveguides, resonators, nanoantennas, and more. Exploration in the field is highly dependent on computational methods, which simulate how light will interact with matter in specific situations. However, as nanophotonics advances, so must the computational techniques. In this thesis, I present my work in various numerical studies in nanophotonics, sorted into three categories; plasmonics, inverse design, and deep learning. In plasmonics, I have developed methods for solving advanced material models (including nonlinearities) for small metallic and epsilon-near-zero features and validated them with other theoretical and experimental results. For inverse design, I introduce new methods for designing optical pulse shapes and metalenses for focusing high-harmonic generation. Finally, I used deep learning to model plasmonic colour generation from structured metal surfaces and to predict plasmonic nanoparticle multipolar responses. Nanophotonics Plasmonics Photonics Optics FDTD Inverse design Deep Learning
326	Multiscale Modeling with Meshfree Methods Xu, Wentao January 2023 (has links) Multiscale modeling has become an important tool in material mechanics because material behavior can exhibit varied properties across different length scales. The use of multiscale modeling is essential for accurately capturing these characteristics and predicting material behavior. Mesh-free methods have also been gaining attention in recent years due to their innate ability to handle complex geometries and large deformations. These methods provide greater flexibility and efficiency in modeling complex material behavior, especially for problems involving discontinuities, such as fractures and cracks. Moreover, mesh-free methods can be easily extended to multiple lengths and time scales, making them particularly suitable for multiscale modeling. The thesis focuses on two specific problems of multiscale modeling with mesh-free methods. The first problem is the atomistically informed constitutive model for the study of high-pressure induced densification of silica glass. Molecular Dynamics (MD) simulations are carried out to study the atomistic level responses of fused silica under different pressure and strain-rate levels, Based on the data obtained from the MD simulations, a novel continuum-based multiplicative hyper-elasto-plasticity model that accounts for the anomalous densification behavior is developed and then parameterized using polynomial regression and deep learning techniques. To incorporate dynamic damage evolution, a plasticity-damage variable that controls the shrinkage of the yield surface is introduced and integrated into the elasto-plasticity model. The resulting coupled elasto-plasticity-damage model is reformulated to a non-ordinary state-based peridynamics (NOSB-PD) model for the computational efficiency of impact simulations. The developed peridynamics (PD) model reproduces coarse-scale quantities of interest found in MD simulations and can simulate at a component level. Finally, the proposed atomistically-informed multiplicative hyper-elasto-plasticity-damage model has been validated against limited available experimental results for the simulation of hyper-velocity impact simulation of projectiles on silica glass targets. The second problem addressed in the thesis involves the upscaling approach for multi-porosity media, analyzed using the so-called MultiSPH method, which is a sequential SPH (Smoothed Particle Hydrodynamics) solver across multiple scales. Multi-porosity media is commonly found in natural and industrial materials, and their behavior is not easily captured with traditional numerical methods. The upscaling approach presented in the thesis is demonstrated on a porous medium consisting of three scales, it involves using SPH methods to characterize the behavior of individual pores at the microscopic scale and then using a homogenization technique to upscale to the meso and macroscopic level. The accuracy of the MultiSPH approach is confirmed by comparing the results with analytical solutions for simple microstructures, as well as detailed single-scale SPH simulations and experimental data for more complex microstructures. Mathematics Multiscale modeling Deep learning (Machine learning) Silica
327	Detection and Classification of Diabetic Retinopathy using Deep Learning Models Olatunji, Aishat 01 May 2024 (has links) (PDF) Healthcare analytics leverages extensive patient data for data-driven decision-making, enhancing patient care and results. Diabetic Retinopathy (DR), a complication of diabetes, stems from damage to the retina’s blood vessels. It can affect both type 1 and type 2 diabetes patients. Ophthalmologists employ retinal images for accurate DR diagnosis and severity assessment. Early detection is crucial for preserving vision and minimizing risks. In this context, we utilized a Kaggle dataset containing patient retinal images, employing Python’s versatile tools. Our research focuses on DR detection using deep learning techniques. We used a publicly available dataset to apply our proposed neural network and transfer learning models, classifying images into five DR stages. Python libraries like TensorFlow facilitate data preprocessing, model development, and evaluation. Rigorous cross-validation and hyperparameter tuning optimized model accuracy, demonstrating their effectiveness in early risk identification, personalized healthcare recommendations, and improving patient outcomes. Deep Learning Computer Vision Medical Imaging Retinopathy. Computer Sciences
328	AUTOMATIC EXTRACTION OF COMPUTER SCIENCE CONCEPT PHRASES USING A HYBRID MACHINE LEARNING PARADIGM S. M. Abrar Jahin (14300654) 31 May 2023 (has links) <p> With the proliferation of computer science in recent years in modern society, the number of computer science-related employment is expanding quickly. Software engineer has been chosen as the best job for 2023 based on pay, stress level, opportunity for professional growth, and balance between work and personal life. This was decided by a rankings of different news, journals, and publications. Computer science occupations are anticipated to be in high demand not just in 2023, but also for the foreseeable future. It’s not surprising that the number of computer science students at universities is growing and will continue to grow. The enormous increase in student enrolment in many subdisciplines of computers has presented some distinct issues. If computer science is to be incorporated into the K-12 curriculum, it is vital that K-12 educators are competent. But one of the biggest problems with this plan is that there aren’t enough trained computer science professors. Numerous new fields and applications, for instance, are being introduced to computer science. In addition, it is difficult for schools to recruit skilled computer science instructors for a variety of reasons including low salary issue. Utilizing the K-12 teachers who are already in the schools, have a love for teaching, and consider teaching as a vocation is therefore the most effective strategy to improve or fix this issue. So, if we want teachers to quickly grasp computer science topics, we need to give them an easy way to learn about computer science. To simplify and expedite the study of computer science, we must acquaint school-treachers with the terminology associated with computer science concepts so they can know which things they need to learn according to their profile. If we want to make it easier for schoolteachers to comprehend computer science concepts, it would be ideal if we could provide them with a tree of words and phrases from which they could determine where the phrases originated and which phrases are connected to them so that they can be effectively learned. To find a good concept word or phrase, we must first identify concepts and then establish their connections or linkages. As computer science is a fast developing field, its nomenclature is also expanding at a frenetic rate. Therefore, adding all concepts and terms to the knowledge graph would be a challenging endeavor. Creating a system that automatically adds all computer science domain terms to the knowledge graph 11 would be a straightforward solution to the issue. We have identified knowledge graph use cases for the school-teacher training program, which motivates the development of a knowl?edge graph. We have analyzed the knowledge graph’s use case and the knowledge graph’s ideal characteristics. We have designed a web-based system for adding, editing, and remov?ing words from a knowledge graph. In addition, a term or phrase can be represented with its children list, parent list, and synonym list for enhanced comprehension. We’ve developed an automated system for extracting words and phrases that can extract computer science idea phrases from any supplied text, therefore enriching the knowledge graph. Therefore, we have designed the knowledge graph for use in teacher education so that school-teachers can educate K-12 students computer science topicses effectively. </p> Deep learning Neural networks phrase extraction
329	Song Popularity Prediction with Deep Learning : Investigating predictive power of low level audio features Holst, Gustaf, Niia, Jan January 2023 (has links) Today streaming services are the most popular way to consume music, and with this the field of Music Information Retrieval (MIR) has exploded. Tangy market is a music investment platform and they want to use MIR techniques to estimate the value of not yet released songs. In this thesis we collaborate with them investigating how a song’s financial success can be predicted using machine learning models. Previous research has shown that well-known algorithms used for tasks such as image recognition and machine translation, also can be used for audio analysis and prediction. We show that a lot of previous work has been done regarding different aspects of audio analysis and prediction, but that most of that work has been related to genre classification and hit song prediction. The popularity prediction of audio is still quite new and this is where we will contribute by researching if low-level audio features can be used to predict streams. We are using an existing dataset with more than 100 000 songs containing low-level features, which we extend with streaming information. We are using the features in two shapes, summarized and full, and the dataset only contains the summarized digital representation of features. We use Librosa to extend the dataset to also have the digital representation of the full version for the audio features. A previous study by Martín-Gutiérrez et al. [1] successfully used a combination of low-level and high level audio features as well as non musical features such as number of social media followers. The aim of this thesis is to explore five of the low-level features used in a previous study in [1] in order to assess the predictive power that these features have on their own. The five features we explore is; Chromagram, Mel Spectrogram, Tonnetz, Spectral Contrast, and MFCC. These features are selected for our research specifically because they were used in [1], and we want to investigate to what extent these low-level features contribute to the final predictions made by their model. Our conclusion is that neither of these features could be used for prediction with any accuracy, which indicates that other high-level and external features are of more importance. However, Chromagram and Mel Spectrogram in their full feature states show some potential but they will need to be researched more. machine learning deep learning audio Computer Sciences Datavetenskap (datalogi)
330	Accuracy Considerations in Deep Learning Using Memristive Crossbar Arrays Paudel, Bijay Raj 01 May 2023 (has links) (PDF) Deep neural networks (DNNs) are receiving immense attention because of their ability to solve complex problems. However, running a DNN requires a very large number of computations. Hence, dedicated hardware optimized for running deep learning algorithms known as neuromorphic architectures is often utilized. This dissertation focuses on evaluating andenhancing the accuracy of these neuromorphic architectures considering the designs of components, process variations, and adversarial attacks. The first contribution of the dissertation (Chapter 2) proposes design enhancements in analog Memristive Crossbar Array(MCA)-based neuromorphic architectures to improve classification accuracy. It introduces an analog Winner-Take-All (WTA) architecture and an on-chip training architecture. WTA ensures that the classification of the analog MCA is correct at the final selection level and the highest probability is selected. In particular, this dissertation presents a design of a highly scalable and precise current-mode WTA circuit with digital address generation. The design is based on current mirrors and comparators that use the cross-coupled latch structure. A post-silicon calibration circuit is also presented to handle process variations. On-chip training ensures that there is consistency in classification accuracy among different all analog MCA-based neuromorphic chips. Finally, an enhancement to the analog on-chip training architecture by implementing the Convolutional Neural Network (CNN) on MCA and software considerations to accelerate the training is presented.The second focus of the dissertation (Chapter 3) is on producing correct classification in the presence of malicious inputs known as adversarial attacks. This dissertation shows that MCA-based neuromorphic architectures ensure correct classification when the input is compromised using existing adversarial attack models. Furthermore, it shows that adversarialrobustness can be further improved by compression-based preprocessing steps that can be implemented on MCAs. It also evaluates the impact of the architecture in Chapter 2 under adversarial attacks. It shows that adversarial attacks do not uniformly affect the classification accuracy of different MCA-based chips. Experimental evidence using a variety of datasets and attack models supports the impact of MCA-based neuromorphic architectures and compression-based preprocessing implemented on MCAs to mitigate adversarial attacks. It is also experimentally shown that the on-chip training improves consistency in mitigating adversarial attacks among different chips. The final contribution (Chapter 4) of this dissertation introduces an enhancement of the method in Chapter 3. It consists of input preprocessing using compression and subsequent rescale and rearrange operations that are implemented using MCAs. This approach further improves the robustness against adversarial attacks. The rescale and rearrange operations are implemented using a DNN consisting of fully connected and convolutional layers. Experimental results show improved defense compared to similar input preprocessing techniques on MCAs. Deep Learning DNN Accelerator El Memristive Crossbar Arrays

Search results