311 |
On robustness and explainability of deep learningLe, Hieu 06 February 2024 (has links)
There has been tremendous progress in machine learning and specifically deep learning in the last few decades. However, due to some inherent nature of deep neural networks, many questions regarding explainability and robustness still remain open. More specifically, as deep learning models are shown to be brittle against malicious changes, when do the models fail and how can we construct a more robust model against these types of attacks are of high interest. This work tries to answer some of the questions regarding explainability and robustness of deep learning by tackling the problem at four different topics. First, real world datasets often contain noise which can badly impact classification model performance. Furthermore, adversarial noise can be crafted to alter classification results. Geometric multi-resolution analysis (GMRA) is capable of capturing and recovering manifolds while preserving geomtric features. We showed that GMRA can be applied to retrieve low dimension representation, which is more robust to noise and simplify classification models. Secondly, I showed that adversarial defense in the image domain can be partially achieved without knowing the specific attacking method by employing preprocessing model trained with the task of denoising. Next, I tackle the problem of adversarial generation in the text domain within the context of real world applications. I devised a new method of crafting adversarial text by using filtered unlabeled data, which is usually more abundant compared to labeled data. Experimental results showed that the new method created more natural and relevant adversarial texts compared with current state of the art methods. Lastly, I presented my work in referring expression generation aiming at creating a more explainable natural language model. The proposed method decomposes the referring expression generation task into two subtasks and experimental results showed that generated expressions are more comprehensive to human readers. I hope that all the approaches proposed here can help further our understanding of the explainability and robustness deep learning models.
|
312 |
Foundations of Radio Frequency Transfer LearningWong, Lauren Joy 06 February 2024 (has links)
The introduction of Machine Learning (ML) and Deep Learning (DL) techniques into modern radio communications system, a field known as Radio Frequency Machine Learning (RFML), has the potential to provide increased performance and flexibility when compared to traditional signal processing techniques and has broad utility in both the commercial and defense sectors. Existing RFML systems predominately utilize supervised learning solutions in which the training process is performed offline, before deployment, and the learned model remains fixed once deployed. The inflexibility of these systems means that, while they are appropriate for the conditions assumed during offline training, they show limited adaptability to changes in the propagation environment and transmitter/receiver hardware, leading to significant performance degradation. Given the fluidity of modern communication environments, this rigidness has limited the widespread adoption of RFML solutions to date.
Transfer Learning (TL) is a means to mitigate such performance degradations by re-using prior knowledge learned from a source domain and task to improve performance on a "similar" target domain and task. However, the benefits of TL have yet to be fully demonstrated and integrated into RFML systems. This dissertation begins by clearly defining the problem space of RF TL through a domain-specific TL taxonomy for RFML that provides common language and terminology with concrete and Radio Frequency (RF)-specific example use- cases. Then, the impacts of the RF domain, characterized by the hardware and channel environment(s), and task, characterized by the application(s) being addressed, on performance are studied, and methods and metrics for predicting and quantifying RF TL performance are examined. In total, this work provides the foundational knowledge to more reliably use TL approaches in RF contexts and opens directions for future work that will improve the robustness and increase the deployability of RFML. / Doctor of Philosophy / The field of Radio Frequency Machine Learning (RFML) introduces Machine Learning (ML) and Deep Learning (DL) techniques into modern radio communications systems, and is expected to be a core component of 6G technologies and beyond. While RFML provides a myriad of benefits over traditional radio communications systems, existing approaches are generally incapable of adapting to changes that will inevitably occur over time, which causes severe performance degradation. Transfer Learning (TL) offers a solution to the inflexibility of current RFML systems, through techniques for re-using and adapting existing models for new, but similar, problems. TL is an approach often used in image and language-based ML/DL systems, but has yet to be commonly used by RFML researchers. This dissertation aims to provide the foundational knowledge necessary to reliably use TL in RFML systems, from the definition and categorization of RF TL techniques to practical guidelines for when to use RF TL in real-world systems. The unique elements of RF TL not present in other modalities are exhaustively studied, and methods and metrics for measuring and predicting RF TL performance are examined.
|
313 |
Burns Depth Assessment Using Deep Learning FeaturesAbubakar, Aliyu, Ugail, Hassan, Smith, K.M., Bukar, Ali M., Elmahmudi, Ali 20 March 2022 (has links)
Yes / Burns depth evaluation is a lifesaving task and very challenging that requires objective techniques to accomplish. While the visual assessment is the most commonly used by surgeons, its accuracy reliability ranges between 60 and 80% and subjective that lacks any standard guideline. Currently, the only standard adjunct to clinical evaluation of burn depth is Laser Doppler Imaging (LDI) which measures microcirculation within the dermal tissue, providing the burns potential healing time which correspond to the depth of the injury achieving up to 100% accuracy. However, the use of LDI is limited due to many factors including high affordability and diagnostic costs, its accuracy is affected by movement which makes it difficult to assess paediatric patients, high level of human expertise is required to operate the device, and 100% accuracy possible after 72 h. These shortfalls necessitate the need for objective and affordable technique. Method: In this study, we leverage the use of deep transfer learning technique using two pretrained models ResNet50 and VGG16 for the extraction of image patterns (ResFeat50 and VggFeat16) from a a burn dataset of 2080 RGB images which composed of healthy skin, first degree, second degree and third-degree burns evenly distributed. We then use One-versus-One Support Vector Machines (SVM) for multi-class prediction and was trained using 10-folds cross validation to achieve optimum trade-off between bias and variance. Results: The proposed approach yields maximum prediction accuracy of 95.43% using ResFeat50 and 85.67% using VggFeat16. The average recall, precision and F1-score are 95.50%, 95.50%, 95.50% and 85.75%, 86.25%, 85.75% for both ResFeat50 and VggFeat16 respectively. Conclusion: The proposed pipeline achieved a state-of-the-art prediction accuracy and interestingly indicates that decision can be made in less than a minute whether the injury requires surgical intervention such as skin grafting or not.
|
314 |
Design Methods and Processes for ML/DL modelsJohn, Meenu Mary January 2021 (has links)
Context: With the advent of Machine Learning (ML) and especially Deep Learning (DL) technology, companies are increasingly using Artificial Intelligence (AI) in systems, along with electronics and software. Nevertheless, the end-to-end process of developing, deploying and evolving ML and DL models in companies brings some challenges related to the design and scaling of these models. For example, access to and availability of data is often challenging, and activities such as collecting, cleaning, preprocessing, and storing data, as well as training, deploying and monitoring the model(s) are complex. Regardless of the level of expertise and/or access to data scientists, companies in all embedded systems domain struggle to build high-performing models due to a lack of established and systematic design methods and processes. Objective: The overall objective is to establish systematic and structured design methods and processes for the end-to-end process of developing, deploying and successfully evolving ML/DL models. Method: To achieve the objective, we conducted our research in close collaboration with companies in the embedded systems domain using different empirical research methods such as case study, action research and literature review. Results and Conclusions: This research provides six main results: First, it identifies the activities that companies undertake in parallel to develop, deploy and evolve ML/DL models, and the challenges associated with them. Second, it presents a conceptual framework for the continuous delivery of ML/DL models to accelerate AI-driven business in companies. Third, it presents a framework based on current literature to accelerate the end-to-end deployment process and advance knowledge on how to integrate, deploy and operationalize ML/DL models. Fourth, it develops a generic framework with five architectural alternatives for deploying ML/DL models at the edge. These architectural alternatives range from a centralized architecture that prioritizes (re)training in the cloud to a decentralized architecture that prioritizes (re)training at the edge. Fifth, it identifies key factors to help companies decide which architecture to choose for deploying ML/DL models. Finally, it explores how MLOps, as a practice that brings together data scientist teams and operations, ensures the continuous delivery and evolution of models. / <p>Due to copyright reasons, the articles are not included in the fulltext online</p>
|
315 |
Automated Pre-Play Analysis of American Football Formations Using Deep LearningNewman, Jacob DeLoy 29 June 2022 (has links)
Annotation and analysis of sports videos is a time consuming task that, once automated, will provide benefits to coaches, players, and spectators. American football, as the most watched sport in the United States, could especially benefit from this automation. Manual annotation and analysis of recorded video of American football games is an inefficient and tedious process. Currently, most college football programs focus on annotating offensive formation. As a first step to further research for this unique application, we use computer vision and deep learning to analyze an overhead image of a football play immediately before the play begins. This analysis consists of locating and labeling individual football players, as well as identifying the formation of the offensive team. We obtain greater than 90% accuracy on both player detection and labeling, and 84.8% accuracy on formation identification. These results prove the feasibility of building a complete American football strategy analysis system using artificial intelligence.
|
316 |
Analysis and Applications of Deep Learning Features on Visual TasksShi, Kangdi January 2022 (has links)
Benefiting from hardware development, deep learning (DL) has become a popular research area in recent decades. Convolutional neural network (CNN) is a critical deep learning tool that has been utilized in many computer vision problems. Moreover, the data-driven approach has unleashed CNN's potential in acquiring impressive learning ability with minimum human supervision. Therefore, many computer vision problems are brought into the spotlight again. In this thesis, we investigate the application of deep-learning-based methods, particularly the role of deep learning features, in two representative visual tasks: image retrieval and image inpainting.
Image retrieval aims to find in a dataset images similar to a query image.
In the proposed image retrieval method, we use canonical correlation analysis to explore the relationship between matching and non-matching features from pre-trained CNN, and generate compact transformed features. The level of similarity between two images is determined by a hypothesis test regarding the joint distribution of transformed image feature pairs. The proposed approach is benchmarked against three popular statistical analysis methods, Linear Discriminant Analysis (LDA), Principal Component Analysis with whitening (PCAw), and Supervised Principal Component Analysis (SPCA). Our approach is shown to achieve competitive retrieval performances on Oxford5k, Paris6k, rOxford, and rParis datasets.
Moreover, an image inpainting framework is proposed to reconstruct the corrupted region in an image progressively. Specifically, we design a feature extraction network inspired by Gaussian and Laplacian pyramid, which is usually used to decompose the image into different frequency components. Furthermore, we use a two-branch iterative inpainting network to progressively recover the corrupted region on high and low-frequency features respectively and fuse both high and low-frequency features from each iteration. Moreover, an enhancement model is introduced to employ neighbouring iterations' features to further improve intermediate iterations' features. The proposed network is evaluated on popular image inpainting datasets such as Paris Streetview, Celeba, and Place2.
Extensive experiments prove the validity of the proposed method in this thesis, and demonstrate the competitive performance against the state-of-the-art. / Thesis / Doctor of Philosophy (PhD)
|
317 |
Predicting Transcription Factor Binding in Humans with Context-Specific Chromatin Accessibility Profiles Using Deep LearningCazares, Tareian January 2022 (has links)
No description available.
|
318 |
Contributions to Document Image Analysis: Application to Music Score ImagesCastellanos, Francisco J. 25 November 2022 (has links)
Esta tesis contribuye en el límite del conocimiento en algunos procesos relevantes dentro del flujo de trabajo típico asociado a los sistemas de reconocimiento óptico de música (OMR). El análisis de los documentos es una etapa clave y temprana dentro de dicho flujo, cuyo objetivo es proporcionar una versión simplificada de la información entrante; es decir, de las imágenes de documentos musicales. El resto de procesos involucrados en OMR pueden aprovechar esta simplificación para resolver sus correspondientes tareas de forma más sencilla y centrándose únicamente en la información que necesitan. Un ejemplo claro es el proceso dedicado a reconocer las áreas donde se sitúan los diferentes pentagramas. Tras obtener las coordenadas de los mismos, los pentagramas individuales pueden ser procesados para recuperar la secuencia simbólica musical que contienen y así construir una versión digital de su contenido. El trabajo de investigación que se ha realizado para completar la presente tesis se encuentra avalada por una serie de contribuciones publicadas en revistas de alto impacto y congresos internacionales. Concretamente, esta tesis contiene un conjunto de 4 artículos que se han publicado en revistas indexadas en el Journal Citation Reports y situadas en los primeros cuartiles en cuanto al factor de impacto, teniendo un total de 58 citas según Google Scholar. También se han incluido 3 comunicaciones realizadas en diferentes ediciones de un congreso internacional de Clase A según la clasificación proporcionada por GII-GRIN-SCIE. Se puede observar que las publicaciones tratan temas muy relacionados entre sí, enfocándose principalmente en el análisis de documentos orientado a OMR pero con pinceladas de transcripción de la secuencia musical y técnicas de adaptación al dominio. También hay publicaciones que demuestran que algunas de estas técnicas pueden ser aplicadas a otros tipos de imágenes de documentos, haciendo que las soluciones propuestas sean más interesantes por su capacidad de generalización y adaptación a otros contextos. Además del análisis de documentos, también se estudia cómo afectan estos procesos a la transcripción final de la notación musical, que a fin de cuentas, es el objetivo final de los sistemas OMR, pero que hasta el momento no se había investigado. Por último, debido a la incontable cantidad de información que requieren las redes neuronales para construir un modelo suficientemente robusto, también se estudia el uso de técnicas de adaptación al dominio, con la esperanza de que su éxito abra las puertas a la futura aplicabilidad de los sistemas OMR en entornos reales. Esto es especialmente interesante en el contexto de OMR debido a la gran cantidad de documentos sin datos de referencia que son necesarios para entrenar modelos de redes neuronales, por lo que una solución que aproveche las limitadas colecciones etiquetadas para procesar documentos de otra índole nos permitiría un uso más práctico de estas herramientas de transcripción automáticas. Tras la realización de esta tesis, se observa que la investigación en OMR no ha llegado al límite que la tecnología puede alcanzar y todavía hay varias vías por las que continuar explorando. De hecho, gracias al trabajo realizado, se han abierto incluso nuevos horizontes que se podrían estudiar para que algún día estos sistemas puedan ser utilizados para digitalizar y transcribir de forma automática la herencia musical escrita o impresa a gran escala y en un tiempo razonable. Entre estas nuevas líneas de investigación, podemos destacar las siguientes: · En esta tesis se han publicado contribuciones que utilizan una técnica de adaptación al dominio para realizar análisis de documentos con buenos resultados. La exploración de nuevas técnicas de adaptación al dominio podría ser clave para construir modelos de redes neuronales robustos y sin la necesidad de etiquetar manualmente una parte de todas las obras musicales que se pretenden digitalizar. · La aplicación de las técnicas de adaptación al dominio en otros procesos como en la transcripción de la secuencia musical podría facilitar el entrenamiento de modelos capaces de realizar esta tarea. Los algoritmos de aprendizaje supervisado requieren que personal cualificado se encargue de transcribir manualmente una parte de las colecciones, pero los costes temporal y económico asociados a este proceso suponen un amplio esfuerzo si el objetivo final es transcribir todo este patrimonio cultural. Por ello, sería interesante estudiar la aplicabilidad de estas técnicas con el fin de reducir drásticamente esta necesidad. · Durante la tesis, se ha estudiado cómo afecta el factor de escala de los documentos en el rendimiento de varios procesos de OMR. Además de la escala, otro factor importante que se debe tratar es la orientación, ya que las imágenes de los documentos no siempre estarán perfectamente alineadas y pueden sufrir algún tipo de rotación o deformación que provoque errores en la detección de la información. Por lo tanto, sería interesante estudiar cómo afectan estas deformaciones a la transcripción y encontrar soluciones viables para el contexto que aplica. · Como caso general y más básico, se ha estudiado cómo, con diferentes modelos de propósito general de detección de objetos, se podrían extraer los pentagramas para su posterior procesamiento. Estos elementos se han considerado rectangulares y sin rotación, pero hay que tener en cuenta que no siempre nos encontraremos con esta situación. Por lo tanto, otra posible vía de investigación sería estudiar otros tipos de modelos que permitan detectar elementos poligonales y no solo rectangulares, así como la posibilidad de detectar objetos con cierta inclinación sin introducir solapamiento entre elementos consecutivos como ocurre en algunas herramientas de etiquetado manual como la utilizada en esta tesis para la obtención de datos etiquetados para experimentación: MuRET. Estas líneas de investigación son, a priori, factibles pero es necesario realizar un proceso de exploración con el fin de detectar aquellas técnicas útiles para ser adaptadas al ámbito de OMR. Los resultados obtenidos durante la tesis señalan que es posible que estas líneas puedan aportar nuevas contribuciones en este campo, y por ende, avanzar un paso más a la aplicación práctica y real de estos sistemas a gran escala.
|
319 |
Morphing architectures for pose-based image generation of people in clothing / Morphing-arkitekturer för pose-baserad bildgeneration av människor i kläderBaldassarre, Federico January 2018 (has links)
This project investigates the task of conditional image generation from misaligned sources, with an example application in the context of content creation for the fashion industry. The problem of spatial misalignment between images is identified, the related literature is discussed, and different approaches are introduced to address it. In particular, several non-linear differentiable morphing modules are designed and integrated in current architectures for image-to-image translation. The proposed method for conditional image generation is applied on a clothes swapping task, using a real-world dataset of fashion images provided by Zalando. In comparison to previous methods for clothes swapping and virtual try-on, the result achieved with our method are of high visual quality and achieve precise reconstruction of the details of the garments. / Detta projekt undersöker villkorad bildgenerering från förskjutna bild-källor, med ett tillämpat exempel inom innehållsskapande för modebranschen. Problemet med rumslig förskjutning mellan bilder identifieras varpå relaterad litteratur diskuteras. Därefter introduceras olika tillvägagångssätt för att lösa problemet. Projektet fokuserar i synnerhet på ickelinjära, differentierbara morphing-moduler vilka designas och integreras i befintlig arkitektur för bild-till-bild-översättning. Den föreslagna metoden för villkorlig bildgenerering tillämpas på en uppgift för klädbyte, med hjälp av ett verklighetsbaserat dataset av modebilder från Zalando. I jämförelse med tidigare modeller för klädbyte och virtuell provning har resultaten från vår metod hög visuell kvalité och uppnår exakt återuppbyggnad av klädernas detaljer.
|
320 |
Bi-directional Sampling in Partial Fourier ReconstructionMa, Zizhong 28 October 2022 (has links)
No description available.
|
Page generated in 0.0524 seconds