Global ETD Search

471	Learning Sampling-Based 6D Object Pose Estimation Krull, Alexander 31 August 2018 (has links) The task of 6D object pose estimation, i.e. of estimating an object position (three degrees of freedom) and orientation (three degrees of freedom) from images is an essential building block of many modern applications, such as robotic grasping, autonomous driving, or augmented reality. Automatic pose estimation systems have to overcome a variety of visual ambiguities, including texture-less objects, clutter, and occlusion. Since many applications demand real time performance the efficient use of computational resources is an additional challenge. In this thesis, we will take a probabilistic stance on trying to overcome said issues. We build on a highly successful automatic pose estimation framework based on predicting pixel-wise correspondences between the camera coordinate system and the local coordinate system of the object. These dense correspondences are used to generate a pool of hypotheses, which in turn serve as a starting point in a final search procedure. We will present three systems that each use probabilistic modeling and sampling to improve upon different aspects of the framework. The goal of the first system, System I, is to enable pose tracking, i.e. estimating the pose of an object in a sequence of frames instead of a single image. By including information from previous frames tracking systems can resolve many visual ambiguities and reduce computation time. System I is a particle filter (PF) approach. The PF represents its belief about the pose in each frame by propagating a set of samples through time. Our system uses the process of hypothesis generation from the original framework as part of a proposal distribution that efficiently concentrates samples in the appropriate areas. In System II, we focus on the problem of evaluating the quality of pose hypotheses. This task plays an essential role in the final search procedure of the original framework. We use a convolutional neural network (CNN) to assess the quality of an hypothesis by comparing rendered and observed images. To train the CNN we view it as part of an energy-based probability distribution in pose space. This probabilistic perspective allows us to train the system under the maximum likelihood paradigm. We use a sampling approach to approximate the required gradients. The resulting system for pose estimation yields superior results in particular for highly occluded objects. In System III, we take the idea of machine learning a step further. Instead of learning to predict an hypothesis quality measure, to be used in a search procedure, we present a way of learning the search procedure itself. We train a reinforcement learning (RL) agent, termed PoseAgent, to steer the search process and make optimal use of a given computational budget. PoseAgent dynamically decides which hypothesis should be refined next, and which one should ultimately be output as final estimate. Since the search procedure includes discrete non-differentiable choices, training of the system via gradient descent is not easily possible. To solve the problem, we model behavior of PoseAgent as non-deterministic stochastic policy, which is ultimately governed by a CNN. This allows us to use a sampling-based stochastic policy gradient training procedure. We believe that some of the ideas developed in this thesis, such as the sampling-driven probabilistically motivated training of a CNN for the comparison of images or the search procedure implemented by PoseAgent have the potential to be applied in fields beyond pose estimation as well. info:eu-repo/classification/ddc/004 ddc:004
472	Deep Learning with Vision-based Technologies for Structural Damage Detection and Health Monitoring Bai, Yongsheng 08 December 2022 (has links) No description available. Civil Engineering Computer Science Mechanics deep learning structural damage classification structural damage detection crack detection spalling detection ResNet U-Net cascaded networks Mask R-CNN structural health monitoring shaking table tests Lucas-Kanade tracker displacement subtraction frequency subtraction progressive collapse LiDAR camera drones.
473	Dynamic Network Modeling from Temporal Motifs and Attributed Node Activity Giselle Zeno (16675878) 26 July 2023 (has links) <p>The most important networks from different domains—such as Computing, Organization, Economic, Social, Academic, and Biology—are networks that change over time. For example, in an organization there are email and collaboration networks (e.g., different people or teams working on a document). Apart from the connectivity of the networks changing over time, they can contain attributes such as the topic of an email or message, contents of a document, or the interests of a person in an academic citation or a social network. Analyzing these dynamic networks can be critical in decision-making processes. For instance, in an organization, getting insight into how people from different teams collaborate, provides important information that can be used to optimize workflows.</p> <p><br></p> <p>Network generative models provide a way to study and analyze networks. For example, benchmarking model performance and generalization in tasks like node classification, can be done by evaluating models on synthetic networks generated with varying structure and attribute correlation. In this work, we begin by presenting our systemic study of the impact that graph structure and attribute auto-correlation on the task of node classification using collective inference. This is the first time such an extensive study has been done. We take advantage of a recently developed method that samples attributed networks—although static—with varying network structure jointly with correlated attributes. We find that the graph connectivity that contributes to the network auto-correlation (i.e., the local relationships of nodes) and density have the highest impact on the performance of collective inference methods.</p> <p><br></p> <p>Most of the literature to date has focused on static representations of networks, partially due to the difficulty of finding readily-available datasets of dynamic networks. Dynamic network generative models can bridge this gap by generating synthetic graphs similar to observed real-world networks. Given that motifs have been established as building blocks for the structure of real-world networks, modeling them can help to generate the graph structure seen and capture correlations in node connections and activity. Therefore, we continue with a study of motif evolution in <em>dynamic</em> temporal graphs. Our key insight is that motifs rarely change configurations in fast-changing dynamic networks (e.g. wedges intotriangles, and vice-versa), but rather keep reappearing at different times while keeping the same configuration. This finding motivates the generative process of our proposed models, using temporal motifs as building blocks, that generates dynamic graphs with links that appear and disappear over time.</p> <p><br></p> <p>Our first proposed model generates dynamic networks based on motif-activity and the roles that nodes play in a motif. For example, a wedge is sampled based on the likelihood of one node having the role of hub with the two other nodes being the spokes. Our model learns all parameters from observed data, with the goal of producing synthetic graphs with similar graph structure and node behavior. We find that using motifs and node roles helps our model generate the more complex structures and the temporal node behavior seen in real-world dynamic networks.</p> <p><br></p> <p>After observing that using motif node-roles helps to capture the changing local structure and behavior of nodes, we extend our work to also consider the attributes generated by nodes’ activities. We propose a second generative model for attributed dynamic networks that (i) captures network structure dynamics through temporal motifs, and (ii) extends the structural roles of nodes in motifs to roles that generate content embeddings. Our new proposed model is the first to generate synthetic dynamic networks and sample content embeddings based on motif node roles. To the best of our knowledge, it is the only attributed dynamic network model that can generate <em>new</em> content embeddings—not observed in the input graph, but still similar to that of the input graph. Our results show that modeling the network attributes with higher-order structures (e.g., motifs) improves the quality of the networks generated.</p> <p><br></p> <p>The generative models proposed address the difficulty of finding readily-available datasets of dynamic networks—attributed or not. This work will also allow others to: (i) generate networks that they can share without divulging individual’s private data, (ii) benchmark model performance, and (iii) explore model generalization on a broader range of conditions, among other uses. Finally, the evaluation measures proposed will elucidate models, allowing fellow researchers to push forward in these domains.</p> Modelling and simulation Data mining and knowledge discovery Graph, social and multimedia data Neural networks Graph Machine Learning network evolution model temporal graph model Dynamic Networks, Attributed Graphs Social network analysis tools convolutional neural network (CNN) graph convolutional network (GCN) node embeddings language model bert Collective classification Collective inference Node classification model evaluation techniques synthetic networks BERT models pre-trained language models
474	Towards a Nuanced Evaluation of Voice Activity Detection Systems : An Examination of Metrics, Sampling Rates and Noise with Deep Learning / Mot en nyanserad utvärdering av system för detektering av talaktivitet Joborn, Ludvig, Beming, Mattias January 2022 (has links) Recently, Deep Learning has revolutionized many fields, where one such area is Voice Activity Detection (VAD). This is of great interest to sectors of society concerned with detecting speech in sound signals. One such sector is the police, where criminal investigations regularly involve analysis of audio material. Convolutional Neural Networks (CNN) have recently become the state-of-the-art method of detecting speech in audio. But so far, understanding the impact of noise and sampling rates on such methods remains incomplete. Additionally, there are evaluation metrics from neighboring fields that remain unintegrated into VAD. We trained on four different sampling rates and found that changing the sampling rate could have dramatic effects on the results. As such, we recommend explicitly evaluating CNN-based VAD systems on pertinent sampling rates. Further, with increasing amounts of white Gaussian noise, we observed better performance by increasing the capacity of our Gated Recurrent Unit (GRU). Finally, we discuss how careful consideration is necessary when choosing a main evaluation metric, leading us to recommend Polyphonic Sound Detection Score (PSDS). voice activity detection VAD deep learning machine learning ML artificial intelligence AI convolutional neural network CNN deep neural network DNN sound event detection SED mel spectrogram audio processing polyphonic sound detection score PSDS signal processing signal to noise ratio SNR RCRNN sampling rate Gaussian noise Computer Sciences Datavetenskap (datalogi)
475	Segmentation and Depth Estimation of Urban Road Using Monocular Camera and Convolutional Neural Networks / Segmentering och djupskatting av stadsväg med monokulär kamera Djikic, Addi January 2018 (has links) Deep learning for safe autonomous transport is rapidly emerging. Fast and robust perception for autonomous vehicles will be crucial for future navigation in urban areas with high traffic and human interplay. Previous work focuses on extracting full image depth maps, or finding specific road features such as lanes. However, in urban environments lanes are not always present, and sensors such as LiDAR with 3D point clouds provide a quite sparse depth perception of road with demanding algorithmic approaches. In this thesis we derive a novel convolutional neural network that we call AutoNet. It is designed as an encoder-decoder network for pixel-wise depth estimation of an urban drivable free-space road, using only a monocular camera, and handled as a supervised regression problem. AutoNet is also constructed as a classification network to solely classify and segment the drivable free-space in real- time with monocular vision, handled as a supervised classification problem, which shows to be a simpler and more robust solution than the regression approach. We also implement the state of the art neural network ENet for comparison, which is designed for fast real-time semantic segmentation and fast inference speed. The evaluation shows that AutoNet outperforms ENet for every performance metrics, but shows to be slower in terms of frame rate. However, optimization techniques are proposed for future work, on how to advance the frame rate of the network while still maintaining the robustness and performance. All the training and evaluation is done on the Cityscapes dataset. New ground truth labels for road depth perception are created for training with a novel approach of fusing pre-computed depth maps with semantic labels. Data collection with a Scania vehicle is conducted, mounted with a monocular camera to test the final derived models. The proposed AutoNet shows promising state of the art performance in regards to road depth estimation as well as road classification. / Deep learning för säkra autonoma transportsystem framträder mer och mer inom forskning och utveckling. Snabb och robust uppfattning om miljön för autonoma fordon kommer att vara avgörande för framtida navigering inom stadsområden med stor trafiksampel. I denna avhandling härleder vi en ny form av ett neuralt nätverk som vi kallar AutoNet. Där nätverket är designat som en autoencoder för pixelvis djupskattning av den fria körbara vägytan för stadsområden, där nätverket endast använder sig av en monokulär kamera och dess bilder. Det föreslagna nätverket för djupskattning hanteras som ett regressions problem. AutoNet är även konstruerad som ett klassificeringsnätverk som endast ska klassificera och segmentera den körbara vägytan i realtid med monokulärt seende. Där detta är hanterat som ett övervakande klassificerings problem, som även visar sig vara en mer simpel och mer robust lösning för att hitta vägyta i stadsområden. Vi implementerar även ett av de främsta neurala nätverken ENet för jämförelse. ENet är utformat för snabb semantisk segmentering i realtid, med hög prediktions- hastighet. Evalueringen av nätverken visar att AutoNet utklassar ENet i varje prestandamätning för noggrannhet, men visar sig vara långsammare med avseende på antal bilder per sekund. Olika optimeringslösningar föreslås för framtida arbete, för hur man ökar nätverk-modelens bildhastighet samtidigt som man behåller robustheten.All träning och utvärdering görs på Cityscapes dataset. Ny data för träning samt evaluering för djupskattningen för väg skapas med ett nytt tillvägagångssätt, genom att kombinera förberäknade djupkartor med semantiska etiketter för väg. Datainsamling med ett Scania-fordon utförs även, monterad med en monoculär kamera för att testa den slutgiltiga härleda modellen. Det föreslagna nätverket AutoNet visar sig vara en lovande topp-presterande modell i fråga om djupuppskattning för väg samt vägklassificering för stadsområden. AI ANN CNN semantic segmentation autonomous Scania driving road pixel classification regression real time monocular depth estimation convolutional neural networks deep learning perception camera vehicles supervised tensorflow Cityscapes machine learning autoencoder decoder encoder
476	Reconstruction of Accelerated Cardiovascular MRI data Khalid, Hussnain January 2023 (has links) Magnetic resonance imaging (MRI), is a noninvasive medical imaging testing techniquewhich is used to produce detailed images of internal structure of the human body, includingbones, muscles, organs, and blood vessels. MRI scanners use large magnets and radiowaves to create images of the body. Cardiac MRI scan helps doctors to detect and monitorcardiac diseases like blood clots, artery blockages, and scar tissue etc. Cardiovasculardisease is a type of disease that affects the heart or the blood vessels.This thesis aims to explore the reconstruction of accelerated cardiovascular MRI datato reconstruct under-sampled MRI data acquired after applying accelerated techniques.The focus of this research is to study and implement deep learning techniques to overcomethe aliasing artifacts caused by accelerated imaging. The results of this study will becompared with fully sampled data acquired with traditional existing techniques such asParallel Imaging (PI) and Compressed Sensing (CS).The primary findings of this study show that the proposed deep learning network caneffectively reconstruct under-sampled cardiovascular MRI data acquired using acceleratedimaging techniques. Many experiments were performed to handle 4D Flow data with limitedmemory for training the network. The network’s performance was found to be comparableto the fully sampled data acquired using traditional imaging techniques such asPI and CS. It is also important to note that this study also aimed to investigate the generalizabilityof the proposed deep learning network, specifically FlowVN, when appliedto different datasets. To explore this aspect, two different models were employed: a pretrainedmodel using previous research data and configurations, and a model trained fromscratch using CMIV data with experiments performed to address limited memory issuesassociated with 4D Flow data. medical imaging deep learning CNN Magnetic resonance imaging MRI Cardiac MRI Cardiac Cardiovascular reconstruction 4D flow MRI Parallel Imaging Compressed Sensing FlowVN Flow Variational Network K-space Reference images sensitivity maps Respiratory motion undersampled images Radiologi och bildbehandling
477	Unsupervised Detection of Interictal Epileptiform Discharges in Routine Scalp EEG : Machine Learning Assisted Epilepsy Diagnosis Shao, Shuai January 2023 (has links) Epilepsy affects more than 50 million people and is one of the most prevalent neurological disorders and has a high impact on the quality of life of those suffering from it. However, 70% of epilepsy patients can live seizure free with proper diagnosis and treatment. Patients are evaluated using scalp EEG recordings which is cheap and non-invasive. Diagnostic yield is however low and qualified personnel need to process large amounts of data in order to accurately assess patients. MindReader is an unsupervised classifier which detects spectral anomalies and generates a hypothesis of the underlying patient state over time. The aim is to highlight abnormal, potentially epileptiform states, which could expedite analysis of patients and let qualified personnel attest the results. It was used to evaluate 95 scalp EEG recordings from healthy adults and adult patients with epilepsy. Interictal Epileptiform discharges (IED) occurring in the samples had been retroactively annotated, along with the patient state and maneuvers performed by personnel, to enable characterization of the classifier’s detection performance. The performance was slightly worse than previous benchmarks on pediatric scalp EEG recordings, with a 7% and 33% drop in specificity and sensitivity, respectively. Electrode positioning and partial spatial extent of events saw notable impact on performance. However, no correlation between annotated disturbances and reduction in performance could be found. Additional explorative analysis was performed on serialized intermediate data to evaluate the analysis design. Hyperparameters and electrode montage options were exposed to optimize for the average Mathew’s correlation coefficient (MCC) per electrode per patient, on a subset of the patients with epilepsy. An increased window length and lowered amount of training along with an common average montage proved most successful. The Euclidean distance of cumulative spectra (ECS), a metric suitable for spectral analysis, and homologous L2 and L1 loss function were implemented, of which the ECS further improved the average performance for all samples. Four additional analyses, featuring new time-frequency transforms and multichannel convolutional autoencoders were evaluated and an analysis using the continuous wavelet transform (CWT) and a convolutional autoencoder (CNN) performed the best, with an average MCC score of 0.19 and 56.9% sensitivity with approximately 13.9 false positives per minute. EEG electroencephalography IED interictal epileptiform discharges spike detection epilepsy unsupervised Fourier transform STFT short-time Fourier transform CWT continuous wavelet transform DWT discrete wavelet transform ML machine learning ANN artificial neural network CNN convolutional neural network autoencoder HMM hidden Markov model ECS Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Neurology Neurologi
478	Through the Blur with Deep Learning : A Comparative Study Assessing Robustness in Visual Odometry Techniques Berglund, Alexander January 2023 (has links) In this thesis, the robustness of deep learning techniques in the field of visual odometry is investigated, with a specific focus on the impact of motion blur. A comparative study is conducted, evaluating the performance of state-of-the-art deep convolutional neural network methods, namely DF-VO and DytanVO, against ORB-SLAM3, a well-established non-deep-learning technique for visual simultaneous localization and mapping. The objective is to quantitatively assess the performance of these models as a function of motion blur. The evaluation is carried out on a custom synthetic dataset, which simulates a camera navigating through a forest environment. The dataset includes trajectories with varying degrees of motion blur, caused by camera translation, and optionally, pitch and yaw rotational noise. The results demonstrate that deep learning-based methods maintained robust performance despite the challenging conditions presented in the test data, while excessive blur lead to tracking failures in the geometric model. This suggests that the ability of deep neural network architectures to automatically learn hierarchical feature representations and capture complex, abstract features may enhance the robustness of deep learning-based visual odometry techniques in challenging conditions, compared to their geometric counterparts. artificial intelligence AI machine learning ML deep learning DL computer vision neural networks NN convolutional neural networks CNN visual odometry VO robustness motion blur AirForestry localization navigation ego-motion pose estimation SLAM DF-VO DytanVO ORB-SLAM3 artificiell intelligens maskininlärning datorseende Computer Sciences Datavetenskap (datalogi)
479	From Pixels to Predators: Wildlife Monitoring with Machine Learning / Från Pixlar till Rovdjur: Viltövervakning med Maskininlärning Eriksson, Max January 2024 (has links) This master’s thesis investigates the application of advanced machine learning models for the identification and classification of Swedish predators using camera trap images. With the growing threats to biodiversity, there is an urgent need for innovative and non-intrusive monitoring techniques. This study focuses on the development and evaluation of object detection models, including YOLOv5, YOLOv8, YOLOv9, and Faster R-CNN, aiming to enhance the surveillance capabilities of Swedish predatory species such as bears, wolves, lynxes, foxes, and wolverines. The research leverages a dataset from the NINA database, applying data preprocessing and augmentation techniques to ensure robust model training. The models were trained and evaluated using various dataset sizes and conditions, including day and night images. Notably, YOLOv8 and YOLOv9 underwent extended training for 300 epochs, leading to significant improvements in performance metrics. The performance of the models was evaluated using metrics such as mean Average Precision (mAP), precision, recall, and F1-score. YOLOv9, with its innovative Programmable Gradient Information (PGI) and GELAN architecture, demonstrated superior accuracy and reliability, achieving an F1-score of 0.98 on the expanded dataset. The research found that training models on images captured during both day and night jointly versus separately resulted in only minor differences in performance. However, models trained exclusively on daytime images showed slightly better performance due to more consistent and favorable lighting conditions. The study also revealed a positive correlation between the size of the training dataset and model performance, with larger datasets yielding better results across all metrics. However, the marginal gains decreased as the dataset size increased, suggesting diminishing returns. Among the species studied, foxes were the least challenging for the models to detect and identify, while wolves presented more significant challenges, likely due to their complex fur patterns and coloration blending with the background. Machine Learning Project Ngulia YOLO YOLOv9 YOLOv8 YOLOv5 Faster R-CNN Wildlife Monitoring Deep Learning Camera Traps Object Detection Image Processing Animal Detection Neural Networks Transfer Lernning Maskininlärning Viltövervakning Djupinlärning Kamerafällor Objektdetektion Neurala Nätverk Media and Communication Technology Medieteknik
480	Investigación y desarrollo de metodología avanzada de segmentación de la médula espinal cervical a partir de imágenes RM para la ayuda al diagnóstico y seguimiento de pacientes de esclerosis múltiple Bueno Gómez, América 01 July 2024 (has links) [ES] La Esclerosis Múltiple (EM) es una enfermedad inflamatoria y autoinmune del sistema nervioso central (SNC) con rasgos de desmielinización y degeneración axonal en el tiempo, y caracterizada por ser muy heterogénea en los síntomas y en el curso de la enfermedad. La Imagen de Resonancia Magnética (RM) es una de las herramientas clínicas más sensibles para la evaluación de los procesos inflamatorios y neurodegenerativos. En los últimos años, la evaluación de la médula espinal ha tenido un creciente interés clínico para mejorar el diagnóstico y el fenotipado de la enfermedad, aunque, a diferencia del cerebro, en médula espinal cervical no existen algoritmos de inteligencia artificial (IA) desarrollados y certificados para práctica clínica. Es por ello, que nuestro objetivo se centra en investigar y desarrollar un método automático de segmentación de médula cervical en RM, facilitando así una evaluación automática y mejorada de la atrofia de la médula espinal, pues esta puede proporcionar información valiosa sobre la progresión de la enfermedad y sus consecuencias clínicas. El algoritmo se desarrolló mediante datos del mundo real (real-world data) recogidos de manera retrospectiva en 121 pacientes de EM. Se utilizaron 96 de ellos para el entrenamiento del modelo, 25 para test y 13 para la validación del modelo. Durante la tesis se trabajaron secuencias de RM adquiridas en un equipo de 3T (SignaHD, GEHC), de tipo 3D axiales potenciadas en T1, dada su mejor resolución y contraste para identificar pequeñas estructuras anatómicas como la médula espinal. El etiquetado manual de los datos fue realizado bajo el consejo y supervisión de dos radiólogos experimentados, obteniendo finalmente el ground-truth. Varias fueron las arquitecturas, hiperparámetros y formas de preprocesado aplicados al dataset en busca de la solución óptima. Dada su conocida importancia en la segmentación de imagen médica, la arquitectura U-Net fue el punto de partida. Tras la ausencia de buenos resultados y una mayor investigación en el campo, se dio con la problemática del desbalanceo de datos. Finalmente, para obtener la segmentación deseada, se implementó y entrenó una red neuronal convolucional 2D compuesta por un mecanismo de atención residual y conexiones basadas en la arquitectura U-Net. El mecanismo de atención permitió que el modelo se centrara en aquellas localizaciones de la imagen que son importantes para la tarea de clasificación de los vóxeles correspondientes a la médula cervical, a la vez que retenía la información del resto de estructuras anatómicas, mientras que los bloques residuales nos permitieron solventar problemas de desvanecimiento de gradiente comunes en redes neuronales profundas. El entrenamiento se diseñó con una función de pérdidas local, basada en el índice de Tversky con el fin de controlar el problema de desbalanceo de datos de imagen médica, y un buscador automático de tasa de aprendizaje óptima que nos permitió mejorar la convergencia y rendimiento del modelo. Finalmente, nuestro método proporcionó una segmentación con una elevada tasa de acierto, obteniendo un valor de 0.95 como MCC en la métrica de entrenamiento y consiguiendo en validación un coeficiente DICE de 0.904±0.101 tomando como referencia la segmentación manual. Además de obtener una herramienta para la segmentación automática de la médula, también creamos un módulo para el cálculo de sus dimensiones, actuando como biomarcador de imagen, lo que será útil y eficaz para la valoración de la atrofia. De esta forma, los clínicos pueden evaluar el grado de daño neurológico y seguir su evolución a lo largo del tiempo. Como biomarcadores de imagen, calculamos las dimensiones de las médulas de nuestros pacientes en forma de volumen (mm3) y sección media (mm2) y estudiamos la relación entre sección media de la médula espinal cervical con la distribución de las distintas formas clínicas y los niveles en Escala de Discapacidad Extendida de Kurtzke (EDSS) de los pacientes. / [CA] L'Esclerosi Múltiple (EM), és una malaltia inflamatòria i autoimmune del sistema nerviós central (SNC) amb trets de desmielinització i degeneració axonal en el el temps. Es caracteritza per ser molt heterogènia amb els símptomes i curs de la malaltia. La Imatge de Ressonància Magnètica (RM) és una de les eines més sensibles per a l'avaluació dels processos inflamatoris i neurodegeneratius. Als darrers anys, l'evolució de la medul·la espinal ha tingut un creixent interés clínic per tal de millorar el diagnòstic i el fenotipatge de la malaltia, encara que, a diferència del cervell, en medul·la espinal cervical no existeixen algoritmes d'intel·ligència artificial (IA) desenvolupats i certificats. Aquest fet motiva el present estudi, que se centra en la recerca i desenvolupament d'un mètode automàtic de segmentació de medul·la cervical en RM. L'automatització i millora del procés d'avaluació de l'atròfia de la medul·la espinal podrà proporcionar valuosa informació sobre la progressió de la malaltia i les seves conseqüències clíniques. L'algoritme proposat al present treball va ser desenvolupat mitjançant dades del món real (real-world data) recollides de manera retrospectiva en 121 pacients d'EM. D'aquestes mostres, 96 foren utilitzades per a l'entrenament del model d'IA, 13 per a la validació durant l'entrenament i les 25 restants com a conjunt d'avaluació. Les seqüències d'imatges de RM fetes servir foren adquirides amb un equip 3T de tipus 3D axials potenciats en T1, donada la seua millor resolució i contrast alhora identificar petites estructures anatòmiques com la medul·la espinal. L'etiquetatge de les dades fou realitzat sota la supervisió i consell de dos experimentats radiòlegs. El resultat final fou un conjunt d'imatges RM de referència (ground truth dataset) amb les corresponents màscares de segmentació de la medul·la espinal cervical definides pels radiòlegs. Diverses van ser les arquitectures, hiperparàmetres i tècniques de preprocessat aplicades al conjunt de dades en cerca de la solució òptima. Donada la seua coneguda importància en la segmentació d'imatge mèdica, l'arquitectura U-Net fou el punt de partida. Un altre punt d'inflexió fou resoldre la problemàtica de la desproporció de representativitat al conjunt de dades utilitzat (dataset imbalancement). Finalment, per obtindre la segmentació desitjada, es va implementar i entrenar una xarxa neuronal convolucional 2D composta per un mecanisme d'atenció residual i connexions basades en l'arquitectura U-Net. El mecanisme d'atenció va permetre que el model se centrara en aquelles localitzacions de la imatge més importants per a la tasca de classificació dels corresponents vòxels a la medul·la cervical, a la volta que retenia la informació de la resta d'estructures anatòmiques. Alhora, els blocs residuals, van permetre resoldre els problemes d'esvaïment de gradient, comuns a l'entrenament de xarxes neuronals profundes. L'entrenament es va dissenyar amb una funció de cost local, basada en l'índex Tversky, amb el fi de controlar la problemàtica del dataset imbalancement i, un buscador automàtic de la taxa d'aprenentatge òptima que permetia una millor convergència i rendiment del model. Els resultats proporcionats pel nostre mètode de segmentació automàtica, presentaren una elevada taxa d'encert, obtinguen un valor de 0.95 com coeficient de correlació de Matthew en la mètrica d'entrenament i aconseguint en validació un coeficient DICE de 0.904±0.101 prenent com a referència la segmentació manual. A més de l'eina de segmentació automàtica, també hem desenvolupat un mòdul per al càlcul de les seues dimensions, el que serà útil per a una eficaç valoració de l'atròfia. Com biomarcadors d'imatge, calcularem les dimensions de les medul·les dels nostres pacients en forma de volum (mm³) i secció mitjana (mm²) i estudiarem la relació entre secció mitjana de la medul·la espinal cervical amb la distribució de les distintes formes clíniques i l'escala de discapacitat estesa de Kurtzke / [EN] Multiple Sclerosis (MS) is an inflammatory and autoimmune disease of the central nervous system (CNS) with features of demyelination and axonal degeneration over time, and characterised by being very heterogeneous in symptoms, disease course and outcome. Magnetic Resonance Imaging (MRI) is one of the most sensitive clinical tools for the evaluation of inflammatory and neurodegenerative processes. In recent years, the evaluation of the spinal cord has been of increasing clinical interest to improve the diagnosis and phenotyping of the disease, although, unlike the brain, in the cervical spinal cord there are no artificial intelligence (AI) algorithms developed and certified for clinical practice. Therefore, our aim is to investigate and develop an automatic method of cervical cord segmentation in MRI, thus facilitating an automatic and improved assessment of spinal cord atrophy, which can provide valuable information on the progression of the disease and its clinical consequences. The algorithm was developed using real-world data collected retrospectively from 121 MS patients. Of these, 96 were used for model training, 25 for testing and 13 for validation of the proposed model. During the thesis, 3D axial T1-weighted MRI sequences acquired in 3T equipment (SignaHD, GEHC) were used, given their better resolution and contrast to identify small anatomical structures such as the spinal cord. Manual labelling of the data was performed under the advice and supervision of two experienced radiologists, between whom possible discrepancies were resolved with a third radiologist, resulting in a set of cervical spinal cord masks as ground-truth. Several architectures, hyperparameters and forms of pre-processing were applied to the dataset in search of the optimal solution. Given its known importance in medical image segmentation, the U-Net architecture was the starting point. After the absence of good results and further research in the field, the problem of data imbalance was identified. Finally, to obtain the desired segmentation, a 2D convolutional neural network (CNN) composed of a residual attention mechanism and connections based on the U-Net architecture was implemented and trained. The attention mechanism allowed the model to focus on those image locations that are important for the classification task of the voxels corresponding to the cervical cord, while retaining the information of the rest of the anatomical structures. Residual blocks allowed us to solve common gradient fading problems in deep neural networks. Training was designed with a local loss function, based on the Tversky index in order to control the medical image data imbalance problem, and an automatic optimal learning rate finder that allowed us to improve the convergence and performance of the model. Finally, our method provided a segmentation with a high success rate, obtaining a value of 0.95 as MCC in the training metric and obtaining in validation a DICE coefficient of 0.904±0.101 taking manual segmentation as a reference. In addition to obtaining a tool for the automatic segmentation of the spinal cord, we also created a module for the calculation of its dimensions, which will be useful and effective for the assessment of atrophy. Atrophy is a direct indicator of neuronal damage and tissue loss in both the brain and spinal cord, and is a key risk factor for disability in MS. By accurately calculating atrophy, clinicians can assess the degree of neurological damage and follow its evolution over time. In our study, we calculated the dimensions of our patients' cords, as possible imaging biomarkers, in terms of volume (mm3) and mean section (mm2), and studied the relationship between the mean section of the cervical spinal cord with the distribution of the different clinical forms and the Kurtzke Expanded Disability Status Scale (EDSS) levels in our study group. / Bueno Gómez, A. (2024). Investigación y desarrollo de metodología avanzada de segmentación de la médula espinal cervical a partir de imágenes RM para la ayuda al diagnóstico y seguimiento de pacientes de esclerosis múltiple [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/205742 Multiple sclerosis Esclerosis múltiple Segmentation Segmentación MRI IRM Deep Learning Aprendizaje profundo Residual attention-aware Biomarcadores de imagen Image biomarkers Inteligencia Artificial Artificial Intelligence Convolutional Neural Network (CNN) TEORÍA DE LA SEÑAL Y COMUNICACIONES

Search results