Global ETD Search

71	On Non-Convex Splitting Methods For Markovian Information Theoretic Representation Learning Teng Hui Huang (12463926) 27 April 2022 (has links) <p>In this work, we study a class of Markovian information theoretic optimization problems motivated by the recent interests in incorporating mutual information as performance metrics which gives evident success in representation learning, feature extraction and clustering problems. In particular, we focus on the information bottleneck (IB) and privacy funnel (PF) methods and their recent multi-view, multi-source generalizations that gain attention because the performance significantly improved with multi-view, multi-source data. Nonetheless, the generalized problems challenge existing IB and PF solves in terms of the complexity and their abilities to tackle large-scale data. </p> <p>To address this, we study both the IB and PF under a unified framework and propose solving it through splitting methods, including renowned algorithms such as alternating directional method of multiplier (ADMM), Peaceman-Rachford splitting (PRS) and Douglas-Rachford splitting (DRS) as special cases. Our convergence analysis and the locally linear rate of convergence results give rise to new splitting method based IB and PF solvers that can be easily generalized to multi-view IB, multi-source PF. We implement the proposed methods with gradient descent and empirically evaluate the new solvers in both synthetic and real-world datasets. Our numerical results demonstrate improved performance over the state-of-the-art approach with significant reduction in complexity. Furthermore, we consider the practical scenario where there is distribution mismatch between training and testing data generating processes under a known bounded divergence constraint. In analyzing the generalization error, we develop new techniques inspired by the input-output mutual information approach and tighten the existing generalization error bounds.</p> Optimisation Coding and Information Theory Information Engineering and Theory Rate distortion optimization non-convex optimization information bottleneck theory multi-view data representation learning generalization errors splitting methods ADMM algorithm
72	SfM-3DULC: Desarrollo y validación de un procedimiento fotogramétrico para el escaneo, medición, clasificación tisular y seguimiento clínico de úlceras cutáneas Sánchez Jiménez, David 21 March 2022 (has links) [ES] La Fotogrametría es una ciencia y tecnología que tiene utilidad médica creciente. Una aplicación médica destacable de la Fotogrametría es la medición de las úlceras de la piel. Las úlceras de la piel constituyen un problema médico y social importante: por su elevado coste económico, afectación de la salud y calidad de vida, frecuente cronicidad y complicaciones. La medición de la úlcera es necesaria y útil para el seguimiento clínico. La disminución de variables de tamaño de la úlcera indica su progresión hacia la cicatrización. Los procedimientos tradicionales de medición unidimensional y bidimensional, como la regla graduada y la planimetría con acetato, se siguen utilizando por su sencillez y comodidad de uso. Sin embargo, son invasivos y tienen inconvenientes técnicos, como inexactitud e imprecisión. Otros procedimientos de medición tridimensional (3D), como la inyección de líquido y los moldes de pasta, pueden tener, además, efectos adversos, como dolor, irritación o reacción alérgica. Algunos procedimientos sin contacto que utilizan técnicas de escaneo con luz estructurada o láser: 1/ necesitan dispositivos de escaneo específicos; 2/ no se ha demostrado su utilidad en la práctica clínica; 3/ tienen un coste elevado. Por otra parte, no hay un procedimiento de referencia (patrón oro) para la medición del volumen de las úlceras cutáneas. Una optimización de las técnicas utilizadas para la valoración objetiva de la evolución de las úlceras de la piel ayudaría a comparar la eficacia de los distintos tratamientos y seleccionar los más adecuados, así como predecir el tiempo de curación. Por todo lo anterior, se justifica el desarrollo de un procedimiento de medición de úlceras basado en una técnica fotogramétrica sin contacto, como la estereofotogrametría. El objetivo general de esta tesis es desarrollar un procedimiento fotogramétrico para el escaneo, medición, clasificación tisular y seguimiento clínico de úlceras cutáneas; y validar dicho procedimiento en un estudio clínico con pacientes, evaluando su fiabilidad y exactitud. El procedimiento SfM-3DULC está basado en las técnicas estereofotogramétricas SfM (Structure from Motion) y MVS (Multi View Stereo) y utiliza como software de escaneo Agisoft PhotoScan y como software de medición del modelo 3D el programa 3DULC, creado por los autores. Este procedimiento escanea y reconstruye un modelo digital 3D de la úlcera utilizando una cámara digital, con la que se adquieren una serie de fotografías desde varias localizaciones y orientaciones. Para la validación del procedimiento SfM-3DULC, se realizó un estudio piloto en el que se evaluó su fiabilidad y exactitud. También se propuso una nueva variante del procedimiento ImageJ, en la que se utiliza una ortofotografía (Ortho-ImageJ), para medir el área proyectada. Por último, se compararon las mediciones realizadas por un grupo de dermatólogos y otro grupo de no expertos. Todas las variables medidas por dermatólogos usando SfM-3DULC mostraron excelentes puntuaciones de fiabilidad intra-evaluador (ICC > 0.99) e inter-evaluador (ICC > 0.98). En conclusión, el software 3DULC desarrollado, en su versión 1.0: 1/ Interviene en la fase de medición de la úlcera cutánea, tras su escaneo. 2/ Es autónomo respecto al procedimiento de escaneo, y podría utilizarse junto a cualquier otra técnica que obtenga una nube de puntos de la úlcera cutánea. 3/ Detecta el contorno de la úlcera de forma asistida basándose en su respuesta espectral. 4/ Clasifica las zonas de la úlcera cutánea según su tipo de tejido utilizando un árbol de decisión. 5/ Mide las siguientes variables morfométricas de la úlcera cutánea: coeficiente de circularidad, coeficiente de lisura, longitud máxima, perímetro, profundidad máxima, área proyectada, área de la superficie excavada, área de la superficie de referencia y volumen. 6/ Presenta los resultados con un informe HTML que facilita la interpretación por personal sanitario. / [CA] La Fotogrametria és una ciència i tecnologia que té utilitat mèdica creixent. Una aplicació mèdica destacable de la Fotogrametria és el mesurament de les úlceres de la pell. Les úlceres de la pell constitueixen un problema mèdic i social important: pel seu elevat cost econòmic, afectació de la salut i qualitat de vida, freqüent cronicitat i complicacions. El mesurament de l'úlcera és necessària i útil per al seguiment clínic. La disminució de variables de mida de l'úlcera indica la seva progressió cap a la cicatrització. Els procediments tradicionals de mesurament unidimensional i bidimensional, com el regle graduat i la planimetria amb acetat, es continuen utilitzant per la seva senzillesa i comoditat d'ús. No obstant això, són invasius i tenen inconvenients tècnics, com inexactitud i imprecisió. Altres procediments de mesurament tridimensional (3D), com la injecció de líquid i els motles de pasta, poden tenir, a més, efectes adversos, com dolor, irritació o reaccions al·lèrgiques. Alguns procediments sense contacte que utilitzen tècniques d'escaneig amb llum estructurada o làser: 1 / necessiten dispositius d'escaneig específics; 2 / no s'ha demostrat la seva utilitat en la pràctica clínica; 3 / tenen un cost elevat. D'altra banda, no hi ha un procediment de referència (patró or) per al mesurament del volum de les úlceres cutànies. Una optimització de les tècniques utilitzades per a la valoració objectiva de l'evolució de les úlceres de la pell ajudaria a comparar l'eficàcia dels diferents tractaments i seleccionar els més adequats, així com predir el temps de curació. Per tot l'anterior, es justifica el desenvolupament d'un procediment de mesurament de úlceres basat en una tècnica fotogramètrica sense contacte, com la estereofotogrametría. L'objectiu general d'aquesta tesi és desenvolupar un procediment fotogramètric per a l'escaneig, mesurament, classificació tissular i seguiment clínic d'úlceres cutànies; i validar aquest procediment en un estudi clínic amb pacients, avaluant la seva fiabilitat i exactitud. El procediment SFM-3DULC està basat en les tècniques estereofotogramétricas SFM (Structure from Motion) i MVS (Multi View Stereo) i utilitza com a programari d'escaneig Agisoft PhotoScan i com a programari de mesurament de el model 3D el programa 3DULC, creat pels autors. Aquest procediment escaneja i reconstrueix un model digital 3D de l'úlcera utilitzant una càmera digital, amb la qual s'adquireixen una sèrie de fotografies des de diverses localitzacions i orientacions. Per a la validació de l'procediment SFM-3DULC, es va realitzar un estudi pilot en el qual es va avaluar la seva fiabilitat i exactitud. També es va proposar una nova variant del procediment ImageJ, en què s'utilitza una ortofotografia (Ortho-ImageJ), per mesurar l'àrea projectada. Finalment, es van comparar les mesures realitzades per un grup de dermatòlegs i un altre grup de no experts. Totes les variables mesures per dermatòlegs usant SFM-3DULC van mostrar excel·lents puntuacions de fiabilitat intra-avaluador (ICC> 0.99) i inter-avaluador (ICC> 0.98). En conclusió, el programari 3DULC desenvolupat, en la seva versió 1.0: 1 / Intervé en la fase de mesurament de l'úlcera cutània, després de la seva exploració. 2 / És autònom respecte a l'procediment d'escaneig, i podria utilitzar-costat de qualsevol altra tècnica que obtingui un núvol de punts de l'úlcera cutània. 3 / Detecta el contorn de l'úlcera de forma assistida basant-se en la seva resposta espectral. 4 / Classifica les zones de l'úlcera cutània segons el seu tipus de teixit utilitzant un arbre de decisió. 5 / Mesura les variables morfomètriques de l'úlcera cutània: coeficient de circularitat, coeficient de llisor, longitud màxima, perímetre, profunditat màxima, àrea projectada, àrea de la superfície excavada, àrea de la superfície de referència i volum. 6 / Presenta els resultats amb un informe HTML que facilita la interpretació per personal sanitari. / [EN] Photogrammetry is a science and technology of increasing medical utility. A notable medical application of photogrammetry is the measurement of skin ulcers. Skin ulcers are a major medical and social problem: due to their high economic cost, impact on health and quality of life, frequent chronicity and complications. Ulcer measurement is necessary and useful for the clinical follow-up. Decreasing ulcer size variables indicate progression towards healing. Traditional one- and two-dimensional measurement procedures, such as the graduated ruler and acetate planimetry, are still used because of their simplicity and ease of use. However, they are invasive and have technical drawbacks, such as inaccuracy and imprecision. Other three-dimensional (3D) measurement procedures, such as liquid injection and paste moulds, may also have adverse effects, such as pain, irritation or allergic reaction. Some non-contact procedures that use structured light or laser scanning techniques: 1/ require specific scanning devices; 2/ have not been demonstrated to be useful in clinical practice; 3/ are expensive. Moreover, there is no reference procedure (gold standard) for the measurement of skin ulcer volume. Optimisation of the techniques used for the objective assessment of the evolution of skin ulcers would help to compare the efficacy of different treatments and to select the most appropriate ones, as well as to predict healing time. Therefore, the development of an ulcer measurement procedure based on a non-contact photogrammetric technique, such as stereophotogrammetry, is justified. The main objective of this thesis is to develop a photogrammetric procedure for the scanning, measurement, tissue classification and clinical follow-up of skin ulcers; and to validate this procedure in a clinical study with patients, evaluating its reliability and accuracy. The SfM-3DULC procedure is based on the stereophotogrammetric techniques SfM (Structure from Motion) and MVS (Multi View Stereo) and uses Agisoft PhotoScan as scanning software and 3DULC as 3D model measurement software. This procedure scans and reconstructs a 3D digital model of the ulcer using a digital camera, which acquires photographs from various locations and orientations. In order to validate the SfM-3DULC procedure, a pilot study was conducted to assess its reliability and accuracy. A new variant of the ImageJ procedure was also proposed, in which an orthophotography (Ortho-ImageJ) is used to measure the projected area. Finally, measurements made by a group of dermatologists and a group of non-experts were compared. All the variables measured by dermatologists using SfM-3DULC showed excellent scores of intra-rater reliability (ICC > 0.99) and inter-rater reliability (ICC > 0.98). In conclusion, the 3DULC software developed, in its version 1.0: 1/ Is used to measure the skin ulcer, after its scan. 2/ Is autonomous with respect to the scanning procedure, and could be used with any other technique that obtains a point cloud of the skin ulcer. 3/ Outlines the edge of the ulcer semi-automatically, based on its spectral response. 4/ Classifies skin ulcer areas according to their tissue type, using a decision tree. 5/ Measures the following morphometric variables of the skin ulcer: circularity coefficient, evenness coefficient, maximum length, perimeter, maximum depth, projected area, surface area, reference surface area and volume. 6/ Presents the results with an HTML report that facilitates its interpretation by healthcare personnel. / Esta tesis doctoral fue financiada con una beca predoctoral de la Generalitat Valenciana – Consellería de Educación, Investigación, Cultura y Deporte, y el Fondo Social Europeo (ACIF/2018/160). / Sánchez Jiménez, D. (2022). SfM-3DULC: Desarrollo y validación de un procedimiento fotogramétrico para el escaneo, medición, clasificación tisular y seguimiento clínico de úlceras cutáneas [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181691 / TESIS Úlceras cutáneas Medición de úlceras cutaneas SfM-3DULC Estereofotogrametría Dermatología Fotogrametría 3DULC Dermatology Photogrammetry Structure from Motion (SfM) Multi View Stereo (MVS) Wound measurement Skin ulcers
73	Multi-view Geometric Constraints For Human Action Recognition And Tracking Gritai, Alexei 01 January 2007 (has links) Human actions are the essence of a human life and a natural product of the human mind. Analysis of human activities by a machine has attracted the attention of many researchers. This analysis is very important in a variety of domains including surveillance, video retrieval, human-computer interaction, athlete performance investigation, etc. This dissertation makes three major contributions to automatic analysis of human actions. First, we conjecture that the relationship between body joints of two actors in the same posture can be described by a 3D rigid transformation. This transformation simultaneously captures different poses and various sizes and proportions. As a consequence of this conjecture, we show that there exists a fundamental matrix between the imaged positions of the body joints of two actors, if they are in the same posture. Second, we propose a novel projection model for cameras moving at a constant velocity in 3D space, \emph cameras, and derive the Galilean fundamental matrix and apply it to human action recognition. Third, we propose a novel use for the invariant ratio of areas under an affine transformation and utilizing the epipolar geometry between two cameras for 2D model-based tracking of human body joints. In the first part of the thesis, we propose an approach to match human actions using semantic correspondences between human bodies. These correspondences are used to provide geometric constraints between multiple anatomical landmarks ( e.g. hands, shoulders, and feet) to match actions observed from different viewpoints and performed at different rates by actors of differing anthropometric proportions. The fact that the human body has approximate anthropometric proportion allows for innovative use of the machinery of epipolar geometry to provide constraints for analyzing actions performed by people of different anthropometric sizes, while ensuring that changes in viewpoint do not affect matching. A novel measure in terms of rank of matrix constructed only from image measurements of the locations of anatomical landmarks is proposed to ensure that similar actions are accurately recognized. Finally, we describe how dynamic time warping can be used in conjunction with the proposed measure to match actions in the presence of nonlinear time warps. We demonstrate the versatility of our algorithm in a number of challenging sequences and applications including action synchronization , odd one out, following the leader, analyzing periodicity etc. Next, we extend the conventional model of image projection to video captured by a camera moving at constant velocity. We term such moving camera Galilean camera. To that end, we derive the spacetime projection and develop the corresponding epipolar geometry between two Galilean cameras. Both perspective imaging and linear pushbroom imaging form specializations of the proposed model and we show how six different ``fundamental" matrices including the classic fundamental matrix, the Linear Pushbroom (LP) fundamental matrix, and a fundamental matrix relating Epipolar Plane Images (EPIs) are related and can be directly recovered from a Galilean fundamental matrix. We provide linear algorithms for estimating the parameters of the the mapping between videos in the case of planar scenes. For applying fundamental matrix between Galilean cameras to human action recognition, we propose a measure that has two important properties. First property makes it possible to recognize similar actions, if their execution rates are linearly related. Second property allows recognizing actions in video captured by Galilean cameras. Thus, the proposed algorithm guarantees that actions can be correctly matched despite changes in view, execution rate, anthropometric proportions of the actor, and even if the camera moves with constant velocity. Finally, we also propose a novel 2D model based approach for tracking human body parts during articulated motion. The human body is modeled as a 2D stick figure of thirteen body joints and an action is considered as a sequence of these stick figures. Given the locations of these joints in every frame of a model video and the first frame of a test video, the joint locations are automatically estimated throughout the test video using two geometric constraints. First, invariance of the ratio of areas under an affine transformation is used for initial estimation of the joint locations in the test video. Second, the epipolar geometry between the two cameras is used to refine these estimates. Using these estimated joint locations, the tracking algorithm determines the exact location of each landmark in the test video using the foreground silhouettes. The novelty of the proposed approach lies in the geometric formulation of human action models, the combination of the two geometric constraints for body joints prediction, and the handling of deviations in anthropometry of individuals, viewpoints, execution rate, and style of performing action. The proposed approach does not require extensive training and can easily adapt to a wide variety of articulated actions. Human Action Recognition Human Joint Tracking 2D Human Model Based Tracking View Invariance in Action Recognition Multi-view Geomety in Action Recognition Computer Sciences Engineering
74	Low-Rank and Sparse Decomposition for Hyperspectral Image Enhancement and Clustering Tian, Long 03 May 2019 (has links) In this dissertation, some new algorithms are developed for hyperspectral imaging analysis enhancement. Tensor data format is applied in hyperspectral dataset sparse and low-rank decomposition, which could enhance the classification and detection performance. And multi-view learning technique is applied in hyperspectral imaging clustering. Furthermore, kernel version of multi-view learning technique has been proposed, which could improve clustering performance. Most of low-rank and sparse decomposition algorithms are based on matrix data format for HSI analysis. As HSI contains high spectral dimensions, tensor based extended low-rank and sparse decomposition (TELRSD) is proposed in this dissertation for better performance of HSI classification with low-rank tensor part, and HSI detection with sparse tensor part. With this tensor based method, HSI is processed in 3D data format, and information between spectral bands and pixels maintain integrated during decomposition process. This proposed algorithm is compared with other state-of-art methods. And the experiment results show that TELRSD has the best performance among all those comparison algorithms. HSI clustering is an unsupervised task, which aims to group pixels into different groups without labeled information. Low-rank sparse subspace clustering (LRSSC) is the most popular algorithms for this clustering task. The spatial-spectral based multi-view low-rank sparse subspace clustering (SSMLC) algorithms is proposed in this dissertation, which extended LRSSC with multi-view learning technique. In this algorithm, spectral and spatial views are created to generate multi-view dataset of HSI, where spectral partition, morphological component analysis (MCA) and principle component analysis (PCA) are applied to create others views. Furthermore, kernel version of SSMLC (k-SSMLC) also has been investigated. The performance of SSMLC and k-SSMLC are compared with sparse subspace clustering (SSC), low-rank sparse subspace clustering (LRSSC), and spectral-spatial sparse subspace clustering (S4C). It has shown that SSMLC could improve the performance of LRSSC, and k-SSMLC has the best performance. The spectral clustering has been proved that it equivalent to non-negative matrix factorization (NMF) problem. In this case, NMF could be applied to the clustering problem. In order to include local and nonlinear features in data source, orthogonal NMF (ONMF), graph-regularized NMF (GNMF) and kernel NMF (k-NMF) has been proposed for better clustering performance. The non-linear orthogonal graph NMF combine both kernel, orthogonal and graph constraints in NMF (k-OGNMF), which push up the clustering performance further. In the HSI domain, kernel multi-view based orthogonal graph NMF (k-MOGNMF) is applied for subspace clustering, where k-OGNMF is extended with multi-view algorithm, and it has better performance and computation efficiency. Multi-view algorithm Low-rank sparse subspace clustering Non-nagative Matrix Factorization. Clustering Anomaly Dectection Classification Low-rank and sparse decomposition Hyperspectral Image
75	Effect of Enhancement on Convolutional Neural Network Based Multi-view Object Classification Xie, Zhiyuan 29 May 2018 (has links) No description available. Electrical Engineering
76	A platform for multi-video learning content in emergency-related educational scenarios Lozano-Prieto, David January 2021 (has links) Utilizing multiple videos is an upcoming approach for developing learning material. It consists of recording scenes from different perspectives using diverse recording approaches, for example, 360-degrees camera, a drone camera, and body cameras. Up until now, there is a lack of efficient ways to present such recordings and extract the benefits of applying this type of media in learning contexts. To close this gap, this thesis explores suitable manners for presenting this specific type of media, aiming to be helpful for the further training of emergency-related learners. To achieve this goal, we performed a study structured in three major blocks: design of the solution, development of the designed system, and assessment of the suitability of the presented solution. The design was informed by a literature review, a qualitative expert interview, and a preferences questionnaire. After the design process, the system named Theia was developed using web-based technologies. Finally, to validate the system’s suitability within the context of this project, an expert evaluation was carried out. It consisted of a mixed assessment combining qualitative methods, based on task performance and qualitative interview assessment, and the usage of a Technology Acceptance Model (TAM) questionnaire, aiming for the usability and the ease of use of the developed tool. After the evaluation, the proposed system was concluded to incorporate a suitable layout, navigation, functionalities, and interactive mechanisms for an adequate video presentation of media footage from simultaneous recordings within an educational context for emergency-related students. Additionally, valuable insights were extracted from the analysis of the results for the future of the area of research, including recommendations for an optimal footage recording and the starting point for future work in the research community. Multi-view video player multiple videos web-based application emergency scenario education enhanced video experience interactive drone bodycam 360-degrees Elektroteknik och elektronik
77	Canonical Correlation and Clustering for High Dimensional Data Ouyang, Qing January 2019 (has links) Multi-view datasets arise naturally in statistical genetics when the genetic and trait profile of an individual is portrayed by two feature vectors. A motivating problem concerning the Skin Intrinsic Fluorescence (SIF) study on the Diabetes Control and Complications Trial (DCCT) subjects is presented. A widely applied quantitative method to explore the correlation structure between two domains of a multi-view dataset is the Canonical Correlation Analysis (CCA), which seeks the canonical loading vectors such that the transformed canonical covariates are maximally correlated. In the high dimensional case, regularization of the dataset is required before CCA can be applied. Furthermore, the nature of genetic research suggests that sparse output is more desirable. In this thesis, two regularized CCA (rCCA) methods and a sparse CCA (sCCA) method are presented. When correlation sub-structure exists, stand-alone CCA method will not perform well. To tackle this limitation, a mixture of local CCA models can be employed. In this thesis, I review a correlation clustering algorithm proposed by Fern, Brodley and Friedl (2005), which seeks to group subjects into clusters such that features are identically correlated within each cluster. An evaluation study is performed to assess the effectiveness of CCA and correlation clustering algorithms using artificial multi-view datasets. Both sCCA and sCCA-based correlation clustering exhibited superior performance compare to the rCCA and rCCA-based correlation clustering. The sCCA and the sCCA-clustering are applied to the multi-view dataset consisted of PrediXcan imputed gene expression and SIF measurements of DCCT subjects. The stand-alone sparse CCA method identified 193 among 11538 genes being correlated with SIF#7. Further investigation of these 193 genes with simple linear regression and t-test revealed that only two genes, ENSG00000100281.9 and ENSG00000112787.8, were significance in association with SIF#7. No plausible clustering scheme was detected by the sCCA based correlation clustering method. / Thesis / Master of Science (MSc) Machine Learning Correlation Clustering Sparse Canonical Correlation Analysis Skin Intrinsic Fluorescence Multi-view dataset Lasso Dimensionality reduction PrediXcan High dimensional data
78	Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and Retrieval Rohan Sarkar (19065215) 11 July 2024 (has links) <p dir="ltr">Object Recognition and Retrieval is a fundamental problem in Computer Vision that involves recognizing objects and retrieving similar object images through visual queries. While deep metric learning is commonly employed to learn image embeddings for solving such problems, the representations learned using existing methods are not robust to changes in viewpoint, pose, and object state, especially for fine-grained recognition and retrieval tasks. To overcome these limitations, this dissertation aims to learn robust object representations that remain invariant to such transformations for fine-grained tasks. First, it focuses on learning dual pose-invariant embeddings to facilitate recognition and retrieval at both the category and finer object-identity levels by learning category and object-identity specific representations in separate embedding spaces simultaneously. For this, the PiRO framework is introduced that utilizes an attention-based dual encoder architecture and novel pose-invariant ranking losses for each embedding space to disentangle the category and object representations while learning pose-invariant features. Second, the dissertation introduces ranking losses that cluster multi-view images of an object together in both the embedding spaces while simultaneously pulling the embeddings of two objects from the same category closer in the category embedding space to learn fundamental category-specific attributes and pushing them apart in the object embedding space to learn discriminative features to distinguish between them. Third, the dissertation addresses state-invariance and introduces a novel ObjectsWithStateChange dataset to facilitate research in recognizing fine-grained objects with state changes involving structural transformations in addition to pose and viewpoint changes. Fourth, it proposes a curriculum learning strategy to progressively sample object images that are harder to distinguish for training the model, enhancing its ability to capture discriminative features for fine-grained tasks amidst state changes and other transformations. Experimental evaluations demonstrate significant improvements in object recognition and retrieval performance compared to previous methods, validating the effectiveness of the proposed approaches across several challenging datasets under various transformations.</p> Computer vision Deep metric learning Pose-invariant State-invariant Object Recognition and Retrieval multi-view machine learning Representation Learning self-attention models
79	O algoritmo de aprendizado semi-supervisionado co-training e sua aplicação na rotulação de documentos / The semi-supervised learning algorithm co-training applied to label text documents Matsubara, Edson Takashi 26 May 2004 (has links) Em Aprendizado de Máquina, a abordagem supervisionada normalmente necessita de um número significativo de exemplos de treinamento para a indução de classificadores precisos. Entretanto, a rotulação de dados é freqüentemente realizada manualmente, o que torna esse processo demorado e caro. Por outro lado, exemplos não-rotulados são facilmente obtidos se comparados a exemplos rotulados. Isso é particularmente verdade para tarefas de classificação de textos que envolvem fontes de dados on-line tais como páginas de internet, email e artigos científicos. A classificação de textos tem grande importância dado o grande volume de textos disponível on-line. Aprendizado semi-supervisionado, uma área de pesquisa relativamente nova em Aprendizado de Máquina, representa a junção do aprendizado supervisionado e não-supervisionado, e tem o potencial de reduzir a necessidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados está disponível. Este trabalho descreve o algoritmo de aprendizado semi-supervisionado co-training, que necessita de duas descrições de cada exemplo. Deve ser observado que as duas descrições necessárias para co-training podem ser facilmente obtidas de documentos textuais por meio de pré-processamento. Neste trabalho, várias extensões do algoritmo co-training foram implementadas. Ainda mais, foi implementado um ambiente computacional para o pré-processamento de textos, denominado PreTexT, com o objetivo de utilizar co-training em problemas de classificação de textos. Os resultados experimentais foram obtidos utilizando três conjuntos de dados. Dois conjuntos de dados estão relacionados com classificação de textos e o outro com classificação de páginas de internet. Os resultados, que variam de excelentes a ruins, mostram que co-training, similarmente a outros algoritmos de aprendizado semi-supervisionado, é afetado de maneira bastante complexa pelos diferentes aspectos na indução dos modelos. / In Machine Learning, the supervised approach usually requires a large number of labeled training examples to learn accurately. However, labeling is often manually performed, making this process costly and time-consuming. By contrast, unlabeled examples are often inexpensive and easier to obtain than labeled examples. This is especially true for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Text classification is of great practical importance today given the massive volume of online text available. Semi-supervised learning, a relatively new area in Machine Learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labeled data whenever only a small set of labeled examples is available. This work describes the semi-supervised learning algorithm co-training, which requires a partitioned description of each example into two distinct views. It should be observed that the two different views required by co-training can be easily obtained from textual documents through pre-processing. In this works, several extensions of co-training algorithm have been implemented. Furthermore, we have also implemented a computational environment for text pre-processing, called PreTexT, in order to apply the co-training algorithm to text classification problems. Experimental results using co-training on three data sets are described. Two data sets are related to text classification and the other one to web-page classification. Results, which range from excellent to poor, show that co-training, similarly to other semi-supervised learning algorithms, is affected by modelling assumptions in a rather complicated way. aprendizado de máquina aprendizado multi-visão aprendizado semi-supervisionado co-training co-training machine learning mineração de textos multi-view learning pré-processamento de textos semi-supervised learning text mining text pre-processing
80	Workflow and Activity Modeling for Monitoring Surgical Procedures / Modélisation des activités chirurgicales et de leur déroulement pour la reconnaissance des étapes opératoires Padoy, Nicolas 14 April 2010 (has links) Le bloc opératoire est au coeur des soins délivrés dans l'hôpital. Suite à de nombreux développements techniques et médicaux, il devient équipé de salles opératoires hautement technologiques. Bien que ces changements soient bénéfiques pour le traitement des patients, ils accroissent la complexité du déroulement des opérations. Ils impliquent également la présence de nombreux systèmes électroniques fournissant de l'information sur les processus chirurgicaux. Ce travail s'intéresse au développement de méthodes statistiques permettant de modéliser le déroulement des processus chirurgicaux et d'en reconnaitre les étapes, en utilisant des signaux présents dans le bloc opératoire. Nous introduisons et formalisons le problème consistant à reconnaitre les phases réalisées au sein d'un processus chirurgical, en utilisant une représentation des chirurgies par une suite temporelle et multi-dimensionnelle de signaux synchronisés. Nous proposons ensuite des méthodes pour la modélisation, la segmentation hors-ligne et la reconnaissance en-ligne des phases chirurgicales. La méthode principale, une variante de modèle de Markov caché étendue par des variables de probabilités de phases, est démontrée sur deux applications médicales. La première concerne les interventions endoscopiques, la cholécystectomie étant prise en exemple. Les phases endoscopiques sont reconnues en utilisant des signaux indiquant l'utilisation des instruments et enregistrés lors de chirurgies réelles. La deuxième application concerne la reconnaissance des activités génériques d'une salle opératoire. Dans ce cas, la reconnaissance utilise de l'information 4D provenant d'un système de reconstruction multi-vues / The department of surgery is the core unit of the patient care system within a hospital. Due to continuous technical and medical developments, such departments are equipped with increasingly high-tech surgery rooms. This provides higher benefits for patient treatment, but also increases the complexity of the procedures' workflow. This also induces the presence of multiple electronic systems providing rich and various information about the surgical processes. The focus of this work is the development of statistical methods that permit the modeling and monitoring of surgical processes, based on signals available in the surgery room. We introduce and formalize the problem of recognizing phases within a workflow, using a representation of interventions in terms of multidimensional time-series formed by synchronized signals acquired over time. We then propose methods for the modeling, offline segmentation and on-line recognition of surgical phases. The main method, a variant of hidden Markov models augmented by phase probability variables, is demonstrated on two medical applications. The first one is the monitoring of endoscopic interventions, using cholecystectomy as illustrative surgery. Phases are recognized using signals indicating tool usage and recorded from real procedures. The second application is the monitoring of a generic surgery room workflow. In this case, phase recognition is performed by using 4D information from surgeries performed in a mock-up operating room in presence of a multi-view reconstruction system Déroulement des Processus Chirurgicaux Analyse des Activités Chirurgicales Modèles de Markov Cachés Cholécystectomie Surgical Workflow Surgical Activity Analysis Context Aware Operating Rooms Hidden Markov Models Cholecystectomy

Search results