Global ETD Search

11	Parameter-free agglomerative hierarchical clustering to model learners' activity in online discussion forums Cobo Rodríguez, Germán 22 April 2014 (has links) L'anàlisi de l'activitat dels estudiants en els fòrums de discussió online implica un problema de modelització altament depenent del context, el qual pot ser plantejat des d'aproximacions tant teòriques com empíriques. Quan aquest problema és abordat des de l'àmbit de la mineria de dades, l'enfocament més comunament adoptat és el de la classificació no supervisada (o clustering), donant lloc, d'aquesta manera, a un escenari de clustering en el qual el nombre real de clústers és a priori desconegut. Per tant, aquesta aproximació revela una qüestió subjacent, la qual no és sinó un dels problemes més coneguts del paradigma del clustering: l'estimació del nombre de clústers, habitualment seleccionat per l'usuari concorde a algun tipus de criteri subjectiu que pot comportar fàcilment l'aparició de biaixos indesitjats en els models obtinguts. Amb l'objectiu d'evitar qualsevol intervenció de l'usuari en l'etapa de clustering, dos nous criteris d'unió entre clústers són proposats en la present tesi, els quals, al seu torn, permeten la implementació d'un nou algorisme de clustering jeràrquic aglomeratiu lliure de paràmetres. Un complet conjunt d'experiments indica que el nou algorisme de clustering és capaç de proporcionar solucions de clustering òptimes enfront d'una gran varietat d'escenaris de clustering, sent capaç de bregar amb diferents classes de dades, així com de millorar el rendiment ofert pels algorismes de clustering més àmpliament emprats en la pràctica. Finalment, una estratègia d'anàlisi de dues etapes basada en el paradigma del clustering subespaial és proposada a fi d'abordar adequadament el problema de la modelització de la participació dels estudiants en les discussions asíncrones. Combinada amb el nou algorisme clustering, l'estratègia proposada demostra ser capaç de limitar la intervenció subjectiva de l'usuari a les etapes d'interpretació del procés d'anàlisi i de donar lloc a una completa modelització de l'activitat duta a terme pels estudiants en els fòrums de discussió online. / El análisis de la actividad de los estudiantes en los foros de discusión online acarrea un problema de modelización altamente dependiente del contexto, el cual puede ser planteado desde aproximaciones tanto teóricas como empíricas. Cuando este problema es abordado desde el ámbito de la minería de datos, el enfoque más comúnmente adoptado es el de la clasificación no supervisada (o clustering), dando lugar, de este modo, a un escenario de clustering en el que el número real de clusters es a priori desconocido. Por tanto, esta aproximación revela una cuestión subyacente, la cual no es sino uno de los problemas más conocidos del paradigma del clustering: la estimación del número de clusters, habitualmente seleccionado por el usuario acorde a algún tipo de criterio subjetivo que puede conllevar fácilmente la aparición de sesgos indeseados en los modelos obtenidos. Con el objetivo de evitar cualquier intervención del usuario en la etapa de clustering, dos nuevos criterios de unión entre clusters son propuestos en la presente tesis, los cuales, a su vez, permiten la implementación de un nuevo algoritmo de clustering jerárquico aglomerativo libre de parámetros. Un completo conjunto de experimentos indica que el nuevo algoritmo de clustering es capaz de proporcionar soluciones de clustering óptimas frente a una gran variedad de escenarios de clustering, siendo capaz de lidiar con diferentes clases de datos, así como de mejorar el rendimiento ofrecido por los algoritmos de clustering más ampliamente utilizados en la práctica. Finalmente, una estrategia de análisis de dos etapas basada en el paradigma del clustering subespacial es propuesta a fin de abordar adecuadamente el problema de la modelización de la participación de los estudiantes en las discusiones asíncronas. Combinada con el nuevo algoritmo clustering, la estrategia propuesta demuestra ser capaz de limitar la intervención subjetiva del usuario a las etapas de interpretación del proceso de análisis y de dar lugar a una completa modelización de la actividad llevada a cabo por los estudiantes en los foros de discusión online. / The analysis of learners' activity in online discussion forums leads to a highly context-dependent modelling problem, which can be posed from both theoretical and empirical approaches. When this problem is tackled from the data mining field, a clustering-based perspective is usually adopted, thus giving rise to a clustering scenario where the real number of clusters is a priori unknown. Hence, this approach reveals an underlying problem, which is one of the best-known issues of the clustering paradigm: the estimation of the number of clusters, habitually selected by user according to some kind of subjective criterion that may easily lead to the appearance of undesired biases in the obtained models. With the aim of avoiding any user intervention in the cluster analysis stage, two new cluster merging criteria are proposed in the present thesis, which allow to implement a novel parameter-free agglomerative hierarchical algorithm. A complete set of experiments indicate that the new clustering algorithm is able to provide optimal clustering solutions in the face of a great variety of clustering scenarios, both having the ability to deal with different kinds of data and outperforming clustering algorithms most widely used in practice. Finally, a two-stage analysis strategy based on the subspace clustering paradigm is proposed to properly tackle the issue of modelling learners' participation in the asynchronous discussions. In combination with the new clustering algorithm, the proposed strategy proves to be able to limit user's subjective intervention to the interpretation stages of the analysis process and to lead to a complete modelling of the activity performed by learners in online discussion forums. 004 - Informàtica 378 - Ensenyament superior. Universitats
12	Flexible techniques for heterogeneous XML data retrieval Sanz Blasco, Ismael 31 October 2007 (has links) The progressive adoption of XML by new communities of users has motivated the appearance of applications that require the management of large and complex collections, which present a large amount of heterogeneity. Some relevant examples are present in the fields of bioinformatics, cultural heritage, ontology management and geographic information systems, where heterogeneity is not only reflected in the textual content of documents, but also in the presence of rich structures which cannot be properly accounted for using fixed schema definitions. Current approaches for dealing with heterogeneous XML data are, however, mainly focused at the content level, whereas at the structural level only a limited amount of heterogeneity is tolerated; for instance, weakening the parent-child relationship between nodes into the ancestor-descendant relationship. The main objective of this thesis is devising new approaches for querying heterogeneous XML collections. This general objective has several implications: First, a collection can present different levels of heterogeneity in different granularity levels; this fact has a significant impact in the selection of specific approaches for handling, indexing and querying the collection. Therefore, several metrics are proposed for evaluating the level of heterogeneity at different levels, based on information-theoretical considerations. These metrics can be employed for characterizing collections, and clustering together those collections which present similar characteristics. Second, the high structural variability implies that query techniques based on exact tree matching, such as the standard XPath and XQuery languages, are not suitable for heterogeneous XML collections. As a consequence, approximate querying techniques based on similarity measures must be adopted. Within the thesis, we present a formal framework for the creation of similarity measures which is based on a study of the literature that shows that most approaches for approximate XML retrieval (i) are highly tailored to very specific problems and (ii) use similarity measures for ranking that can be expressed as ad-hoc combinations of a set of --basic' measures. Some examples of these widely used measures are tf-idf for textual information and several variations of edit distances. Our approach wraps these basic measures into generic, parametrizable components that can be combined into complex measures by exploiting the composite pattern, commonly used in Software Engineering. This approach also allows us to integrate seamlessly highly specific measures, such as protein-oriented matching functions.Finally, these measures are employed for the approximate retrieval of data in a context of highly structural heterogeneity, using a new approach based on the concepts of pattern and fragment. In our context, a pattern is a concise representations of the information needs of a user, and a fragment is a match of a pattern found in the database. A pattern consists of a set of tree-structured elements --- basically an XML subtree that is intended to be found in the database, but with a flexible semantics that is strongly dependent on a particular similarity measure. For example, depending on a particular measure, the particular hierarchy of elements, or the ordering of siblings, may or may not be deemed to be relevant when searching for occurrences in the database. Fragment matching, as a query primitive, can deal with a much higher degree of flexibility than existing approaches. In this thesis we provide exhaustive and top-k query algorithms. In the latter case, we adopt an approach that does not require the similarity measure to be monotonic, as all previous XML top-k algorithms (usually based on Fagin's algorithm) do. We also presents two extensions which are important in practical settings: a specification for the integration of the aforementioned techniques into XQuery, and a clustering algorithm that is useful to manage complex result sets.All of the algorithms have been implemented as part of ArHeX, a toolkit for the development of multi-similarity XML applications, which supports fragment-based queries through an extension of the XQuery language, and includes graphical tools for designing similarity measures and querying collections. We have used ArHeX to demonstrate the effectiveness of our approach using both synthetic and real data sets, in the context of a biomedical research project. similarity approximate query processing heterogeneous data management XML 004
13	Heterogeneous neural networks: theory and applications Belanche Muñoz, Lluis 18 July 2000 (has links) Aquest treball presenta una classe de funcions que serveixen de models neuronals generalitzats per ser usats en xarxes neuronals artificials. Es defineixen com una mesura de similitud que actúa com una definició flexible de neurona vista com un reconeixedor de patrons. La similitud proporciona una marc conceptual i serveix de cobertura unificadora de molts models neuronals de la literatura i d'exploració de noves instàncies de models de neurona. La visió basada en similitud porta amb naturalitat a integrar informació heterogènia, com ara quantitats contínues i discretes (nominals i ordinals), i difuses ó imprecises. Els valors perduts es tracten de manera explícita. Una neurona d'aquesta classe s'anomena neurona heterogènia i qualsevol arquitectura neuronal que en faci ús serà una Xarxa Neuronal Heterogènia.En aquest treball ens concentrem en xarxes neuronals endavant, com focus inicial d'estudi. Els algorismes d'aprenentatge són basats en algorisms evolutius, especialment extesos per treballar amb informació heterogènia. En aquesta tesi es descriu com una certa classe de neurones heterogènies porten a xarxes neuronals que mostren un rendiment molt satisfactori, comparable o superior al de xarxes neuronals tradicionals (com el perceptró multicapa ó la xarxa de base radial), molt especialment en presència d'informació heterogènia, usual en les bases de dades actuals. / This work presents a class of functions serving as generalized neuron models to be used in artificial neural networks. They are cast into the common framework of computing a similarity function, a flexible definition of a neuron as a pattern recognizer. The similarity endows the model with a clear conceptual view and serves as a unification cover for many of the existing neural models, including those classically used for the MultiLayer Perceptron (MLP) and most of those used in Radial Basis Function Networks (RBF). These families of models are conceptually unified and their relation is clarified. The possibilities of deriving new instances are explored and several neuron models --representative of their families-- are proposed. The similarity view naturally leads to further extensions of the models to handle heterogeneous information, that is to say, information coming from sources radically different in character, including continuous and discrete (ordinal) numerical quantities, nominal (categorical) quantities, and fuzzy quantities. Missing data are also explicitly considered. A neuron of this class is called an heterogeneous neuron and any neural structure making use of them is an Heterogeneous Neural Network (HNN), regardless of the specific architecture or learning algorithm. Among them, in this work we concentrate on feed-forward networks, as the initial focus of study. The learning procedures may include a great variety of techniques, basically divided in derivative-based methods (such as the conjugate gradient)and evolutionary ones (such as variants of genetic algorithms).In this Thesis we also explore a number of directions towards the construction of better neuron models --within an integrant envelope-- more adapted to the problems they are meant to solve.It is described how a certain generic class of heterogeneous models leads to a satisfactory performance, comparable, and often better, to that of classical neural models, especially in the presence of heterogeneous information, imprecise or incomplete data, in a wide range of domains, most of them corresponding to real-world problems. algorismes evolutius dades heterogènies aprenentatge automàtic xarxes neuronals mesures de similitud 004
14	An object-oriented approach to the translation between MOF Metaschemas Raventós Pagès, Ruth 27 February 2009 (has links) Since the 1960s, many formal languages have been developed in order to allow software engineers to specify conceptual models and to design software artifacts. A few of these languages, such as the Unified Modeling Language (UML), have become widely used standards. They employ notations and concepts that are not readily understood by "domain experts," who understand the actual problem domain and are responsible for finding solutions to problems.The Object Management Group (OMG) developed the Semantics of Business Vocabulary and Rules (SBVR) specification as a first step towards providing a language to support the specification of "business vocabularies and rules." The function of SBVR is to capture business concepts and business rules in languages that are close enough to ordinary language, so that business experts can read and write them, and formal enough to capture the intended semantics and present them in a form that is suitable for engineering the automation of the rules.The ultimate goal of business rules approaches is to build software systems directly from vocabularies and rules. One way of reaching this goal, within the context of model-driven architecture (MDA), is to transform SBVR models into UML models. OMG also notes the need for a reverse engineering transformation between UML schemas and SBVR vocabularies and rules in order to validate UML schemas. This thesis proposes an automatic approach to translation between UML schemas and SBVR vocabularies and rules, and vice versa. It consists of the application of a new generic schema translation approach to the particular case of UML and SBVR.The main contribution of the generic approach is the extensive use of object-oriented concepts in the definition of translation mappings, particularly the use of operations (and their refinements) and invariants, both formalized in the Object Constraint Language (OCL). Translation mappings can be used to check that two schemas are translations of each other, and to translate one into the other, in either direction. Translation mappings are declaratively defined by means of preconditions, postconditions and invariants, and they can be implemented in any suitable language. The approach leverages the object-oriented constructs embedded in Meta Object Facility (MOF) metaschemas to achieve the goals of object-oriented software development in the schema translation problem.The generic schema translation approach and its application to UML schemas and SBVR vocabularies and rules is fully implemented in the UML-based Specification Environment (USE) tool and validated by a case study based on the conceptual schema of the Digital Bibliography & Library Project (DBLP) system. traducció d'esquemes Models conceptuals de dades UML MOF 004
15	Transparent Protection of Data Sebé Feixas, Francesc 07 February 2003 (has links) Aquesta tesi tracta la protecció de dades quan aquestes han de ser lliurades a usuaris en qui no es té absoluta confiança. En aquesta situació, les dades s'han de protegir de manera que segueixin essent utilitzables. Aquesta protecció ha de ser imperceptible, per tal de no destorbar la utilització correcta de les dades, i alhora efectiva, protegint contra la utilització no autoritzada d'aquestes.L'estudi es divideix tenint en compte els dos tipus de dades la protecció transparent de les quals s'estudia: continguts multimèdia i microdades estadístiques.Pel que fa a dades multimèdia, es tracta la seva protecció des de dues vessants: la protecció del copyright i la protecció de la integritat i l'autentificació.En comerç electrònic de continguts multimèdia, els comerciants venen dades a usuaris en qui no confien plenament i que és possible que en facin còpies il·legals. Aquest fet fa que sigui necessari protegir la propietat intel·lectual d'aquests productes. Centrant-se en imatges digitals, es presenten diverses contribucions a les dues principals tècniques de protecció del copyright electrònic: marca d'aigua i empremta digital.Concretament, pel que fa a marca d'aigua, es presenten dos nous esquemes per imatges digitals. El primer és semi-cec i robust contra atacs de compresió, filtratge i escalat. El segon és cec i robust contra atacs de compresió, filtratge, escalat i distorsió geomètrica moderada. Seguidament, es proposa una nova tècnica basada en mesclar objectes marcats que permet combinar i augmentar la robustesa de diferents esquemes de marca d'aigua actuals.En empremta digital, es presenta una construcció per obtenir codis binaris segurs contra atacs de confabulació de fins a tres usuaris deshonestos. La proposta actual obté, per un nombre moderat de possibles compradors, paraules codi més curtes que les obtingudes fins al moment.Freqüentment, els continguts multimèdia es publiquen en llocs de poca confiança on poden ser alterats. En aquestes situacions, la marca d'aigua es pot utilitzar per protegir dades proporcionant-los integritat i autenticació. Es demostra l'aplicabilitat de l'algorisme de marca d'aigua basat en expansió d'espectre en el domini espacial per proporcionar, de forma transparent, autenticació i integritat sense pèrdua a imatges digitals. L'altre tipus de dades tractades en aquesta tesi són les microdades estadístiques.Quan fitxers amb dades estadístiques que contenen informació sobre entitats individuals són lliurats per al seu estudi, és necessari protegir la privacitat d'aquestes entitats. Aquest tipus de dades s'han de lliurar de manera que es combini la utilitat estadística amb la protecció de la privacitat de les entitats afectades. Els mètodes per pertorbar dades amb aquest objectiu s'anomenen mètodes de control del risc de revelació estadística. En aquest camp, es proposa una modificació d'una mètrica existent per mesurar la pèrdua d'informació i el risc de revelació per tal que permeti avaluar mètodes que generen fitxers emmascarats amb un nombre de registres diferent a l'original. Es proposa també un algorisme per post-processar fitxers de dades emmascarades per tal de reduir la pèrdua d'informació mantenint un risc de revelació similar. D'aquesta manera s'aconsegueix millorar els dos millors mètodes d'emmascarament actuals: 'microagregació multivariant' i 'intercanvi de rangs'.Finalment, es presenta una nova aplicació per proporcionar accés multinivell a dades de precisió crítica. D'aquesta manera, les dades protegides es fan disponibles a diferents usuaris, que segons el seu nivell d'autorització, podran eliminar part de la protecció obtenint dades de millor qualitat. / This dissertation is about protection of data that have to be made available to possibly dishonest users. Data must be protected while keeping its usability. Such protection must be imperceptible, so as not to disrupt correct use of data, and effective against unauthorized uses. The study is divided according to the two kinds of data whose transparent protection is studied: multimedia content and statistical microdata.Regarding multimedia content, protection is addressed in two ways: 1)copyright protection; 2) integrity protection and authentication.In electronic commerce of multimedia content, merchants sell data to untrusted buyers that may redistribute it. In this respect, intellectual property rights of content providers must be ensured. Focusing on digital images, several contributions are presented on the two main electronic copyright protection techniques: watermarking and fingerprinting.Two new schemes for watermarking for digital images are presented. The first is semi-public and robust against compression, filtering and scaling attacks. The second one is oblivious and robust against compression, filtering, scaling and moderate geometric distortion attacks. Next, a new technique based on mixture of watermarked digital objects is proposed that allows robustness to be increased by combining robustness properties of different current watermarking schemes.In the field of fingerprinting, a new construction to obtain binary collusion-secure fingerprinting codes robust against collusions of up to three buyers is presented. This construction provides, for a moderate number of possible buyers, shorter codewords than those offered by current proposals.Rather often, multimedia contents are published in untrusted sites where they may suffer malicious alterations. In this situation, watermarking can be applied to protecting data in order to provide integrity and authentication. A spatial-domain spread-spectrum watermarking algorithm is described and proven suitable for lossless image authentication.The other kind of data addressed in this dissertation are statistical microdata.When statistical files containing information about individual entities are released for public use, privacy is a major concern. Such data files must be released in a way that combines statistical utility and protection of the privacy of entities concerned. Methods to perturb data in this way are called statistical disclosure control methods. In this field, a modification to a current score to measure information loss and disclosure risk is proposed that allows masked data sets with a number of records not equal to the number of records of the original data set to be considered.Next, a post-masking optimization procedure which reduces information loss while keeping disclosure risk approximately unchanged is proposed. Through this procedure, the two best performing masking methods are enhanced: 'multivariate microaggregation' and 'rankswapping'.Finally, a novel application to providing multilevel access to precision-critical data is presented. In this way, protected data are made available to different users, who, depending on their clearance, can remove part of the noise introduced by protection, thus obtaining better data quality. marca d'aigua mètodes d'emmascarament protecció de dades 3325. Tecnologia de les comunicacions 621.3
16	Nous desenvolupaments, aplicacions bioanalítiques i validació dels mètodes de resolució multivariant Jaumot Soler, Joaquim 20 June 2006 (has links) Aquest treball s'integra en una de les línies d'investigació del grup de recerca "Quimiometria" del Departament de Química Analítica de la Universitat de Barcelona. Aquesta línia d'investigació es centra en el desenvolupament de mètodes quimiomètrics d'anàlisi multivariant de dades, i en la seva aplicació a l'estudi analític dels canvis de conformació i/o de les interaccions entre biomolècules.Actualment és possible enregistrar l'espectre sencer d'una mostra en poc temps. Aquest augment del nombre i de la complexitat de les dades adquirides ha portat a l'aparició de mètodes que tenen com a finalitat la obtenció d'informació d'interés físico-químic a partir d'aquests conjunt de dades. Amb aquesta finalitat es poden trobar dues aproximacions: a) els mètodes de modelatge rígid que exigeixen la postulació d'un model químic o cinètic al qual ajustar les dades experimentals, i b) els mètodes de modelatge flexible que no necessiten la postulació d'un model.El treball realitzat en aquests tesi doctoral es pot dividir en tres blocs.En primer lloc, s'ha desenvolupat una interfície gràfica en l'entorn de programació MATLAB pel mètode de resolució multivariant de corbes mitjançant mínims quadrats alternats (MCR-ALS). Aquesta interfície millora notablement la interacció entre l'usuari i el programa, i potencía la seva utilització generalitzada per part d'usuaris no acostumats a treballar amb eines pròpies de la Quimiometria.En segon lloc, s'ha dut a terme la validació de diversos mètodes d'anàlisi multivariant, és a dir, s'ha estudiat la fiabilitat de les solucions obtingudes per aquest tipus de mètodes quimiomètrics. Així, pel mètode MCR-ALS, s'ha analitzat la influència i la propagació de l'error experimental i les possibles repercusions sobre les ambigüetats matemàtiques existents en les solucions obtingudes. Aquest estudi s'ha realitzat tant en el cas de l'anàlisi individual de matrius de dades obtingudes en un únic experiment, com en el cas de l'anàlisi simultani de matrius de dades obtingudes en diversos experiments. En el cas dels mètodes de modelatge rígid s'ha estudiat l'ambigüetat existent al ajustar mecanismes cinètics complexos. En aquest cas s'ha observat l'aparició de mínims locals múltiples amb el mateix valor d'ajust en la superfície de desposta associada.Finalment, s'han aplicat els mètodes quimiomètrics de modelatge flexible i de modelatge rígid a l'estudi dels equilibris en solució dels àcids nucleics. Aquestes són biomolècules que tenen una organització jeràrquica començant en la seqüència de nucleòtids a les cadenes fins a estructures complexes d'ordre superior com els tríplexs o quadruplexs. Els canvis conformacionals o les interaccions amb d'altres biomolècules s'han estudiat tradicionalment mitjançant experiments seguits amb tècniques espectroscòpies. En aquest treball es seguiran aquests processos mitjançant lectures a moltes longituts d'ona (aproximació multivariant) i s'aplicaran mètodes quimiomètrics adients de tractaments de dades multivariants. Els procesos estudiants en aquesta Tesi són bàsicament els canvis conformacionals provocats en variar condicions del medi, com el pH, la temperatura, la concentració d'altres ions... S'han emprat tècniques espectroscòpiques com l'absorció molecular a l'UV-visible, la fluorescència, el dicroisme circular i la ressonància magnètica nuclear. Una altra aplicació, ha estat l'anàlisi de micromatrius d'ADN. L'aparició d'aquesta la tecnologia ha permès obtenir informació sobre els nivells de l'expressió gènica per un gran nombre de gens en un únic experiment. La generació de grans quantitats de dades requereix la utilització d'eines mitjançant les quals es pugui extreure la informació biològica. En aquest treball s'ha aplicat el mètode MCR-ALS a l'anàlisis de diversos conjunts de dades per tal de poder determinar la relació entre les mostres que presenten diferents tipus de càncer i els gens estudiats. / OF THE PHD THESIS: This PhD Thesis has been developed in the framework of the Chemometrics group at the Universitat de Barcelona. The work deals with the development and validation of Multivariate Curve Resolution (MCR) methods (both hard- and soft-modelling), and with their application to bioanalytical problems. The work has been organized into three blocks:First, a graphical interface has been developed for the program running the MCR-ALS (Multivariate Curve resolution Alternating Least Squares) method in the MATLAB® environment. This interface improves the interaction between the user and the program and facilitates the use of multivariate curve resolution to little experineced potential users.Secondly, validation of multivariate resolution methods of data analysis has been carried out. For the MCR-ALS method, effects of rotational ambiguities and of propagation of experimental noise have been studied. These studies have been performed in the analysis of a single experiment and in the case of analyzing multiple experiments simultaneously. In the case of hard-modelling kinetic data fitting methods, ambiguities in the analysis of kinetic experiments have been studied and methods to overcome this ambiguity have been proposed.Third, multivariate resolution methods have been applied to the study of conformational equilibria of nucleic acids. These are biomolecules that have a hierarchic organization from the nucleotide sequence to higher order structures such as triplex or quadruplex. Traditionally, conformational changes or interactions of nucleic acids with other biomolecules have been spectroscopically monitored at just one wavelength. In this work, these processes have been followed at multiple wavelengths and suitable multivariate resolution methods for the data treatment have been applied. Processes studied during this Thesis have been DNA conformational changes induced by pH, temperature or salinity. Spectroscopic techniques such as molecular absorption in the UV-visible, circular dichroism or nuclear magnetic resonance have been used for this purpose. Finally, data obtained using DNA microarrays have been analyzed. This technique allows highthroughput analysis of relative gene expressions of thousands of genes of an organism that generates large amounts of data. This has caused a need for statistical methods that can extract useful information for further research. In this PhD Thesis, the MCR-ALS method has been proposed for the analysis of this kind of data with very promising results. MATLAB Mètodes de modelatge Anàlisi multivariant de dades Quimiometria MCR-ALS Ciències Experimentals i Matemàtiques 543
17	Time misalignments in fault detection and diagnosis Llanos Rodríguez, David Alejandro 17 October 2008 (has links) El desalineamiento temporal es la incorrespondencia de dos señales debido a una distorsión en el eje temporal. La Detección y Diagnóstico de Fallas (Fault Detection and Diagnosis-FDD) permite la detección, el diagnóstico y la corrección de fallos en un proceso. La metodología usada en FDD está dividida en dos categorías: técnicas basadas en modelos y no basadas en modelos. Esta tesis doctoral trata sobre el estudio del efecto del desalineamiento temporal en FDD. Nuestra atención se enfoca en el análisis y el diseño de sistemas FDD en caso de problemas de comunicación de datos, como retardos y pérdidas. Se proponen dos técnicas para reducir estos problemas: una basada en programación dinámica y la otra en optimización. Los métodos propuestos han sido validados sobre diferentes sistemas dinámicos: control de posición de un motor de corriente continua, una planta de laboratorio y un problema de sistemas eléctricos conocido como hueco de tensión. / Time misalignment is the unmatching of two signals due to a distortion in the time axis. Fault Detection and Diagnosis (FDD) deals with the timely detection, diagnosis and correction of abnormal conditions of faults in a process. The methodology used in FDD is clearly dependent on the process and the sort of available information and it is divided in two categories: model-based and non-model based techniques. This doctoral dissertation deals with the study of time misalignments effects when performing FDD. Our attention is focused on the analysis and design of FDD systems in case of data communication problems, such as delays and dropouts. Techniques based on dynamic programming and optimization are proposed to deal with these problems. Numerical validation of the proposed methods is performed on different dynamic systems: a control position for a DC motor, a laboratory plant and an electrical system problem known as voltage sag. retards i pèrdua de dades modelització de processos xarxes de comunicació detecció i diagnòstic sistemes de control Sistemes de supervisió 621
18	Restauradores en Canton Ticino entre Ottocento y Novecento. Catalogación y gestión de datos Giner Cordero, Ester 19 February 2009 (has links) El objetivo de esta tesis doctoral titulada: "Restauradores en Canton Ticino entre Ottocento y Novecento. Catalogación y gestión de datos" es el de individualizar y clasificar los personajes más relevantes promotores de la revalorización del patrimonio histórico-artístico en su país, concretamente en el campo de la restauración de las pinturas murales, desde los pintores-restauradores más conocidos como Edoardo Berta, Emilio Ferrazzini y Tita Pozzi a otros menos distinguidos como Carlo Cotti, Ottorino Olgiati, Mario Moglia, Nino Facchinetti, Carlo Mazzi y Pompeo Maino. La información recogida entre la hetereogeneidad de documentos que componen archivos públicos, archivos privados y bibliotecas ha sido analizada y ordenada con la finalidad de caracterizar la praxis ejecutiva de cada uno de los personajes individualizados, así como para servir a futuras investigaciones, a restauradores profesionales, a historiadores y a conservadores entre otros. También por este último motivo han sido estudiados los más recientes programas de base de datos factibles a contener, relacionar y visualizar la documentación elaborada, atendiendo a requisitos indispensables como el uso de una terminología normalizada, permitir el libre acceso a los datos y la creación de una red amplia y "viva" que favorezca la difusión de la memoria. / Giner Cordero, E. (2009). Restauradores en Canton Ticino entre Ottocento y Novecento. Catalogación y gestión de datos [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/4141 Canton ticino Pintura mural Restauradors Bases de dades PINTURA 62 - Ciencias de las artes y las letras
19	Facing-up Challenges of Multiobjective Clustering Based on Evolutionary Algorithms: Representations, Scalability and Retrieval Solutions García Piquer, Álvaro 13 April 2012 (has links) Aquesta tesi es centra en algorismes de clustering multiobjectiu, que estan basats en optimitzar varis objectius simultàniament obtenint una col•lecció de solucions potencials amb diferents compromisos entre objectius. El propòsit d'aquesta tesi consisteix en dissenyar i implementar un nou algorisme de clustering multiobjectiu basat en algorismes evolutius per afrontar tres reptes actuals relacionats amb aquest tipus de tècniques. El primer repte es centra en definir adequadament l'àrea de possibles solucions que s'explora per obtenir la millor solució i que depèn de la representació del coneixement. El segon repte consisteix en escalar el sistema dividint el conjunt de dades original en varis subconjunts per treballar amb menys dades en el procés de clustering. El tercer repte es basa en recuperar la solució més adequada tenint en compte la qualitat i la forma dels clusters a partir de la regió més interessant de la col•lecció de solucions ofertes per l’algorisme. / Esta tesis se centra en los algoritmos de clustering multiobjetivo, que están basados en optimizar varios objetivos simultáneamente obteniendo una colección de soluciones potenciales con diferentes compromisos entre objetivos. El propósito de esta tesis consiste en diseñar e implementar un nuevo algoritmo de clustering multiobjetivo basado en algoritmos evolutivos para afrontar tres retos actuales relacionados con este tipo de técnicas. El primer reto se centra en definir adecuadamente el área de posibles soluciones explorada para obtener la mejor solución y que depende de la representación del conocimiento. El segundo reto consiste en escalar el sistema dividiendo el conjunto de datos original en varios subconjuntos para trabajar con menos datos en el proceso de clustering El tercer reto se basa en recuperar la solución más adecuada según la calidad y la forma de los clusters a partir de la región más interesante de la colección de soluciones ofrecidas por el algoritmo. / This thesis is focused on multiobjective clustering algorithms, which are based on optimizing several objectives simultaneously obtaining a collection of potential solutions with different trade¬offs among objectives. The goal of the thesis is to design and implement a new multiobjective clustering technique based on evolutionary algorithms for facing up three current challenges related to these techniques. The first challenge is focused on successfully defining the area of possible solutions that is explored in order to find the best solution, and this depends on the knowledge representation. The second challenge tries to scale-up the system splitting the original data set into several data subsets in order to work with less data in the clustering process. The third challenge is addressed to the retrieval of the most suitable solution according to the quality and shape of the clusters from the most interesting region of the collection of solutions returned by the algorithm. Mineria de dades Clustering Algoritmes evolutius Clustering Multiobjectiu Grans Volums de Dades Minería de datos Algoritmos evolutivos Clustering Multiobjetivo Grandes Volúmenes de Datos Data Mining Evolutionary Algorithms Multiobjective Clustering Large Data Les TIC i la seva gestió 004
20	Aplicacions de tècniques de fusió de dades per a l'anàlisi d'imatges de satèl·lit en Oceanografia Reig Bolaño, Ramon 25 June 2008 (has links) Durant dècades s'ha observat i monitoritzat sistemàticament la Terra i el seu entorn des de l'espai o a partir de plataformes aerotransportades. Paral·lelament, s'ha tractat d'extreure el màxim d'informació qualitativa i quantitativa de les observacions realitzades. Les tècniques de fusió de dades donen un "ventall de procediments que ens permeten aprofitar les dades heterogènies obtingudes per diferents mitjans i instruments i integrar-les de manera que el resultat final sigui qualitativament superior". En aquesta tesi s'han desenvolupat noves tècniques que es poden aplicar a l'anàlisi de dades multiespectrals que provenen de sensors remots, adreçades a aplicacions oceanogràfiques. Bàsicament s'han treballat dos aspectes: les tècniques d'enregistrament o alineament d'imatges; i la interpolació de dades esparses i multiescalars, focalitzant els resultats als camps vectorials bidimensionals.En moltes aplicacions que utilitzen imatges derivades de satèl·lits és necessari mesclar o comparar imatges adquirides per diferents sensors, o bé comparar les dades d'un sòl sensor en diferents instants de temps, per exemple en: reconeixement, seguiment i classificació de patrons o en la monitorització mediambiental. Aquestes aplicacions necessiten una etapa prèvia d'enregistrament geomètric, que alinea els píxels d'una imatge, la imatge de treball, amb els píxels corresponents d'una altra imatge, la imatge de referència, de manera que estiguin referides a uns mateixos punts. En aquest treball es proposa una aproximació automàtica a l'enregistrament geomètric d'imatges amb els contorns de les imatges; a partir d'un mètode robust, vàlid per a imatges mutimodals, que a més poden estar afectades de distorsions, rotacions i de, fins i tot, oclusions severes. En síntesi, s'obté una correspondència punt a punt de la imatge de treball amb el mapa de referència, fent servir tècniques de processament multiresolució. El mètode fa servir les mesures de correlació creuada de les transformades wavelet de les seqüències que codifiquen els contorns de la línia de costa. Un cop s'estableix la correspondència punt a punt, es calculen els coeficients de la transformació global i finalment es poden aplicar a la imatge de treball per a enregistrar-la respecte la referència.A la tesi també es prova de resoldre la interpolació d'un camp vectorial espars mostrejat irregularment. Es proposa un algorisme que permet aproximar els valors intermitjos entre les mostres irregulars si es disposa de valors esparsos a escales de menys resolució. El procediment és òptim si tenim un model que caracteritzi l'esquema multiresolució de descomposició i reconstrucció del conjunt de dades. Es basa en la transformada wavelet discreta diàdica i en la seva inversa, realitzades a partir d'uns bancs de filtres d'anàlisi i síntesi. Encara que el problema està mal condicionat i té infinites solucions, la nostra aproximació, que primer treballarem amb senyals d'una dimensió, dóna una estratègia senzilla per a interpolar els valors d'un camp vectorial bidimensional, utilitzant tota la informació disponible a diferents resolucions. Aquest mètode de reconstrucció es pot utilitzar com a extensió de qualsevol interpolació inicial. També pot ser un mètode adequat si es disposa d'un conjunt de mesures esparses de diferents instruments que prenen dades d'una mateixa escena a diferents resolucions, sense cap restricció en les característiques de la distribució de mesures. Inicialment cal un model dels filtres d'anàlisi que generen les dades multiresolució i els filtres de síntesi corresponents, però aquest requeriment es pot relaxar parcialment, i és suficient tenir una aproximació raonable a la part passa baixes dels filtres. Els resultats de la tesi es podrien implementar fàcilment en el flux de processament d'una estació receptora de satèl·lits, i així es contribuiria a la millora d'aplicacions que utilitzessin tècniques de fusió de dades per a monitoritzar paràmetres mediambientals. / During the last decades a systematic survey of the Earth environment has been set up from many spatial and airborne platforms. At present, there is a continuous effort to extract and combine the maximum of quantitative information from these different data sets, often rather heterogeneous. Data fusion can be defined as "a set of means and tools for the alliance of data originating from different sources with the aims of a greater quality result". In this thesis we have developed new techniques and schemes that can be applied on multispectral data obtained from remote sensors, with particular interest in oceanographic applications. They are based on image and signal processing. We have worked mainly on two topics: image registration techniques or image alignment; and data interpolation of multiscale and sparse data sets, with focus on two dimensional vector fields. In many applications using satellite images, and specifically in those related to oceanographic studies, it is necessary to merge or compare multiple images of the same scene acquired from different captors or from one captor but at different times. Typical applications include pattern classification, recognition and tracking, multisensor data fusion and environmental monitoring. Image registration is the process of aligning the remotely sensed images to the same ground truth and transforming them into a known geographic projection (map coordinates). This step is crucial to correctly merge complementary information from multisensor data. The proposed approach to automatic image registration is a robust method, valid for multimodal images affected by distortions, rotations and, to a reasonably extend, with severe data occlusion. We derived a point to point matching of one image to a georeferenced map applying multiresolution signal processing techniques. The method is based on the contours of images: it uses a maximum cross correlation measure on the biorthogonal undecimated discrete wavelet transforms of the codified coastline contours sequences. Once this point to point correspondence is established, the coefficients of a global transform could be calculated and finally applied on the working image to register it to the georeferenced map. The second topic of this thesis focus on the interpolation of sparse irregularly-sampled vector fields when these sparse data belong to different resolutions. It is proposed a new algorithm to iteratively approximate the intermediate values between irregularly sampled data when a set of sparse values at coarser scales is known. The procedure is optimal if there is a characterized model for the multiresolution decomposition / reconstruction scheme of the dataset. The scheme is based on a fast dyadic wavelet transform and on its inversion using a filter bank analysis/synthesis implementation for the wavelet transform model. Although the problem is ill-posed, and there are infinite solutions, our approach, firstly worked for one dimension signals, gives an easy strategy to interpolate the values of a vector field using all the information available at different scales. This reconstruction method could be used as an extension on any initial interpolation. It can also be suitable in cases where there are sparse measures from different instruments that are sensing the same scene simultaneously at several resolutions, without any restriction to the characteristics of the data distribution. Initially a filter model for the generation of multiresolution data and their synthesis counterpart is the main requisite but; this assumption can be partially relaxed with the only requirement of a reasonable approximation to the low pass counterpart. The thesis results can be easily implemented on the process stream of any satellite receiving station and therefore constitute a first contribution to potential applications on data fusion of environmental monitoring. alineament d'imatges - image alignment fusió de dades - data fusion teledetecció - remote sensing 621.3

Search results