Global ETD Search

141	Konzeption und Entwicklung eines automatisierten Workflows zur geovisuellen Analyse von georeferenzierten Textdaten(strömen) / Microblogging Content Gröbe, Mathias 13 October 2015 (has links) Die vorliegende Masterarbeit behandelt den Entwurf und die exemplarische Umsetzung eines Arbeitsablaufs zur Aufbereitung von georeferenziertem Microblogging Content. Als beispielhafte Datenquelle wurde Twitter herangezogen. Darauf basierend, wurden Überlegungen angestellt, welche Arbeitsschritte nötig und mit welchen Mitteln sie am besten realisiert werden können. Dabei zeigte sich, dass eine ganze Reihe von Bausteinen aus dem Bereich des Data Mining und des Text Mining für eine Pipeline bereits vorhanden sind und diese zum Teil nur noch mit den richtigen Einstellungen aneinandergereiht werden müssen. Zwar kann eine logische Reihenfolge definiert werden, aber weitere Anpassungen auf die Fragestellung und die verwendeten Daten können notwendig sein. Unterstützt wird dieser Prozess durch verschiedenen Visualisierungen mittels Histogrammen, Wortwolken und Kartendarstellungen. So kann neues Wissen entdeckt und nach und nach die Parametrisierung der Schritte gemäß den Prinzipien des Geovisual Analytics verfeinert werden. Für eine exemplarische Umsetzung wurde nach der Betrachtung verschiedener Softwareprodukte die für statistische Anwendungen optimierte Programmiersprache R ausgewählt. Abschließend wurden die Software mit Daten von Twitter und Flickr evaluiert. / This Master's Thesis deals with the conception and exemplary implementation of a workflow for georeferenced Microblogging Content. Data from Twitter is used as an example and as a starting point to think about how to build that workflow. In the field of Data Mining and Text Mining, there was found a whole range of useful software modules that already exist. Mostly, they only need to get lined up to a process pipeline using appropriate preferences. Although a logical order can be defined, further adjustments according to the research question and the data are required. The process is supported by different forms of visualizations such as histograms, tag clouds and maps. This way new knowledge can be discovered and the options for the preparation can be improved. This way of knowledge discovery is already known as Geovisual Analytics. After a review of multiple existing software tools, the programming language R is used to implement the workflow as this language is optimized for solving statistical problems. Finally, the workflow has been tested using data from Twitter and Flickr. info:eu-repo/classification/ddc/550 ddc:550
142	Lineamientos para la integración de minería de procesos y visualización de datos / Guidelines for the integration of process mining and data visualization Chise Teran, Bryhan, Hurtado Bravo, Jimmy Manuel 04 December 2020 (has links) Process mining es una disciplina que ha tomado mayor relevancia en los últimos años; prueba de ello es un estudio realizado por la consultora italiana HSPI en el 2018, donde se indica un crecimiento del 72% de casos de estudio aplicados sobre process mining con respecto al año 2017. Así mismo, un reporte publicado en el mismo año por BPTrends, firma especializada en procesos de negocio, afirma que las organizaciones tienen como prioridad en sus proyectos estratégicos el rediseño y automatización de sus principales procesos de negocio. La evolución de esta disciplina ha permitido superar varios de los retos que se identificaron en un manifiesto [1] realizado por los miembros de la IEEE Task Force on Process Mining en el 2012. En este sentido, y apoyados en el desafío número 11 de este manifiesto, el objetivo de este proyecto es integrar las disciplinas de process mining y data visualization a través de un modelo de interacción de lineamientos que permitan mejorar el entendimiento de los usuarios no expertos1 en los resultados gráficos de proyectos de process mining, a fin de optimizar los procesos de negocio en las organizaciones. Nuestro aporte tiene como objetivo mejorar el entendimiento de los usuarios no expertos en el campo de process mining. Por ello, nos apoyamos de las técnicas de data visualization y de la psicología del color para proponer un modelo de interacción de lineamientos que permita guiar a los especialistas en process mining a diseñar gráficos que transmitan de forma clara y comprensible. Con ello, se busca comprender de mejor forma los resultados de los proyectos de process mining, permitiéndonos tomar mejores decisiones sobre el desempeño de los procesos de negocio en las organizaciones. El modelo de interacción generado en nuestra investigación se validó con un grupo de usuarios relacionados a procesos críticos de diversas organizaciones del país. Esta validación se realizó a través de una encuesta donde se muestran casos a dichos usuarios a fin de constatar las 5 variables que se definieron para medir de forma cualitativa el nivel de mejora en la compresión de los gráficos al aplicar los lineamientos del modelo de interacción. Los resultados obtenidos demostraron que 4 de las 5 variables tuvieron un impacto positivo en la percepción de los usuarios según el caso que se propuso en forma de pregunta. / Process mining is a discipline that has become more relevant in recent years; proof of this is a study carried out by the Italian consultancy HSPI in 2018, where a growth of 72% of case studies applied on process mining is indicated compared to 2017. Likewise, a report published in the same year by BPTrends, a firm specialized in business processes, affirms that organizations have as a priority in their strategic projects the redesign and automation of their main business processes. The evolution of this discipline has made it possible to overcome several of the challenges that were identified in a manifesto [1] made by the members of the IEEE Task Force on Process Mining in 2012. In this sense, and supported by challenge number 11 of this manifesto, the objective of this project is to integrate the disciplines of process mining and data visualization through an interaction model of guidelines that allow to improve the understanding of non-expert users in the graphical results of process mining projects, in order to optimize the business processes in organizations. Our contribution aims to improve the understanding of non-expert users in the field of process mining. For this reason, we rely on data visualization techniques and color psychology to propose an interaction model of guidelines that allows us to guide process mining specialists to design graphics that convey clearly and understandably. With this, it seeks to better understand the results of process mining projects, allowing us to make better decisions about the performance of business processes in organizations. The interaction model generated in our research was validated with a group of users related to critical processes from various organizations in the country. This validation was carried out through a survey where cases are shown to these users in order to verify the 5 variables that were defined to qualitatively measure the level of improvement in the compression of the graphs when applying the guidelines of the interaction model. The results obtained showed that 4 of the 5 variables had a positive impact on the perception of users according to the case that was proposed in the form of a question. / Tesis Minería de procesos Visualización de datos Modelo de interacción Analítica visual Psicología del color Process mining Data visualization Interaction model Visual analytics Color psychology
143	Understanding High-Dimensional Data Using Reeb Graphs Harvey, William John 14 August 2012 (has links) No description available. Bioinformatics Computer Science computer science computational geometry computational topology geometry topology high dimensions high dimensional data Reeb graph contour tree visualization visual analytics Morse theory protein folding molecular dynamics survivin
144	Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA / with Applications for QuantNet 2.0 and GitHub Borke, Lukas 08 September 2017 (has links) Mit der wachsenden Popularität von GitHub, dem größten Online-Anbieter von Programm-Quellcode und der größten Kollaborationsplattform der Welt, hat es sich zu einer Big-Data-Ressource entfaltet, die eine Vielfalt von Open-Source-Repositorien (OSR) anbietet. Gegenwärtig gibt es auf GitHub mehr als eine Million Organisationen, darunter solche wie Google, Facebook, Twitter, Yahoo, CRAN, RStudio, D3, Plotly und viele mehr. GitHub verfügt über eine umfassende REST API, die es Forschern ermöglicht, wertvolle Informationen über die Entwicklungszyklen von Software und Forschung abzurufen. Unsere Arbeit verfolgt zwei Hauptziele: (I) ein automatisches OSR-Kategorisierungssystem für Data Science Teams und Softwareentwickler zu ermöglichen, das Entdeckbarkeit, Technologietransfer und Koexistenz fördert. (II) Visuelle Daten-Exploration und thematisch strukturierte Navigation innerhalb von GitHub-Organisationen für reproduzierbare Kooperationsforschung und Web-Applikationen zu etablieren. Um Mehrwert aus Big Data zu generieren, ist die Speicherung und Verarbeitung der Datensemantik und Metadaten essenziell. Ferner ist die Wahl eines geeigneten Text Mining (TM) Modells von Bedeutung. Die dynamische Kalibrierung der Metadaten-Konfigurationen, TM Modelle (VSM, GVSM, LSA), Clustering-Methoden und Clustering-Qualitätsindizes wird als "Smart Clusterization" abgekürzt. Data-Driven Documents (D3) und Three.js (3D) sind JavaScript-Bibliotheken, um dynamische, interaktive Datenvisualisierung zu erzeugen. Beide Techniken erlauben Visuelles Data Mining (VDM) in Webbrowsern, und werden als D3-3D abgekürzt. Latent Semantic Analysis (LSA) misst semantische Information durch Kontingenzanalyse des Textkorpus. Ihre Eigenschaften und Anwendbarkeit für Big-Data-Analytik werden demonstriert. "Smart clusterization", kombiniert mit den dynamischen VDM-Möglichkeiten von D3-3D, wird unter dem Begriff "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA" zusammengefasst. / With the growing popularity of GitHub, the largest host of source code and collaboration platform in the world, it has evolved to a Big Data resource offering a variety of Open Source repositories (OSR). At present, there are more than one million organizations on GitHub, among them Google, Facebook, Twitter, Yahoo, CRAN, RStudio, D3, Plotly and many more. GitHub provides an extensive REST API, which enables scientists to retrieve valuable information about the software and research development life cycles. Our research pursues two main objectives: (I) provide an automatic OSR categorization system for data science teams and software developers promoting discoverability, technology transfer and coexistence; (II) establish visual data exploration and topic driven navigation of GitHub organizations for collaborative reproducible research and web deployment. To transform Big Data into value, in other words into Smart Data, storing and processing of the data semantics and metadata is essential. Further, the choice of an adequate text mining (TM) model is important. The dynamic calibration of metadata configurations, TM models (VSM, GVSM, LSA), clustering methods and clustering quality indices will be shortened as "smart clusterization". Data-Driven Documents (D3) and Three.js (3D) are JavaScript libraries for producing dynamic, interactive data visualizations, featuring hardware acceleration for rendering complex 2D or 3D computer animations of large data sets. Both techniques enable visual data mining (VDM) in web browsers, and will be abbreviated as D3-3D. Latent Semantic Analysis (LSA) measures semantic information through co-occurrence analysis in the text corpus. Its properties and applicability for Big Data analytics will be demonstrated. "Smart clusterization" combined with the dynamic VDM capabilities of D3-3D will be summarized under the term "Dynamic Clustering and Visualization of Smart Data via D3-3D-LSA". GitHub Mining Infrastruktur Software Mining Text Mining Verallgemeinerte Vektorraum-Modelle YAML Visuelles Data Mining Visual Analytics D3.js Clusteranalyse Qualitätsindizes Validierungs-Pipeline Verteilt-paralleles Rechnen Reproduzierbare Kooperationsforschung Risk Analytics GitHub Mining infrastructure Software Mining Text Mining Generalized Vector Space Models YAML Visual Data Mining Visual Analytics D3.js Cluster Analysis Quality Indices Validation Pipeline Cluster and Parallel Computing Collaborative Reproducible Research Risk Analytics 330 Wirtschaft QH 250 ddc:330
145	Visualizing the Ethiopian Commodity Market Rogstadius, Jakob January 2009 (has links) The Ethiopia Commodity Exchange (ECX), like many other data intensive organizations, is having difficulties making full use of the vast amounts of data that it collects. This MSc thesis identifies areas within the organization where concepts from the academic fields of information visualization and visual analytics can be applied to address this issue.Software solutions are designed and implemented in two areas with the purpose of evaluating the approach and to demonstrate to potential users, developers and managers what can be achieved using this method. A number of presentation methods are proposed for the ECX website, which previously contained no graphing functionality for market data, to make it easier for users to find trends, patterns and outliers in prices and trade volumes of commodieties traded at the exchange. A software application is also developed to support the ECX market surveillance team by drastically improving its capabilities of investigating complex trader relationships.Finally, as ECX lacked previous experiences with visualization, one software developer was trained in computer graphics and involved in the work, to enable continued maintenance and future development of new visualization solutions within the organization. information visualization financial finance media technology data visualization coffee market surveillance ethiopia commodity exchange africa developing market data market information visual analytics Information Systems Computer Sciences Datavetenskap (datalogi) Other Computer and Information Science Annan data- och informationsvetenskap Computer and Information Sciences Data- och informationsvetenskap
146	Searching for novel gene functions in yeast : identification of thousands of novel molecular interactions by protein-fragment complementation assay followed by automated gene function prediction and high-throughput lipidomics Tarasov, Kirill 09 1900 (has links) No description available. Interaction protéine-protéine Protéine membranaire Métabolisme des lipides Apprentissage automatique Prédiction de la fonction d’un gène Visualisation analytique Criblage à haut débit Protein-protein interactions Protein-fragment complementation assays High-throughput screen Membrane proteins Lipid metabolism Lipidomics Machine learning Gene function prediction Visual analytics
147	Scalable Multimedia Learning: From local eLectures to global Opencast Ketterl, Markus 27 March 2014 (has links) Universities want to go where the learners are to share their rich scientific and intellectual knowledge beyond the walls of the academy and to expand the boundaries of the classroom. This desire has become a critical need, as the worldwide economy adjusts to globalization and the need for advanced education and training becomes ever more critical. Unfortunately, the work of creating, processing, distributing and using quality multimedia learning content is expensive and technically challenging. The work combines research results, lessons learned and usage findings in the presentation of a fully open source based scalable lecture capture solution, that is useful in the heterogenous computing landscape of today’s universities and learning institutes. Especially implemented user facing applications and components are being addressed, which enable lecturers, faculty and students to record, analyze and subsequently re-use the recorded multimedia learning material in multiple and attractive ways across devices and distribution platforms. adaptive multimedia data mining dynamic media objects eLectures e-learning feeds lecture recording learning portals multimedia micro learning m-learning human computer interaction rich internet applications user interfaces mobile development streams web 2.0 web technologies web lectures video based learning visual analytics open source opencast podcasting recommender systems social software social navigation ddc:000 ddc:500 ddc:600 ddc:620
148	VISUAL ANALYTICS OF BIG DATA FROM MOLECULAR DYNAMICS SIMULATION Catherine Jenifer Rajam Rajendran (5931113) 03 February 2023 (has links) <p>Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations.</p> <p>The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein.</p> <p>Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD</p> <p>using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions.</p> <p>Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs.</p> <p>Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins.</p> <p>Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it.</p> Applications in life sciences Spatial data and applications Semi- and unsupervised learning Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Pearson Coefficient Correlation Protein Structure Analysis Molecular Dynamics Simulation Study Paired-Distance Spatial-Temporal Data Pseudo-Symmetry
149	Probabilistic methods for multi-source and temporal biomedical data quality assessment Sáez Silvestre, Carlos 05 April 2016 (has links) [EN] Nowadays, biomedical research and decision making depend to a great extent on the data stored in information systems. As a consequence, a lack of data quality (DQ) may lead to suboptimal decisions, or hinder the derived research processes and outcomes. This thesis aims to the research and development of methods for assessing two DQ problems of special importance in Big Data and large-scale repositories, based on multi-institutional, cross-border infrastructures, and acquired during long periods of time: the variability of data probability distributions (PDFs) among different data sources-multi-source variability-and the variability of data PDFs over time-temporal variability. Variability in PDFs may be caused by differences in data acquisition methods, protocols or health care policies; systematic or random errors during data input and management; demographic differences in populations; or even falsified data. To date, these issues have received little attention as DQ problems nor count with adequate assessment methods. The developed methods aim to measure, detect and characterize variability dealing with multi-type, multivariate, multi-modal data, and not affected by large sample sizes. To this end, we defined an Information Theory and Geometry probabilistic framework based on the inference of non-parametric statistical manifolds from the normalized distances of PDFs among data sources and over time. Based on this, a number of contributions have been generated. For the multi-source variability assessment we have designed two metrics: the Global Probabilistic Deviation, which measures the degree of global variability among the PDFs of multiple sources-equivalent to the standard deviation among PDFs; and the Source Probabilistic Outlyingness, which measures the dissimilarity of the PDF of a single data source to a global latent average. They are based on the construction of a simplex geometrical figure (the maximum-dimensional statistical manifold) using the distances among sources, and complemented by the Multi-Source Variability plot, an exploratory visualization of that simplex which permits detecting grouping patterns among sources. The temporal variability method provides two main tools: the Information Geometric Temporal plot, an exploratory visualization of the temporal evolution of PDFs based on the projection of the statistical manifold from temporal batches; and the PDF Statistical Process Control, a monitoring and automatic change detection algorithm for PDFs. The methods have been applied to repositories in real case studies, including the Public Health Mortality and Cancer Registries of the Region of Valencia, Spain; the UCI Heart Disease; the United States NHDS; and Spanish Breast Cancer and an In-Vitro Fertilization datasets. The methods permitted discovering several findings such as partitions of the repositories in probabilistically separated temporal subgroups, punctual temporal anomalies due to anomalous data, and outlying and clustered data sources due to differences in populations or in practices. A software toolbox including the methods and the automated generation of DQ reports was developed. Finally, we defined the theoretical basis of a biomedical DQ evaluation framework, which have been used in the construction of quality assured infant feeding repositories, in the contextualization of data for their reuse in Clinical Decision Support Systems using an HL7-CDA wrapper; and in an on-line service for the DQ evaluation and rating of biomedical data repositories. The results of this thesis have been published in eight scientific contributions, including top-ranked journals and conferences. One of the journal publications was selected by the IMIA as one of the best of Health Information Systems in 2013. Additionally, the results have contributed to several research projects, and have leaded the way to the industrialization of the developed methods and approaches for the audit and control of biomedical DQ. / [ES] Actualmente, la investigación biomédica y toma de decisiones dependen en gran medida de los datos almacenados en los sistemas de información. En consecuencia, una falta de calidad de datos (CD) puede dar lugar a decisiones sub-óptimas o dificultar los procesos y resultados de las investigaciones derivadas. Esta tesis tiene como propósito la investigación y desarrollo de métodos para evaluar dos problemas especialmente importantes en repositorios de datos masivos (Big Data), basados en infraestructuras multi-céntricas, adquiridos durante largos periodos de tiempo: la variabilidad de las distribuciones de probabilidad (DPs) de los datos entre diferentes fuentes o sitios-variabilidad multi-fuente-y la variabilidad de las distribuciones de probabilidad de los datos a lo largo del tiempo-variabilidad temporal. La variabilidad en DPs puede estar causada por diferencias en los métodos de adquisición, protocolos o políticas de atención; errores sistemáticos o aleatorios en la entrada o gestión de datos; diferencias demográficas en poblaciones; o incluso por datos falsificados. Esta tesis aporta métodos para detectar, medir y caracterizar dicha variabilidad, tratando con datos multi-tipo, multivariantes y multi-modales, y sin ser afectados por tamaños muestrales grandes. Para ello, hemos definido un marco de Teoría y Geometría de la Información basado en la inferencia de variedades de Riemann no-paramétricas a partir de distancias normalizadas entre las PDs de varias fuentes de datos o a lo largo del tiempo. En consecuencia, se han aportado las siguientes contribuciones: Para evaluar la variabilidad multi-fuente se han definido dos métricas: la Global Probabilistic Deviation, la cual mide la variabilidad global entre las PDs de varias fuentes-equivalente a la desviación estándar entre PDs; y la Source Probabilistic Outlyingness, la cual mide la disimilaridad entre la DP de una fuente y un promedio global latente. Éstas se basan en un simplex construido mediante las distancias entre las PDs de las fuentes. En base a éste, se ha definido el Multi-Source Variability plot, visualización que permite detectar patrones de agrupamiento entre fuentes. El método de variabilidad temporal proporciona dos herramientas: el Information Geometric Temporal plot, visualización exploratoria de la evolución temporal de las PDs basada en la la variedad estadística de los lotes temporales; y el Control de Procesos Estadístico de PDs, algoritmo para la monitorización y detección automática de cambios en PDs. Los métodos han sido aplicados a casos de estudio reales, incluyendo: los Registros de Salud Pública de Mortalidad y Cáncer de la Comunidad Valenciana; los repositorios de enfermedades del corazón de UCI y NHDS de los Estados Unidos; y repositorios españoles de Cáncer de Mama y Fecundación In-Vitro. Los métodos detectaron hallazgos como particiones de repositorios en subgrupos probabilísticos temporales, anomalías temporales puntuales, y fuentes de datos agrupadas por diferencias en poblaciones y en prácticas. Se han desarrollado herramientas software incluyendo los métodos y la generación automática de informes. Finalmente, se ha definido la base teórica de un marco de CD biomédicos, el cual ha sido utilizado en la construcción de repositorios de calidad para la alimentación del lactante, en la contextualización de datos para el reuso en Sistemas de Ayuda a la Decisión Médica usando un wrapper HL7-CDA, y en un servicio on-line para la evaluación y clasificación de la CD de repositorios biomédicos. Los resultados de esta tesis han sido publicados en ocho contribuciones científicas (revistas indexadas y artículos en congresos), una de ellas seleccionada por la IMIA como una de las mejores publicaciones en Sistemas de Información de Salud en 2013. Los resultados han contribuido en varios proyectos de investigación, y facilitado los primeros pasos hacia la industrialización de las tecnologías / [CA] Actualment, la investigació biomèdica i presa de decisions depenen en gran mesura de les dades emmagatzemades en els sistemes d'informació. En conseqüència, una manca en la qualitat de les dades (QD) pot donar lloc a decisions sub-òptimes o dificultar els processos i resultats de les investigacions derivades. Aquesta tesi té com a propòsit la investigació i desenvolupament de mètodes per avaluar dos problemes especialment importants en repositoris de dades massius (Big Data) basats en infraestructures multi-institucionals o transfrontereres, adquirits durant llargs períodes de temps: la variabilitat de les distribucions de probabilitat (DPs) de les dades entre diferents fonts o llocs-variabilitat multi-font-i la variabilitat de les distribucions de probabilitat de les dades al llarg del temps-variabilitat temporal. La variabilitat en DPs pot estar causada per diferències en els mètodes d'adquisició, protocols o polítiques d'atenció; errors sistemàtics o aleatoris durant l'entrada o gestió de dades; diferències demogràfiques en les poblacions; o fins i tot per dades falsificades. Aquesta tesi aporta mètodes per detectar, mesurar i caracteritzar aquesta variabilitat, tractant amb dades multi-tipus, multivariants i multi-modals, i no sent afectats per mides mostrals grans. Per a això, hem definit un marc de Teoria i Geometria de la Informació basat en la inferència de varietats de Riemann no-paramètriques a partir de distàncies normalitzades entre les DPs de diverses fonts de dades o al llarg del temps. En conseqüència s'han aportat les següents contribucions: Per avaluar la variabilitat multi-font s'han definit dos mètriques: la Global Probabilistic Deviation, la qual mesura la variabilitat global entre les DPs de les diferents fonts-equivalent a la desviació estàndard entre DPs; i la Source Probabilistic Outlyingness, la qual mesura la dissimilaritat entre la DP d'una font de dades donada i una mitjana global latent. Aquestes estan basades en la construcció d'un simplex mitjançant les distàncies en les DPs entre fonts. Basat en aquest, s'ha definit el Multi-Source Variability plot, una visualització que permet detectar patrons d'agrupament entre fonts. El mètode de variabilitat temporal proporciona dues eines: l'Information Geometric Temporal plot, visualització exploratòria de l'evolució temporal de les distribucions de dades basada en la varietat estadística dels lots temporals; i el Statistical Process Control de DPs, algoritme per al monitoratge i detecció automàtica de canvis en les DPs de dades. Els mètodes han estat aplicats en repositoris de casos d'estudi reals, incloent: els Registres de Salut Pública de Mortalitat i Càncer de la Comunitat Valenciana; els repositoris de malalties del cor de UCI i NHDS dels Estats Units; i repositoris espanyols de Càncer de Mama i Fecundació In-Vitro. Els mètodes han detectat troballes com particions dels repositoris en subgrups probabilístics temporals, anomalies temporals puntuals, i fonts de dades anòmales i agrupades a causa de diferències en poblacions i en les pràctiques. S'han desenvolupat eines programari incloent els mètodes i la generació automàtica d'informes. Finalment, s'ha definit la base teòrica d'un marc de QD biomèdiques, el qual ha estat utilitzat en la construcció de repositoris de qualitat per l'alimentació del lactant, la contextualització de dades per a la reutilització en Sistemes d'Ajuda a la Decisió Mèdica usant un wrapper HL7-CDA, i en un servei on-line per a l'avaluació i classificació de la QD de repositoris biomèdics. Els resultats d'aquesta tesi han estat publicats en vuit contribucions científiques (revistes indexades i en articles en congressos), una de elles seleccionada per la IMIA com una de les millors publicacions en Sistemes d'Informació de Salut en 2013. Els resultats han contribuït en diversos projectes d'investigació, i han facilitat la industrialització de les tecnologies d / Sáez Silvestre, C. (2016). Probabilistic methods for multi-source and temporal biomedical data quality assessment [Tesis doctoral]. Editorial Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62188 / Premiado Data quality Biomedical data Data reuse Variability Multi-site data Monitoring Process control Change detection Big data Information theory Information geometry Visual analytics Probability Case studies Mortality Cancer Breast cancer Heart diseases Nhds Ivf Calidad de datos Datos biomédicos Reuso de datos Variabilidad Datos multi-céntricos Monitorización Control de procesos Detección de cambios Datos masivos Teoría de la información Geometría de la información Visualización analítica Probabilidad Casos de estudio Mortalidad Cáncer Cáncer de mama Enfermedades cardiovasculares Fiv FISICA APLICADA
150	Creation, deconstruction, and evaluation of a biochemistry animation about the role of the actin cytoskeleton in cell motility Kevin Wee (11198013) 28 July 2021 (has links) <p>External representations (ERs) used in science education are multimodal ensembles consisting of design elements to convey educational meanings to the audience. As an example of a dynamic ER, an animation presenting its content features (i.e., scientific concepts) via varying the feature’s depiction over time. A production team invited the dissertation author to inspect their creation of a biochemistry animation about the role of the actin cytoskeleton in cell motility and the animation’s implication on learning. To address this, the author developed a four-step methodology entitled the Multimodal Variation Analysis of Dynamic External Representations (MVADER) that deconstructs the animation’s content and design to inspect how each content feature is conveyed via the animation’s design elements.</p><p><br></p><p> </p><p>This dissertation research investigated the actin animation’s educational value and the MVADER’s utility in animation evaluation. The research design was guided by descriptive case study methodology and an integrated framework consisting of the variation theory, multimodal analysis, and visual analytics. As stated above, the animation was analyzed using MVADER. The development of the actin animation and the content features the production team members intended to convey via the animation were studied by analyzing the communication records between the members, observing the team meetings, and interviewing the members individually. Furthermore, students’ learning experiences from watching the animation were examined via semi-structured interviews coupled with post- storyboarding. Moreover, the instructions of MVADER and its applications in studying the actin animation were reviewed to determine the MVADER’s usefulness as an animation evaluation tool.</p><p><br></p><p> </p><p>Findings of this research indicate that the three educators in the production team intended the actin animation to convey forty-three content features to the undergraduate biology students. At least 50% of the student who participated in this thesis learned thirty-five of these forty-three (> 80%) features. Evidence suggests that the animation’s effectiveness to convey its features was associated with the features’ depiction time, the number of identified design elements applied to depict the features, and the features’ variation of depiction over time.</p><p><br></p><p>Additionally, one-third of the student participants made similar mistakes regarding two content features after watching the actin animation: the F-actin elongation and the F-actin crosslink structure in lamellipodia. The analysis reveals the animation’s potential design flaws that might have contributed to these common misconceptions. Furthermore, two disruptors to the creation process and the educational value of the actin animation were identified: the vagueness of the learning goals and the designer’s placement of the animation’s beauty over its reach to the learning goals. The vagueness of the learning goals hampered the narration scripting process. On the other hand, the designer’s prioritization of the animation’s aesthetic led to the inclusion of a “beauty shot” in the animation that caused students’ confusion.</p><p><br></p><p> </p><p>MVADER was used to examine the content, design, and their relationships in the actin animation at multiple aspects and granularities. The result of MVADER was compared with the students’ learning outcomes from watching the animation to identify the characteristics of content’s depiction that were constructive and disruptive to learning. These findings led to several practical recommendations to teach using the actin animation and create educational ERs.</p><p><br></p><p> </p><p>To conclude, this dissertation discloses the connections between the creation process, the content and design, and the educational implication of a biochemistry animation. It also introduces MVADER as a novel ER analysis tool to the education research and visualization communities. MVADER can be applied in various formats of static and dynamic ERs and beyond the disciplines of biology and chemistry.</p> Biochemistry Design Education Media Studies Animal Cell and Molecular Biology Computer Graphics Higher Education Time-Series Analysis Education not elsewhere classified Computer Gaming and Animation Education Assessment and Evaluation Educational Technology and Computing chemistry education research Chemistry Education biochemistry education biology education research Education research methodology data visualization methods data visualizations Visual analytics Tableau Desktop Animation (Cinematography) multimedia research Research methods Mixed method research design media research Science education - assessment Movie analysis

Search results