Global ETD Search

71	Scalable and explainable self-supervised motif discovery in temporal data Bakhtiari Ramezani, Somayeh 08 December 2023 (has links) (PDF) The availability of a scalable and explainable rule extraction technique via motif discovery is crucial for identifying the health states of a system. Such a technique can enable the creation of a repository of normal and abnormal states of the system and identify the system’s state as we receive data. In complex systems such as ECG, each activity session can consist of a long sequence of motifs that form different global structures. As a result, applying machine learning algorithms without first identifying the local patterns is not feasible and would result in low performance. Thus, extracting unique local motifs and establishing a database of prototypes or signatures is a crucial first step in analyzing long temporal data that reduces the computational cost and overcomes imbalanced data. The present research aims to streamline the extraction of motifs and add explainability to their analysis by identifying their differences. We have developed a novel framework for unsupervised motif extraction. We also offer a robust algorithm to identify unique motifs and their signatures, coupled with a proper distance metric to compare the signatures of partially similar motifs. Defining such distance metrics allows us to assign a degree of semblance between two motifs that may have different lengths or contain noise. We have tested our framework against five different datasets and observed excellent results, including extraction of motifs from 100 million samples in 8.02 seconds, 99.90% accuracy in self-supervised ECG data classification, and an average error of 16.66% in RUL prediction of bearing failure. Motif discovery Temporal data Self-supervised Pattern Clustering Pattern detection Predictive maintenance Anomaly detection ECG data Artificial Intelligence and Robotics Data Science
72	Visual Analytics of Big Data from Molecular Dynamics Simulation Rajendran, Catherine Jenifer Rajam 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations. The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein. Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions. Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs. Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins. Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it. Visual Analytics Data Visualization Principal Component Analysis Parallel Computing Protein Structure Analysis Molecular Dynamics Simulation Study Spatial-Temporal Data Paired-Distances Pseudo-Symmetry in Proteins
73	Ontology-Based Query Answering for Probabilistic Temporal Data: Extended Version Koopmann, Patrick 20 June 2022 (has links) We investigate ontology-based query answering for data that are both temporal and probabilistic, which might occur in contexts such as stream reasoning or situation recognition with uncertain data. We present a framework that allows to represent temporal probabilistic data, and introduce a query language with which complex temporal and probabilistic patterns can be described. Specifically, this language combines conjunctive queries with operators from linear time logic as well as probability operators. We analyse the complexities of evaluating queries in this language in various settings. While in some cases, combining the temporal and the probabilistic dimension in such a way comes at the cost of increased complexity, we also determine cases for which this increase can be avoided. / This is an extended version of the article to appear in the proceedings of AAAI 2019. info:eu-repo/classification/ddc/004 ddc:004
74	Some Advanced Semiparametric Single-index Modeling for Spatially-Temporally Correlated Data Mahmoud, Hamdy F. F. 09 October 2014 (has links) Semiparametric modeling is a hybrid of the parametric and nonparametric modelings where some function forms are known and others are unknown. In this dissertation, we have made several contributions to semiparametric modeling based on the single index model related to the following three topics: the first is to propose a model for detecting change points simultaneously with estimating the unknown function; the second is to develop two models for spatially correlated data; and the third is to further develop two models for spatially-temporally correlated data. To address the first topic, we propose a unified approach in its ability to simultaneously estimate the nonlinear relationship and change points. We propose a single index change point model as our unified approach by adjusting for several other covariates. We nonparametrically estimate the unknown function using kernel smoothing and also provide a permutation based testing procedure to detect multiple change points. We show the asymptotic properties of the permutation testing based procedure. The advantage of our approach is demonstrated using the mortality data of Seoul, Korea from January, 2000 to December, 2007. On the second topic, we propose two semiparametric single index models for spatially correlated data. One additively separates the nonparametric function and spatially correlated random effects, while the other does not separate the nonparametric function and spatially correlated random effects. We estimate these two models using two algorithms based on Markov Chain Expectation Maximization algorithm. Our approaches are compared using simulations, suggesting that the semiparametric single index nonadditive model provides more accurate estimates of spatial correlation. The advantage of our approach is demonstrated using the mortality data of six cities, Korea from January, 2000 to December, 2007. The third topic involves proposing two semiparametric single index models for spatially and temporally correlated data. Our first model has the nonparametric function which can separate from spatially and temporally correlated random effects. We refer it to "semiparametric spatio-temporal separable single index model (SSTS-SIM)", while the second model does not separate the nonparametric function from spatially correlated random effects but separates the time random effects. We refer our second model to "semiparametric nonseparable single index model (SSTN-SIM)". Two algorithms based on Markov Chain Expectation Maximization algorithm are introduced to simultaneously estimate parameters, spatial effects, and times effects. The proposed models are then applied to the mortality data of six major cities in Korea. Our results suggest that SSTN-SIM is more flexible than SSTS-SIM because it can estimate various nonparametric functions while SSTS-SIM enforces the similar nonparametric curves. SSTN-SIM also provides better estimation and prediction. / Ph. D. Change Point Generalized Linear Model Generalized Additive Model Markov Chain Expectation Maximization Mixed model Permutation Test Semiparametric regression Single Index model Spatially correlated data Spatio-temporal data.
75	Digital Intelligence – Möglichkeiten und Umsetzung einer informatikgestützten Frühaufklärung / Digital Intelligence – opportunities and implementation of a data-driven foresight Walde, Peter 18 January 2011 (has links) (PDF) Das Ziel der Digital Intelligence bzw. datengetriebenen Strategischen Frühaufklärung ist, die Zukunftsgestaltung auf Basis valider und fundierter digitaler Information mit vergleichsweise geringem Aufwand und enormer Zeit- und Kostenersparnis zu unterstützen. Hilfe bieten innovative Technologien der (halb)automatischen Sprach- und Datenverarbeitung wie z. B. das Information Retrieval, das (Temporal) Data, Text und Web Mining, die Informationsvisualisierung, konzeptuelle Strukturen sowie die Informetrie. Sie ermöglichen, Schlüsselthemen und latente Zusammenhänge aus einer nicht überschaubaren, verteilten und inhomogenen Datenmenge wie z. B. Patenten, wissenschaftlichen Publikationen, Pressedokumenten oder Webinhalten rechzeitig zu erkennen und schnell und zielgerichtet bereitzustellen. Die Digital Intelligence macht somit intuitiv erahnte Muster und Entwicklungen explizit und messbar. Die vorliegende Forschungsarbeit soll zum einen die Möglichkeiten der Informatik zur datengetriebenen Frühaufklärung aufzeigen und zum zweiten diese im pragmatischen Kontext umsetzen. Ihren Ausgangspunkt findet sie in der Einführung in die Disziplin der Strategischen Frühaufklärung und ihren datengetriebenen Zweig – die Digital Intelligence. Diskutiert und klassifiziert werden die theoretischen und insbesondere informatikbezogenen Grundlagen der Frühaufklärung – vor allem die Möglichkeiten der zeitorientierten Datenexploration. Konzipiert und entwickelt werden verschiedene Methoden und Software-Werkzeuge, die die zeitorientierte Exploration insbesondere unstrukturierter Textdaten (Temporal Text Mining) unterstützen. Dabei werden nur Verfahren in Betracht gezogen, die sich im Kontext einer großen Institution und den spezifischen Anforderungen der Strategischen Frühaufklärung pragmatisch nutzen lassen. Hervorzuheben sind eine Plattform zur kollektiven Suche sowie ein innovatives Verfahren zur Identifikation schwacher Signale. Vorgestellt und diskutiert wird eine Dienstleistung der Digital Intelligence, die auf dieser Basis in einem globalen technologieorientierten Konzern erfolgreich umgesetzt wurde und eine systematische Wettbewerbs-, Markt- und Technologie-Analyse auf Basis digitaler Spuren des Menschen ermöglicht. ddc:000 ddc:003 ddc:004 ddc:006 ddc:020 ddc:025 ddc:028 Information Retrieval Text Mining Data Mining
76	Discovering Frequent Episodes : Fast Algorithms, Connections With HMMs And Generalizations Laxman, Srivatsan 03 1900 (has links) Temporal data mining is concerned with the exploration of large sequential (or temporally ordered) data sets to discover some nontrivial information that was previously unknown to the data owner. Sequential data sets come up naturally in a wide range of application domains, ranging from bioinformatics to manufacturing processes. Pattern discovery refers to a broad class of data mining techniques in which the objective is to unearth hidden patterns or unexpected trends in the data. In general, pattern discovery is about finding all patterns of 'interest' in the data and one popular measure of interestingness for a pattern is its frequency in the data. The problem of frequent pattern discovery is to find all patterns in the data whose frequency exceeds some user-defined threshold. Discovery of temporal patterns that occur frequently in sequential data has received a lot of attention in recent times. Different approaches consider different classes of temporal patterns and propose different algorithms for their efficient discovery from the data. This thesis is concerned with a specific class of temporal patterns called episodes and their discovery in large sequential data sets. In the framework of frequent episode discovery, data (referred to as an event sequence or an event stream) is available as a single long sequence of events. The ith event in the sequence is an ordered pair, (Et,tt), where Et takes values from a finite alphabet (of event types), and U is the time of occurrence of the event. The events in the sequence are ordered according to these times of occurrence. An episode (which is the temporal pattern considered in this framework) is a (typically) short partially ordered sequence of event types. Formally, an episode is a triple, (V,<,9), where V is a collection of nodes, < is a partial order on V and 9 is a map that assigns an event type to each node of the episode. When < is total, the episode is referred to as a serial episode, and when < is trivial (or empty), the episode is referred to as a parallel episode. An episode is said to occur in an event sequence if there are events in the sequence, with event types same as those constituting the episode, and with times of occurrence respecting the partial order in the episode. The frequency of an episode is some measure of how often it occurs in the event sequence. Given a frequency definition for episodes, the task is to discover all episodes whose frequencies exceed some threshold. This is done using a level-wise procedure. In each level, a candidate generation step is used to combine frequent episodes from the previous level to build candidates of the next larger size, and then a frequency counting step makes one pass over the event stream to determine frequencies of all the candidates and thus identify the frequent episodes. Frequency counting is the main computationally intensive step in frequent episode discovery. Choice of frequency definition for episodes has a direct bearing on the efficiency of the counting procedure. In the original framework of frequent episode discovery, episode frequency is defined as the number of fixed-width sliding windows over the data in which the episode occurs at least once. Under this frequency definition, frequency counting of a set of \|C\| candidate serial episodes of size N has space complexity O(N\|C\|) and time complexity O(ΔTN\|C\|) (where ΔT is the difference between the times of occurrence of the last and the first event in the data stream). The other main frequency definition available in the literature, defines episode frequency as the number of minimal occurrences of the episode (where, a minimal occurrence is a window on the time axis containing an occurrence of the episode, such that, no proper sub-window of it contains another occurrence of the episode). The algorithm for obtaining frequencies for a set of \|C\| episodes needs O(n\|C\|) time (where n denotes the number of events in the data stream). While this is time-wise better than the the windows-based algorithm, the space needed to locate minimal occurrences of an episode can be very high (and is in fact of the order of length, n, of the event stream). This thesis proposes a new definition for episode frequency, based on the notion of, what is called, non-overlapped occurrences of episodes in the event stream. Two occurrences are said to be non-overlapped if no event corresponding to one occurrence appears in between events corresponding to the other. Frequency of an episode is defined as the maximum possible number of non-overlapped occurrences of the episode in the data. The thesis also presents algorithms for efficient frequent episode discovery under this frequency definition. The space and time complexities for frequency counting of serial episodes are O(\|C\|) and O(n\|C\|) respectively (where n denotes the total number of events in the given event sequence and \|C\| denotes the num-ber of candidate episodes). These are arguably the best possible space and time complexities for the frequency counting step that can be achieved. Also, the fact that the time needed by the non-overlapped occurrences-based algorithm is linear in the number of events, n, in the event sequence (rather than the difference, ΔT, between occurrence times of the first and last events in the data stream, as is the case with the windows-based algorithm), can result in considerable time advantage when the number of time ticks far exceeds the number of events in the event stream. The thesis also presents efficient algorithms for frequent episode discovery under expiry time constraints (according to which, an occurrence of an episode can be counted for its frequency only if the total time span of the occurrence is less than a user-defined threshold). It is shown through simulation experiments that, in terms of actual run-times, frequent episode discovery under the non-overlapped occurrences-based frequency (using the algorithms developed here) is much faster than existing methods. There is also a second frequency measure that is proposed in this thesis, which is based on, what is termed as, non-interleaved occurrences of episodes in the data. This definition counts certain kinds of overlapping occurrences of the episode. The time needed is linear in the number of events, n, in the data sequence, the size, N, of episodes and the number of candidates, \|C\|. Simulation experiments show that run-time performance under this frequency definition is slightly inferior compared to the non-overlapped occurrences-based frequency, but is still better than the run-times under the windows-based frequency. This thesis also establishes the following interesting property that connects the non-overlapped, the non-interleaved and the minimal occurrences-based frequencies of an episode in the data: the number of minimal occurrences of an episode is bounded below by the maximum number of non-overlapped occurrences of the episode, and is bounded above by the maximum number of non-interleaved occurrences of the episode in the data. Hence, non-interleaved occurrences-based frequency is an efficient alternative to that based on minimal occurrences. In addition to being superior in terms of both time and space complexities compared to all other existing algorithms for frequent episode discovery, the non-overlapped occurrences-based frequency has another very important property. It facilitates a formal connection between discovering frequent serial episodes in data streams and learning or estimating a model for the data generation process in terms of certain kinds of Hidden Markov Models (HMMs). In order to establish this connection, a special class of HMMs, called Episode Generating HMMs (EGHs) are defined. The symbol set for the HMM is chosen to be the alphabet of event types, so that, the output of EGHs can be regarded as event streams in the frequent episode discovery framework. Given a serial episode, α, that occurs in the event stream, a method is proposed to uniquely associate it with an EGH, Λα. Consider two N-node serial episodes, α and β, whose (non-overlapped occurrences-based) frequencies in the given event stream, o, are fα and fβ respectively. Let Λα and Λβ be the EGHs associated with α and β. The main result connecting episodes and EGHs states that, the joint probability of o and the most likely state sequence for Λα is more than the corresponding probability for Λβ, if and only if, fα is greater than fβ. This theoretical connection has some interesting consequences. First of all, since the most frequent serial episode is associated with the EGH having the highest data likelihood, frequent episode discovery can now be interpreted as a generative model learning exercise. More importantly, it is now possible to derive a formal test of significance for serial episodes in the data, that prescribes, for a given size of the test, a minimum frequency for the episode needed in order to declare it as statistically significant. Note that this significance test for serial episodes does not require any separate model estimation (or training). The only quantity required to assess significance of an episode is its non-overlapped occurrences-based frequency (and this is obtained through the usual counting procedure). The significance test also helps to automatically fix the frequency threshold for the frequent episode discovery process, so that it can lead to what may be termed parameterless data mining. In the framework considered so far, the input to frequent episode discovery process is a sequence of instantaneous events. However, in many applications events tend to persist for different periods of time and the durations may carry important information from a data mining perspective. This thesis extends the framework of frequent episodes to incorporate such duration information directly into the definition of episodes, so that, the patterns discovered will now carry this duration information as well. Each event in this generalized framework looks like a triple, (Ei, ti, τi), where Ei, as earlier, is the event type (from some finite alphabet) corresponding to the ith event, and ti and τi denote the start and end times of this event. The new temporal pattern, called the generalized episode, is a quadruple, (V, <, g, d), where V, < and g, as earlier, respectively denote a collection of nodes, a partial order over this collection and a map assigning event types to nodes. The new feature in the generalized episode is d, which is a map from V to 2I, where, I denotes a collection of time interval possibilities for event durations, which is defined by the user. An occurrence of a generalized episode in the event sequence consists of events with both 'correct' event types and 'correct' time durations, appearing in the event sequence in 'correct' time order. All frequency definitions for episodes over instantaneous event streams are applicable for generalized episodes as well. The algorithms for frequent episode discovery also easily extend to the case of generalized episodes. The extra design choice that the user has in this generalized framework, is the set, I, of time interval possibilities. This can be used to orient and focus the frequent episode discovery process to come up with temporal correlations involving only time durations that are of interest. Through extensive simulations the utility and effectiveness of the generalized framework are demonstrated. The new algorithms for frequent episode discovery presented in this thesis are used to develop an application for temporal data mining of some data from car engine manufacturing plants. Engine manufacturing is a heavily automated and complex distributed controlled process with large amounts of faults data logged each day. The goal of temporal data mining here is to unearth some strong time-ordered correlations in the data which can facilitate quick diagnosis of root causes for persistent problems and predict major breakdowns well in advance. This thesis presents an application of the algorithms developed here for such analysis of the faults data. The data consists of time-stamped faults logged in car engine manufacturing plants of General Motors. Each fault is logged using an extensive list of codes (which constitutes the alphabet of event types for frequent episode discovery). Frequent episodes in fault logs represent temporal correlations among faults and these can be used for fault diagnosis in the plant. This thesis describes how the outputs from the frequent episode discovery framework, can be used to help plant engineers interpret the large volumes of faults logged, in an efficient and convenient manner. Such a system, based on the algorithms developed in this thesis, is currently being used in one of the engine manufacturing plants of General Motors. Some examples of the results obtained that were regarded as useful by the plant engineers are also presented. Data Mining Databases - Data Mining Hidden Markov Models Temporal Data Mining Episodes (Temporal Patterns) Event Sequences Frequent Episodes Episode Discovery Fast Algorithms Pattern Discovery Computer Science
77	應用在空間認知發展的學習歷程分析之高效率空間探勘演算法 / Efficient Mining of Spatial Co-orientation Patterns for Analyzing Portfolios of Spatial Cognitive Development 魏綾音, WEI, LING-YIN Unknown Date (has links) 空間認知(Spatial Cognition)指出人所理解的空間複雜度，也就是人與環境互動的過程中，經由記憶與感官經驗，透過內化與重建產生物體在空間的關係認知。認知圖(Cognitive Map)是最常被使用在評估空間認知。分析學生所畫的認知圖有助於老師們瞭解學生的空間認知能力，進而擬定適當的地理教學設計。我們視空間認知發展的學習歷程檔案是由這些認知圖所構成。隨著數位學習科技的進步，我們可以透過探勘認知圖的方式，探討空間認知發展的學習歷程檔案。因此，我們藉由透過圖像的空間資料探勘，分析學生空間認知發展的學習歷程。空間資料探勘(Spatial Data Mining)主要是從空間資料庫或圖像資料庫中找出有趣且有意義的樣式。在論文中，我們介紹一種空間樣式(Spatial Co-orientation Pattern)探勘以提供空間認知發展學習歷程的分析。Spatial Co-orientation Pattern是指圖像資料庫中，具有共同相對方向關係的物體(Object)常一起出現。例如，我們可以從圖像資料庫中發現物體P常出現在物體Q的左邊，我們利用二維字串(2D String)來表示物體分佈在圖像中的空間方向關係。我們透過Pattern-growth的方法探勘此種空間樣式，藉由實驗結果呈現Pattern-growth的方法與過去Apriori-based的方法[14]之優缺點。我們延伸Spatial Co-orientation Pattern的概念至時空資料庫(Spatio-temporal Database)，提出從時空資料庫中，探勘Temporal Co-orientation Pattern。Temporal Co-orientation Pattern是指Spatial Co-orientation Pattern隨著時間的變化。論文中，我們提出兩種此類樣式，即是Coarse Temporal Co-orientation Pattern與Fine Temporal Co-orientation Pattern。針對此兩種樣式，我們提出三階段(three-stage)演算法，透過實驗分析演算法的效率。 / Spatial cognition means how human interpret spatial complexity. Cognitive maps are mostly used to test the spatial cognition. Analyzing cognitive maps drawn by students is helpful for teachers to understand students’ spatial cognitive ability and to draft geography teaching plans. Cognitive maps constitute the portfolios of spatial cognitive development. With the advance of e-learning technology, we can analyze portfolios of spatial cognitive development by spatial data mining of cognitive images. Therefore, we can analyze portfolios of spatial cognitive development by spatial data mining of images. Spatial data mining is an important task to discover interesting and meaningful patterns from spatial or image databases. In this thesis, we investigate the spatial co-orientation patterns for analyzing portfolios of spatial cognitive development. Spatial co-orientation patterns refer to objects that frequently occur with the same spatial orientation, e.g. left, right, below, etc., among images. For example, an object P is frequently left to an object Q among images. We utilize the data structure, 2D string, to represent the spatial orientation of objects. We propose the pattern-growth approach for mining co-orientation patterns. An experimental evaluation with synthetic datasets shows the advantages and disadvantages between pattern-growth approach and Apriori-based approach proposed by Huang [14]. Moreover, we extend the concept of spatial co-orientation pattern to that of temporal patterns. Temporal co-orientation patterns refer to the change of spatial co-orientation patterns over time. Two temporal patterns, the coarse temporal co-orientation patterns and fine temporal co-orientation patterns are introduced to be extracted from spatio-temporal databases. We propose the three-stage algorithms, CTPMiner and FTPMiner, for mining coarse and fine temporal co-orientation patterns, respectively. An experimental evaluation with synthetic datasets shows the performance of these algorithms. 空間認知認知圖空間資料探勘時空資料探勘 Spatial Cognition Cognitive Map Spatial Data Mining Spatio-temporal Data Mining
78	Learning with Sparcity: Structures, Optimization and Applications Chen, Xi 01 July 2013 (has links) The development of modern information technology has enabled collecting data of unprecedented size and complexity. Examples include web text data, microarray & proteomics, and data from scientific domains (e.g., meteorology). To learn from these high dimensional and complex data, traditional machine learning techniques often suffer from the curse of dimensionality and unaffordable computational cost. However, learning from large-scale high-dimensional data promises big payoffs in text mining, gene analysis, and numerous other consequential tasks. Recently developed sparse learning techniques provide us a suite of tools for understanding and exploring high dimensional data from many areas in science and engineering. By exploring sparsity, we can always learn a parsimonious and compact model which is more interpretable and computationally tractable at application time. When it is known that the underlying model is indeed sparse, sparse learning methods can provide us a more consistent model and much improved prediction performance. However, the existing methods are still insufficient for modeling complex or dynamic structures of the data, such as those evidenced in pathways of genomic data, gene regulatory network, and synonyms in text data. This thesis develops structured sparse learning methods along with scalable optimization algorithms to explore and predict high dimensional data with complex structures. In particular, we address three aspects of structured sparse learning: 1. Efficient and scalable optimization methods with fast convergence guarantees for a wide spectrum of high-dimensional learning tasks, including single or multi-task structured regression, canonical correlation analysis as well as online sparse learning. 2. Learning dynamic structures of different types of undirected graphical models, e.g., conditional Gaussian or conditional forest graphical models. 3. Demonstrating the usefulness of the proposed methods in various applications, e.g., computational genomics and spatial-temporal climatological data. In addition, we also design specialized sparse learning methods for text mining applications, including ranking and latent semantic analysis. In the last part of the thesis, we also present the future direction of the high-dimensional structured sparse learning from both computational and statistical aspects. Machine Learning Sparse Learning Optimization Structure Regression Multi-task Regression Canonical Correlation Analysis Undirected Graphical Models First-order Method Stochastic Optimization Text Mining Ranking Latent Semantic Analysis Spatial-temporal Data Computational Genomics Computer Sciences
79	ANÁLISE MULTITEMPORAL DA COBERTURA FLORESTAL DA MICROBACIA DO ARROIO GRANDE, SANTA MARIA, RS / MULTI-TEMPORAL ANALYSIS OF FOREST COVERING FROM ARROIO GRANDE WATERSHED, SANTA MARIA, RS. Kleinpaul, Joel Juliano 20 December 2005 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / This study had as objective to do a multi-temporal analysis from Arroio Grande watershed area, located in Santa Maria, RS, to detect the forest covering changes, its localization and quantification, as well as, to monitor the deforestation and regeneration processes and its main determinant factors. Four satellite images were used: LANDSAT 5 (1987), LANDSAT 5 (1995), LANDSAT 7 (2002) and CBERS 2 (2005). The SPRING applicative was used to elaborate the cartography dada basis and to digitally process the images. The images were segmented with threshold 10 to similarity and 20 to area and classified with Bhattacharya algorithm in the following land uses: forest, field, exposed soil, agriculture, irrigated agriculture and water lamina. After the images classification, a thematic maps cross was done with LEGAL programming. As a result, maps with the following land uses were obtained: forest maintenance, regeneration and deforestation. For a period of 18 years, the forest covering increased 25,59% or 10,24% in the area, mainly in the hillside and in the plateaus, changing from 14.135,42 ha (40,01%) in 1987 to 17.752,20 ha (50,25%) in 2005. However, there is still a great deficit of riparian forest in plain (depression), mainly due to rice cultivation. The obtained results show the potential of Remote Sensing and Geoprocessing techniques in mapping the land use. They also can be used to support researches, territorial planning, economical development and environmental preservation in this region. With the data bank obtained, it will be possible to create models able to simulate the forest covering dynamic in the studied area. / Este trabalho teve como objetivo realizar uma análise multitemporal da área da microbacia do Arroio Grande, localizada em Santa Maria, RS, a fim de detectar mudanças na cobertura florestal, sua localização e quantificação, além de monitorar os processos de desmatamento e regeneração e seus principais fatores determinantes. Foram utilizadas quatro imagens de satélite: LANDSAT 5 (1987), LANDSAT 5 (1995), LANDSAT 7 (2002) e CBERS 2 (2005). Utilizou-se o aplicativo SPRING para a elaboração da base de dados cartográficos e do processamento digital das imagens. As imagens foram segmentadas com limiar de 10 para similaridade e 20 para área e classificadas com auxílio do algoritmo Bhattacharya nos seguintes usos da terra: floresta, campo, solo exposto, agricultura, agricultura irrigada e lâmina d água. Após a classificação das imagens, foi realizado o cruzamento dos mapas temáticos com ajuda da programação LEGAL. Como resultado, obtiveram-se mapas com os seguintes usos da terra: manutenção florestal, regeneração e desmatamento, ou seja, o que permaneceu inalterado de uma época para outra, o que regenerou e o que foi desmatado. Para um período de 18 anos, a cobertura florestal aumentou 25,59% ou 10,24% da área da microbacia, principalmente na encosta (rebordo) e no planalto, passando de 14.135,42 ha (40,01%) em 1987 para 17.752,20 ha (50,25%) em 2005. Porém, ainda há um déficit muito grande de mata ciliar na planície (depressão), principalmente devido ao cultivo de arroz. Os resultados obtidos mostram o potencial das técnicas de Sensoriamento Remoto e Geoprocessamento no mapeamento do uso da terra. Também servem para apoiar as mais diversas iniciativas de pesquisa, planejamento territorial, desenvolvimento econômico e preservação ambiental nesta região. Com o banco de dados gerado, será possível confeccionar modelos capazes de simular a dinâmica da cobertura florestal na área pesquisada. Sensoriamento remoto Sistema de informações geográficas Dados multitemporais Uso da terra Microbacia Remote sensing Geographical system information Multi-temporal Data Land use Small basin
80	Les figures de la discontinuité dans le développement résidentiel périurbain : application à la région Limousin. / Discontinuous urban patterns of peri-urban residential development. : application to the Limousin region Reux, Sara 16 January 2015 (has links) Alors que la continuité du bâti ne suffit plus pour appréhender l’espace urbain d’aujourd’hui,la discontinuité du tissu urbain est devenue une clé de compréhension de la ville contemporaine et de sonprocessus de formation. Elle suscite l'intérêt des chercheurs, d'autant plus que le déploiement des systèmesd'information géographique offre de nouvelles perspectives de mesure des formes urbaines. Mais, si lestravaux en écologie du paysage ou en géographie permettent de mesurer l'émergence de ces formesdiscontinues, il nous semble important de nous intéresser aux fondements économiques de l'urbanisationdiscontinue qui commencent à faire l’objet de travaux empiriques en économie. La constitution d’une grillede lecture de l’urbanisation discontinue nous permet de comprendre de manière concomitante la formationdes espaces périurbains et les formes de développement de l’habitat à l’échelle parcellaire. Cette rechercheest appliquée au Limousin sur la période 1950-2009. Le prisme de la discontinuité nous apporte un éclairagesur les trajectoires de développement résidentiel des communes de cette région. La construction d’une basede données spatio-temporelles nous offre la possibilité de lire ces trajectoires à partir de l’association demesures de dispersion géographique et de dispersion morphologique de l’habitat. À partir de ces mesuresde dispersion, nous abordons l’articulation des logiques fonctionnelles et morphologiques du développementrésidentiel grâce à la construction d’une base de données multithématiques. Pour comprendre les schémasde localisation des ménages, nous analysons plus particulièrement les problématiques de la production deslogements, de l’interaction entre structure foncière et régulation publique à l’échelle des communes et del’influence des aménités et désaménités des espaces urbains et ruraux sur la dispersion de l’habitat. / While understanding urban areas through continuity of developed land reached its limits,discontinuity of urban fabrics has become a key to understand today's cities and their shaping dynamics. Itraises researchers’ interest especially as GIS development gives new opportunities to measure urbanpatterns. While researches in landscape ecology or geography allow to measure discontinuous patterns, itseems to be important to focus on their economic foundations which are a matter for recent empiricalresearches in economy. The construction of an analytical grid of discontinuous urban patterns allows tounderstand simultaneously peri-urban development and patterns of residential development at the parcellevel. This research is applied to the Limousin region on the 1950-2009 period. The focus on discontinuousurban patterns sheds light on residential trajectories of the Limousin region's communes. The proposal of aspatio-temporal data base allows to understand these trajectories through combined measures of geographical dispersion and morphological dispersion. With these measures, we broach the link betweenfunctional and morphological dynamics thanks to a multitheme data base. To understand household locationand residential dispersion, we analyze the issue of housing production, the interaction between property andpublic regulation at the scale of communes, the influence of amenities and desamenities of urban and ruralspaces Urbanisation discontinue Étalement urbain Demande résidentielle Régulation publique Offre résidentielle Trajectoires résidentielles Données spatio-temporelless Limousin Discountinuous urban patterns Sprawl Residential demand Public policies Residential supply Residential trajectories Spatio-temporal data Limousin 333.1

Search results