Global ETD Search

311	Framställning av en GIS-metod samt analys av ingående parametrar för att lokalisera representativa delområden av ett avrinningsområde för snödjupsmätningar / Development of a GIS method and analysis of input parameters to locate representative sub-areas of a catchment area for snow depth measurements Kaplin, Jennifer, Leierdahl, Lisa January 2022 (has links) Vattenkraft är en stor källa till energi i Sverige, främst i de norra delarna av landet. För att få ut maximal potential från vattenkraftverken behövs information om hur mycket vatten eller snö det finns uppströms från kraftverken. Genom att få fram tillförlitliga värden av snömängd är det möjligt att minska osäkerheten i uppskattningarna.Eftersom det är svårt att kartera större avrinningsområden via markbundna observationer, både praktiskt och ekonomiskt, har drönarobservationer utvecklats. För att använda sig av drönare krävs det vetskap om var de ska flygas i för område för att hela avrinningsområdet ska representeras. I projektet tas en modell fram i ArcGIS för att hitta mindre områden inom avrinningsområden som ska vara representativa inom utvalda parametrar. I projektet berörs parametrarna vegetation, höjd, lutningsgrad samt dess riktning.Arbetet för att ta fram en modell som ska underlätta framtida arbete inom och utanför forskningsprojektet DRONES är uppdelat i två delar. Den första delen är att ta fram och granska vilka parametrar som påverkar snödjupet i avrinningsområdet. Den andra delen innefattar arbetet med att skapa en modell i ArcGIS som ska analysera ett avrinningsområde med framtagna parametrar för att hitta mindre områden som representerar det hela.Resultatet från de framtagna modellerna kan tillämpas för att underlätta kartläggningen och snödjupsmätningar i avrinningsområden, vilket kan utnyttjas vid effektivisering av vattenreglering. / Hydropower is a major source of energy in Sweden mainly in the northern parts of the country. To get the maximum potential from the hydropower plants, information is required on how much water or snow there is upstream from the power plants. By obtaining reliable values of the amount of snow, it is possible to reduce the uncertainty in forecasts on spring flood.Due to difficulties in mapping larger catchment areas via ground-level observations, drone observations have been developed. In order to use drone observations, knowledge of where they are to be flown to represent the entire catchment area is required. In this project, a model was developed in ArcGIS to find smaller areas within catchments that are to be representative within selected parameters. The project touches upon the parameters vegetation, height, slope and aspect.The work to develop a model that will facilitate future work within and outside the DRONES research project is divided into two parts. The first part is to analyze which parameters affect the snow depth in the catchment area. The second part consists of creating a model in ArcGIS that will find a smaller area inside a catchment that represents the snow depth for the whole catchment.The results from the developed model can be applied to facilitate the mapping and snow depth measurements in catchment areas, which can be used to streamline water regulation. Snow accumulation snow depth water regulation snow hydrology DRONES Överuman ModelBuilder Similarity Search Snöackumulation snödjup vattenreglering snöhydrologi DRONES Överuman ModelBuilder Similarity Search Physical Geography Naturgeografi
312	Efficient Graph Summarization of Large Networks Hajiabadi, Mahdi 24 June 2022 (has links) In this thesis, we study the notion of graph summarization, which is a fundamental task of finding a compact representation of the original graph called the summary. Graph summarization can be used for reducing the footprint of the input graph, better visualization, anonymizing the identity of users, and query answering. There are two different frameworks of graph summarization we consider in this thesis, the utility-based framework and the correction set-based framework. In the utility-based framework, the input graph is summarized until a utility threshold is not violated. In the correction set-based framework a set of correction edges is produced along with the summary graph. In this thesis we propose two algorithms for the utility-based framework and one for the correction set-based framework. All these three algorithms are for static graphs (i.e. graphs that do not change over time). Then, we propose two more utility-based algorithms for fully dynamic graphs (i.e. graphs with edge insertions and deletions). Algorithms for graph summarization can be lossless (summarizing the input graph without losing any information) or lossy (losing some information about the input graph in order to summarize it more). Some of our algorithms are lossless and some lossy, but with controlled utility loss. Our first utility-driven graph summarization algorithm, G-SCIS, is based on a clique and independent set decomposition, that produces optimal compression with zero loss of utility. The compression provided is significantly better than state-of-the-art in lossless graph summarization, while the runtime is two orders of magnitude lower. Our second algorithm is T-BUDS, a highly scalable, utility-driven algorithm for fully controlled lossy summarization. It achieves high scalability by combining memory reduction using Maximum Spanning Tree with a novel binary search procedure. T-BUDS outperforms state-of-the-art drastically in terms of the quality of summarization and is about two orders of magnitude better in terms of speed. In contrast to the competition, we are able to handle web-scale graphs in a single machine without performance impediment as the utility threshold (and size of summary) decreases. Also, we show that our graph summaries can be used as-is to answer several important classes of queries, such as triangle enumeration, Pagerank and shortest paths. We then propose algorithm LDME, a correction set-based graph summarization algorithm that produces compact output representations in a fast and scalable manner. To achieve this, we introduce (1) weighted locality sensitive hashing to drastically reduce the number of comparisons required to find good node merges, (2) an efficient way to compute the best quality merges that produces more compact outputs, and (3) a new sort-based encoding algorithm that is faster and more robust. More interestingly, our algorithm provides performance tuning settings to allow the option of trading compression for running time. On high compression settings, LDME achieves compression equal to or better than the state of the art with up to 53x speedup in running time. On high speed settings, LDME achieves up to two orders of magnitude speedup with only slightly lower compression. We also present two lossless summarization algorithms, Optimal and Scalable, for summarizing fully dynamic graphs. More concretely, we follow the framework of G-SCIS, which produces summaries that can be used as-is in several graph analytics tasks. Different from G-SCIS, which is a batch algorithm, Optimal and Scalable are fully dynamic and can respond rapidly to each change in the graph. Not only are Optimal and Scalable able to outperform G-SCIS and other batch algorithms by several orders of magnitude, but they also significantly outperform MoSSo, the state-of-the-art in lossless dynamic graph summarization. While Optimal produces always the most optimal summary, Scalable is able to trade the amount of node reduction for extra scalability. For reasonable values of the parameter $K$, Scalable is able to outperform Optimal by an order of magnitude in speed, while keeping the rate of node reduction close to that of Optimal. An interesting fact that we observed experimentally is that even if we were to run a batch algorithm, such as G-SCIS, once for every big batch of changes, still they would be much slower than Scalable. For instance, if 1 million changes occur in a graph, Scalable is two orders of magnitude faster than running G-SCIS just once at the end of the 1 million-edge sequence. / Graduate Graph Summarization Query Answering Lossless summary Lossy summary Locality Sensitive Hashing Jaccard Similarity Weighted Jaccard Similarity Hashing Incremental Algorithms Randomized Algorithms
313	Similarity measures for scientific workflows Starlinger, Johannes 08 January 2016 (has links) In Laufe der letzten zehn Jahre haben Scientific Workflows als Werkzeug zur Erstellung von reproduzierbaren, datenverarbeitenden in-silico Experimenten an Aufmerksamkeit gewonnen, in die sowohl lokale Skripte und Anwendungen, als auch Web-Services eingebunden werden können. Über spezialisierte Online-Bibliotheken, sogenannte Repositories, können solche Workflows veröffentlicht und wiederverwendet werden. Mit zunehmender Größe dieser Repositories werden Ähnlichkeitsmaße für Scientific Workflows notwendig, etwa für Duplikaterkennung, Ähnlichkeitssuche oder Clustering von funktional ähnlichen Workflows. Die vorliegende Arbeit untersucht solche Ähnlichkeitsmaße für Scientific Workflows. Als erstes untersuchen wir ähnlichkeitsrelevante Eigenschaften von Scientific Workflows und identifizieren Charakteristika der Wiederverwendung ihrer Komponenten. Als zweites analysieren und reimplementieren wir existierende Lösungen für den Vergleich von Scientific Workflows entlang definierter Teilschritte des Vergleichsprozesses. Wir erstellen einen großen Gold-Standard Corpus von Workflowähnlichkeiten, der über 2400 Bewertungen für 485 Workflowpaare enthält, die von 15 Experten aus 6 Institutionen beigetragen wurden. Zum ersten Mal erlauben diese Vorarbeiten eine umfassende, vergleichende Evaluation verschiedener Ähnlichkeitsmaße für Scientific Workflows, in der wir einige vorige Ergebnisse bestätigen, andere aber revidieren. Als drittes stellen wir ein neue Methode für das Vergleichen von Scientific Workflows vor. Unsere Evaluation zeigt, dass diese neue Methode bessere und konsistentere Ergebnisse liefert und leicht mit anderen Ansätzen kombiniert werden kann, um eine weitere Qualitätssteigerung zu erreichen. Als viertes zweigen wir, wie die Resultate aus den vorangegangenen Schritten genutzt werden können, um aus Standardkomponenten eine Suchmaschine für schnelle, qualitativ hochwertige Ähnlichkeitssuche im Repositorymaßstab zu implementieren. / Over the last decade, scientific workflows have gained attention as a valuable tool to create reproducible in-silico experiments. Specialized online repositories have emerged which allow such workflows to be shared and reused by the scientific community. With increasing size of these repositories, methods to compare scientific workflows regarding their functional similarity become a necessity. To allow duplicate detection, similarity search, or clustering, similarity measures for scientific workflows are an essential prerequisite. This thesis investigates similarity measures for scientific workflows. We carry out four consecutive research tasks: First, we closely investigate the relevant properties of scientific workflows regarding their similarity and identify characteristics of re-use of their components. Second, we review and dissect existing approaches to scientific workflow comparison into a defined set of subtasks necessary in the process of workflow comparison, and re-implement previous approaches to each subtask. We create a large gold-standard corpus of expert-ratings on workflow similarity, with more than 2400 ratings provided for 485 pairs of workflows by 15 workflow experts from 6 institutions. For the first time, this allows comprehensive, comparative evaluation of different scientific workflow similarity measures, confirming some previous findings, but rejecting others. Third, we propose and evaluate a novel method for scientific workflow comparison. We show that this novel method provides results of both higher quality and higher consistency than previous approaches, and can easily be stacked and ensembled with other approaches for still better performance and higher speed. Fourth, we show how our findings can be leveraged to implement a search engine using off-the-shelf tools that performs fast, high quality similarity search for scientific workflows at repository-scale, a premier area of application for similarity measures for scientific workflows. Wissensmanagement Information Retrieval Ähnlichkeitssuche Ähnlichkeitsmaße Scientific Workflows Information Retrieval Knowledge Management Similarity Measures Scientific Workflows Similarity Serach 004 Informatik 28 Informatik, Datenverarbeitung ST 530 ddc:004
314	CLustering of Web Services Based on Semantic Similarity Konduri, Aparna 12 May 2008 (has links) No description available. Computer Science SIMILARITY OF WEB SERVICES Stemming Word sense disambiguation WORDNET BASED SEMANTIC SIMILARITY CLUSTERING OF WEB SERVICES Prediction of similar web services LERS-M algorithm
315	Deriving pilots’ knowledge structures for weather information: an evaluation of elicitation techniques Raddatz, Kimberly R. January 1900 (has links) Doctor of Philosophy / Department of Psychology / Richard J. Harris / Systems that support or require human interaction are generally easier to learn, use, and remember when their organization is consistent with the user’s knowledge and experiences (Norman, 1983; Roske-Hofstrand & Paap, 1986). Thus, in order for interface designers to truly design for the user, they must first have a way of deriving a representation of what the user knows about the domain of interest. The current study evaluated three techniques for eliciting knowledge structures for how General Aviation pilots think about weather information. Weather was chosen because of its varying implications for pilots of different levels of experience. Two elicitation techniques (Relationship Judgment and Card Sort) asked pilots to explicitly consider the relationship between 15 weather-related information concepts. The third technique, Prime Recognition Task, used response times and priming to implicitly reflect the strength of relationship between concepts in semantic memory. Techniques were evaluated in terms of pilot performance, conceptual structure validity, and required resources for employment. Validity was assessed in terms of the extent to which each technique identified differences in organization of weather information among pilots of different experience levels. Multidimensional scaling was used to transform proximity data collected by each technique into conceptual structures representing the relationship between concepts. Results indicated that Card Sort was the technique that most consistently tapped into knowledge structure affected by experience. Only conceptual structures based on Card Sort data were able to be used to both discriminate between pilots of different experience levels and accurately classify experienced pilots as “experienced”. Additionally, Card Sort was the most efficient and effective technique to employ in terms of preparation time, time on task, flexibility, and face validity. The Card Sort provided opportunities for deliberation, revision, and visual feedback that allowed the pilots to engage in a deeper level of processing at which experience may play a stronger role. Relationship Judgment and Prime Recognition Task characteristics (e.g., time pressure, independent judgments) may have motivated pilots to rely on a more shallow or text-based level of processing (i.e., general semantic meaning) that is less affected by experience. Implications for menu structure design and assessment are discussed. Knowledge Elicitation Techniques Card Sort Similarity Ratings General Aviation Weather Knowledge Structures Psychology (0621)
316	Evaluation and development of conceptual document similarity metrics with content-based recommender applications Gouws, Stephan 12 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The World Wide Web brought with it an unprecedented level of information overload. Computers are very effective at processing and clustering numerical and binary data, however, the automated conceptual clustering of natural-language data is considerably harder to automate. Most past techniques rely on simple keyword-matching techniques or probabilistic methods to measure semantic relatedness. However, these approaches do not always accurately capture conceptual relatedness as measured by humans. In this thesis we propose and evaluate the use of novel Spreading Activation (SA) techniques for computing semantic relatedness, by modelling the article hyperlink structure of Wikipedia as an associative network structure for knowledge representation. The SA technique is adapted and several problems are addressed for it to function over the Wikipedia hyperlink structure. Inter-concept and inter-document similarity metrics are developed which make use of SA to compute the conceptual similarity between two concepts and between two natural-language documents. We evaluate these approaches over two document similarity datasets and achieve results which compare favourably with the state of the art. Furthermore, document preprocessing techniques are evaluated in terms of the performance gain these techniques can have on the well-known cosine document similarity metric and the Normalised Compression Distance (NCD) metric. Results indicate that a near two-fold increase in accuracy can be achieved for NCD by applying simple preprocessing techniques. Nonetheless, the cosine similarity metric still significantly outperforms NCD. Finally, we show that using our Wikipedia-based method to augment the cosine vector space model provides superior results to either in isolation. Combining the two methods leads to an increased correlation of Pearson p = 0:72 over the Lee (2005) document similarity dataset, which matches the reported result for the state-of-the-art Explicit Semantic Analysis (ESA) technique, while requiring less than 10% of the Wikipedia database as required by ESA. As a use case for document similarity techniques, a purely content-based news-article recommender system is designed and implemented for a large online media company. This system is used to gather additional human-generated relevance ratings which we use to evaluate the performance of three state-of-the-art document similarity metrics for providing content-based document recommendations. / AFRIKAANSE OPSOMMING: Die Wêreldwye-Web het ’n vlak van inligting-oorbelading tot gevolg gehad soos nog nooit tevore. Rekenaars is baie effektief met die verwerking en groepering van numeriese en binêre data, maar die konsepsuele groepering van natuurlike-taal data is aansienlik moeiliker om te outomatiseer. Tradisioneel berus sulke algoritmes op eenvoudige sleutelwoordherkenningstegnieke of waarskynlikheidsmetodes om semantiese verwantskappe te bereken, maar hierdie benaderings modelleer nie konsepsuele verwantskappe, soos gemeet deur die mens, baie akkuraat nie. In hierdie tesis stel ons die gebruik van ’n nuwe aktiverings-verspreidingstrategie (AV) voor waarmee inter-konsep verwantskappe bereken kan word, deur die artikel skakelstruktuur van Wikipedia te modelleer as ’n assosiatiewe netwerk. Die AV tegniek word aangepas om te funksioneer oor die Wikipedia skakelstruktuur, en verskeie probleme wat hiermee gepaard gaan word aangespreek. Inter-konsep en inter-dokument verwantskapsmaatstawwe word ontwikkel wat gebruik maak van AV om die konsepsuele verwantskap tussen twee konsepte en twee natuurlike-taal dokumente te bereken. Ons evalueer hierdie benadering oor twee dokument-verwantskap datastelle en die resultate vergelyk goed met die van ander toonaangewende metodes. Verder word teks-voorverwerkingstegnieke ondersoek in terme van die moontlike verbetering wat dit tot gevolg kan hê op die werksverrigting van die bekende kosinus vektorruimtemaatstaf en die genormaliseerde kompressie-afstandmaatstaf (GKA). Resultate dui daarop dat GKA se akkuraatheid byna verdubbel kan word deur gebruik te maak van eenvoudige voorverwerkingstegnieke, maar dat die kosinus vektorruimtemaatstaf steeds aansienlike beter resultate lewer. Laastens wys ons dat die Wikipedia-gebasseerde metode gebruik kan word om die vektorruimtemaatstaf aan te vul tot ’n gekombineerde maatstaf wat beter resultate lewer as enige van die twee metodes afsonderlik. Deur die twee metodes te kombineer lei tot ’n verhoogde korrelasie van Pearson p = 0:72 oor die Lee dokument-verwantskap datastel. Dit is gelyk aan die gerapporteerde resultaat vir Explicit Semantic Analysis (ESA), die huidige beste Wikipedia-gebasseerde tegniek. Ons benadering benodig egter minder as 10% van die Wikipedia databasis wat benodig word vir ESA. As ’n toetstoepassing vir dokument-verwantskaptegnieke ontwerp en implementeer ons ’n stelsel vir ’n aanlyn media-maatskappy wat nuusartikels aanbeveel vir gebruikers, slegs op grond van die artikels se inhoud. Joernaliste wat die stelsel gebruik ken ’n punt toe aan elke aanbeveling en ons gebruik hierdie data om die akkuraatheid van drie toonaangewende maatstawwe vir dokument-verwantskap te evalueer in die konteks van inhoud-gebasseerde nuus-artikel aanbevelings. Document similarity Wikipedia Spreading activation Information retrieval Dissertations -- Electronic engineering Theses -- Electronic engineering
317	Computational approaches for time series analysis and prediction : data-driven methods for pseudo-periodical sequences Lan, Yang January 2009 (has links) Time series data mining is one branch of data mining. Time series analysis and prediction have always played an important role in human activities and natural sciences. A Pseudo-Periodical time series has a complex structure, with fluctuations and frequencies of the times series changing over time. Currently, Pseudo-Periodicity of time series brings new properties and challenges to time series analysis and prediction. This thesis proposes two original computational approaches for time series analysis and prediction: Moving Average of nth-order Difference (MANoD) and Series Features Extraction (SFE). Based on data-driven methods, the two original approaches open new insights in time series analysis and prediction contributing with new feature detection techniques. The proposed algorithms can reveal hidden patterns based on the characteristics of time series, and they can be applied for predicting forthcoming events. This thesis also presents the evaluation results of proposed algorithms on various pseudo-periodical time series, and compares the predicting results with classical time series prediction methods. The results of the original approaches applied to real world and synthetic time series are very good and show that the contributions open promising research directions. 005.3
318	Large scale optimization methods for metric and kernel learning Jain, Prateek 06 November 2014 (has links) A large number of machine learning algorithms are critically dependent on the underlying distance/metric/similarity function. Learning an appropriate distance function is therefore crucial to the success of many methods. The class of distance functions that can be learned accurately is characterized by the amount and type of supervision available to the particular application. In this thesis, we explore a variety of such distance learning problems using different amounts/types of supervision and provide efficient and scalable algorithms to learn appropriate distance functions for each of these problems. First, we propose a generic regularized framework for Mahalanobis metric learning and prove that for a wide variety of regularization functions, metric learning can be used for efficiently learning a kernel function incorporating the available side-information. Furthermore, we provide a method for fast nearest neighbor search using the learned distance/kernel function. We show that a variety of existing metric learning methods are special cases of our general framework. Hence, our framework also provides a kernelization scheme and fast similarity search scheme for such methods. Second, we consider a variation of our standard metric learning framework where the side-information is incremental, streaming and cannot be stored. For this problem, we provide an efficient online metric learning algorithm that compares favorably to existing methods both theoretically and empirically. Next, we consider a contrasting scenario where the amount of supervision being provided is extremely small compared to the number of training points. For this problem, we consider two different modeling assumptions: 1) data lies on a low-dimensional linear subspace, 2) data lies on a low-dimensional non-linear manifold. The first assumption, in particular, leads to the problem of matrix rank minimization over polyhedral sets, which is a problem of immense interest in numerous fields including optimization, machine learning, computer vision, and control theory. We propose a novel online learning based optimization method for the rank minimization problem and provide provable approximation guarantees for it. The second assumption leads to our geometry-aware metric/kernel learning formulation, where we jointly model the metric/kernel over the data along with the underlying manifold. We provide an efficient alternating minimization algorithm for this problem and demonstrate its wide applicability and effectiveness by applying it to various machine learning tasks such as semi-supervised classification, colored dimensionality reduction, manifold alignment etc. Finally, we consider the task of learning distance functions under no supervision, which we cast as a problem of learning disparate clusterings of the data. To this end, we propose a discriminative approach and a generative model based approach and we provide efficient algorithms with convergence guarantees for both the approaches. / text Rank minimization Metric learning Kernel learning Fast similarity search Locality sensitive hashing
319	詞義相似度的社會網路分析研究 / A study on word similarity with social network analysis 溫文喆 Unknown Date (has links) 社會網路分析（social network analysis）將社會關係以網路形式表示，從原本純粹分析社會互動的工具，到近年來被廣泛被應用在社會學、組織研究、資訊科學、生物學、語言學等各種領域，藉由引入數學圖學理論與與日益精進的電腦處理能力，使得社會網路分析能從有別於以往的角度找出個體間行動的規律；而詞義相似度（word similarity）是資訊檢索等技術發展的基礎課題之一，近年來對詞義相似度的量測有許多方法的提出。本研究針對英語字詞利用社會網路分析這樣的工具，藉由提出不同的網路建構方式，以語料庫為資料來源，設定網路節點與連結關係，以共現網路（co-occurrence networks）為基礎，經由改變產生與篩選的條件，觀察以社會網路分析已有的性質或指標做調整，是否可以對詞義相似度提供另一種量測方式；同時以目前詞義相似度研究上已有同義詞標準評比對前述產生的網路與所計算的性質做驗證，並進一步探討使用社會網路分析在詞義相似度研究上的適用性。社會網路分析詞義相似度 Social Network Analysis Word Similarity
320	Combining Similarity Transformed Equation of Motion Coupled Cluster (STEOM-CC), Vibronic Coupling models, and Spin-Orbit Coupling: Towards a First Principle Description of Intersystem Crossing Sous, John January 2013 (has links) Electronic Structure Theory has led to a variety of developments and applications. In the Nooijen group the focus is on the development and use of Coupled Cluster based approaches. Coupled Cluster is a very strong and accurate approach to the quantum mechanical problem. The research results presented in the thesis testify to the Similarity Transformed Equation of Motion Coupled Cluster (STEOM-CC) for being a very accurate and yet computationally inexpensive approach for excited states. This study reveals new features about STEOM and provides promise regarding future improvement in the methodology. STEOM can be used as the first step in the construction of the Vibronic model, which is a strong tool to move to paradigms beyond the Born-Oppenheimer approximation. Spin-Orbit Coupling (SOC) is a very important ingredient required to study relativistic phenomena and its quantum mechanical implementation for many body systems is not straightforward. The most widely used SOC operator in Chemical Physics is the Breit-Pauli operator, which requires employing non-trivial approximations to the Dirac equation to adapt the theory to many body systems. The integration of electronic structure approaches, Vibronic Coupling, and SOC is essential to study the phenomenon of intersystem crossing (transition between spin states) in fine detail. In this thesis a computational benchmark of STEOM is discussed, while the frameworks of Vibronic Coupling and Spin-Orbit Coupling (SOC) are considered on a theoretical level. Intersystem Crossing Chemistry

Search results