Spelling suggestions: "subject:"data anda forminformation"" "subject:"data anda informationation""
71 |
Ein empirischer Zugang zur Ermittlung von Kompetenzprofilen in der Digitalen WirtschaftZiebarth, Sabrina, Malzahn, Nils, Zeini, Sam, Hoppe, Ulrich January 2008 (has links)
No description available.
|
72 |
On-line analytical processing in distributed data warehousesLehner, Wolfgang, Albrecht, Jens 14 April 2022 (has links)
The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined.
|
73 |
Towards Privacy and Communication Efficiency in Distributed Representation LearningSheikh S Azam (12836108) 10 June 2022 (has links)
<p>Over the past decade, distributed representation learning has emerged as a popular alternative to conventional centralized machine learning training. The increasing interest in distributed representation learning, specifically federated learning, can be attributed to its fundamental property that promotes data privacy and communication savings. While conventional ML encourages aggregating data at a central location (e.g., data centers), distributed representation learning advocates keeping data at the source and instead transmitting model parameters across the network. However, since the advent of deep learning, model sizes have become increasingly large often comprising million-billions of parameters, which leads to the problem of communication latency in the learning process. In this thesis, we propose to tackle the problem of communication latency in two different ways: (i) learning private representation of data to enable its sharing, and (ii) reducing the communication latency by minimizing the corresponding long-range communication requirements.</p>
<p><br></p>
<p>To tackle the former goal, we first start by studying the problem of learning representations that are private yet informative, i.e., providing information about intended ''ally'' targets while hiding sensitive ''adversary'' attributes. We propose Exclusion-Inclusion Generative Adversarial Network (EIGAN), a generalized private representation learning (PRL) architecture that accounts for multiple ally and adversary attributes, unlike existing PRL solutions. We then address the practical constraints of the distributed datasets by developing Distributed EIGAN (D-EIGAN), the first distributed PRL method that learns a private representation at each node without transmitting the source data. We theoretically analyze the behavior of adversaries under the optimal EIGAN and D-EIGAN encoders and the impact of dependencies among ally and adversary tasks on the optimization objective. Our experiments on various datasets demonstrate the advantages of EIGAN in terms of performance, robustness, and scalability. In particular, EIGAN outperforms the previous state-of-the-art by a significant accuracy margin (47% improvement), and D-EIGAN's performance is consistently on par with EIGAN under different network settings.</p>
<p><br></p>
<p>We next tackle the latter objective - reducing the communication latency - and propose two timescale hybrid federated learning (TT-HF), a semi-decentralized learning architecture that combines the conventional device-to-server communication paradigm for federated learning with device-to-device (D2D) communications for model training. In TT-HF, during each global aggregation interval, devices (i) perform multiple stochastic gradient descent iterations on their individual datasets, and (ii) aperiodically engage in consensus procedure of their model parameters through cooperative, distributed D2D communications within local clusters. With a new general definition of gradient diversity, we formally study the convergence behavior of TT-HF, resulting in new convergence bounds for distributed ML. We leverage our convergence bounds to develop an adaptive control algorithm that tunes the step size, D2D communication rounds, and global aggregation period of TT-HF over time to target a sublinear convergence rate of O(1/t) while minimizing network resource utilization. Our subsequent experiments demonstrate that TT-HF significantly outperforms the current art in federated learning in terms of model accuracy and/or network energy consumption in different scenarios where local device datasets exhibit statistical heterogeneity. Finally, our numerical evaluations demonstrate robustness against outages caused by fading channels, as well favorable performance with non-convex loss functions.</p>
|
74 |
[en] ENABLING DATA REGULATION EVALUATION THROUGH INTELLIGENT AND NORMATIVE MULTIAGENT SYSTEMS DESIGN / [pt] PERMITINDO A SIMULAÇÃO DE CENÁRIOS NA REGULAÇÃO DE DADOS ATRAVÉS DA APLICAÇÃO DE SISTEMAS MULTIAGENTES INTELIGENTES E NORMATIVOSPAULO HENRIQUE CARDOSO ALVES 28 November 2023 (has links)
[pt] O compartilhamento e o gerenciamento de dados pessoais são atividades desafiadoras devido à grande quantidade de dados gerados, carregados e digitalizados por cidadãos para utilizar serviços, online ou não. Esse desafio afeta não apenas os cidadãos, mas também os controladores e processadores de dados, que são responsáveis pela segurança, privacidade, anonimato e uso de dados fundados em bases legais e no propósito inicial quando os dados foram solicitados. Nesse cenário, a proteção e regulamentação dedados entram em cena para organizar esse ambiente, propondo direitos e deveres aos agentes envolvidos. No entanto, cada país é livre para criar e empregar sua própria regulamentação de dados, como o GDPR na União Europeia e a LGPD no Brasil. Portanto, embora o objetivo seja proteger os cidadãos, as regulamentações podem apresentar regras diferentes com base em sua jurisdição. Nesse cenário, as ontologias surgem para identificar as entidades e relacionamentos e mostrá-los em um nível de abstração elevado, facilitando o alinhamento das ontologias com diferentes regulamentações. Para isso, desenvolvemos um meta modelo baseado em ontologias da GDPR para possibilitar a representação da LGPD com foco na base legal do consentimento. Além disso, propusemos o GoDReP (Geraçãod e Cenários de Regulamentação de Dados) para permitir que os atores representem a interpretação de sua legislação em um cenário de aplicação específico. Apresentamos então três cenários diferentes para exercitar a aplicação do GoDReP. Além disso, nesta tese, também propomos uma arquitetura de sistema multiagente normativo e inteligente (RegulAI) para representar os direitos e obrigações apresentados pela regulamentação de dados pessoais, bem como o processo de tomada de decisão dos agentes.Por fim, desenvolvemos um estudo de caso aplicando o RegulAI no cenário de open banking. / [en] Sharing and managing personal data are challenging due to the
massive amount of data generated, uploaded, and digitalized, informed by
data subjects to utilize services, online or not. This challenge disrespects
not only the data subjects, but also data controllers and processors, which
are responsible for security, privacy, anonymity, and data usage under the
legal basis applied and the initial purpose when the data were required.
In this scenario, data protection and regulation take place to organize this
environment proposing rights and duties to the involved agents. However,
each country is free to create and employ its data regulation, e.g., GDPR
in European Union and LGPD in Brazil. Therefore, although the goal is
to protect the data subjects, the regulations can present different rules
based on their jurisdiction. In this scenario, ontologies emerge to identify
the entities and relationships to show them at a high abstraction level,
facilitating ontology alignment with different regulations. To do so, we
developed a metamodel based on GDPR ontologies to enable the LGPD
representation focused on the consent legal basis. Moreover, we proposed
GoDReP (Generation of Data Regulation Plots) to allow actors to represent
their law s interpretation in a specific application scenario. As a result,
we set three scenarios to exercise the GoDReP application. Moreover, in
this thesis, we also propose an intelligent normative multiagent system
architecture (RegulAI) to represent the personal data regulation rights
and obligations, as well as the agent s decision-making process. Finally, we
developed a use case applying RegulAI in the open banking scenario.
|
75 |
GARBLED COMPUTATION: HIDING SOFTWARE, DATAAND COMPUTED VALUESShoaib Amjad Khan (19199497) 27 July 2024 (has links)
<p dir="ltr">This thesis presents an in depth study and evaluation of a class of secure multiparty protocols that enable execution of a confidential software program $\mathcal{P}$ owned by Alice, on confidential data $\mathcal{D}$ owned by Bob, without revealing anything about $\mathcal{P}$ or $\mathcal{D}$ in the process. Our initial adverserial model is an honest-but-curious adversary, which we later extend to a malicious adverarial setting. Depending on the requirements, our protocols can be set up such that the output $\mathcal{P(D)}$ may only be learned by Alice, Bob, both, or neither (in which case an agreed upon third party would learn it). Most of our protocols are run by only two online parties which can be Alice and Bob, or alternatively they could be two commodity cloud servers (in which case neither Alice nor Bob participate in the protocols' execution - they merely initialize the two cloud servers, then go offline). We implemented and evaluated some of these protocols as prototypes that we made available to the open source community via Github. We report our experimental findings that compare and contrast the viability of our various approaches and those that already exist. All our protocols achieve the said goals without revealing anything other than upper bounds on the sizes of program and data.</p><p><br></p>
|
76 |
Graph Learning at Scale: Algorithms, Systems, and ApplicationsHaoteng Yin (13548904) 07 March 2025 (has links)
<p dir="ltr">Graph-structured data capture complex relationships and interactions between entities, offering valuable insights for scientific discovery, business modeling, and AI-driven decision-making. Despite its transformative potential, learning on graphs faces two key challenges: (1) scaling expressive learning approaches, especially subgraph-based graph representation learning, and (2) ensuring privacy when handling sensitive relational data. Both challenges arise from intricate dependencies in graph structures, which limit the effectiveness of canonical algorithms and system optimizations. This dissertation addresses these challenges through a unified framework that integrates system-aware algorithm design across two main thrusts.</p><p dir="ltr">In Thrust I, we develop a family of efficient frameworks for expressive graph representation learning that eliminate redundancy in subgraph-based methods. By decoupling dependencies over task-specific input features (i.e., query-induced subgraphs), the proposed paradigm enables efficient higher-order pattern discovery, scalable network analysis on billion-edge graphs, and low-latency online inference using reusable, task-agnostic features derived from random walks, node-set sampling, and neighborhood hashing. In Thrust II, we extend the design principle to privacy-preserving relational learning, where structural dependencies in graphs often violate the gradient decoupling assumption in standard privacy learning mechanisms like differentially private stochastic gradient descent (DP-SGD). We propose the first differential private relational learning framework that disentangles sample dependencies through a tailored DP-SGD approach. This framework enables the private fine-tuning of large language models (LLMs) on sensitive graph data, effectively addressing associated computational complexities while achieving strong privacy-utility trade-offs. </p><p dir="ltr">By co-designing learning algorithms and system implementations, this dissertation demonstrates how graph-based AI can be both scalable and trustworthy, opening new avenues for learning from complex structured data in real-world applications.</p>
|
77 |
Semantic Federation of Musical and Music-Related Information for Establishing a Personal Music Knowledge BaseGängler, Thomas 20 May 2011 (has links)
Music is perceived and described very subjectively by every individual. Nowadays, people often get lost in their steadily growing, multi-placed, digital music collection. Existing music player and management applications get in trouble when dealing with poor metadata that is predominant in personal music collections. There are several music information services available that assist users by providing tools for precisely organising their music collection, or for presenting them new insights into their own music library and listening habits. However, it is still not the case that music consumers can seamlessly interact with all these auxiliary services directly from the place where they access their music individually. To profit from the manifold music and music-related knowledge that is or can be available via various information services, this information has to be gathered up, semantically federated, and integrated into a uniform knowledge base that can personalised represent this data in an appropriate visualisation to the users. This personalised semantic aggregation of music metadata from several sources is the gist of this thesis. The outlined solution particularly concentrates on users’ needs regarding music collection management which can strongly alternate between single human beings. The author’s proposal, the personal music knowledge base (PMKB), consists of a client-server architecture with uniform communication endpoints and an ontological knowledge representation model format that is able to represent the versatile information of its use cases. The PMKB concept is appropriate to cover the complete information flow life cycle, including the processes of user account initialisation, information service choice, individual information extraction, and proactive update notification. The PMKB implementation makes use of SemanticWeb technologies. Particularly the knowledge representation part of the PMKB vision is explained in this work. Several new Semantic Web ontologies are defined or existing ones are massively modified to meet the requirements of a personalised semantic federation of music and music-related data for managing personal music collections. The outcome is, amongst others, • a new vocabulary for describing the play back domain, • another one for representing information service categorisations and quality ratings, and • one that unites the beneficial parts of the existing advanced user modelling ontologies. The introduced vocabularies can be perfectly utilised in conjunction with the existing Music Ontology framework. Some RDFizers that also make use of the outlined ontologies in their mapping definitions, illustrate the fitness in practise of these specifications. A social evaluation method is applied to carry out an examination dealing with the reutilisation, application and feedback of the vocabularies that are explained in this work. This analysis shows that it is a good practise to properly publish Semantic Web ontologies with the help of some Linked Data principles and further basic SEO techniques to easily reach the searching audience, to avoid duplicates of such KR specifications, and, last but not least, to directly establish a \"shared understanding\". Due to their project-independence, the proposed vocabularies can be deployed in every knowledge representation model that needs their knowledge representation capacities. This thesis added its value to make the vision of a personal music knowledge base come true.:1 Introduction and Background 11
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Personal Music Collection Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Music Information Management 17
2.1 Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1.1 Knowledge Representation Models . . . . . . . . . . . . . . . . . 18
2.1.1.2 Semantic Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1.3 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 Knowledge Management Systems . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2.1 Information Services . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2.2 Ontology-based Distributed Knowledge Management Systems . . 20
2.1.2.3 Knowledge Management System Design Guideline . . . . . . . . 21
2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Semantic Web Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 The Evolution of the World Wide Web . . . . . . . . . . . . . . . . . . . . . 22
Personal Music Knowledge Base Contents
2.2.1.1 The Hypertext Web . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1.2 The Normative Principles of Web Architecture . . . . . . . . . . . 23
2.2.1.3 The Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Common Semantic Web Knowledge Representation Languages . . . . . . 25
2.2.3 Resource Description Levels and their Relations . . . . . . . . . . . . . . . 26
2.2.4 Semantic Web Knowledge Representation Models . . . . . . . . . . . . . . 29
2.2.4.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4.2 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4.3 Context Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.4.4 Storing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.4.5 Providing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.4.6 Consuming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3 Music Content and Context Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 Categories of Musical Characteristics . . . . . . . . . . . . . . . . . . . . . 37
2.3.2 Music Metadata Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.3 Music Metadata Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.3.1 Audio Signal Carrier Indexing Services . . . . . . . . . . . . . . . . 41
2.3.3.2 Music Recommendation and Discovery Services . . . . . . . . . . 42
2.3.3.3 Music Content and Context Analysis Services . . . . . . . . . . . 43
2.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4 Personalisation and Environmental Context . . . . . . . . . . . . . . . . . . . . . . 44
2.4.1 User Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.2 Context Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.3 Stereotype Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3 The Personal Music Knowledge Base 48
3.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.1 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.2 Knowledge Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.1 User Account Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 Individual Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.3 Information Service Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.4 Proactive Update Notification . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.5 Information Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.6 Personal Associations and Context . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 A Personal Music Knowledge Base 57
4.1 Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.1 The Info Service Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.2 The Play Back Ontology and related Ontologies . . . . . . . . . . . . . . . . 61
4.1.2.1 The Ordered List Ontology . . . . . . . . . . . . . . . . . . . . . . 61
4.1.2.2 The Counter Ontology . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.2.3 The Association Ontology . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.2.4 The Play Back Ontology . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.3 The Recommendation Ontology . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.4 The Cognitive Characteristics Ontology and related Vocabularies . . . . . . 72
4.1.4.1 The Weighting Ontology . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.4.2 The Cognitive Characteristics Ontology . . . . . . . . . . . . . . . 73
4.1.4.3 The Property Reification Vocabulary . . . . . . . . . . . . . . . . . 78
4.1.5 The Media Types Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Knowledge Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 Personal Music Knowledge Base in Practice 87
5.1 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.1 AudioScrobbler RDF Service . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1.2 PMKB ID3 Tag Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.1 Reutilisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.3 Reviews and Mentions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2.4 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6 Conclusion and Future Work 93
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
|
78 |
Vliv zavedení datových schránek na efektivní komunikaci ve státní správě / Impact of introduction of the data box system on effective communication in public administrationSvoboda, Jiří January 2012 (has links)
Diploma work deals with current aspects of information systems in public administration. The thesis assesses impact of introduction of the data box system on effective communication in public administration and evaluates to what extend the data boxes are beneficial to citizens of this state. After definition of terms used in this field, first part of the work is focused on the e-Government project and its four pillars. Further, thesis evaluates effect of the e-Government project on reduction of administrative burden for businessmen and on the lives of citizens in the information society. The following part of the work highlights gaps in the legislation, which creates rules for the data box system and has got an immediate impact on the efficiency of their operation. The work also surveys effects of the data box system introduction on effective communication between state and local governments on the citizens of our country. Diploma work points out the problems of the data box system, analyses their causes and offers possible solutions.
|
79 |
Automatische Erkennung von Gebäudetypen auf Grundlage von GeobasisdatenHecht, Robert January 2013 (has links)
Für die kleinräumige Modellierung und Analyse von Prozessen im Siedlungsraum spielen gebäudebasierte Informationen eine zentrale Rolle. In amtlichen Geodaten, Karten und Diensten des Liegenschaftskatasters und der Landesvermessung werden die Gebäude in ihrem Grundriss modelliert. Semantische Informationen zur Gebäudefunktion, der Wohnform oder dem Baualter sind in den Geobasisdaten nur selten gegeben.
In diesem Beitrag wird eine Methode zur automatischen Klassifizierung von Gebäudegrundrissen vorgestellt mit dem Ziel, diese für die Ableitung kleinräumiger Informationen zur Siedlungsstruktur zu nutzen. Dabei kommen Methoden der Mustererkennung und des maschinellen Lernens zum Einsatz. Im Kern werden Gebäudetypologie, Eingangsdaten, Merkmalsgewinnung sowie verschiedene Klassifikationsverfahren hinsichtlich ihrer Genauigkeit und Generalisierungsfähigkeit untersucht. Der Ensemble-basierte Random-Forest-Algorithmus zeigt im Vergleich zu 15 weiteren Lernverfahren die höchste Generalisierungsfähigkeit und Effizienz und wurde als bester Klassifikator zur Lösung der Aufgabenstellung identifiziert.
Für Gebäudegrundrisse im Vektormodell, speziell den Gebäuden aus der ALK, dem ALKIS® oder dem ATKIS® Basis-DLM sowie den amtlichen Hausumringen und 3D-Gebäudemodellen, kann mit dem Klassifikator für alle städtischen Gebiete eine Klassifikationsgenauigkeit zwischen 90 % und 95 % erreicht werden. Die Genauigkeit bei Nutzung von Gebäudegrundrissen extrahiert aus digitalen topographischen Rasterkarten ist mit 76 % bis 88 % deutlich geringer.
Die automatische Klassifizierung von Gebäudegrundrissen leistet einen wichtigen Beitrag zur Gewinnung von Informationen für die kleinräumige Beschreibung der Siedlungsstruktur. Neben der Relevanz in den Forschungs- und Anwendungsfeldern der Stadtgeographie und Stadtplanung sind die Ergebnisse auch für die kartographischen Arbeitsfelder der Kartengeneralisierung, der automatisierten Kartenerstellung sowie verschiedenen Arbeitsfeldern der Geovisualisierung relevant.
|
80 |
High-Dimensional Data Representations and Metrics for Machine Learning and Data Mining / Reprezentacije i metrike za mašinsko učenje i analizu podataka velikih dimenzijaRadovanović Miloš 11 February 2011 (has links)
<p>In the current information age, massive amounts of data are gathered, at a rate prohibiting their effective structuring, analysis, and conversion into useful knowledge. This information overload is manifested both in large numbers of data objects recorded in data sets, and large numbers of attributes, also known as high dimensionality. This dis-sertation deals with problems originating from high dimensionality of data representation, referred to as the “curse of dimensionality,” in the context of machine learning, data mining, and information retrieval. The described research follows two angles: studying the behavior of (dis)similarity metrics with increasing dimensionality, and exploring feature-selection methods, primarily with regard to document representation schemes for text classification. The main results of the dissertation, relevant to the first research angle, include theoretical insights into the concentration behavior of cosine similarity, and a detailed analysis of the phenomenon of hubness, which refers to the tendency of some points in a data set to become hubs by being in-cluded in unexpectedly many <em>k</em>-nearest neighbor lists of other points. The mechanisms behind the phenomenon are studied in detail, both from a theoretical and empirical perspective, linking hubness with the (intrinsic) dimensionality of data, describing its interaction with the cluster structure of data and the information provided by class la-bels, and demonstrating the interplay of the phenomenon and well known algorithms for classification, semi-supervised learning, clustering, and outlier detection, with special consideration being given to time-series classification and information retrieval. Results pertaining to the second research angle include quantification of the interaction between various transformations of high-dimensional document representations, and feature selection, in the context of text classification.</p> / <p>U tekućem „informatičkom dobu“, masivne količine podataka se<br />sakupljaju brzinom koja ne dozvoljava njihovo efektivno strukturiranje,<br />analizu, i pretvaranje u korisno znanje. Ovo zasićenje informacijama<br />se manifestuje kako kroz veliki broj objekata uključenih<br />u skupove podataka, tako i kroz veliki broj atributa, takođe poznat<br />kao velika dimenzionalnost. Disertacija se bavi problemima koji<br />proizilaze iz velike dimenzionalnosti reprezentacije podataka, često<br />nazivanim „prokletstvom dimenzionalnosti“, u kontekstu mašinskog<br />učenja, data mining-a i information retrieval-a. Opisana istraživanja<br />prate dva pravca: izučavanje ponašanja metrika (ne)sličnosti u odnosu<br />na rastuću dimenzionalnost, i proučavanje metoda odabira atributa,<br />prvenstveno u interakciji sa tehnikama reprezentacije dokumenata za<br />klasifikaciju teksta. Centralni rezultati disertacije, relevantni za prvi<br />pravac istraživanja, uključuju teorijske uvide u fenomen koncentracije<br />kosinusne mere sličnosti, i detaljnu analizu fenomena habovitosti koji<br />se odnosi na tendenciju nekih tačaka u skupu podataka da postanu<br />habovi tako što bivaju uvrštene u neočekivano mnogo lista k najbližih<br />suseda ostalih tačaka. Mehanizmi koji pokreću fenomen detaljno su<br />proučeni, kako iz teorijske tako i iz empirijske perspektive. Habovitost<br />je povezana sa (latentnom) dimenzionalnošću podataka, opisana<br />je njena interakcija sa strukturom klastera u podacima i informacijama<br />koje pružaju oznake klasa, i demonstriran je njen efekat na<br />poznate algoritme za klasifikaciju, semi-supervizirano učenje, klastering<br />i detekciju outlier-a, sa posebnim osvrtom na klasifikaciju vremenskih<br />serija i information retrieval. Rezultati koji se odnose na<br />drugi pravac istraživanja uključuju kvantifikaciju interakcije između<br />različitih transformacija višedimenzionalnih reprezentacija dokumenata<br />i odabira atributa, u kontekstu klasifikacije teksta.</p>
|
Page generated in 0.1236 seconds