Global ETD Search

1	Verbesserung und Überwachung von RFID-Infrastrukturen im Einzelhandel – ein aktionsforschungsbasierter Ansatz / Improvement and monitoring of RFID infrastructures in retail – a novel action research based approach Buckel, Thomas January 2014 (has links) (PDF) Die Grundlage für eine hohe Bestandsgenauigkeit ist die unternehmensübergreifende Identifikation und Nachverfolgung von Waren, die mit automatisierten Identifizierungstechnologien (Auto-ID-Technologien) ermöglicht wird. Die Einführung der Auto-ID-Technologie des Barcodes hat die Industrie vor mehr als 30 Jahren fundamental verändert. Darauf aufbauend versprechen neuere Auto-ID-Technologien wie die „Radio Frequency Identification“ (RFID) Probleme wie die Nichtverfügbarkeit von Waren, eine intransparente Diebstahlrate oder Warenschwund durch eine bessere Nachverfolgung aller Waren und eine höhere Bestandsgenauigkeit zu lösen. Die Vorteile von RFID gegenüber dem Barcode sind unter anderem die höhere Datendichte, die größere Robustheit gegenüber Umwelteinflüssen sowie die schnellere und mehrfache Erfassung von Gegenständen. Viele Unternehmen sehen sich jedoch vor allem nach der Implementierung einer RFID-Infrastruktur mit einer Vielzahl von Problemen konfrontiert. Aspekte wie wenig Unterstützung durch das Management, interner Widerstand durch Mitarbeiter, Probleme bei der Integration von Hardware und Software und vor allem eine mangelnde Datenqualität verhindern, dass die prognostizierten positiven Effekte erreicht werden können. Derartige Phänomene werden passend unter dem Begriff „Credibility Gap“ zusammengefasst. Dieser beschreibt die Problematik, dass es insgesamt an Verfahren, Methoden und gezielter Unterstützung mangelt, um die in der Literatur umfangreich versprochenen positiven Effekte tatsächlich und nachhaltig zu realisieren. Passend werden die erwarteten Einsparungen und Verbesserungen durch den RFID-Einsatz oftmals als Expertenschätzungen und sogar als größtenteils rein spekulativ bezeichnet. Das Ziel dieser Dissertation ist es, Praktikern das Erreichen der positiven RFID-Effekte zu ermöglichen. Hierzu wurden vielfältige Untersuchungen auf Basis einer langfristigen Kooperation mit einem der weltweit größten Bekleidungshändler durchgeführt, indem ein RFID-Implementierungsprojekt begleitet und intensiv mitgestaltet wurde. Zunächst wird bestätigt, dass die prognostizierten Vorteile der RFID-Technologie tatsächlich nicht allein durch die Implementierung der benötigten Infrastruktur erreicht werden können. Als Grund werden hohe Bestandsungenauigkeiten der verwendeten Bestandssysteme identifiziert, die sowohl auf technische als auch auf menschlich verursachte Fehler zurückzuführen sind. Als Folge ist die RFID-Datenqualität nicht verlässlich. Die Dissertation setzt an den Problemen des Credibility Gap an und diagnostiziert bei einer bereits implementierten RFID-Infrastruktur zunächst die Fehler und Ursachen der mangelnden Datenqualität. Darauf aufbauend werden Maßnahmen und Handlungsanweisungen vorgestellt, mit deren Hilfe die Fehler behoben und die Infrastruktur schließlich verbessert und überwacht werden kann. Um insgesamt die Anforderungen der Praxis und der Wissenschaft erfolgreich miteinander zu verknüpfen, wird als Forschungsmethode eine neuartige Kombination zweier Ausprägungen der Aktionsforschung verwendet. Als Ergebnis werden einerseits für Praktiker hilfreiche Frameworks und Tests zur Fehlerbehebung, Überwachungskennzahlen sowie Regeln des effektiven RFID-Systemmanagements beschrieben. Alle durchgeführten und in der Dissertation vorgestellten Maßnahmen führen nachweislich zu einer erhöhten Datenqualität eines implementierten RFID-Systems und stellen Möglichkeiten zur kennzahlenbasierten Visualisierung der RFID-Prozessperformance bereit. Andererseits wird ein Modell für die Verwendung der Aktionsforschung vorgeschlagen sowie eine umfangreiche Validierung der Methodik durchgeführt. Auf diese Weise wird neben der Praxisrelevanz der Ergebnisse auch die Präzision der Forschungsergebnisse sichergestellt. Sämtliche Ergebnisse dienen als Basis für vielfältige Forschungsansätze. So ermöglichen eine höhere Verlässlichkeit und Datenqualität der RFID-Informationen aussagekräftigere Analysen. Weiter sind durch fehlerkorrigierte Prozessdaten neuartige Methoden des RFID-Data-Mining denkbar. Dieser Forschungsbereich ist nach wie vor größtenteils unberührt und bietet enormes Potential, weitere durch RFID in Aussicht gestellte Vorteile zu realisieren. / The automatic identification, tracking and tracing of goods is a prerequisite for stock accuracy. In the 1980s, the barcode as an automatic identification technology substantially changed retail operations. On this basis, the rise of radio-frequency identification (RFID) in the past years was meant to solve problems such as unavailability of products, theft and shrinkage through a higher product transparency along the supply chain. Benefits of using RFID instead of using the barcode are an increased data density, a higher degree of resistance to environmental effects as well as a bulk identification of items, among others. However, companies still face a series of problems after implementing a RFID infrastructure. Issues like low management support, internal resistance among the staff, complex soft-/hardware integration issues as well as low data quality prevent companies from gaining the expected benefits. These phenomena properly are described as a „credibility gap“. This term refers to the lack of methods and procedures, to achieve the effects discussed in literature. Consequently, the expected benefits and improvements through using RFID technology are declared as expert estimates or even purely speculative assumptions. The aim of this dissertation is to facilitate practitioners in gaining the positive effects of RFID. For this purpose, an investigation in the scope of an RFID implementation project conducted by one of the world’s largest fashion retailers has been accomplished. It can be initially confirmed that a RFID implementation alone does not necessarily result in (expected) benefits. Reasons identified are various inconsistencies between the RFID and the existing inventory system caused by both technical and human issues. As a result, the RFID stock information is not reliable. Given the objective to solve the described problems associated with the credibility gap, reasons for poor data quality are identified in a first step. Subsequently, systematic procedures are introduced aiming to solve these issues in order to improve and to monitor the RFID infrastructure. To expand scientific knowledge and simultaneously assist in practical problem-solving, a novel combination of two action research types is used. On the one hand, several error-handling procedures, monitoring options and specific rules for an effective RFID system management for practitioners are described. All introduced measures demonstrably increase data quality of the implemented RFID system and provide indicator-based tools to review the RFID process performance. On the other hand, an application model for action research including a validation of the specific research method is proposed. This approach contributes to both dimensions of the rigor and relevance framework. These findings may serve as a basis for further research in various directions. An increased reliability of RFID information enables more meaningful analyses, accordingly. In addition, error-corrected processes and data lead to new methods of RFID data mining, which still poses a widely untapped area of research. RFID Aktionsforschung Datenqualität Einzelhandel Data Mining ddc:000
2	Datenqualität als Schlüsselfrage der Qualitätssicherung an Hochschulen / Data Quality as a key issue of quality assurance in higher education Pohlenz, Philipp January 2008 (has links) Hochschulen stehen zunehmend vor einem Legitimationsproblem bezüglich ihres Umgangs mit (öffentlich bereit gestellten) Ressourcen. Die Kritik bezieht sich hauptsächlich auf den Leistungsbereich der Lehre. Diese sei ineffektiv organisiert und trage durch schlechte Studienbedingungen – die ihrerseits von den Hochschulen selbst zu verantworten seien – zu langen Studienzeiten und hohen Abbruchquoten bei. Es wird konstatiert, dass mit der Lebenszeit der Studierenden verantwortungslos umgegangen und der gesellschaftliche Ausbildungsauftrag sowohl von der Hochschule im Ganzen, als auch von einzelnen Lehrenden nicht angemessen wahrgenommen werde. Um die gleichzeitig steigende Nachfrage nach akademischen Bildungsangeboten befriedigen zu können, vollziehen Hochschulen einen Wandel zu Dienstleistungsunternehmen, deren Leistungsfähigkeit sich an der Effizienz ihrer Angebote bemisst. Ein solches Leitbild ist von den Steuerungsgrundsätzen des New Public Management inspiriert. In diesem zieht sich der Staat aus der traditionell engen Verbindung zu den Hochschulen zurück und gewährt diesen lokale Autonomie, bspw. durch die Einführung globaler Haushalte zu ihrer finanziellen Selbststeuerung. Die Hochschulen werden zu Marktakteuren, die sich in der Konkurrenz um Kunden gegen ihre Wettbewerber durchsetzen, indem sie Qualität und Exzellenz unter Beweis stellen. Für die Durchführung von diesbezüglichen Leistungsvergleichen werden unterschiedliche Verfahren der Evaluation eingesetzt. In diese sind landläufig sowohl Daten der Hochschulstatistik, bspw. in Form von Absolventenquoten, als auch zunehmend Befragungsdaten, meist von Studierenden, zur Erhebung ihrer Qualitätseinschätzungen zu Lehre und Studium involviert. Insbesondere letzteren wird vielfach entgegen gehalten, dass sie nicht geeignet seien, die Qualität der Lehre adäquat abzubilden. Vielmehr seien sie durch subjektive Verzerrungen in ihrer Aussagefähigkeit eingeschränkt. Eine Beurteilung, die auf studentischen Befragungsdaten aufsetzt, müsse entsprechend zu Fehleinschätzungen und daraus folgend ungerechten Leistungssanktionen kommen. Im Sinne der Akzeptanz von Verfahren der Evaluation als Instrument hochschulinterner Qualitätssicherungs- und –entwicklungsprozesse ist daher zu untersuchen, inwieweit Beeinträchtigungen der Validität von für die Hochschulsteuerung eingesetzten Datenbasen deren Aussagekraft vermindern. Ausgehend von den entsprechenden Ergebnissen sind Entwicklungen der Verfahren möglich. Diese Frage steht im Zentrum der vorliegenden Arbeit. / Universities encounter public debate on the effectivenes of their handling of public funds. Criticism mainly refers to higher education which is regarded as ineffectively organised and -due to bad learning conditions- contributing to excessively long study times and student drop out. An irresponsible handling of students' life time is detected and it is stated that universities as institutions and individual teachers do not adquately meet society's demands regarding higher education quality. In order to respond to the raising request of higher education services, universities are modified to service-oriented "enterprises" which are competing with other institutions for "customers" by providing the publicly requested evidence of quality and excellencec of their educational services. For the implementation of respective quality comparisons, different procesures of educational evaluation are being established. Higher education statistics (students/graduates ratios) and -increasingly- students' surveys, inquiring their quality appraisals of higher education teaching are involved in these procedures. Particularly the latter encounter controverse debate on their suitability to display the quality of teaching and training adequately. Limitations of their informational value is regarded to stem from subjective distortions of the collected data. Quality assessments and respective sanctions thus are deemed by those who are evaluated to potentially result in misjudgments. In order to establish evaluation procedures as an accepted instrument of internal quality assurance and quality development, data quality and the validity concerns need to be inquired carefully. Based on respective research results, further developments and improvements of the evaluation procedures can be achieved. Lehrevaluation Qualitätssicherung Hochschulsteuerung Datenqualität Validität Evaluation Quality Assurance Higher Education Management Data Quality Validity Social sciences
3	Adaptive windows for duplicate detection Draisbach, Uwe, Naumann, Felix, Szott, Sascha, Wonneberg, Oliver January 2012 (has links) Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons). / Duplikaterkennung beschreibt das Auffinden von mehreren Datensätzen, die das gleiche Realwelt-Objekt repräsentieren. Diese Aufgabe ist nicht trivial, da sich (i) die Datensätze geringfügig unterscheiden können, so dass Ähnlichkeitsmaße für einen paarweisen Vergleich benötigt werden, und (ii) aufgrund der Datenmenge ein vollständiger, paarweiser Vergleich nicht möglich ist. Zur Lösung des zweiten Problems existieren verschiedene Algorithmen, die die Datenmenge partitionieren und nur noch innerhalb der Partitionen Vergleiche durchführen. Einer dieser Algorithmen ist die Sorted-Neighborhood-Methode (SNM), welche Daten anhand eines Schlüssels sortiert und dann ein Fenster über die sortierten Daten schiebt. Vergleiche werden nur innerhalb dieses Fensters durchgeführt. Wir beschreiben verschiedene Variationen der Sorted-Neighborhood-Methode, die auf variierenden Fenstergrößen basieren. Diese Ansätze basieren auf der Intuition, dass Bereiche mit größerer und geringerer Ähnlichkeiten innerhalb der sortierten Datensätze existieren, für die entsprechend größere bzw. kleinere Fenstergrößen sinnvoll sind. Wir beschreiben und evaluieren verschiedene Adaptierungs-Strategien, von denen nachweislich einige bezüglich Effizienz besser sind als die originale Sorted-Neighborhood-Methode (gleiches Ergebnis bei weniger Vergleichen). Informationssysteme Datenqualität Datenintegration Duplikaterkennung Duplicate Detection Data Quality Data Integration Information Systems Data processing Computer science
4	Die Wirkung von Incentives auf die Antwortqualität in Umfragen / The effect of incentives on response quality in surveys Dingelstedt, André 24 November 2015 (has links) Die standardisierte Befragung ist in der sozialwissenschaftlichen Forschung ein anerkanntes und häufig genutztes Erhebungsverfahren, um Einblicke in die Einstellungen von Bevölkerungsgruppen zu erlangen. In den letzten Jahrzehnten konnte jedoch ein deutlicher Rückgang der Teilnahmebereitschaft an Umfragen festgestellt werden. Zur Erhöhung der Teilnahmebereitschaft wird zumeist der Einsatz monetärer Anreize (= Incentives) empfohlen, wobei diese zu Beginn oder am Ende der Befragung ausgehändigt werden können. Es ist jedoch unklar, ob und inwiefern ein Incentive auch die Antwortqualität während der Befragung beeinflusst. Die bisher durchgeführten Studien weisen zumeist keine klare Begriffsdefinition für Antwortqualität auf und wählen daher Indikatoren zur Prüfung von Zusammenhängen ohne abgeleiteten theoretischen Bezug aus. Darüber hinaus fehlen im Forschungsfeld empirisch abgesicherte Theorien zur Erklärung der Wirkung von Incentives auf die Datenqualität in Befragungen. Eine theoretische Absicherung erscheint umso wichtiger, da in aktuellen Studien negative Befunde zur Antwortqualität aufgrund der Incentivierung berichtet werden (vgl. Barge & Gehlbach (2012)). Das Ziel der vorliegenden Arbeit ist daher auf Grundlage theoretischer Konzepte – unter Verwendung eines Incentive-Experiments – die Frage zu klären, ob und inwiefern Incentives systematisch auf die Antwortqualität wirken. Hierfür wurde zu Beginn eine Definition für Antwortqualität aus dem Konzept des Total Survey Error (vgl. Biemer & Lyberg (2003); Weisberg (2005)), dem Satisficing-Ansatz nach Krosnick (1991) und dem Mikrozensusgesetz (2005) abgeleitet. Es wurden vier Facetten der Antwortqualität herausgearbeitet, welche als Grundlage für die später folgenden Analysen dienten. Darauf folgend wurde zum einen als motivationspsychologischer Ansatz die Cognitive Evaluation Theory (Deci & Ryan (1985)) herangezogen und zum anderen die Reziprozitätshypothese (Gouldner (1960)) vorgestellt. Aus diesen theoretischen Ansätzen wurden Zusammenhangshypothesen abgeleitet, welche stets einen positiven Effekt von Incentives auf die Antwortqualität postulierten. Im nächsten Schritt wurde das Erhebungsdesign beschrieben (= drei Versuchsgruppen mit unterschiedlicher Incentivierung: 0 Euro, 5 Euro, 20 Euro; als Versuchspersonen wurden Studierende der Universität Göttingen herangezogen) und der zur Hypothesenprüfung benötigte, selbst entwickelte Fragebogen vorgestellt. Die zentrale Schlussfolgerung der auf Basis der Ergebnisse lautet, dass Incentives heterogene Effekte auf die vier Facetten der Antwortqualität aufweisen. Die Höhe des Incentives beeinflusst dabei nicht nur die Stärke der Effekte, sondern auch deren Wirkrichtung. Darüber hinaus konnten bei einem Incentive in Höhe von 5 Euro tendenziell positive Effekte bezüglich der Antwortqualität beobachtet werden, wobei bei einem Incentive in Höhe von 20 Euro prinzipiell eher negative Effekte festgestellt wurden. Es konnten dabei auch negative Effekte auf die Facetten der Antwortqualität in der Versuchsgruppe ohne Incentive festgestellt werden. Diese negativen Zusammenhänge werden über die Definition der Situation erklärt. Hierbei wird vermutet, dass die Befragten Forscher in ihren Studien unterstützen wollen, aber aufgrund von Fehlinterpretationen über die Ziele und Erwartungen der Forscher zu einem unerwünschten Antwortverhalten tendieren. Aus dieser Erklärung heraus wird die Vermutung formuliert, dass mit steigender intrinsischer Motivation, bzw. Reziprozität nicht die Antwortqualität steigt, sondern höchstens der Wille der Befragten für eine verbesserte Antwortqualität. 300 Antwortqualität Incentive Datenqualität Umfrage Befragung response quality data quality survey incentive Soziologie (PPN62125505X)
5	Linked Data Quality Assessment and its Application to Societal Progress Measurement Zaveri, Amrapali 19 May 2015 (has links) (PDF) In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis. Linked Data Datenqualität Semantic Web Linked Data Data Quality Semantic Web ddc:500
6	Ontologiemetriken zur Datenqualitätsverbesserung Cherix, Didier 26 February 2018 (has links) Die Datenqualität ist ein weitreichendes Thema. Bei vielen Anwendungen und Verfahren spielt sie eine große Rolle. Semantic Web ist da keine Ausnahme. Die Vollständigkeit, Fehlerfreiheit und Genauigkeit der Daten ist maßgebend für die Qualität des Ergebnisses. Im Semantic Web sind Ontologien die wichtigsten Datenquellen. Deswegen ist es wesentlich, diese auf ihre Datenqualität untersuchen zu können. In dieser Arbeit stellen wir ein Verfahren vor, um die Datenqualität einer Ontologie zu überprüfen und potentielle Fehler zu erkennen. Als erstes zeigen wir, wie aus einer Startmenge an fehlerhaften Daten (Goldstandard) andere Fehlerquellen gefunden werden können. Mit Hilfe von Clustern erweitern wir einen Goldstandard, um neue Fehler zu finden. Mit Hilfe dieser Verfahren konnten fehlerhafte Daten in DBpedia wiedergefunden werden. Da ein solcher Goldstandard nicht immer existiert, zeigen wir Methoden, um Fehlerquellen ohne ihn zu finden. Die verschiedenen Verfahren liefern eine Menge an potentiell fehlerhaften Daten. Diese Daten sollen per Hand evaluiert werden und daraus die nötigen Regeln oder Tests abgeleitet werden. Mit diesen Verfahren konnte ein hoher Recall an fehlerhaften Daten erzielt werden. Außerdem zeigen wir Fälle, die von anderen Verfahren unter anderem Databugger, nicht erkannt werden. info:eu-repo/classification/ddc/000 ddc:000
7	The intrinsic quality assessment of building footprints data on OpenStreetMap in Baden-Württemberg Fan, Hongchao, Yang, Anran, Zipf, Alexander 23 November 2017 (has links) In this work, we propose a framework to assess the quality of OpenStreetMap (OSM) building footprints data without using any reference data. More specifically, the OSM history data will be examined regarding the development of attributes, geometries and positions of building footprints. In total seven quality indicators are defined for the intrinsic quality assessment. For our case study in the federal state of Baden-Württemberg (BW), Germany, a PostgreSQL database is established based on a spatiotemporal data model which can track both individual objects and editing events on OSM. The preliminary experiments show that the quality of building footprints in BW is relatively high. And the quality in terms of semantics, geometries and positions are getting increasingly high over the time thanks to the considerable contribution of OSM volunteers. / In dieser Arbeit stellen wir ein Konzept zur Bewertung von der Qualität von Gebäudegrundrissen aus OpenStreetMap (OSM) ohne Verwendung von Referenzdaten vor. Insbesondere wird der Verlauf der Bearbeitung von Stützpunkten und Attributen der Objekte untersucht. Sieben Indikatoren Bewertung der intrinsischen Datenqualität wurden definiert. Für die vorliegende Studie ist am Beispiel von Baden-Württemberg eine PostgreSQL-Datenbank erstellt worden, um ein räumlich-zeitliches Datenmodell zu implementieren, welches sowohl einzelne Objekte als auch Bearbeitungsereignisse (Events) verfolgen kann. Vorläufige Ergebnisse zeigen eine relativ hohe Qualität der OSM-Gebäudedaten, wobei eine Steigerung der Qualität hinsichtlich Semantik, Geometrie und Positionsgenauigkeit als Beitrag der freiwilligen OSM-Bearbeiter zu beobachten ist. Datenqualität, Datenmodell info:eu-repo/classification/ddc/550 ddc:550 info:eu-repo/classification/ddc/710 ddc:710
8	Fuzzy-Set Veränderungsanalyse für hochauflösende Fernerkundungsdaten Tufte, Lars 07 April 2006 (has links) Die Fernerkundung ist eine wichtige Quelle für aktuelle und qualitativ hochwertige Geodaten bzw. für die Aktualisierung von vorhandenen Geodaten. Die Entwicklung von neuen flugzeug- und satellitengestützten digitalen Sensoren in den letzten Jahren hat diese Bedeutung noch erhöht. Die Sensoren erschließen aufgrund ihrer verbesserten räumlichen und radiometrischen Auflösung und der vollständig digitalen Verarbeitungskette neue Anwendungsfelder. Klassische Auswerteverfahren stoßen bei der Analyse der Daten häufig an ihre Grenzen. Die in dieser Arbeit vorgestellte multiskalige objektklassen-spezifische Analyse stellt hier ein sehr gut geeignetes Verfahren dar, welches gute Ergebnisse liefert. Die Klassifizierung der Daten erfolgt mittels eines Fuzzy- Klassifizierungsverfahrens, welches Vorteile in der Genauigkeit und Interpretierbarkeit der Ergebnisse liefert. Die thematische Genauigkeit (Datenqualität) der Fuzzy-Klassifizierung ist von entscheidender Bedeutung für die Akzeptanz der Ergebnisse und ihre weitere Nutzung. Hier wurden Methoden zur räumlich differenzierten Ermittlung und Visualisierung der thematischen Genauigkeit entwickelt.Außerdem wurde die Methode der segmentbasierten Fuzzy-Logic Veränderungsanalyse (SFLV) entwickelt. Die Methode ermöglicht die Veränderungsanalyse von sehr bis ultra hoch aufgelösten Fernerkundungsdaten mit einer differenzierten Aussage zu den eingetretenen Veränderungen. Sie basiert auf den Standard Operationen für unscharfe Mengen und nutzt die Ergebnisse der entwickelten Methode zur Analyse hochauflösender Fernerkundungsdaten. Die SFLV liefert einen deutlichen Mehrwert zu dem klassischen Vergleich zweier Klassifizierungsergebnisse, indem sich differenzierte Aussagen über mögliche Veränderungen machen lassen. Die Anwendbarkeit der SFLV wurde erfolgreich an einem kleinen Untersuchungsgebiet auf der Elbinsel Pagensand beispielhaft für Veränderungsanalyse von Biotoptypen auf der Grundlage von HRSC-A Daten aufgezeigt. hochauflösende Fernerkundungsdaten change detection Fuzzy-Klassifizierung Fuzzy-Set Datenqualität Computeranimation multiskalige Analyse 31 - Geowissenschaften ddc:520
9	Linked Data Quality Assessment and its Application to Societal Progress Measurement Zaveri, Amrapali 17 April 2015 (has links) In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis. info:eu-repo/classification/ddc/500 ddc:500 Linked Data, Data Quality, Semantic Web
10	Context Similarity for Retrieval-Based Imputation Ahmadov, Ahmad, Thiele, Maik, Lehner, Wolfgang, Wrembel, Robert 30 June 2022 (has links) Completeness as one of the four major dimensions of data quality is a pervasive issue in modern databases. Although data imputation has been studied extensively in the literature, most of the research is focused on inference-based approach. We propose to harness Web tables as an external data source to effectively and efficiently retrieve missing data while taking into account the inherent uncertainty and lack of veracity that they contain. Existing approaches mostly rely on standard retrieval techniques and out-of-the-box matching methods which result in a very low precision, especially when dealing with numerical data. We, therefore, propose a novel data imputation approach by applying numerical context similarity measures which results in a significant increase in the precision of the imputation procedure, by ensuring that the imputed values are of the same domain and magnitude as the local values, thus resulting in an accurate imputation. We use Dresden Web Table Corpus which is comprised of more than 125 million web tables extracted from the Common Crawl as our knowledge source. The comprehensive experimental results demonstrate that the proposed method well outperforms the default out-of-the-box retrieval approach. info:eu-repo/classification/ddc/004 ddc:004

Search results