Global ETD Search

21	Metrics and Test Procedures for Data Quality Estimation in the Aeronautical Telemetry Channel Hill, Terry 10 1900 (has links) ITC/USA 2015 Conference Proceedings / The Fifty-First Annual International Telemetering Conference and Technical Exhibition / October 26-29, 2015 / Bally's Hotel & Convention Center, Las Vegas, NV / There is great potential in using Best Source Selectors (BSS) to improve link availability in aeronautical telemetry applications. While the general notion that diverse data sources can be used to construct a consolidated stream of "better" data is well founded, there is no standardized means of determining the quality of the data streams being merged together. Absent this uniform quality data, the BSS has no analytically sound way of knowing which streams are better, or best. This problem is further exacerbated when one imagines that multiple vendors are developing data quality estimation schemes, with no standard definition of how to measure data quality. In this paper, we present measured performance for a specific Data Quality Metric (DQM) implementation, demonstrating that the signals present in the demodulator can be used to quickly and accurately measure the data quality, and we propose test methods for calibrating DQM over a wide variety of channel impairments. We also propose an efficient means of encapsulating this DQM information with the data, to simplify processing by the BSS. This work leads toward a potential standardization that would allow data quality estimators and best source selectors from multiple vendors to interoperate. Data Quality Metrics Data Quality Encapsulation Best Source Selection Maximum Likelihood Data Combining
22	Data Quality Model for Machine Learning Nitesh Varma Rudraraju, Nitesh, Varun Boyanapally, Varun January 2019 (has links) Context: - Machine learning is a part of artificial intelligence, this area is now continuously growing day by day. Most internet related services such as Social media service, Email Spam, E-commerce sites, Search engines are now using machine learning. The Quality of machine learning output relies on the input data, so the input data is crucial for machine learning and good quality of input data can give a better outcome to the machine learning system. In order to achieve quality data, a data scientist can use a data quality model on data of machine learning. Data quality model can help data scientists to monitor and control the input data of machine learning. But there is no considerable amount of research done on data quality attributes and data quality model for machine learning. Objectives: - The primary objectives of this paper are to find and understand the state-of-art and state-of-practice on data quality attributes for machine learning, and to develop a data quality model for machine learning in collaboration with data scientists. Methods: - This paper mainly consists of two studies: - 1) Conducted a literature review in the different database in order to identify literature on data quality attributes and data quality model for machine learning. 2) An in-depth interview study was conducted to allow a better understanding and verifying of data quality attributes that we identified from our literature review study, this process is carried out with the collaboration of data scientists from multiple locations. Totally of 15 interviews were performed and based on the results we proposed a data quality model based on these interviewees perspective. Result: - We identified 16 data quality attributes as important from our study which is based on the perspective of experienced data scientists who were interviewed in this study. With these selected data quality attributes, we proposed a data quality model with which quality of data for machine learning can be monitored and improved by data scientists, and effects of these data quality attributes on machine learning have also been stated. Conclusion: - This study signifies the importance of quality of data, for which we proposed a data quality model for machine learning based on the industrial experiences of a data scientist. This research gap is a benefit to all machine learning practitioners and data scientists who intended to identify quality data for machine learning. In order to prove that data quality attributes in the data quality model are important, a further experiment can be conducted, which is proposed in future work. Machine learning Data Quality Attributes Data Quality Model Software Engineering Programvaruteknik
23	Data Quality : An organizational rather than a technical concern Hedström, Jakob, Dimitrova, Petya January 2019 (has links) Delegating responsibilities for ensuring data quality to employees close to operations is generally regarded as beneficial according to academia, rather than allowing the IT department full authority. The business employees are suitable for this task, as they have both a practical understanding for the business and a role that often is regarded as data-driven. However, delegation of responsibilities is argued to be one of the biggest barriers concerning data quality. This study examines this phenomenon by connecting delegation of responsibilities for ensuring data quality to research on the motivational aspects of controlling data, which lays the foundation for the empirical investigation. This single case study was conducted qualitatively using semi-structured interviews. The data sample consisted of a business department, including six business controllers and their manager, at a large Swedish manufacturing company. Further, the data collection was complemented by conducting an interview with a representative at the IT department. The study shows how business controllers are dependent on the quality of data, and due to their awareness of the benefits in avoiding data errors they are autonomously motivated to ensure good data quality. This strengthen the argument of delegating control of data quality to the business department, which gives the study practical significance. Data quality delegation of responsibilities motivation in ensuring data quality data strategy Business Administration Företagsekonomi
24	Efficient Extraction and Query Benchmarking of Wikipedia Data Morsey, Mohamed 06 January 2014 (has links) (PDF) Knowledge bases are playing an increasingly important role for integrating information between systems and over the Web. Today, most knowledge bases cover only specific domains, they are created by relatively small groups of knowledge engineers, and it is very cost intensive to keep them up-to-date as domains change. In parallel, Wikipedia has grown into one of the central knowledge sources of mankind and is maintained by thousands of contributors. The DBpedia (http://dbpedia.org) project makes use of this large collaboratively edited knowledge source by extracting structured content from it, interlinking it with other knowledge bases, and making the result publicly available. DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Furthermore, many companies and researchers use DBpedia and its public services to improve their applications and research approaches. However, the DBpedia release process is heavy-weight and the releases are sometimes based on several months old data. Hence, a strategy to keep DBpedia always in synchronization with Wikipedia is highly required. In this thesis we propose the DBpedia Live framework, which reads a continuous stream of updated Wikipedia articles, and processes it. DBpedia Live processes that stream on-the-fly to obtain RDF data and updates the DBpedia knowledge base with the newly extracted data. DBpedia Live also publishes the newly added/deleted facts in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Moreover, the new DBpedia Live framework incorporates several significant features, e.g. abstract extraction, ontology changes, and changesets publication. Basically, knowledge bases, including DBpedia, are stored in triplestores in order to facilitate accessing and querying their respective data. Furthermore, the triplestores constitute the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triplestore implementations. We introduce a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triplestores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triplestores and provide results for the popular triplestore implementations Virtuoso, Sesame, Apache Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triplestores is by far less homogeneous than suggested by previous benchmarks. Further, one of the crucial tasks when creating and maintaining knowledge bases is validating their facts and maintaining the quality of their inherent data. This task include several subtasks, and in thesis we address two of those major subtasks, specifically fact validation and provenance, and data quality The subtask fact validation and provenance aim at providing sources for these facts in order to ensure correctness and traceability of the provided knowledge This subtask is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. We present DeFacto (Deep Fact Validation), which is an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. On the other hand the subtask of data quality maintenance aims at evaluating and continuously improving the quality of data of the knowledge bases. We present a methodology for assessing the quality of knowledge bases’ data, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia. Knowledge base DBpedia Benchmarking Data Quality Knowledge base DBpedia Benchmarking Data Quality ddc:500
25	PERFORMANCE RESULTS USING DATA QUALITY ENCAPSULATION (DQE) AND BEST SOURCE SELECTION (BSS) IN AERONAUTICAL TELEMETRY ENVIRONMENTS Geoghegan, Mark, Schumacher, Robert 10 1900 (has links) Flight test telemetry environments can be particularly challenging due to RF shadowing, interference, multipath propagation, antenna pattern variations, and large operating ranges. In cases where the link quality is unacceptable, applying multiple receiving assets to a single test article can significantly improve the overall link reliability. The process of combining multiple received streams into a single consolidated stream is called Best Source Selection (BSS). Recent developments in BSS technology include a description of the maximum likelihood detection approach for combining multiple bit sources, and an efficient protocol for providing the real-time data quality metrics necessary for optimal BSS performance. This approach is being standardized and will be included in Appendix 2G of IRIG-106-17. This paper describes the application of this technology and presents performance results obtained during flight testing. Best Source Selection (BSS) Data Quality Metric (DQM) Data Quality Encapsulation (DQE) Aeronautical Flight Testing
26	Customer Data Management Sehat, Mahdis, PAVEZ FLORES, RENÉ January 2012 (has links) Abstract As the business complexity, number of customers continues to grow and customers evolve into multinational organisations that operate across borders, many companies are faced with great challenges in the way they manage their customer data. In today’s business, a single customer may have a relationship with several entities of an organisation, which means that the customer data is collected through different channels. One customer may be described in different ways by each entity, which makes it difficult to obtain a unified view of the customer. In companies where there are several sources of data and the data is distributed to several systems, data environments become heterogenic. In this state, customer data is often incomplete, inaccurate and inconsistent throughout the company. This thesis aims to study how organisations with heterogeneous customer data sources implement the Master Data Management (MDM) concept to achieve and maintain high customer data quality. The purpose is to provide recommendations for how to achieve successful customer data management using MDM based on existing literature related to the topic and an interview-based empirical study. Successful customer data management is more of an organisational issue than a technological one and requires a top-down approach in order to develop a common strategy for an organisation’s customer data management. Proper central assessment and maintenance processes that can be adjusted according to the entities’ needs must be in place. Responsibilities for the maintenance of customer data should be delegated to several levels of an organisation in order to better manage customer data. Customer Data Management Master Data Management Customer Data Quality Data Quality Management Engineering and Technology Teknik och teknologier
27	Data quality challenges in the UK social housing sector Duvier, Caroline, Neagu, Daniel, Oltean-Dumbrava, Crina, Dickens, D. 12 October 2017 (has links) No / The social housing sector has yet to realise the potential of high data quality. While other businesses, mainly in the private sector, reap the benefits of data quality, the social housing sector seems paralysed, as it is still struggling with recent government regulations and steep revenue reduction. This paper offers a succinct review of relevant literature on data quality and how it relates to social housing. The Housing and Development Board in Singapore offers a great example on how to integrate data quality initiatives in the social housing sector. Taking this example, the research presented in this paper is extrapolating cross-disciplinarily recommendations on how to implement data quality initiatives in social housing providers in the UK. Social housing Data quality Data quality initiatives Recommendations Social housing sector
28	Towards a Data Quality Framework for Heterogeneous Data Micic, Natasha, Neagu, Daniel, Campean, Felician, Habib Zadeh, Esmaeil 22 April 2017 (has links) yes / Every industry has signiﬁcant data output as a product of their working process, and with the recent advent of big data mining and integrated data warehousing it is the case for a robust methodology for assessing the quality for sustainable and consistent processing. In this paper a review is conducted on Data Quality (DQ) in multiple domains in order to propose connections between their methodologies. This critical review suggests that within the process of DQ assessment of heterogeneous data sets, not often are they treated as separate types of data in need of an alternate data quality assessment framework. We discuss the need for such a directed DQ framework and the opportunities that are foreseen in this research area and propose to address it through degrees of heterogeneity.
29	Assessing Data Quality of ERP and CRM Systems Sarwar, Muhammad Azeem January 2014 (has links) Data Quality confirms the correct and meaningful representation of real world information. Researchers have proposed frameworks to measure and analyze the Data Quality. Still modern organizations find it very challenging to state the level of enterprise Data Quality maturity. This study aims at defining the Data Quality of a system also examine the Data Quality Assessment practices. A definition for Data Quality is suggested with the help of systematic literature review. Literature review also provided a list of dimensions and initiatives for Data Quality Assessment. A survey is conducted to examine these aggregated aspects of Data Quality in an organization actively using ERP and CRM systems. The survey was aimed at collecting organizational awareness of Data Quality and to study the practices followed to ensure the Data Quality in ERP and CRM systems. The survey results identified data validity, accuracy and security as the main areas of interest for Data Quality. The results also indicate that, due to audit requirements of ERP systems, ERP systems have higher demand of Data Quality as compared to CRM systems. Data Quality Data Quality Management Quality Assessment ERP CRM Engineering and Technology Teknik och teknologier
30	Řízení kvality dat v malých a středních firmách / Data quality management in small and medium enterprises Zelený, Pavel January 2010 (has links) This diploma thesis deals with the data quality management. There are many tools and methodologies to support the data quality management even in Czech market but they are all only for large companies. Small and middle companies can't afford them because of high cost. The first goal of this thesis is to summarize principles of the methodologies and then on the base of the methodologies to suggest more simple methodology for small and middle companies. In the second part of thesis is created and adapted the methodology for a specific company. The first step is to choose the data area of interest in the company. Because of impossibility to buy a software tool to clean data, there are defined relatively simple rules which are base source to create cleaning scripts in SQL language. The scripts are used for automatic data cleaning. On the base of next analyze is decided what data should be cleaned manually. In the next step are described recommendations how to remove duplicities from the database. There is used a functionality of the company's production system. The last step of the methodology is to create a control mechanism which have to keep the required data quality in future. At the end of thesis is made a data research in four data sources. All these sources are from companies using the same production system. The reason of research is to present the overview of data quality and to help with decision about cleaning data in the companies also.

Search results