101 |
Querying, Exploring and Mining the Extended DocumentSarkas, Nikolaos 31 August 2011 (has links)
The evolution of the Web into an interactive medium that encourages active user engagement has ignited a huge increase in the amount, complexity and diversity of available textual data. This evolution forces us to re-evaluate our view of documents as simple pieces of text and of document collections as immutable and isolated. Extended documents published in the context of blogs, micro-blogs, on-line social networks, customer feedback portals, can be associated with a wealth of meta-data in addition to their textual component: tags, links, sentiment, entities mentioned in text, etc. Collections of user-generated documents grow, evolve, co-exist and interact: they are dynamic and integrated.
These unique characteristics of modern documents and document collections present us with exciting opportunities for improving the way we interact with them. At the same time, this additional complexity combined with the vast amounts of available textual data present us with formidable computational challenges. In this context, we introduce, study and extensively evaluate an array of effective and efficient solutions for querying, exploring and mining extended documents, dynamic and integrated document collections.
For collections of socially annotated extended documents, we present an improved probabilistic search and ranking approach based on our growing understanding of the dynamics of the social annotation process.
For extended documents, such as blog posts, associated with entities extracted from text and categorical attributes, we enable their interactive exploration through the efficient computation of strong entity associations. Associated entities are computed for all possible attribute value restrictions of the document collection.
For extended documents, such as user reviews, annotated with a numerical rating, we introduce a keyword-query refinement approach. The solution enables the interactive navigation and exploration of large result sets.
We extend the skyline query to document streams, such as news articles, associated with categorical attributes and partially ordered domains. The technique incrementally maintains a small set of recent, uniquely interesting extended documents from the stream.Finally, we introduce a solution for the scalable integration of structured data sources into Web search. Queries are analysed in order to determine what structured data, if any, should be used to augment Web search results.
|
102 |
Querying, Exploring and Mining the Extended DocumentSarkas, Nikolaos 31 August 2011 (has links)
The evolution of the Web into an interactive medium that encourages active user engagement has ignited a huge increase in the amount, complexity and diversity of available textual data. This evolution forces us to re-evaluate our view of documents as simple pieces of text and of document collections as immutable and isolated. Extended documents published in the context of blogs, micro-blogs, on-line social networks, customer feedback portals, can be associated with a wealth of meta-data in addition to their textual component: tags, links, sentiment, entities mentioned in text, etc. Collections of user-generated documents grow, evolve, co-exist and interact: they are dynamic and integrated.
These unique characteristics of modern documents and document collections present us with exciting opportunities for improving the way we interact with them. At the same time, this additional complexity combined with the vast amounts of available textual data present us with formidable computational challenges. In this context, we introduce, study and extensively evaluate an array of effective and efficient solutions for querying, exploring and mining extended documents, dynamic and integrated document collections.
For collections of socially annotated extended documents, we present an improved probabilistic search and ranking approach based on our growing understanding of the dynamics of the social annotation process.
For extended documents, such as blog posts, associated with entities extracted from text and categorical attributes, we enable their interactive exploration through the efficient computation of strong entity associations. Associated entities are computed for all possible attribute value restrictions of the document collection.
For extended documents, such as user reviews, annotated with a numerical rating, we introduce a keyword-query refinement approach. The solution enables the interactive navigation and exploration of large result sets.
We extend the skyline query to document streams, such as news articles, associated with categorical attributes and partially ordered domains. The technique incrementally maintains a small set of recent, uniquely interesting extended documents from the stream.Finally, we introduce a solution for the scalable integration of structured data sources into Web search. Queries are analysed in order to determine what structured data, if any, should be used to augment Web search results.
|
103 |
The Research of Supporting Customer Values' Resolutions with "Data Warehousing"~ A Case Study of Concerning Subscribers' Churn Rate in TransAsia TelecommunicationsYen, Yu-Lung 28 June 2001 (has links)
ABSTRACT
In recent years, Customer Relationship Management (CRM) and One to One Marketing have become two hit topics. Many enterprises have invested huge amount of money and manpower in these fields, hoping to build up a perfect model of customer management. Their major purpose of doing so is in desiring to raise their customer loyalty, therefore can create their corporation profits. In order to achieve this goal, they have to start to understand their customers.
Advocators of One to One Marketing, Peppers and Rogers¡]1995¡^, have declared that reducing churn rate by 5% increases profit by 100%. Core value of marketing is going to shift from ¡§product¡¨ to ¡§customer¡¨. Whoever owns the most customer knowledge owns the most customer capitals in 21st century. Through Data Mining, a business can categorize its mass database into valuable information of customer behavior model.
For learning to take place, data from many sources¡Xbilling records, scanner data, registration forms, applications, call records, coupon redemptions, surveys¡Xmust first be gathered together and organized in a consistent and useful way. This is called data warehousing. Data warehousing allows the enterprise to remember what it has noticed about its customers. Next, the data must be analyzed, understood, and turned into actionable information. That is where data mining comes in.
By means of case study and grounded-theory, this article is in research of linkage between Data warehousing and increase of corporate value. As many business do not share their study outcome and experience on customer knowledge, this research provides a proof on how Data warehousing can efficiently support a business in reducing its churn rate and creating more business value.
|
104 |
The use of matrix decomposition for data mining and subscriber classification in mobile cellular networks.João, Zolana Rui. January 2011 (has links)
M. Tech. Electrical Engineering. / Telecommunication databases contain billions of records and are among the largest in the world, reaching around 30 terabits (30 trillion bits). Data mining is a proven solution for analysing such large volumes of data where traditional methods of turning data into knowledge are impractical. However, the increasing size (scalability), complexity (complex data types) and high dimensionality of telecommunication databases pose a significant challenge for conventional data mining approaches. In this dissertation, a matrix decomposition method (Singular Value Decomposition or SVD) is used to improve data mining for subscriber classification in mobile cellular networks. Using a large real mobile network dataset, the performance of a standard data mining approach (for clustering analysis) is evaluated when it is used with, and without, matrix decomposition. The proposed approach decreases the computational cost, for a given size of data (in terms of number of rows and columns). We also demonstrate improvement of the quality of clusters, yielding the following improvements in clustering assessment indices: 2.45% in Jaccard score, 3.5% in purity, and 1.35% in efficiency. Subscribers with different behaviours in the network are classified on the basis of various features; SVD analysis on their voice, text message, and data usage patterns are also performed. The proposed data mining model can be used for business intelligence activities such as customer segmentation, traffic modelling and social network analysis.
|
105 |
Data Quality Through Active Constraint Discovery and MaintenanceChiang, Fei Yen 10 December 2012 (has links)
Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ensure that the data conforms to the intended application domain semantics, we develop two algorithms focusing on constraint discovery. The first algorithm discovers a class of conditional constraints, which hold over a subset of the relation, under specific conditional values. The second algorithm discovers attribute domain constraints, which bind specific values to the attributes of a relation for a given domain. These two types of constraints have been shown to be useful for data cleaning.
In practice, weak enforcement of constraints often occurs for performance reasons. This leads to inconsistencies between the data and the set of defined constraints. To resolve this inconsistency, we must determine whether it is the constraints or the data that is incorrect, and then make the necessary corrections. We develop a repair model that considers repairs to the data and repairs to the constraints on an equal footing. We present repair algorithms that find the necessary repairs to bring the data and the constraints back to a consistent state. Finally, we study the efficiency and quality of our techniques. We show that our constraint discovery algorithms find meaningful constraints with good precision and recall. We also show that our repair algorithms resolve many inconsistencies with high quality repairs, and propose repairs that previous algorithms did not consider.
|
106 |
Data Mining : in der Medizin und Medizintechnik /Mikut, Ralf. January 2008 (has links)
Teilw. zugl.: Karlsruhe, Univ., Habil.-Schr., 2007 u.d.T.: Mikut, Ralf: Automatisierte Datenanalyse in der Medizin und Medizintechnik.
|
107 |
Personalisierung der Informationsversorgung in Unternehmen /Felden, Carsten. January 2006 (has links) (PDF)
Zugl.: Duisburg-Essen, Univ., Habil.-Schr., 2006.
|
108 |
High-dimensional glyph-based visualization and interactive techniquesChung, David H. S. January 2014 (has links)
The advancement of modern technology and scientific measurements has led to datasets growing in both size and complexity, exposing the need for more efficient and effective ways of visualizing and analysing data. Despite the amount of progress in visualization methods, high-dimensional data still poses a number of significant challenges in terms of the technical ability of realising such a mapping, and how accurate they are actually interpreted. The different data sources and characteristics which arise from a wide range of scientific domains as well as specific design requirements constantly create new special challenges for visualization research. This thesis presents several contributions to the field of glyph-based visualization. Glyphs are parametrised objects which encode one or more data values to its appearance (also referred to as visual channels) such as their size, colour, shape, and position. They have been widely used to convey information visually, and are especially well suited for displaying complex, multi-faceted datasets. Its major strength is the ability to depict patterns of data in the context of a spatial relationship, where multi-dimensional trends can often be perceived more easily. Our research is set in the broad scope of multi-dimensional visualization, addressing several aspects of glyph-based techniques, including visual design, perception, placement, interaction, and applications. In particular, this thesis presents a comprehensive study on one interaction technique, namely sorting, for supporting various analytical tasks. We have outlined the concepts of glyph- based sorting, identified a set of design criteria for sorting interactions, designed and prototyped a user interface for sorting multivariate glyphs, developed a visual analytics technique to support sorting, conducted an empirical study on perceptual orderability of visual channels used in glyph design, and applied glyph-based sorting to event visualization in sports applications. The content of this thesis is organised into two parts. Part I provides an overview of the basic concepts of glyph-based visualization, before describing the state-of-the-art in this field. We then present a collection of novel glyph-based approaches to address challenges created from real-world applications. These are detailed in Part II. Our first approach involves designing glyphs to depict the composition of multiple error-sensitivity fields. This work addresses the problem of single camera positioning, using both 2D and 3D methods to support camera configuration based on various constraints in the context of a real-world environment. Our second approach present glyphs to visualize actions and events "at a glance". We discuss the relative merits of using metaphoric glyphs in comparison to other types of glyph designs to the particular problem of real-time sports analysis. As a result of this research, we delivered a visualization software, MatchPad, on a tablet computer. It successfully helped coaching staff and team analysts to examine actions and events in detail whilst maintaining a clear overview of the match, and assisted in their decision making during the matches. Abstract shortened by ProQuest.
|
109 |
Data Science for Small BusinessesJanuary 2016 (has links)
abstract: This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and find a solution to these problems faced, in the form of a tool which utilizes data science. The tool will have features which will aid the vendor to mine their data which they record themselves and find useful information which will benefit their businesses. Since there is lack of properly documented data, One Class Classification using Support Vector Machine (SVM) is used to build a classifying model that can return positive values for audience that is likely to respond to a marketing strategy. Market basket analysis is used to choose products from the inventory in a way that patterns are found amongst them and therefore there is a higher chance of a marketing strategy to attract audience. Also, higher selling products can be used to the vendors' advantage and lesser selling products can be paired with them to have an overall profit to the business. The tool, as envisioned, meets all the requirements that it was set out to have and can be used as a stand alone application to bring the power of data mining into the hands of a small vendor. / Dissertation/Thesis / Masters Thesis Engineering 2016
|
110 |
Přizpůsobitelný prohlížeč pro Linked Data / Customizable Linked Data BrowserKlíma, Karel January 2015 (has links)
The aim of this thesis is to identify key requirements for exploring Linked Data and design and implement a web application which serves as a Linked Data browser, including search and customization features. In comparison to existing approaches it will enable users to provide templates which define a visual style for presentation of particular types of Linked Data resources. Alternatively, the application can provide other means of altering the presentation of data or the appearance of the application. Powered by TCPDF (www.tcpdf.org)
|
Page generated in 0.0557 seconds