Global ETD Search

1	Nestruktūrizuotų duomenų modelio sudarymas ir tyrimas / Construction and analysis of unstructured data model Surdokas, Mindaugas 24 May 2005 (has links) The amount of information on the internet is increasing every day and it is very difficult to find exact information from big list of search results. Users typically generate query specifying keywords and the search engine displays documents, what contains keywords what were set by user. Search within unstructured data starts from data processing: first of all documents are braked into sentences, then words and word groups are analyzed semantically and syntactically, to obtain facts. Facts describe objects of real world. Facts obtained from unstructured data will be stored into database, and unstructured data will be transformed into structured data. Afterwards it will be easy to analyze, conjunct, filter or do other manipulations with structured data. Informatics Unstructured data model Nestruktūrizuotų duomenų modelis
2	Making Sense of Online Reviews: A Machine Learning Approach: An Abstract Harrison, Dana E., Ajjan, Haya 01 January 2020 (has links) It is estimated that 80% of companies’ data is unstructured. Unstructured data, or data that is not predefined by numerical values, continues to grow at a rapid pace. Images, text, videos and voice are all examples of unstructured data. Companies can use this type of data to leverage novel insights unavailable through more easily manageable, structured data. Unstructured data, however, creates a challenge since it often requires substantial coding prior to performing an analysis. The purpose of this study is to describe the steps and introduce computational methods that can be adopted to further explore unstructured, online reviews. The unstructured nature of online reviews requires extensive text analytics processing. This study introduces methods for text analytics including tokenization at the sentence level, lemmatization or stemming to reduce inflectional forms of the words appearing in the text, and ‘bag of n-grams’ approach. We will also introduce lexicon-based feature engineering and methods to develop new lexicons for capturing theoretically established constructs and relationships that are specific to the domain of study. The numeric features generated in the analysis will then be analyzed using machine learning algorithms. This process can be applied to the analysis of other unstructured data such as dyadic information exchange between customer service, salespeople, customers and channel members. Although not a comprehensive set of examples, companies can apply results from unstructured data analysis to examine a variety of outcomes related to customer decisions, managing channels and mitigating potential crisis situations. Understanding interdisciplinary methods of analyzing unstructured data is critical as the availability of this type of data continues to accelerate and enables researchers to develop theoretical contributions within the marketing discipline. lemmatization machine learning stemming tokenization unstructured data Management and Marketing
3	An artefact to analyse unstructured document data stores / by André Romeo Botes Botes, André Romeo January 2014 (has links) Structured data stores have been the dominating technologies for the past few decades. Although dominating, structured data stores lack the functionality to handle the ‘Big Data’ phenomenon. A new technology has recently emerged which stores unstructured data and can handle the ‘Big Data’ phenomenon. This study describes the development of an artefact to aid in the analysis of NoSQL document data stores in terms of relational database model constructs. Design science research (DSR) is the methodology implemented in the study and it is used to assist in the understanding, design and development of the problem, artefact and solution. This study explores the existing literature on DSR, in addition to structured and unstructured data stores. The literature review formulates the descriptive and prescriptive knowledge used in the development of the artefact. The artefact is developed using a series of six activities derived from two DSR approaches. The problem domain is derived from the existing literature and a real application environment (RAE). The reviewed literature provided a general problem statement. A representative from NFM (the RAE) is interviewed for a situation analysis providing a specific problem statement. An objective is formulated for the development of the artefact and suggestions are made to address the problem domain, assisting the artefact’s objective. The artefact is designed and developed using the descriptive knowledge of structured and unstructured data stores, combined with prescriptive knowledge of algorithms, pseudo code, continuous design and object-oriented design. The artefact evolves through multiple design cycles into a final product that analyses document data stores in terms of relational database model constructs. The artefact is evaluated for acceptability and utility. This provides credibility and rigour to the research in the DSR paradigm. Acceptability is demonstrated through simulation and the utility is evaluated using a real application environment (RAE). A representative from NFM is interviewed for the evaluation of the artefact. Finally, the study is communicated by describing its findings, summarising the artefact and looking into future possibilities for research and application. / MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014 Design science research Structured data stores Unstructured data stores Artefact NoSQL Big data
4	An artefact to analyse unstructured document data stores / by André Romeo Botes Botes, André Romeo January 2014 (has links) Structured data stores have been the dominating technologies for the past few decades. Although dominating, structured data stores lack the functionality to handle the ‘Big Data’ phenomenon. A new technology has recently emerged which stores unstructured data and can handle the ‘Big Data’ phenomenon. This study describes the development of an artefact to aid in the analysis of NoSQL document data stores in terms of relational database model constructs. Design science research (DSR) is the methodology implemented in the study and it is used to assist in the understanding, design and development of the problem, artefact and solution. This study explores the existing literature on DSR, in addition to structured and unstructured data stores. The literature review formulates the descriptive and prescriptive knowledge used in the development of the artefact. The artefact is developed using a series of six activities derived from two DSR approaches. The problem domain is derived from the existing literature and a real application environment (RAE). The reviewed literature provided a general problem statement. A representative from NFM (the RAE) is interviewed for a situation analysis providing a specific problem statement. An objective is formulated for the development of the artefact and suggestions are made to address the problem domain, assisting the artefact’s objective. The artefact is designed and developed using the descriptive knowledge of structured and unstructured data stores, combined with prescriptive knowledge of algorithms, pseudo code, continuous design and object-oriented design. The artefact evolves through multiple design cycles into a final product that analyses document data stores in terms of relational database model constructs. The artefact is evaluated for acceptability and utility. This provides credibility and rigour to the research in the DSR paradigm. Acceptability is demonstrated through simulation and the utility is evaluated using a real application environment (RAE). A representative from NFM is interviewed for the evaluation of the artefact. Finally, the study is communicated by describing its findings, summarising the artefact and looking into future possibilities for research and application. / MSc (Computer Science), North-West University, Vaal Triangle Campus, 2014 Design science research Structured data stores Unstructured data stores Artefact NoSQL Big data
5	STUDYING THE IMPACT OF DEVELOPER COMMUNICATION ON THE QUALITY AND EVOLUTION OF A SOFTWARE SYSTEM Bettenburg, Nicolas 22 May 2014 (has links) Software development is a largely collaborative effort, of which the actual encoding of program logic in source code is a relatively small part. Software developers have to collaborate effectively and communicate with their peers in order to avoid coordination problems. To date, little is known how developer communication during software development activities impacts the quality and evolution of a software. In this thesis, we present and evaluate tools and techniques to recover communication data from traces of the software development activities. With this data, we study the impact of developer communication on the quality and evolution of the software through an in-depth investigation of the role of developer communication during software development activities. Through multiple case-studies on a broad spectrum of open-source software projects, we find that communication between developers stands in a direct relationship to the quality of the software. Our findings demonstrate that our models based on developer communication explain software defects as well as state-of-the art models that are based on technical information such as code and process metrics, and that social information metrics are orthogonal to these traditional metrics, leading to a more complete and integrated view on software defects. In addition, we find that communication between developers plays a important role in maintaining a healthy contribution management process, which is one of the key factors to the successful evolution of the software. Source code contributors who are part of the community surrounding open-source projects are available for limited times, and long communication times can lead to the loss of valuable contributions. Our thesis illustrates that software development is an intricate and complex process that is strongly influenced by the social interactions between the stakeholders involved in the development activities. A traditional view based solely on technical aspects of software development such as source code size and complexity, while valuable, limits our understanding of software development activities. The research presented in this thesis consists of a first step towards gaining a more holistic view on software development activities. / Thesis (Ph.D, Computing) -- Queen's University, 2014-05-22 12:07:13.823 Communication Software Evolution Unstructured Data Software Engineering Software Quality Empirical Studies
6	Conditional random fields for noisy text normalisation Coetsee, Dirko 12 1900 (has links) Thesis (MScEng) -- Stellenbosch University, 2014. / ENGLISH ABSTRACT: The increasing popularity of microblogging services such as Twitter means that more and more unstructured data is available for analysis. The informal language usage in these media presents a problem for traditional text mining and natural language processing tools. We develop a pre-processor to normalise this noisy text so that useful information can be extracted with standard tools. A system consisting of a tokeniser, out-of-vocabulary token identifier, correct candidate generator, and N-gram language model is proposed. We compare the performance of generative and discriminative probabilistic models for these different modules. The effect of normalising the training and testing data on the performance of a tweet sentiment classifier is investigated. A linear-chain conditional random field, which is a discriminative model, is found to work better than its generative counterpart for the tokenisation module, achieving a 0.76% character error rate compared to 1.41% for the finite state automaton. For the candidate generation module, however, the generative weighted finite state transducer works better, getting the correct clean version of a word right 36% of the time on the first guess, while the discriminatively trained hidden alignment conditional random field only achieves 6%. The use of a normaliser as a pre-processing step does not significantly affect the performance of the sentiment classifier. / AFRIKAANSE OPSOMMING: Mikro-webjoernale soos Twitter word al hoe meer gewild, en die hoeveelheid ongestruktureerde data wat beskikbaar is vir analise groei daarom soos nooit tevore nie. Die informele taalgebruik in hierdie media maak dit egter moeilik om tradisionele tegnieke en bestaande dataverwerkingsgereedskap toe te pas. ’n Stelsel wat hierdie ruiserige teks normaliseer word ontwikkel sodat bestaande pakkette gebruik kan word om die teks verder te verwerk. Die stelsel bestaan uit ’n module wat die teks in woordeenhede opdeel, ’n module wat woorde identifiseer wat gekorrigeer moet word, ’n module wat dan kandidaat korreksies voorstel, en ’n module wat ’n taalmodel toepas om die mees waarskynlike skoon teks te vind. Die verrigting van diskriminatiewe en generatiewe modelle vir ’n paar van hierdie modules word vergelyk en die invloed wat so ’n normaliseerder op die akkuraatheid van ’n sentimentklassifiseerder het word ondersoek. Ons bevind dat ’n lineêre-ketting voorwaardelike toevalsveld—’n diskriminatiewe model — beter werk as sy generatiewe eweknie vir tekssegmentering. Die voorwaardelike toevalsveld-model behaal ’n karakterfoutkoers van 0.76%, terwyl die toestandsmasjien-model 1.41% behaal. Die toestantsmasjien-model werk weer beter om kandidaat woorde te genereer as die verskuilde belyningsmodel wat ons geïmplementeer het. Die toestandsmasjien kry 36% van die tyd die regte weergawe van ’n woord met die eerste raaiskoot, terwyl die diskriminatiewe model dit slegs 6% van die tyd kan doen. Laastens het ons bevind dat die vooraf normalisering van Twitter boodskappe nie ’n beduidende effek op die akkuraatheid van ’n sentiment klassifiseerder het nie. Conditional random fields Noisy text Spelling correction Mikroblogging Unstructured data Data mining Text normalisation Tokeniser UCTD
7	Query Languages for Semi-structured Data Maksimovic, Gordana January 2003 (has links) Semi-structured data is defined as irregular data with structure that may change rapidly or unpredictably. An example of such data can be found inside the World-Wide Web. Since the data is irregular, the user may not know the complete structure of the database. Thus, querying such data becomes a difficult issue. In order to write meaningful queries on semi-structured data, there is a need for a query language that will support the features that are presented by this data. Standard query languages, such as SQL for relational databases and OQL for object databases, are too constraining for querying semi-structured data, because they require data to conform to a fixed schema before any data is stored into the database. This paper introduces Lorel, a query language developed particularly for querying semi-structured data. Furthermore, it investigates if the standardised query languages support any of the criteria presented for semi-structured data. The result is an evaluation of three query languages, SQL, OQL and Lorel against these criteria. Semi-structured data unstructured data data on the Web database management Lorel Computer Sciences Datavetenskap (datalogi)
8	A Functional Framework for Content Management Broadbent, Robert Emer 16 June 2009 (has links) (PDF) This thesis proposes a functional framework for content management. This framework provides concepts and vocabulary for analysis and description of content management systems. The framework is derived from an analysis of eight content management systems. It describes forty-five conceptual functions organized into five functional groups. The functionality derived from the analysis of the content management systems is described using the vocabulary provided by the functional framework. Coverage of the concepts in the existing systems is verified. The utility of the framework is validated through the creation of a prototype that implements sufficient functionality to support a set of specific use cases. information technology content management system content management document management framework model unstructured data Databases and Information Systems
9	Ensemble Learning Techniques for Structured and Unstructured Data King, Michael Allen 01 April 2015 (has links) This research provides an integrated approach of applying innovative ensemble learning techniques that has the potential to increase the overall accuracy of classification models. Actual structured and unstructured data sets from industry are utilized during the research process, analysis and subsequent model evaluations. The first research section addresses the consumer demand forecasting and daily capacity management requirements of a nationally recognized alpine ski resort in the state of Utah, in the United States of America. A basic econometric model is developed and three classic predictive models evaluated the effectiveness. These predictive models were subsequently used as input for four ensemble modeling techniques. Ensemble learning techniques are shown to be effective. The second research section discusses the opportunities and challenges faced by a leading firm providing sponsored search marketing services. The goal for sponsored search marketing campaigns is to create advertising campaigns that better attract and motivate a target market to purchase. This research develops a method for classifying profitable campaigns and maximizing overall campaign portfolio profits. Four traditional classifiers are utilized, along with four ensemble learning techniques, to build classifier models to identify profitable pay-per-click campaigns. A MetaCost ensemble configuration, having the ability to integrate unequal classification cost, produced the highest campaign portfolio profit. The third research section addresses the management challenges of online consumer reviews encountered by service industries and addresses how these textual reviews can be used for service improvements. A service improvement framework is introduced that integrates traditional text mining techniques and second order feature derivation with ensemble learning techniques. The concept of GLOW and SMOKE words is introduced and is shown to be an objective text analytic source of service defects or service accolades. / Ph. D. ensemble methods data mining Machine learning classification structured data unstructured data
10	Modelo navegacional dinâmico, para implementação da integração inter-estrutural de dados. / Dynamic navigational model for implementation of the data inter-structural integration. Gomes Neto, José 04 November 2016 (has links) Na última década, observaram-se substanciais mudanças nos tipos de dados processados, quando comparados à definição convencional de dados estruturados. Neste contexto, sistemas computacionais que em sua maioria acessam bases de dados convencionais, centralizadas, que armazenam dados estruturados, necessitam cada vez mais acessarem e processarem também dados não estruturados, distribuídos e em grandes quantidades. Fatores tais como versatilidade em abrigar dados não estruturados, coexistência, integração e difusão de dados complexos a velocidades superiores as velocidades até então observadas, restringem, em determinadas situações, o uso dos modelos de dados convencionais. Dessa forma, nesta Tese é proposto e formalizado um modelo de dados pós relacional, baseado nos conceitos de grafos complexos, também denominados, Redes Complexas. Por intermédio da utilização do modelo de grafos, define-se uma forma de se implementar uma integração inter-estrutural de dados, ou seja, os tradicionais dados estruturados, com os mais recentemente utilizados dados não estruturados, tais como os dados multimídia. Tal integração envolve todas as transações presentes em um banco de dados, ou seja, consulta, inserção, atualização e exclusão de dados. A denominação dada a tal forma de trabalho e implementação foi Modelo Navegacional Dinâmico - MND. Esse modelo representa diferentes estruturas de dados e sobretudo, permite que essas diferentes estruturas coexistam de forma integrada, agregando à informação resultante maior completeza e abrangência. Portanto, o MND associa os benefícios da junção da estrutura das Redes Complexas ao contexto de dados não estruturados, sobretudo no que tange à integração resultante de dados com estruturas distintas, conferindo assim às aplicações que necessitam desta integração, melhoria no aproveitamento dos recursos. / Over the last decade several changes in data processing have been observed when compared to the conventional structured data definition. In such context, computational systems accessing centralized databases need to process large, distributed, non-structured data as well. Factors like versatility in hosting data, coexistence, integration and diffusion of such complex data at high speeds can be, in some cases, troublesome when using conventional data models. In this work a post-relational, graph-based (also known as Complex Network) model, is presented. Such model enables the integration of both structured data and non-structured data, such as multimedia, allowing such structures to coexist. This integration involves all transactions found in a database, such as select, insert, delete and update data. The name given to this form of work and implementation was Navigational Model Dynamic - MND. This model represents different data structures and above all, allows these different structures to coexist in an integrated way, adding to the resulting information greater completeness and comprehensiveness. Hence, MND harnesses the benefits of Complex Network and non-structured data providing all relational data handling already available in other databases but also integration and better use of resources. Banco de dados Complex networks Dados não estruturados Data integration Data models Integração de dados Modelagem de dados Redes complexas Unstructured data

Search results