Spelling suggestions: "subject:"eeb data"" "subject:"eeb mata""
41 |
Recomendação semântica de conteúdo em ambientes de convergência digitalVieira, Priscilla Kelly Machado 18 March 2013 (has links)
Submitted by Clebson Anjos (clebson.leandro54@gmail.com) on 2016-02-11T18:57:46Z
No. of bitstreams: 1
arquivototal.pdf: 1637083 bytes, checksum: 23ef5059be1eb85b0ff5f8ccf73e60d0 (MD5) / Made available in DSpace on 2016-02-11T18:57:46Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 1637083 bytes, checksum: 23ef5059be1eb85b0ff5f8ccf73e60d0 (MD5)
Previous issue date: 2013-03-18 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The emerging scenario of interactive Digital TV (iDTV) is promoting the increase of
interactivity in the communication process and also in audiovisual production, thus
rising the number of channels and resources available to the user. This reality makes
the task of finding the desired content becoming a costly and possibly ineffective
action. The incorporation of recommender systems in the iDTV environment is
emerging as a possible solution to this problem. This work aims to propose a hybrid
approach to content recommendation in iDTV, based on data mining techniques,
integrated the concepts of the Semantic Web, allowing structuring and standardization
of data and consequent possibility of sharing information, providing semantics and
automated reasoning. For the proposed service is considered the Brazilian Digital TV
System and the middleware Ginga. A prototype has been developed and carried out
experiments with NetFlix database using the measuring accuracy for evaluation. There
was obtained an average accuracy of 30% using only mining technique. Including
semantic rules obtained average accuracy of 35%. / Com o advento da TV Digital interativa (TVDi), nota-se o aumento de interatividade no processo de comunicação além do incremento das produções audiovisuais, elevando o número de canais e recursos disponíveis para o usuário. Esta realidade faz da tarefa de encontrar o conteúdo desejado uma ação onerosa e possivelmente ineficaz. A incorporação de sistemas de recomendação no ambiente TVDi emerge como uma possível solução para este problema. Este trabalho tem como objetivo propor uma abordagem híbrida para recomendação de conteúdo em TVDi, baseada em técnicas de Mineração de Dados, integradas a conceitos da Web Semântica, permitindo a estruturação e padronização dos dados e consequente possibilidade do compartilhamento de informações, provendo semântica e raciocínio automático. Para o serviço proposto é considerado o Sistema Brasileiro de TV Digital e o middleware
Ginga. Foi desenvolvido um protótipo e realizado experimentos com a base de dados do NetFlix, utilizando a métrica de precisão para avaliação. Obteve-se uma precisão média de 30%, utilizando apenas a técnica de mineração. Acoplando-se com as regras semânticas obteve-se precisão média de 35%.
|
42 |
Explorando dados provindos da internet em dispositivos móveis: uma abordagem baseada em visualização de informação / Exploring web data on mobile devices: an approach based on information visualizationFelipe Simões Lage Gomes Duarte 12 February 2015 (has links)
Com o progresso da computação e popularização da Internet, a sociedade entrou na era da informação. Esta nova fase é marcada pela forma como produzimos e lidamos com a informação. Diariamente, são produzidos e armazenados milhares de Gigabytes de dados cujo valor é reduzido se a informação ali contida não puder ser transformada em conhecimento. Concomitante a este fato, o padrão da computação está caminhando para a miniaturização e acessibilidade com os dispositivos móveis. Estes novos equipamentos estão mudando o padrão de comportamento dos usuários que passam de leitores passivos a geradores de conteúdo. Neste contexto, este projeto de mestrado propõe a técnica de visualização de dados NMap e a ferramenta de visualização de dados web aplicável a dispositivo móvel SPCloud. A técnica NMap utiliza todo o espaço visual disponível para transmitir informações de grupos preservando a metáfora distância-similaridade. Teste comparativos com as principais técnicas do estado da arte mostraram que a técnica NMap tem melhores resultados em preservação de vizinhança com um tempo de processamento significativamente melhor. Este fato coloca a NMap como uma das principais técnicas de ocupação do espaço visual. A ferramenta SPCloud utiliza a NMap para visualizar notícias disponíveis na web. A ferramenta foi desenvolvida levando em consideração as características inerentes aos dispositivos moveis o que possibilita utiliza-la nestes equipamentos. Teste informais com usuários demonstraram que a ferramenta tem um bom desempenho para sumarizar grandes quantidades de notícias em um pequeno espaço visual. / With the development of computers and the increasing popularity of the Internet, our society has entered the information age. This era is marked by the way we produce and deal with information. Everyday, thousand of Gigabytes are stored, but their value is reduced if the data cannot be transformed into knowledge. Concomitantly, computing is moving towards miniaturization and affordability of mobile devices, which are changing users behavior who move from passive readers to content generators. In this context, in this master thesis we propose and develop a data visualization technique, called NMap, and a web data visualization tool for mobile devices, called SPCloud. NMap uses all available visual space to transmit information preserving the metaphor of distance-similarity between elements. Comparative evaluations were performed with the state of the art techniques and the result has shown that NMap produces better results of neighborhood preservation with a significant improvement in processing time. Our results place NMap as a major space-filling technique establishing a new state of the art. SPCloud, a tool which uses NMap to present news available on the web, was developed taking into account the inherent characteristics of mobile devices. Informal user tests revealed that SPCloud performs well to summarize large amounts of news in a small visual space.
|
43 |
Extração de informação não-supervisionada por segmentação de textoVilarinho, Eli Cortez Custódio 14 December 2012 (has links)
Submitted by Lúcia Brandão (lucia.elaine@live.com) on 2015-07-27T19:15:09Z
No. of bitstreams: 1
Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-28T19:02:25Z (GMT) No. of bitstreams: 1
Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-07-28T19:08:39Z (GMT) No. of bitstreams: 1
Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5) / Made available in DSpace on 2015-07-28T19:08:39Z (GMT). No. of bitstreams: 1
Tese - Eli Cortez Custódio Vilarinho.pdf: 11041462 bytes, checksum: 19414e6ce9e997483dc1adee4e5eb413 (MD5)
Previous issue date: 2012-12-14 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In this work we propose, implement and evaluate a new unsupervised approach for
the problem of Information Extraction by Text Segmentation (IETS). Our approach
relies on information available on pre-existing data to learn how to associate segments
in the input string with attributes of a given domain relying on a very effective
set of content-based features. The effectiveness of the content-based features is also
exploited to directly learn from test data structure-based features, with no previous
human-driven training, a feature unique to our approach. Based on our approach,
we have produced a number of results to address the IETS problem in a unsupervised
fashion. In particular, we have developed, implemented and evaluated distinct IETS
methods, namely ONDUX, JUDIE and iForm. ONDUX (On Demand Unsupervised
Information Extraction) is an unsupervised probabilistic approach for IETS that
relies on content-based features to bootstrap the learning of structure-based features.
Structure-based features are exploited to disambiguate the extraction of certain
attributes through a reinforcement step, which relies on sequencing and positioning
of attribute values directly learned on-demand from the input texts. JUDIE (Joint
Unsupervised Structure Discovery and Information Extraction) aims at automatically
extracting several semi-structured data records in the form of continuous text
and having no explicit delimiters between them. In comparison with other IETS
methods, including ONDUX, JUDIE faces a task considerably harder, that is, extracting
information while simultaneously uncovering the underlying structure of
the implicit records containing it. In spite of that, it achieves results comparable to
the state-of- the-art methods. iForm applies our approach to the task of Web form
filling. It aims at extracting segments from a data-rich text given as input and associating
these segments with fields from a target Web form. The extraction process
relies on content-based features learned from data that was previously submitted to
the Web form. All of these methods were evaluated considering different experimental
datasets, which we use to perform a large set of experiments in order to validate
our approach and methods. These experiments indicate that our proposed approach
yields high quality results when compared to state-of-the-art approaches and that
it is able to properly support IETS methods in a number of real applications. / Neste trabalho, propomos, implementar e avaliar uma nova abordagem não supervisionada para
o problema de Extração de Informações Segmentação Texto (IETS). Nossa abordagem
baseia-se em informações disponíveis sobre dados pré-existentes para aprender a associar segmentos
na seqüência de entrada com atributos de um determinado domínio contando com uma muito eficaz
conjunto de recursos baseados em conteúdo. A eficácia dos recursos com base em conteúdo também é
explorada para aprender diretamente com recursos baseados em estrutura de dados de teste, sem prévia
formação humana-driven, uma característica única para a nossa abordagem. Com base em nossa abordagem,
que produziram um número de resultados de abordar o problema IETS num sem supervisão
moda. Em particular, temos desenvolvido, implementado e avaliado IETS distintas
métodos, nomeadamente ONDUX, judie e iForm. ONDUX (On Demand Unsupervised
Extração de Informação) é uma abordagem probabilística sem supervisão para que IETS
depende de características baseadas em conteúdo para iniciar o aprendizado de características baseadas em estrutura.
Recursos baseados em estrutura são exploradas para disambiguate a extração de certos
atributos através de uma etapa de reforço, que se baseia na sequenciação e posicionamento
de valores de atributos diretamente aprendidas on-demand a partir dos textos de entrada. Judie (Joint
Estrutura sem supervisão Descoberta e Extração de Informações) visa automaticamente
extrair vários registros semi-estruturados de dados na forma de texto contínuo
e não tendo delimitadores explícitas entre eles. Em comparação com outros IETS
métodos, incluindo ONDUX, judie enfrenta uma tarefa consideravelmente mais forte, isto é, extrair
informações, ao mesmo tempo descobrindo a estrutura subjacente de
os registros implícitas que o contenham. Apesar disso, ele consegue resultados comparáveis aos
a métodos the-art estado-da. iForm aplica-se a nossa abordagem para a tarefa de forma Web
o preenchimento. Destina-se a extração de segmentos de um texto rico em dados fornecidos como entrada e associando
esses segmentos com campos de um formulário Web de destino. O processo de extracção
depende de recursos com base em conteúdo aprendidas com os dados que foram previamente submetidos à
o formulário Web. Todos esses métodos foram avaliados considerando diferente experimental
conjuntos de dados, que usamos para realizar um grande conjunto de experiências, a fim de validar
nossa abordagem e métodos. Estas experiências indicam que a nossa abordagem proposta
produz resultados de alta qualidade quando comparado com abordagens state-of-the-art e que
ele é capaz de suportar adequadamente os métodos IETS em uma série de aplicações reais.
|
44 |
Extrakce dat z webu / Web Data ExtractionNovella, Tomáš January 2016 (has links)
Creation of web wrappers (i.e programs that extract data from the web) is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper's language and safety. In addition, little attention has been paid to execution of a wrapper in restricted environment. In this thesis, we present a new wrapping language -- Serrano -- that has three goals in mind. (1) Ability to run in restricted environment, such as a browser extension, (2) extensibility, to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities, to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results. Powered by TCPDF (www.tcpdf.org)
|
45 |
Query-Time Data IntegrationEberius, Julian 16 December 2015 (has links) (PDF)
Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established.
This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need.
To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort.
Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections.
|
46 |
Vizualizace dat pro Ansible Automation Analytics / Chart Builder Ansible Automation AnalyticsBerky, Levente January 2021 (has links)
Tato práce se zaměřuje na vytvoření webové komponenty k vykreslení grafů ze strukturovanýho datovýho formátu (dále jen schéma) a vytvoření uživatelského rozhraní pro editaci schématu pro Ansible Automation Analytics. Práce zkoumá aktuální implementaci Ansible Automation Analytics a odpovídající API. Dále zkoumá vhodné knihovny pro vykreslování grafů a popisuje základy použitých technologií. Praktická část popisuje požadavky na komponentu a popisuje vývoj a implementaci pluginu. Dále práce popisuje proces testování a~plány budoucího vývoje pluginu.
|
47 |
Query-Time Data IntegrationEberius, Julian 10 December 2015 (has links)
Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established.
This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need.
To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort.
Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections.
|
48 |
The One Spider To Rule Them All : Web Scraping Simplified: Improving Analyst Productivity and Reducing Development Time with A Generalized Spider / Spindeln som härskar över dom alla : Webbskrapning förenklat: förbättra analytikerproduktiviteten och minska utvecklingstiden med generaliserade spindlarJohansson, Rikard January 2023 (has links)
This thesis addresses the process of developing a generalized spider for web scraping, which can be applied to multiple sources, thereby reducing the time and cost involved in creating and maintaining individual spiders for each website or URL. The project aims to improve analyst productivity, reduce development time for developers, and ensure high-quality and accurate data extraction. The research involves investigating web scraping techniques and developing a more efficient and scalable approach to report retrieval. The problem statement emphasizes the inefficiency of the current method with one customized spider per source and the need for a more streamlined approach to web scraping. The research question focuses on identifying patterns in the web scraping process and functions required for specific publication websites to create a more generalized web scraper. The objective is to reduce manual effort, improve scalability, and maintain high-quality data extraction. The problem is resolved using a quantitative approach that involves the analysis and implementation of spiders for each data source. This enables a comprehensive understanding of all potential scenarios and provides the necessary knowledge to develop a general spider. These spiders are then grouped based on their similarity, and through the application of simple logic, they are consolidated into a single general spider capable of handling all the sources. To construct the general spider, a utility library is created, equipped with the essential tools for extracting relevant information such as title, description, date, and PDF links. Subsequently, all the individual information is transferred to configuration files, enabling the execution of the general spider. The findings demonstrate the successful integration of multiple sources and spiders into a unified general spider. However, due to the limited time frame of the project, there is potential for further improvement. Enhancements could include better structuring of the configuration files, expansion of the utility library, or even the integration of AI capabilities to enhance the performance of the general spider. Nevertheless, the current solution is deemed suitable for automated article retrieval and ready to be used. / Denna rapport tar upp processen att utveckla en generaliserad spindel för webbskrapning, som kan appliceras på flera källor, och därigenom minska tiden och kostnaderna för att skapa och underhålla individuella spindlar för varje webbplats eller URL. Projektet syftar till att förbättra analytikers produktivitet, minska utvecklingstiden för utvecklare och säkerställa högkvalitativ och korrekt dataextraktion. Forskningen går ut på att undersöka webbskrapningstekniker och utveckla ett mer effektivt och skalbart tillvägagångssätt för att hämta rapporter. Problemformuleringen betonar ineffektiviteten hos den nuvarande metoden med en anpassad spindel per källa och behovet av ett mer effektiviserad tillvägagångssätt för webbskrapning. Forskningsfrågan fokuserar på att identifiera mönster i webbskrapningsprocessen och funktioner som krävs för specifika publikationswebbplatser för att skapa en mer generaliserad webbskrapa. Målet är att minska den manuella ansträngningen, förbättra skalbarheten och upprätthålla datautvinning av hög kvalitet. Problemet löses med hjälp av en kvantitativ metod som involverar analys och implementering av spindlar för varje datakälla. Detta möjliggör en omfattande förståelse av alla potentiella scenarier och ger den nödvändiga kunskapen för att utveckla en allmän spindel. Dessa spindlar grupperas sedan baserat på deras likhet, och genom tillämpning av enkel logik konsolideras de till en enda allmän spindel som kan hantera alla källor. För att konstruera den allmänna spindeln skapas ett verktygsbibliotek, utrustat med de väsentliga verktygen för att extrahera relevant information som titel, beskrivning, datum och PDF-länkar. Därefter överförs all individuell information till konfigurationsfiler, vilket möjliggör exekvering av den allmänna spindeln. Resultaten visar den framgångsrika integrationen av flera källor och spindlar till en enhetlig allmän spindel. Men på grund av projektets begränsade tidsram finns det potential för ytterligare förbättringar. Förbättringar kan inkludera bättre strukturering av konfigurationsfilerna, utökning av verktygsbiblioteket eller till och med integrering av AI-funktioner för att förbättra den allmänna spindelns prestanda. Ändå bedöms den nuvarande lösningen vara lämplig för automatisk artikelhämtning och redo att användas.
|
49 |
A Flexible Graph-Based Data Model Supporting Incremental Schema Design and EvolutionBraunschweig, Katrin, Thiele, Maik, Lehner, Wolfgang 26 January 2023 (has links)
Web data is characterized by a great structural diversity as well as frequent changes, which poses a great challenge for web applications based on that data. We want to address this problem by developing a schema-optional and flexible data model that supports the integration of heterogenous and volatile web data. Therefore, we want to rely on graph-based models that allow to incrementally extend the schema by various information and constraints. Inspired by the on-going web 2.0 trend, we want users to participate in the design and management of the schema. By incrementally adding structural information, users can enhance the schema to meet their very specific requirements.
|
50 |
Web Mining - die Fallstudie Swarovski : theoretische Grundlagen und praktische Anwendungen /Linder, Alexander, Wehrli, Hans Peter. January 2005 (has links) (PDF)
Zugleich: Diss. Wirtschaftswiss. Zürich, 2004. / Im Buchh.: Wiesbaden : Deutscher Universitäts-Verlag. Literaturverz.
|
Page generated in 0.048 seconds