• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 34
  • 11
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 56
  • 56
  • 19
  • 13
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • 8
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Distributed data management with a declarative rule-based language webdamlog / Gestion des données distribuées avec le langage de règles Webdamlog

Antoine, Emilien 05 December 2013 (has links)
Notre but est de permettre à un utilisateur du Web d’organiser la gestionde ses données distribuées en place, c’est à dire sans l’obliger à centraliserses données chez un unique hôte. Par conséquent, notre système diffèrede Facebook et des autres systèmes centralisés, et propose une alternativepermettant aux utilisateurs de lancer leurs propres pairs sur leurs machinesgérant localement leurs données personnelles et collaborant éventuellementavec des services Web externes.Dans ma thèse, je présente Webdamlog, un langage dérivé de datalogpour la gestion de données et de connaissances distribuées. Le langage étenddatalog de plusieurs manières, principalement avec une nouvelle propriété ladélégation, autorisant les pairs à échanger non seulement des faits (les données)mais aussi des règles (la connaissance). J’ai ensuite mené une étude utilisateurpour démontrer l’utilisation du langage. Enfin je décris le moteur d’évaluationde Webdamlog qui étend un moteur d’évaluation de datalog distribué nomméBud, en ajoutant le support de la délégation et d’autres innovations tellesque la possibilité d’avoir des variables pour les noms de pairs et des relations.J’aborde de nouvelles techniques d’optimisation, notamment basées sur laprovenance des faits et des règles. Je présente des expérimentations quidémontrent que le coût du support des nouvelles propriétés de Webdamlogreste raisonnable même pour de gros volumes de données. Finalement, jeprésente l’implémentation d’un pair Webdamlog qui fournit l’environnementpour le moteur. En particulier, certains adaptateurs permettant aux pairsWebdamlog d’échanger des données avec d’autres pairs sur Internet. Pourillustrer l’utilisation de ces pairs, j’ai implémenté une application de partagede photos dans un réseau social en Webdamlog. / Our goal is to enable aWeb user to easily specify distributed data managementtasks in place, i.e. without centralizing the data to a single provider. Oursystem is therefore not a replacement for Facebook, or any centralized system,but an alternative that allows users to launch their own peers on their machinesprocessing their own local personal data, and possibly collaborating with Webservices.We introduce Webdamlog, a datalog-style language for managing distributeddata and knowledge. The language extends datalog in a numberof ways, notably with a novel feature, namely delegation, allowing peersto exchange not only facts but also rules. We present a user study thatdemonstrates the usability of the language. We describe a Webdamlog enginethat extends a distributed datalog engine, namely Bud, with the supportof delegation and of a number of other novelties of Webdamlog such as thepossibility to have variables denoting peers or relations. We mention noveloptimization techniques, notably one based on the provenance of facts andrules. We exhibit experiments that demonstrate that the rich features ofWebdamlog can be supported at reasonable cost and that the engine scales tolarge volumes of data. Finally, we discuss the implementation of a Webdamlogpeer system that provides an environment for the engine. In particular, a peersupports wrappers to exchange Webdamlog data with non-Webdamlog peers.We illustrate these peers by presenting a picture management applicationthat we used for demonstration purposes.
22

Distributed data management with a declarative rule-based language webdamlog

Antoine, Emilien 05 December 2013 (has links) (PDF)
Our goal is to enable aWeb user to easily specify distributed data managementtasks in place, i.e. without centralizing the data to a single provider. Oursystem is therefore not a replacement for Facebook, or any centralized system,but an alternative that allows users to launch their own peers on their machinesprocessing their own local personal data, and possibly collaborating with Webservices.We introduce Webdamlog, a datalog-style language for managing distributeddata and knowledge. The language extends datalog in a numberof ways, notably with a novel feature, namely delegation, allowing peersto exchange not only facts but also rules. We present a user study thatdemonstrates the usability of the language. We describe a Webdamlog enginethat extends a distributed datalog engine, namely Bud, with the supportof delegation and of a number of other novelties of Webdamlog such as thepossibility to have variables denoting peers or relations. We mention noveloptimization techniques, notably one based on the provenance of facts andrules. We exhibit experiments that demonstrate that the rich features ofWebdamlog can be supported at reasonable cost and that the engine scales tolarge volumes of data. Finally, we discuss the implementation of a Webdamlogpeer system that provides an environment for the engine. In particular, a peersupports wrappers to exchange Webdamlog data with non-Webdamlog peers.We illustrate these peers by presenting a picture management applicationthat we used for demonstration purposes.
23

Scalable view-based techniques for web data : algorithms and systems

Katsifodimos, Asterios 03 July 2013 (has links) (PDF)
XML was recommended by W3C in 1998 as a markup language to be used by device- and system-independent methods of representing information. XML is nowadays used as a data model for storing and querying large volumes of data in database systems. In spite of significant research and systems development, many performance problems are raised by processing very large amounts of XML data. Materialized views have long been used in databases to speed up queries. Materialized views can be seen as precomputed query results that can be re-used to evaluate (part of) another query, and have been a topic of intensive research, in particular in the context of relational data warehousing. This thesis investigates the applicability of materialized views techniques to optimize the performance of Web data management tools, in particular in distributed settings, considering XML data and queries. We make three contributions.We first consider the problem of choosing the best views to materialize within a given space budget in order to improve the performance of a query workload. Our work is the first to address the view selection problem for a rich subset of XQuery. The challenges we face stem from the expressive power and features of both the query and view languages and from the size of the search space of candidate views to materialize. While the general problem has prohibitive complexity, we propose and study a heuristic algorithm and demonstrate its superior performance compared to the state of the art.Second, we consider the management of large XML corpora in peer-to-peer networks, based on distributed hash tables (or DHTs, in short). We consider a platform leveraging distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. This thesis has contributed important scalability oriented optimizations, as well as a comprehensive set of experiments deployed in a country-wide WAN. These experiments outgrow by orders of magnitude similar competitor systems in terms of data volumes and data dissemination throughput. Thus, they are the most advanced in understanding the performance behavior of DHT-based XML content management in real settings.Finally, we present a novel approach for scalable content-based publish/subscribe (pub/sub, in short) in the presence of constraints on the available computational resources of data publishers. We achieve scalability by off-loading subscriptions from the publisher, and leveraging view-based query rewriting to feed these subscriptions from the data accumulated in others. Our main contribution is a novel algorithm for organizing subscriptions in a multi-level dissemination network in order to serve large numbers of subscriptions, respect capacity constraints, and minimize latency. The efficiency and effectiveness of our algorithm are confirmed through extensive experiments and a large deployment in a WAN.
24

Easing information extraction on the web through automated rules discovery

Ortona, Stefano January 2016 (has links)
The advent of the era of big data on the Web has made automatic web information extraction an essential tool in data acquisition processes. Unfortunately, automated solutions are in most cases more error prone than those created by humans, resulting in dirty and erroneous data. Automatic repair and cleaning of the extracted data is thus a necessary complement to information extraction on the Web. This thesis investigates the problem of inducing cleaning rules on web extracted data in order to (i) repair and align the data w.r.t. an original target schema, (ii) produce repairs that are as generic as possible such that different instances can benefit from them. The problem is addressed from three different angles: replace cross-site redundancy with an ensemble of entity recognisers; produce general repairs that can be encoded in the extraction process; and exploit entity-wide relations to infer common knowledge on extracted data. First, we present ROSeAnn, an unsupervised approach to integrate semantic annotators and produce a unied and consistent annotation layer on top of them. Both the diversity in vocabulary and widely varying accuracy justify the need for middleware that reconciles different annotator opinions. Considering annotators as "black-boxes" that do not require per-domain supervision allows us to recognise semantically related content in web extracted data in a scalable way. Second, we show in WADaR how annotators can be used to discover rules to repair web extracted data. We study the problem of computing joint repairs for web data extraction programs and their extracted data, providing an approximate solution that requires no per-source supervision and proves effective across a wide variety of domains and sources. The proposed solution is effective not only in repairing the extracted data, but also in encoding such repairs in the original extraction process. Third, we investigate how relationships among entities can be exploited to discover inconsistencies and additional information. We present RuDiK, a disk-based scalable solution to discover first-order logic rules over RDF knowledge bases built from web sources. We present an approach that does not limit its search space to rules that rely on "positive" relationships between entities, as in the case with traditional mining of constraints. On the contrary, it extends the search space to also discover negative rules, i.e., patterns that lead to contradictions in the data.
25

Um modelo de pontuação na busca de competências acadêmicas de pesquisadores / A score-based model for assessing academic researchers competences

Rech, Rodrigo Octavio January 2007 (has links)
Esta pesquisa descreve um modelo para descobrir e pontuar competências acadêmicas de pesquisadores, baseado na combinação de indicadores quantitativos que permitem mensurar a produção acadêmica dos cientistas. Um diferencial do modelo é a inclusão de indicadores quantitativos relacionados com a importância da produção bibliográfica dos pesquisadores. Estes indicadores possibilitam uma avaliação da produção considerando aspectos como repercussão na comunidade acadêmica e nível dos veículos de publicação. A pesquisa também contribui com a especificação de uma arquitetura flexível e extensível fundamentada em técnicas de extração de dados na Web e casamento aproximado de dados (através de funções de similaridade). A arquitetura foi implementada em um sistema Web cuja principal característica é a integração de diversas tecnologias open source. O sistema desenvolvido permite que qualquer pesquisador avalie quantitativamente sua produção científica, automatizando diversos aspectos relacionados à tarefa de avaliação, como a obtenção dos indicadores e a integração das diferentes bases de informações. / The present research describes a model that aims finding out and scoring academic researchers skills or competences based on the combination of quantitative indicators that make it possible to measure the production of academic scientists. A special feature concerning our model is the inclusion of quantitative indicators related to the importance of the researchers’ bibliographic production. These indicators allow the evaluation of the production considering both the outcome it has had in the academic community, and the quality level of the place it was published. The study also presents a flexible and extensible architecture specification based on techniques of web data extraction, and on approximate data matching (which is carried out through similarity functions). The architecture has been implemented in a web system whose main feature relies on the integration of several open-source technologies. The developed system allows any researcher to evaluate his/her own scientific production in quantitative terms, automating as well the so many aspects regarding the evaluation task, by making it easier to obtain the indicators and to integrate the different information databases, for instance.
26

Um modelo de pontuação na busca de competências acadêmicas de pesquisadores / A score-based model for assessing academic researchers competences

Rech, Rodrigo Octavio January 2007 (has links)
Esta pesquisa descreve um modelo para descobrir e pontuar competências acadêmicas de pesquisadores, baseado na combinação de indicadores quantitativos que permitem mensurar a produção acadêmica dos cientistas. Um diferencial do modelo é a inclusão de indicadores quantitativos relacionados com a importância da produção bibliográfica dos pesquisadores. Estes indicadores possibilitam uma avaliação da produção considerando aspectos como repercussão na comunidade acadêmica e nível dos veículos de publicação. A pesquisa também contribui com a especificação de uma arquitetura flexível e extensível fundamentada em técnicas de extração de dados na Web e casamento aproximado de dados (através de funções de similaridade). A arquitetura foi implementada em um sistema Web cuja principal característica é a integração de diversas tecnologias open source. O sistema desenvolvido permite que qualquer pesquisador avalie quantitativamente sua produção científica, automatizando diversos aspectos relacionados à tarefa de avaliação, como a obtenção dos indicadores e a integração das diferentes bases de informações. / The present research describes a model that aims finding out and scoring academic researchers skills or competences based on the combination of quantitative indicators that make it possible to measure the production of academic scientists. A special feature concerning our model is the inclusion of quantitative indicators related to the importance of the researchers’ bibliographic production. These indicators allow the evaluation of the production considering both the outcome it has had in the academic community, and the quality level of the place it was published. The study also presents a flexible and extensible architecture specification based on techniques of web data extraction, and on approximate data matching (which is carried out through similarity functions). The architecture has been implemented in a web system whose main feature relies on the integration of several open-source technologies. The developed system allows any researcher to evaluate his/her own scientific production in quantitative terms, automating as well the so many aspects regarding the evaluation task, by making it easier to obtain the indicators and to integrate the different information databases, for instance.
27

Um modelo de pontuação na busca de competências acadêmicas de pesquisadores / A score-based model for assessing academic researchers competences

Rech, Rodrigo Octavio January 2007 (has links)
Esta pesquisa descreve um modelo para descobrir e pontuar competências acadêmicas de pesquisadores, baseado na combinação de indicadores quantitativos que permitem mensurar a produção acadêmica dos cientistas. Um diferencial do modelo é a inclusão de indicadores quantitativos relacionados com a importância da produção bibliográfica dos pesquisadores. Estes indicadores possibilitam uma avaliação da produção considerando aspectos como repercussão na comunidade acadêmica e nível dos veículos de publicação. A pesquisa também contribui com a especificação de uma arquitetura flexível e extensível fundamentada em técnicas de extração de dados na Web e casamento aproximado de dados (através de funções de similaridade). A arquitetura foi implementada em um sistema Web cuja principal característica é a integração de diversas tecnologias open source. O sistema desenvolvido permite que qualquer pesquisador avalie quantitativamente sua produção científica, automatizando diversos aspectos relacionados à tarefa de avaliação, como a obtenção dos indicadores e a integração das diferentes bases de informações. / The present research describes a model that aims finding out and scoring academic researchers skills or competences based on the combination of quantitative indicators that make it possible to measure the production of academic scientists. A special feature concerning our model is the inclusion of quantitative indicators related to the importance of the researchers’ bibliographic production. These indicators allow the evaluation of the production considering both the outcome it has had in the academic community, and the quality level of the place it was published. The study also presents a flexible and extensible architecture specification based on techniques of web data extraction, and on approximate data matching (which is carried out through similarity functions). The architecture has been implemented in a web system whose main feature relies on the integration of several open-source technologies. The developed system allows any researcher to evaluate his/her own scientific production in quantitative terms, automating as well the so many aspects regarding the evaluation task, by making it easier to obtain the indicators and to integrate the different information databases, for instance.
28

Plataforma para la Extracción y Almacenamiento del Conocimiento Extraído de los Web Data

Rebolledo Lorca, Víctor January 2008 (has links)
No description available.
29

Distributed data management with access control : social Networks and Data of the Web / Gestion de Données Distribuées avec Contrôle d’Accès : réseaux sociaux et données du Web

Galland, Alban 28 September 2011 (has links)
La masse d’information disponible sur leWeb s’accroit rapidement, sous l’afflux de données en provenance des utilisateurs et des compagnies. Ces données qu’ils souhaitent partager de façon controllée sur le réseau et quisont réparties sur de nombreuses machines et systèmes différents, ne sont rapidement plus gérables directement par des moyens humains. Nous introduisons WebdamExchange, un nouveau modèle de bases de connaissancesdistribuées, qui comprend des assertions au sujet des données, du contrôle d’accés et de la distribution. Ces assertions peuvent être échangées avec d’autres pairs, répliquées, interrogées et mises à jour, en gardant la trace de leur origine. La base de connaissance permet aussi de guider de façon automatique sa propre gestion. WebdamExchange est basé surWebdamLog, un nouveau langage de règles pour la gestion de données distribuées, qui associe formellement les règles déductives de Datalog avec négation et les règles actives de Datalog::. WebdamLog met l’accent sur la dynamicité et les interactions, caractéristiques du Web 2.0. Ce modèle procure à la fois un langage expressif pour la spécification de systèmes distribués complexes et un cadre formel pour l’étude de propriétés fondamentales de la distribution. Nous présentons aussi une implémentation de notre base de connaissance. Nous pensons que ces contributions formentune fondation solide pour surmonter les problèmes de gestion de données du Web, en particulier dans le cadre du contrôle d’accès. / The amount of information on the Web is spreading very rapidly. Users as well as companies bring data to the network and are willing to share with others. They quickly reach a situation where their information is hosted on many machines they own and on a large number of autonomous systems where they have accounts. Management of all this information is rapidly becoming beyond human expertise. We introduce WebdamExchange, a novel distributed knowledge-base model that includes logical statements for specifying information, access control, secrets, distribution, and knowledge about other peers. These statements can be communicated, replicated, queried, and updated, while keeping track of time and provenance. The resulting knowledge guides distributed data management. WebdamExchange model is based on WebdamLog, a new rule-based language for distributed data management that combines in a formal setting deductiverules as in Datalog with negation, (to specify intensional data) and active rules as in Datalog:: (for updates and communications). The model provides a novel setting with a strong emphasis on dynamicity and interactions(in a Web 2.0 style). Because the model is powerful, it provides a clean basis for the specification of complex distributed applications. Because it is simple, it provides a formal framework for studying many facets of the problem such as distribution, concurrency, and expressivity in the context of distributed autonomous peers. We also discuss an implementation of a proof-of-concept system that handles all the components of the knowledge base and experiments with a lighter system designed for smartphones. We believe that these contributions are a good foundation to overcome theproblems of Web data management, in particular with respect to access control.
30

Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis Approach

Blázquez Soriano, María Desamparados 15 January 2020 (has links)
[ES] En la Era Digital, el creciente uso de Internet y de dispositivos digitales está transformando completamente la forma de interactuar en el contexto económico y social. Miles de personas, empresas y organismos públicos utilizan Internet en sus actividades diarias, generando de este modo una enorme cantidad de datos actualizados ("Big Data") accesibles principalmente a través de la World Wide Web (WWW), que se ha convertido en el mayor repositorio de información del mundo. Estas huellas digitales se pueden rastrear y, si se procesan y analizan de manera apropiada, podrían ayudar a monitorizar en tiempo real una infinidad de variables económicas. En este contexto, el objetivo principal de esta tesis doctoral es generar indicadores económicos, basados en datos web, que sean capaces de proveer regularmente de predicciones a corto plazo ("nowcasting") sobre varias actividades empresariales que son fundamentales para el crecimiento y desarrollo de las economías. Concretamente, tres indicadores económicos basados en la web han sido diseñados y evaluados: en primer lugar, un indicador de orientación exportadora, basado en un modelo que predice si una empresa es exportadora; en segundo lugar, un indicador de adopción de comercio electrónico, basado en un modelo que predice si una empresa ofrece la posibilidad de venta online; y en tercer lugar, un indicador de supervivencia empresarial, basado en dos modelos que indican la probabilidad de supervivencia de una empresa y su tasa de riesgo. Para crear estos indicadores, se han descargado una diversidad de datos de sitios web corporativos de forma manual y automática, que posteriormente se han procesado y analizado con técnicas de análisis Big Data. Los resultados muestran que los datos web seleccionados están altamente relacionados con las variables económicas objeto de estudio, y que los indicadores basados en la web que se han diseñado en esta tesis capturan en un alto grado los valores reales de dichas variables económicas, siendo por tanto válidos para su uso por parte del mundo académico, de las empresas y de los decisores políticos. Además, la naturaleza online y digital de los indicadores basados en la web hace posible proveer regularmente y de forma barata de predicciones a corto plazo. Así, estos indicadores son ventajosos con respecto a los indicadores tradicionales. Esta tesis doctoral ha contribuido a generar conocimiento sobre la viabilidad de producir indicadores económicos con datos online procedentes de sitios web corporativos. Los indicadores que se han diseñado pretenden contribuir a la modernización en la producción de estadísticas oficiales, así como ayudar a los decisores políticos y los gerentes de empresas a tomar decisiones informadas más rápidamente. / [CAT] A l'Era Digital, el creixent ús d'Internet i dels dispositius digitals està transformant completament la forma d'interactuar al context econòmic i social. Milers de persones, empreses i organismes públics utilitzen Internet a les seues activitats diàries, generant d'aquesta forma una enorme quantitat de dades actualitzades ("Big Data") accessibles principalment mitjançant la World Wide Web (WWW), que s'ha convertit en el major repositori d'informació del món. Aquestes empremtes digitals poden rastrejar-se i, si se processen i analitzen de forma apropiada, podrien ajudar a monitoritzar en temps real una infinitat de variables econòmiques. En aquest context, l'objectiu principal d'aquesta tesi doctoral és generar indicadors econòmics, basats en dades web, que siguen capaços de proveïr regularment de prediccions a curt termini ("nowcasting") sobre diverses activitats empresarials que són fonamentals per al creixement i desenvolupament de les economies. Concretament, tres indicadors econòmics basats en la web han sigut dissenyats i avaluats: en primer lloc, un indicador d'orientació exportadora, basat en un model que prediu si una empresa és exportadora; en segon lloc, un indicador d'adopció de comerç electrònic, basat en un model que prediu si una empresa ofereix la possibilitat de venda online; i en tercer lloc, un indicador de supervivència empresarial, basat en dos models que indiquen la probabilitat de supervivència d'una empresa i la seua tasa de risc. Per a crear aquestos indicadors, s'han descarregat una diversitat de dades de llocs web corporatius de forma manual i automàtica, que posteriorment s'han analitzat i processat amb tècniques d'anàlisi Big Data. Els resultats mostren que les dades web seleccionades estan altament relacionades amb les variables econòmiques objecte d'estudi, i que els indicadors basats en la web que s'han dissenyat en aquesta tesi capturen en un alt grau els valors reals d'aquestes variables econòmiques, sent per tant vàlids per al seu ús per part del món acadèmic, de les empreses i dels decisors polítics. A més, la naturalesa online i digital dels indicadors basats en la web fa possible proveïr regularment i de forma barata de prediccions a curt termini. D'aquesta forma, són avantatjosos en comparació als indicadors tradicionals. Aquesta tesi doctoral ha contribuït a generar coneixement sobre la viabilitat de produïr indicadors econòmics amb dades online procedents de llocs web corporatius. Els indicadors que s'han dissenyat pretenen contribuïr a la modernització en la producció d'estadístiques oficials, així com ajudar als decisors polítics i als gerents d'empreses a prendre decisions informades més ràpidament. / [EN] In the Digital Era, the increasing use of the Internet and digital devices is completely transforming the way of interacting in the economic and social framework. Myriad individuals, companies and public organizations use the Internet for their daily activities, generating a stream of fresh data ("Big Data") principally accessible through the World Wide Web (WWW), which has become the largest repository of information in the world. These digital footprints can be tracked and, if properly processed and analyzed, could help to monitor in real time a wide range of economic variables. In this context, the main goal of this PhD thesis is to generate economic indicators, based on web data, which are able to provide regular, short-term predictions ("nowcasting") about some business activities that are basic for the growth and development of an economy. Concretely, three web-based economic indicators have been designed and evaluated: first, an indicator of firms' export orientation, which is based on a model that predicts if a firm is an exporter; second, an indicator of firms' engagement in e-commerce, which is based on a model that predicts if a firm offers e-commerce facilities in its website; and third, an indicator of firms' survival, which is based on two models that indicate the probability of survival of a firm and its hazard rate. To build these indicators, a variety of data from corporate websites have been retrieved manually and automatically, and subsequently have been processed and analyzed with Big Data analysis techniques. Results show that the selected web data are highly related to the economic variables under study, and the web-based indicators designed in this thesis are capturing to a great extent their real values, thus being valid for their use by the academia, firms and policy-makers. Additionally, the digital and online nature of web-based indicators makes it possible to provide timely, inexpensive predictions about the economy. This way, they are advantageous with respect to traditional indicators. This PhD thesis has contributed to generating knowledge about the viability of producing economic indicators with data coming from corporate websites. The indicators that have been designed are expected to contribute to the modernization of official statistics and to help in making earlier, more informed decisions to policy-makers and business managers. / Blázquez Soriano, MD. (2019). Design and Evaluation of Web-Based Economic Indicators: A Big Data Analysis Approach [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/116836 / TESIS

Page generated in 0.0601 seconds