Global ETD Search

31	Comquest: an Adaptive Crawler for User Comments on the Web Chen, Zhijia, 0009-0005-7866-4549 05 1900 (has links) This thesis introduces Comquest, an adaptive framework designed for the large-scale collection and integration of user comments from the Web. User comments are featured on many websites and there is growing interest in mining and studying user comments in applications, such as opinion mining and information diffusion. However, crawling user comments generally requires hard-coded solutions that are tethered to specific websites, which is hard to scale and maintain. To achieve a generalizable and scalable comment crawling solution, Comquest employs two website-agnostic approaches for comment crawling: Web API querying and HTML data extraction. When the target Web page is integrated with a third-party commenting system whose Web API that is in Comquest’s knowledge base, it retrieves comments by sending HTTP requests to the API’s URL with parameters extracted from the target webpage. The approach has several challenges. Firstly, extracting accurate parameter values to construct HTTP requests is difficult since they are buried deep within the HTML string of web documents (if they exist). Secondly, the solution needs to generalize both vertically (within a website) and horizontally (across unseen websites). To tackle these challenges, the parameter extraction problem is treated as a variant of the multiclass Named Entity Recognition (NER) problem, where the entities represent the values of the parameters. Comquest leverages a sequential labeling deep learning model to identify parameter values within HTML source codes. When the commenting system is native to the website or unknown, Comquest detects and extracts user comments from fully rendered Web pages. However, comments are often hidden until triggered by specific user interaction, such as clicking on a designated page element among many other clickable elements. Furthermore, comments are typically presented as structured record-like Web data with high structure variations, making them difficult to detect and extract from the target Web page along with other record-like Web data. Comquest utilizes deep learning models and Web record extraction algorithms to automate the process of triggering, extracting, and classifying comments. Comquest has been implemented as a comprehensive system that consists of an administration web portal, a task controller, and a crawler backend. It provides a useful tool for collecting comments that represent a wider range of opinions, stances, and sentiments from websites on a global scale. / Computer and Information Science Computer science Crawler Deep learning User comments Weak labeling Web API Web data
32	AIM - A Social Media Monitoring System for Quality Engineering Bank, Mathias 14 June 2013 (has links) In the last few years the World Wide Web has dramatically changed the way people are communicating with each other. The growing availability of Social Media Systems like Internet fora, weblogs and social networks ensure that the Internet is today, what it was originally designed for: A technical platform in which all users are able to interact with each other. Nowadays, there are billions of user comments available discussing all aspects of life and the data source is still growing. This thesis investigates, whether it is possible to use this growing amount of freely provided user comments to extract quality related information. The concept is based on the observation that customers are not only posting marketing relevant information. They also publish product oriented content including positive and negative experiences. It is assumed that this information represents a valuable data source for quality analyses: The original voices of the customers promise to specify a more exact and more concrete definition of \"quality\" than the one that is available to manufacturers or market researchers today. However, the huge amount of unstructured user comments makes their evaluation very complex. It is impossible for an analysis protagonist to manually investigate the provided customer feedback. Therefore, Social Media specific algorithms have to be developed to collect, pre-process and finally analyze the data. This has been done by the Social Media monitoring system AIM (Automotive Internet Mining) that is the subject of this thesis. It investigates how manufacturers, products, product features and related opinions are discussed in order to estimate the overall product quality from the customers\\\'' point of view. AIM is able to track different types of data sources using a flexible multi-agent based crawler architecture. In contrast to classical web crawlers, the multi-agent based crawler supports individual crawling policies to minimize the download of irrelevant web pages. In addition, an unsupervised wrapper induction algorithm is introduced to automatically generate content extraction parameters which are specific for the crawled Social Media systems. The extracted user comments are analyzed by different content analysis algorithms to gain a deeper insight into the discussed topics and opinions. Hereby, three different topic types are supported depending on the analysis needs. * The creation of highly reliable analysis results is realized by using a special context-aware taxonomy-based classification system. * Fast ad-hoc analyses are applied on top of classical fulltext search capabilities. * Finally, AIM supports the detection of blind-spots by using a new fuzzified hierarchical clustering algorithm. It generates topical clusters while supporting multiple topics within each user comment. All three topic types are treated in a unified way to enable an analysis protagonist to apply all methods simultaneously and in exchange. The systematically processed user comments are visualized within an easy and flexible interactive analysis frontend. Special abstraction techniques support the investigation of thousands of user comments with minimal time efforts. Hereby, specifically created indices show the relevancy and customer satisfaction of a given topic.:1 Introduction 1.1 Chapter Overview 2 Problem Definition and Data Environment 2.1 Commonly Applied Quality Sensors 2.2 The Growing Importance of Social Media 2.3 Social Media based Quality Experience 2.4 Change to the Holistic Concept of Quality 2.5 Definition of User Generated Content and Social Media 2.6 Social Media Software Architectures 3 Data Collection 3.1 Related Work 3.2 Requirement Analysis 3.3 A Blackboard Crawler Architecture 3.4 Semi-supervised Wrapper Generation 3.5 Structure Modifification Detection 3.6 Conclusion 4 Hierarchical Fuzzy Clustering 4.1 Related Work 4.2 Generalization of Agglomerative Crisp Clustering Algorithms 4.3 Topic Groups Generation 4.4 Evaluation 4.5 Conclusion 5 A Social Media Monitoring System for Quality Analyses 5.1 Related Work 5.2 Pre-Processing Workflow 5.3 Quality Indices 5.4 AIM Architecture 5.5 Evaluation 5.6 Conclusion 6 Conclusion and Perspectives 6.1 Contributions and Conclusions 6.2 Perspectives Bibliography / In den letzten Jahren hat sich das World Wide Web dramatisch verändert. War es vor einigen Jahren noch primär eine Informationsquelle, in der ein kleiner Anteil der Nutzer Inhalte veröffentlichen konnte, so hat sich daraus eine Kommunikationsplattform entwickelt, in der jeder Nutzer aktiv teilnehmen kann. Die dadurch enstehende Datenmenge behandelt jeden Aspekt des täglichen Lebens. So auch Qualitätsthemen. Die Analyse der Daten verspricht Qualitätssicherungsmaßnahmen deutlich zu verbessern. Es können dadurch Themen behandelt werden, die mit klassischen Sensoren schwer zu messen sind. Die systematische und reproduzierbare Analyse von benutzergenerierten Daten erfordert jedoch die Anpassung bestehender Tools sowie die Entwicklung neuer Social-Media spezifischer Algorithmen. Diese Arbeit schafft hierfür ein völlig neues Social Media Monitoring-System, mit dessen Hilfe ein Analyst tausende Benutzerbeiträge mit minimaler Zeitanforderung analysieren kann. Die Anwendung des Systems hat einige Vorteile aufgezeigt, die es ermöglichen, die kundengetriebene Definition von \"Qualität\" zu erkennen.:1 Introduction 1.1 Chapter Overview 2 Problem Definition and Data Environment 2.1 Commonly Applied Quality Sensors 2.2 The Growing Importance of Social Media 2.3 Social Media based Quality Experience 2.4 Change to the Holistic Concept of Quality 2.5 Definition of User Generated Content and Social Media 2.6 Social Media Software Architectures 3 Data Collection 3.1 Related Work 3.2 Requirement Analysis 3.3 A Blackboard Crawler Architecture 3.4 Semi-supervised Wrapper Generation 3.5 Structure Modifification Detection 3.6 Conclusion 4 Hierarchical Fuzzy Clustering 4.1 Related Work 4.2 Generalization of Agglomerative Crisp Clustering Algorithms 4.3 Topic Groups Generation 4.4 Evaluation 4.5 Conclusion 5 A Social Media Monitoring System for Quality Analyses 5.1 Related Work 5.2 Pre-Processing Workflow 5.3 Quality Indices 5.4 AIM Architecture 5.5 Evaluation 5.6 Conclusion 6 Conclusion and Perspectives 6.1 Contributions and Conclusions 6.2 Perspectives Bibliography info:eu-repo/classification/ddc/500 ddc:500 Social Media;
33	Deal Organizer : personalized alerts for shoppers Reyes, Ulises Uriel 27 November 2012 (has links) Deal Organizer is a web-based application that scans multiple websites for online bargains. It allows users to specify their preferences in order for them to receive notifications based on personalized content. The application obtains deals from other websites through data extraction techniques that include reading RSS feeds and web scraping. In order to better facilitate content personalization, the application tracks the user's activity by recording clicks on both links to deals and rating buttons, e.g., the Facebook like button. Due to the dynamic nature of the source websites offering these deals and the ever-evolving web technologies available to software developers, the Deal Organizer application was designed to implement an interface-based design using the Spring Framework. This yielded to an extensible, pluggable and flexible system that accommodates maintenance and future work gracefully. The application's performance was evaluated by executing resource-intensive tests on a constrained environment. Results show the application responded positively. / text Online deals Spring framework Spring MVC Web scraping Web crawler RSS feeds Content personalization Interface-based design
34	Veiksmų ontologijos formavimas panaudojant internetinį tekstyną / Building action ontology using internet corpus Matulionis, Paulius 20 June 2012 (has links) Šio baigiamojo magistro darbo tikslas yra veiksmų ontologijos formavimo panaudojant automatiniu būdu sukauptą internetinį tekstyną problematikos tyrimas. Tyrimo metu buvo analizuojami tekstynų žymėjimo standartai, sukauptas internetinis tekstynas, pasitelkiant priemones sukurtas darbo metu. Tyrimo metu buvo kuriamos ir tobulinamos jau esamos priemonės atlikti įvairius eksperimentus su tekstynais, duomenų sisteminimu, vaizdavimu ir ontologijos formavimu. Buvo sukurta procesų valdymo sistema, kuri buvo realizuota nuo front-end iki back-end sistemos kūrimo lygiu. Darbe yra pateikiamos detalios sistemos ir jos komponentų schemos atspindinčios visą sistemos veikimą. Tyrimo metu buvo atlikti eksperimentai susiję su veiksmų ontologijos formavimu. Darbe yra aprašyti ontologijos kūrimo žingsniai, pateikiamos problemos ir jų sprendimai bei pasiūlymai, ką galima būtų padaryti, kad eksperimentų rezultatai būtų dar tikslesni. Taip pat yra įvardinamos taisyklės, kurios buvo naudojamos reikalingų duomenų gavimui iš sukaupto tekstyno, taip pat taisyklės buvo apibendrintos ir pateiktos sukurtoms priemonėms suprantamu pavidalu. / The goal of the master thesis is to investigate the problem of the automated action ontology design using a corpus harvested from internet. A software package including tools for internet corpus harvesting, network service access, markup, ontology design and representation was developed and tested in the carried out experiment. A process management system was realized covering both front-end and the back-end system design levels. Detailed system and component models are presented, reflecting all the operations of the system. The thesis presents the results of experiments on building ontologies for several selected action verbs. Ontology building process is described, problems in recognizing separate elements of the action environment are analysed, suggestions on additional rules leading to more accurate results, are presented. Rules have been summarized and integrated into the designed software package. Informatics Ontologijos Ontologijų formavimas Saityno robotas Internetinis tekstynas Veiksmų ontologija Ontologies Ontologies formation Web crawler Internet corpus Actions ontology
35	基於臉書互動行為的關係圖領域專屬語言與工具 / A Domain Specific Language for Describing Facebook Interaction Graphs 潘宗佐, Pan, Tsung Tso Unknown Date (has links) 因大量臉書用戶留下不少行為紀錄，吸引研究人員蒐集臉書資料，從中挖掘具有價值的潛在資訊與研究成果，並用 Graph 點線關係圖視覺化呈現成果。然而透過臉書 Graph API 蒐集資料的方法，自 2.0 版起漸漸提高了限速限制，造成蒐集資料開始是一個問題。雖有研者人員提出應對方法，卻也產生用戶識別度弱化問題。故本研究動機是提出一套工具能有效協助研究人員進行臉書資料蒐集與建立關係圖以利進行分析與研究，在進行相關臉書蒐集資料與社會網絡與社群網絡分析文獻探討後，實作一種基於爬蟲技術，蒐集具用戶識別度之臉書資料；以及提出描述臉書互動行為的關係圖領域專屬語言；實作對應之使用者操作介面工具，讓使用者透過圖形化操作方式描述定義臉書粉絲頁、使用者、Hashtag、留言及回覆與貼文之間互動行為建立關係圖。經過本研究實驗設計與驗證，證明蒐集資料具識別度、產生的關係圖是正確以及問卷調查指出 75% 受試人員認同本系統可快速建立關係圖。未來可以結合爬蟲與關係圖領域專屬語言線上即時建立關係圖，以及導入更多社群分析功能，以利研究人員可以在本系統上進行更深入的分析與研究。 / The huge amount of digital footprints of Facebook users have become a good research resource and we have seen many good results developed by collecting data from Facebook and visualizing it to node-link diagram. However, there are more and more rate limits in Facebook Graph API since version 2.0. Although one could overcome the restriction of rate limits by creating more Facebook App resources, yet this makes user identification become cumbersome in analyzing the retrieved data. Therefore, the motivation of this research is to provide a set of tools to assist researchers in collecting recognizable Facebook user data with a crawler and propose a domain specific language (DSL) to build the relation graph by describing Facebook interactions. With our crawler and the DSL tool, we are able to gather unique user data successfully. The experimental results show that we can construct node-link diagram conveniently, and 75% of the surveyed subjects agreed that our tool could be helpful for building graph. Facebook 爬蟲領域專屬語言點線圖 Facebook Crawler Domain specific langugae Node-link diagram
36	Webový vyhledávací systém / Web Search Engine Tamáš, Miroslav January 2014 (has links) Academic fulltext search engine Egothor has recently became starting point of several thesis aimed on searching. Until now, there was no solution available to provide robust set of web content processing tools. This master thesis is aiming on design and implementation of distributed search system working primary with internet sources. We analyze first generation components for processing of web content and summarize their primary features. We use those features to propose architecture of distributed web search engine. We aim mainly to phases of data fetching, processing and indexing. We also describe final implementation of such system and propose few ideas for future extensions.
37	An Indexation and Discovery Architecture for Semantic Web Services and its Application in Bioinformatics Yu, Liyang 09 June 2006 (has links) Recently much research effort has been devoted to the discovery of relevant Web services. It is widely recognized that adding semantics to service description is the solution to this challenge. Web services with explicit semantic annotation are called Semantic Web Services (SWS). This research proposes an indexation and discovery architecture for SWS, together with a prototype application in the area of bioinformatics. In this approach, a SWS repository is created and maintained by crawling both ontology-oriented UDDI registries and Web sites that hosting SWS. For a given service request, the proposed system invokes the matching algorithm and a candidate set is returned with different degree of matching considered. This approach can add more flexibility to the current industry standards by offering more choices to both the service requesters and publishers. Also, the prototype developed in this research shows the value can be added by using SWS in application areas such as bioinformatics. Ontology OWL-S Semantic Web services Semantic Web Web service standards Web crawler Bioinformatics applications Indexation Service discovery Search engine Computer Sciences
38	Εξόρυξη θεματικών αλυσίδων από ιστοσελίδες για την δημιουργία ενός θεματολογικά προσανατολισμένου προσκομιστή / Lexical chain extraction for the creation of a topical focused crawler Κοκόσης, Παύλος 16 May 2007 (has links) Οι θεματολογικά προσανατολισμένοι προσκομιστές είναι εφαρμογές που έχουν στόχο την συλλογή ιστοσελίδων συγκεκριμένης θεματολογίας από τον Παγκόσμιο Ιστό. Αποτελούν ένα ανοικτό ερευνητικό πεδίο των τελευταίων χρόνων. Σε αυτήν την διπλωματική εργασία επιχειρείται η υλοποίηση ενός θεματολογικά προσανατολισμένου προσκομιστή με χρήση λεξικών αλυσίδων. Οι λεξικές αλυσίδες είναι ένα σημαντικό λεξιλογικό και υπολογιστικό εργαλείο για την αναπαράσταση της έννοιας ενός κειμένου. Έχουν χρησιμοποιηθεί με επιτυχία στην αυτόματη δημιουργία περιλήψεων για κείμενα, αλλά και στην κατηγοριοποίησή τους σε θεματικές κατηγορίες. Παρουσιάζουμε τις διαδικασίες βαθμολόγησης συνδέσμων και ιστοσελίδων, καθώς και τον υπολογισμό της σημασιολογικής ομοιότητας μεταξύ κειμένων με χρήση λεξικών αλυσίδων. Συνδυάζουμε και ενσωματώνουμε αυτές τις διαδικασίες σε έναν θεματολογικά προσανατολισμένο προσκομιστή, τα πειραματικά αποτελέσματα του οποίου είναι πολλά υποσχόμενα. / Topical focused crawlers are applications that aim at collecting web pages of a specific topic from the Web. Building topical focused crawlers is an open research field. In this master thesis we develop a topical focused crawler using lexical chains. Lexical chains are an important lexical and computational tool which is used for representing the meaning of text. They have been used with success in automatic text summarization and text classification in thematic categories. We present the processes of hyperlink and web page scoring, as well as the computation of the semantic similarity between documents by using lexical chains. Combining the aforementioned methods we embody them in a topical focused crawler. Its results are very promising. Θεματικές αλυσίδες 025.04 Lexical chains Topical focused crawler Topical cohesion metric Semantic similarity metric
39	Inblick i fenomenet webbskrapning Andersson, Lars January 2013 (has links) Föreliggande kandidatarbete har till syfte att undersöka fenomenet Webskrapning. Webbskrapnings-program (också kända som Web Wanderers, Crawlers, Spiders eller skrapare) är program som söker igenom webben automatiskt för att extrahera information från webbsidor. Ett exempel på web skrapning är när ett företag samlar in data om prissättningar på en vara eller en tjänst och sen använder informationen för att producera billigare erbjudanden. Detta ger företaget en fördel så att de kan fokusera mera på att marknadsföra sin sida/tjänster. Utöver detta så blir de utsatta företagens servrar också hårt belastade med trafik (skrapning) från ”icke kunder”. Efter att ha genomsökt både akademiska och allmänna källor via informationsinsamling, av denna information så dras slutsatsen att man inte fullt ut kan hindra skrapning av hemsidor. Detta på samma sätt som man inte fullt ut kan hindra någon IT-attack, det finns inga 100 % vattentäta system. Av utfallet ifrån informationssökningen var det bara ett akademiskt arbete, av de hundra, som genomsöktes som hade inriktat sig på att förhindra skrapningsbotar. Scraping Crawler Spider Skrapning Spider trap Sticky honeypot Robots.txt Bot Webbscraping Scraping methods Computer Sciences Datavetenskap (datalogi) Information Systems
40	Raupenfahrzeug-Dynamik Graneß, Henry 18 April 2018 (has links) (PDF) Bei Raupenfahrwerken wird das allgemeingültige Prinzip verfolgt, dass durch die scharnierbare Aneinanderreihung von Kettengliedern eine fahrzeugeigene Fahrstrecke entsteht. Dies erlaubt selbst schwere Geräte im unwegsamen, brüchigen Gelände mit großen Vortriebskräften zu mobilisieren. Jedoch wohnt, der Diskretisierung des Raupenbandes in Glieder endlicher Länge geschuldet, dem Fahrwerk eine hohe Fahrunruhe inne. Dadurch entstehen zeitvariante Lasten im Fahrwerk, welche die Lebensdauer der Kette, des Fahrwerkantriebs und der Tragstruktur des Fahrzeugs limitieren und somit regelmäßig kostenintensive Instandsetzungsmaßnahmen erzwingen. Diese Problemstellung aufgreifend beschäftigt sich die Arbeit mit der Analyse und Optimierung des fahrdynamischen Verhaltens von Raupenfahrzeugen. Zugleich werden Methoden vorgestellt, welche eine rechenzeiteffiziente Simulation von Raupenfahrzeugen und Antriebssystemen zulassen. Raupenfahrzeug Raupenfahrwerk Mehrkörpersimulation Antriebstechnik ganzheitliche Systemanalyse Crawler Track Units Holistic Simulation Multibody Simulation Power Train ddc:620 rvk:ZL 3050 Raupenfahrwerk Antrieb <Technik> Maschinendynamik Simulation

Search results