• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 35
  • 12
  • 10
  • 6
  • 5
  • 3
  • 2
  • 1
  • Tagged with
  • 77
  • 45
  • 15
  • 13
  • 12
  • 11
  • 10
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Breaking Hash-Tag Detection Algorithm for Social Media (Twitter)

January 2015 (has links)
abstract: In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter). The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its (Twitter) usage by news channels, financial experts and pandits, the global economy does seem to hinge on 140 characters. By analyzing the number of tweets hash tagged to a stock, a strong relation can be established between the number of people talking about it, to the trading volume of the stock. In my work, I overt this relation and find a state of the breakout when the volume goes beyond a characterized support or resistance level. / Dissertation/Thesis / Masters Thesis Computer Science 2015
42

Preenchimento automático de formulários na web oculta / Automatically filling in hiddenweb forms

Kantorski, Gustavo Zanini January 2014 (has links)
Muitas informações disponíveis na Web estão armazenadas em bancos de dados on-line e são acessíveis somente após um usuário enviar uma consulta por meio de uma interface de busca. Essas informações estão localizadas em uma parte da Web conhecida como Web Oculta ou Web Profunda e, geralmente, são inacessíveis por máquinas de busca tradicionais. Uma vez que a forma de acessar os dados na Web Oculta se dá por intermédio de submissões de consultas, muitos trabalhos têm focado em como preencher automaticamente campos de formulários. Esta tese apresenta uma metodologia para o preenchimento de formulários na Web Oculta. Além disso, descreve uma categorização das técnicas de preenchimento de formulários existentes no estado da arte de coleta na Web Oculta, produzindo uma análise comparativa entre elas. A solução proposta descreve um método automático para seleção de valores para campos de formulários combinando heurísticas e técnicas de aprendizagem de máquina. Experimentos foram realizados em formulários reais da Web, de vários domínios, e os resultados indicam que a abordagem proposta apresenta desempenho comparável aos obtidos pelas técnicas do estado da arte, sendo inclusive significativamente diferente com base em avaliação estatística. / A large portion of the information on the Web is stored inside online databases. Such information is accessible only after the users submit a query through a search interface. TheWeb portion in which that information is located is called HiddenWeb or DeepWeb, and generally this part is inaccessible by traditional search engines crawlers. Since the only way to access the Hidden Web pages is through the query submissions, many works have focused on how to fill in form fields automatically, aiming at enhancing the amount of distinct information hidden behind Web forms. This thesis presents an automatic solution to value selection for fields in Web forms. The solution combines heuristics and machine learning techniques for improving the selection of values. Furthermore, this proposal also describes a categorization of form filling techniques and a comparative analysis between works in the state of the art. Experiments were conducted on real Web sites and the results indicated that our approach significantly outperforms a baseline method in terms of coverage without additional computational cost.
43

On the application of focused crawling for statistical machine translation domain adaptation

Laranjeira, Bruno Rezende January 2015 (has links)
O treinamento de sistemas de Tradução de Máquina baseada em Estatística (TME) é bastante dependente da disponibilidade de corpora paralelos. Entretanto, este tipo de recurso costuma ser difícil de ser encontrado, especialmente quando lida com idiomas com poucos recursos ou com tópicos muito específicos, como, por exemplo, dermatologia. Para contornar esta situação, uma possibilidade é utilizar corpora comparáveis, que são recursos muito mais abundantes. Um modo de adquirir corpora comparáveis é a aplicação de algoritmos de Coleta Focada (CF). Neste trabalho, são propostas novas abordagens para CF, algumas baseadas em n-gramas e outras no poder expressivo das expressões multipalavra. Também são avaliadas a viabilidade do uso de CF para realização de adaptação de domínio para sistemas genéricos de TME e se há alguma correlação entre a qualidade dos algoritmos de CF e dos sistemas de TME que podem ser construídos a partir dos respectivos dados coletados. Os resultados indicam que algoritmos de CF podem ser bons meios para adquirir corpora comparáveis para realizar adaptação de domínio para TME e que há uma correlação entre a qualidade dos dois processos. / Statistical Machine Translation (SMT) is highly dependent on the availability of parallel corpora for training. However, these kinds of resource may be hard to be found, especially when dealing with under-resourced languages or very specific domains, like the dermatology. For working this situation around, one possibility is the use of comparable corpora, which are much more abundant resources. One way of acquiring comparable corpora is to apply Focused Crawling (FC) algorithms. In this work we propose novel approach for FC algorithms, some based on n-grams and other on the expressive power of multiword expressions. We also assess the viability of using FC for performing domain adaptations for generic SMT systems and whether there is a correlation between the quality of the FC algorithms and of the SMT systems that can be built with its collected data. Results indicate that the use of FCs is, indeed, a good way for acquiring comparable corpora for SMT domain adaptation and that there is a correlation between the qualities of both processes.
44

Skrapa försäljningssidor på nätet : Ett ramverk för webskrapningsrobotar

Karlsson, Emil, Edberg, Mikael January 2016 (has links)
På internet finns det idag ett stort utbud av försäljningswebbsidor där det hela tiden inkommer nya annonser. Vi ser att det finns ett behov av ett verktyg som övervakar de här webbsidorna dygnet runt för att se hur mycket som säljs och vad som säljs. Att skapa ett program som övervakar webbsidor är tidskrävande, därför har vi skapat ett ramverk som underlättar skapandet av webbskrapare som är fokuserade på att listbaserade försäljningswebbsidor på nätet. Det finns flera olika ramverk för webbskrapning, men det finns väldigt få som endast är fokuserade på den här typen av webbsidor.
45

Preenchimento automático de formulários na web oculta / Automatically filling in hiddenweb forms

Kantorski, Gustavo Zanini January 2014 (has links)
Muitas informações disponíveis na Web estão armazenadas em bancos de dados on-line e são acessíveis somente após um usuário enviar uma consulta por meio de uma interface de busca. Essas informações estão localizadas em uma parte da Web conhecida como Web Oculta ou Web Profunda e, geralmente, são inacessíveis por máquinas de busca tradicionais. Uma vez que a forma de acessar os dados na Web Oculta se dá por intermédio de submissões de consultas, muitos trabalhos têm focado em como preencher automaticamente campos de formulários. Esta tese apresenta uma metodologia para o preenchimento de formulários na Web Oculta. Além disso, descreve uma categorização das técnicas de preenchimento de formulários existentes no estado da arte de coleta na Web Oculta, produzindo uma análise comparativa entre elas. A solução proposta descreve um método automático para seleção de valores para campos de formulários combinando heurísticas e técnicas de aprendizagem de máquina. Experimentos foram realizados em formulários reais da Web, de vários domínios, e os resultados indicam que a abordagem proposta apresenta desempenho comparável aos obtidos pelas técnicas do estado da arte, sendo inclusive significativamente diferente com base em avaliação estatística. / A large portion of the information on the Web is stored inside online databases. Such information is accessible only after the users submit a query through a search interface. TheWeb portion in which that information is located is called HiddenWeb or DeepWeb, and generally this part is inaccessible by traditional search engines crawlers. Since the only way to access the Hidden Web pages is through the query submissions, many works have focused on how to fill in form fields automatically, aiming at enhancing the amount of distinct information hidden behind Web forms. This thesis presents an automatic solution to value selection for fields in Web forms. The solution combines heuristics and machine learning techniques for improving the selection of values. Furthermore, this proposal also describes a categorization of form filling techniques and a comparative analysis between works in the state of the art. Experiments were conducted on real Web sites and the results indicated that our approach significantly outperforms a baseline method in terms of coverage without additional computational cost.
46

Link Extraction for Crawling Flash on the Web

Antelius, Daniel January 2015 (has links)
The set of web pages not reachable using conventional web search engines is usually called the hidden or deep web. One client-side hurdle for crawling the hidden web is Flash files. This thesis presents a tool for extracting links from Flash files up to version 8 to enable web crawling. The files are both parsed and selectively interpreted to extract links. The purpose of the interpretation is to simulate the normal execution of Flash in the Flash runtime of a web browser. The interpretation is a low level approach that allows the extraction to occur offline and without involving automation of web browsers. A virtual machine is implemented and a set of limitations is chosen to reduce development time and maximize the coverage of interpreted byte code. Out of a test set of about 3500 randomly sampled Flash files the link extractor found links in 34% of the files. The resulting estimated web search engine coverage improvement is almost 10%.
47

Human Interactions on Online Social Media : Collecting and Analyzing Social Interaction Networks

Erlandsson, Fredrik January 2018 (has links)
Online social media, such as Facebook, Twitter, and LinkedIn, provides users with services that enable them to interact both globally and instantly. The nature of social media interactions follows a constantly growing pattern that requires selection mechanisms to find and analyze interesting data. These interactions on social media can then be modeled into interaction networks, which enable network-based and graph-based methods to model and understand users’ behaviors on social media. These methods could also benefit the field of complex networks in terms of finding initial seeds in the information cascade model. This thesis aims to investigate how to efficiently collect user-generated content and interactions from online social media sites. A novel method for data collection that is using an exploratory research, which includes prototyping, is presented, as part of the research results in this thesis.   Analysis of social data requires data that covers all the interactions in a given domain, which has shown to be difficult to handle in previous work. An additional contribution from the research conducted is that a novel method of crawling that extracts all social interactions from Facebook is presented. Over the period of the last few years, we have collected 280 million posts from public pages on Facebook using this crawling method. The collected posts include 35 billion likes and 5 billion comments from 700 million users. The data collection is the largest research dataset of social interactions on Facebook, enabling further and more accurate research in the area of social network analysis.   With the extracted data, it is possible to illustrate interactions between different users that do not necessarily have to be connected. Methods using the same data to identify and cluster different opinions in online communities have also been developed and evaluated. Furthermore, a proposed method is used and validated for finding appropriate seeds for information cascade analyses, and identification of influential users. Based upon the conducted research, it appears that the data mining approach, association rule learning, can be used successfully in identifying influential users with high accuracy. In addition, the same method can also be used for identifying seeds in an information cascade setting, with no significant difference than other network-based methods. Finally, privacy-related consequences of posting online is an important area for users to consider. Therefore, mitigating privacy risks contributes to a secure environment and methods to protect user privacy are presented.
48

Diseño e implementación de sistema distribuido y colaborativo de peticiones HTTP/S

Pulgar Romero, Francisco Leonardo January 2018 (has links)
Memoria para optar al título de Ingeniero Civil en Computación / En la actualidad existen muchos computadores y dispositivos tecnológicos con capacidad computacional ociosa, con el potencial de ser usados. Es así como existen una gran cantidad de proyectos donde personas donan voluntariamente su poder computacional para ayudar en problemas tales como: renderización de animaciones 3D, correr simulaciones de experimentos, estudiar conjeturas matemáticas, optimización de variables y parámetros en Machine Learning, estudiar estructuras de proteínas y moléculas, clasificación de galaxias, predicción del clima, entre un sinfín de aplicaciones posibles tanto en el área de investigación como en el área empresarial. Esa necesidad de poder de procesamiento y recursos computacionales ha llevado a crear tecnologías como la computación grid (o en malla), que consiste en un sistema de computación distribuido que permite coordinar computadoras de diferente hardware y software haciendo uso de estos para resolver en paralelo tareas en común. La presente memoria tiene como fin la creación de un sistema distribuido en malla donde dispositivos tecnológicos se comunican con un servidor central para recopilar datos de internet; usando así la capacidad ociosa de dispositivos tecnológicos y brindando ayuda voluntaria a aquel que necesite recopilar datos de internet. Durante el desarrollo de este trabajo se implementa un sistema de administración de usuarios y dispositivos tecnológicos realizado con Django, un sistema de distribución de consultas HTTP/S desarrollado con Tornado y un software que corre de lado de los dispositivos tecnológicos para resolver tareas y mandar resultados, hecho en Python. Estos tres sistemas se comunican entre ellos para lograr la distribución de las consultas HTTP/S, pero son independientes entre sí, ayudando a la escalabilidad y tolerancia a fallos del sistema general. Finalmente se realizan pruebas y experimentos de los diferentes componentes para obtener datos relevantes que nos permitan estudiar el comportamiento del sistema, identificando ventajas y desventajas del uso del mismo. Los resultados obtenidos muestran que a medida que aumenta la cantidad de dispositivos tecnológicos que colaboran en una tarea, disminuyen los tiempos de resolución de éstas; además se demuestra una correlación directa entre el tiempo de respuesta de una consulta HTTP/S y la distancia física que existe entre el dispositivo que hace la consulta y el servidor web.
49

Intelligent Event Focused Crawling

Farag, Mohamed Magdy Gharib 23 September 2016 (has links)
There is need for an integrated event focused crawling system to collect Web data about key events. When an event occurs, many users try to locate the most up-to-date information about that event. Yet, there is little systematic collecting and archiving anywhere of information about events. We propose intelligent event focused crawling for automatic event tracking and archiving, as well as effective access. We extend the traditional focused (topical) crawling techniques in two directions, modeling and representing: events and webpage source importance. We developed an event model that can capture key event information (topical, spatial, and temporal). We incorporated that model into the focused crawler algorithm. For the focused crawler to leverage the event model in predicting a webpage's relevance, we developed a function that measures the similarity between two event representations, based on textual content. Although the textual content provides a rich set of features, we proposed an additional source of evidence that allows the focused crawler to better estimate the importance of a webpage by considering its website. We estimated webpage source importance by the ratio of number of relevant webpages to non-relevant webpages found during crawling a website. We combined the textual content information and source importance into a single relevance score. For the focused crawler to work well, it needs a diverse set of high quality seed URLs (URLs of relevant webpages that link to other relevant webpages). Although manual curation of seed URLs guarantees quality, it requires exhaustive manual labor. We proposed an automated approach for curating seed URLs using social media content. We leveraged the richness of social media content about events to extract URLs that can be used as seed URLs for further focused crawling. We evaluated our system through four series of experiments, using recent events: Orlando shooting, Ecuador earthquake, Panama papers, California shooting, Brussels attack, Paris attack, and Oregon shooting. In the first experiment series our proposed event model representation, used to predict webpage relevance, outperformed the topic-only approach, showing better results in precision, recall, and F1-score. In the second series, using harvest ratio to measure ability to collect relevant webpages, our event model-based focused crawler outperformed the state-of-the-art focused crawler (best-first search). The third series evaluated the effectiveness of our proposed webpage source importance for collecting more relevant webpages. The focused crawler with webpage source importance managed to collect roughly the same number of relevant webpages as the focused crawler without webpage source importance, but from a smaller set of sources. The fourth series provides guidance to archivists regarding the effectiveness of curating seed URLs from social media content (tweets) using different methods of selection. / Ph. D.
50

Étude de la mobilité quadrupède en position ventrale chez le nouveau-né et le nourrisson humain / Very early crawling; study of the quadrupedal mobility in the prone position on the newborn and human infant

Forma, Vincent 28 November 2016 (has links)
La locomotion autonome est une étape clef du développement du nourrisson. Elle débute dans la majorité des cas par la marche quadrupède au deuxième semestre de vie. Cependant, dès la naissance, le nouveau-né est déjà capable de se propulser de manière autonome en position ventrale. Cette mobilité quadrupède précoce a été très peu étudiée, car considérée par la plupart des auteurs comme un simple réflexe de reptation, destiné à disparaître rapidement sous l'influence du développement cortical : cette reptation n'aurait aucun lien avec la marche mature, n'impliquerait pas les bras et aurait pour fonction principale de permettre au nouveau-né de se propulser jusqu'au sein maternel. Contrairement à ce point de vue, quelques auteurs ont observé que cette mobilité semblait complexe et pouvait éventuellement persister jusqu'à l'âge de 2-3 mois, dans un contexte adapté. Ces observations posent la question de savoir si cette mobilité primitive, loin d'être un simple réflexe, pourrait être en lien avec la marche quadrupède et bipède mature. L'objectif de cette thèse est d'étudier les différentes caractéristiques, en particulier cinématiques, de cette mobilité quadrupède depuis la naissance jusqu'au sixième mois. Dans ce but, un dispositif a été créé, le CrawliSkate, qui permet de libérer les bras et faciliter la propulsion. Trois études ont été menées et montrent que cette mobilité quadrupède est loin d'être un simple réflexe stéréotypé, qu'elle implique une coordination des jambes et des bras, qu'elle peut en partie être modifiée dès la naissance à un niveau supra spinal par la vision et enfin qu'elle persiste tout en se modifiant durant tout le premier semestre de la vie. / Self-produced locomotion is a key stage in infant development, which usually begins with hand and knees crawling in the second semester of life. Since the moment of birth, however, newborns are already capable of autonomous propulsion from a prone position. This precocious form of quadrupedalism remains largely unstudied due in part to the fact that most researchers consider these creeping movements to constitute a mere reflex, destined dissipate as cortical development progresses. Under such an interpretation, this creeping « reflex » would have no link with mature, bipedal walking, would not recruit the upper limbs and would serve mainly as a mechanism by which newborns could reach the maternal breast. Contrary to this point of view, a handful of authors have observed that these patterns of locomotion seem complex, and might persist in some form until the age of 2-3 months. These observations invite us to consider the possibility that such primitive locomotion might be directly involved in the emergence of quadrupedal and bipedal gait. The present thesis examines the various characteristics (particularly cinematic) of this prone mobility, from birth to about six months of age. To this end, we describe the creation of an experimental tool that frees the use of a newborn's limbs and facilitates the aforementioned form of propulsion: the CrawliSkate. We present three studies showing that neonatal prone mobility goes beyond simple reflexes, involves coordination between the upper and lower limbs, and can be partially modified at birth at a supra-spinal level through visual stimulation. Lastly, we demonstrate that this pattern of locomotion persists, albeit with heavy modification, throughout the first semester of life.

Page generated in 0.0729 seconds