• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 7
  • 3
  • 1
  • 1
  • Tagged with
  • 45
  • 45
  • 17
  • 13
  • 13
  • 11
  • 10
  • 8
  • 7
  • 6
  • 6
  • 6
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Developing a Python based web scraper : A study on the development of a web scraper for TimeEdit

Andersson, Pontus January 2021 (has links)
I en värld där alltmer information lagras på internet är det svårt för en vanlig användare att hänga med. Även när informationen finns tillgänglig på en och samma hemsida kan den hemsidan sakna funktioner eller vara svår att läsa av. Idén bakom att skrapa hemsidor, tidningar eller spel på information är inte ny och detta examensarbete fokuserar på att bygga en web scraper med tillhörande hemsida där användare kan ladda upp sitt schema skrapat från TimeEdit. Hemsidan ska sedan presentera denna skrapade data på ett visuellt tilltalande sett. När system är färdigutvecklade utvärderas dem för att se om examensarbetets mål har uppnåtts samt om systemen har förbättrat det befintliga sättet att hantera schemaläggning i TimeEdit hos lärare och studenter. I sammanfattningen finns sedan framtida forskning och arbeten presenterat. / The concept of scraping the web is not new, however, with modern programming languages it is possible to build web scrapers that can collect unstructured data and save this in a structured way. TimeEdit, a scheduling platform used by Mid Sweden University, has no feasible way to count how many hours has been scheduled at any given week to a specific course, student, or professor. The goal of this thesis is to build a python-based web scraper that collects data from TimeEdit and saves this in a structured manner. Users can then upload this text file to a dynamic website where it is extracted from the file and saved into a predetermined database and unique to that user. The user can then get this data presented in a fast, efficient, and user-friendly way. This platform is developed and evaluated with the resulting platform being a good and fast way to scan a TimeEdit schedule and evaluate the extracted data. With the platform built future work is recommended to make it a finishes product ready for live use by all types of users.
12

Computational Methods for Analyzing Chemical Graphs and Biological Networks / 化学グラフと生体ネットワークに対する情報解析手法

Zhao, Yang 24 March 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18405号 / 情博第520号 / 新制||情||92(附属図書館) / 31263 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 阿久津 達也, 教授 山本 章博, 教授 永持 仁 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
13

Methods for Analyzing Tree-Structured Data and their Applications to Computational Biology / 木構造データの解析手法とその計算生物学への応用

Mori, Tomoya 24 September 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19336号 / 情博第588号 / 新制||情||103(附属図書館) / 32338 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 阿久津 達也, 教授 山本 章博, 教授 岡部 寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
14

Algorithmic Approaches to Pattern Mining from Structured Data / 構造データからのパターン発見におけるアルゴリズム論的アプローチ

Otaki, Keisuke 23 March 2016 (has links)
The contents of Chapter 6 are based on work published in IPSJ Transactions on Mathematical Modeling and Its Applications, vol.9(1), pp.32-42, 2016. / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19846号 / 情博第597号 / 新制||情||104(附属図書館) / 32882 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 山本 章博, 教授 鹿島 久嗣, 教授 阿久津 達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DGAM
15

Adaptive Semi-structured Information Extraction

Arpteg, Anders January 2003 (has links)
The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user. The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built. The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions. An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system. / <p>Report code: LiU-Tek-Lic-2002:73.</p>
16

On the Structural Link Between Ontologies and Organised Data Sets

Marinache, Alicia January 2016 (has links)
The proposed work focuses on articulating a mathematical framework to capture the structure of an ontology and relate it to organised data sets. In the discussed framework, the ontology structure captures the mereological relationships between concepts. It also uses other relationships relevant to the considered domain of application. The organised dataset component of the framework is represented using diagonal-free cylindric algebra. The proposed framework, called the domain-information structure, enables us to link concepts to data sets through a number of typed data operators. The new framework enhances concurrent reasoning on data for knowledge generation, which is essential for handling big data. We illustrate the advantage of the obtained framework by using it in generating new knowledge from an ontology and a given data set. / Thesis / Master of Applied Science (MASc)
17

A Framework for Automatic Ontology Generation from Autonomous Web Applications

Modica, Giovanni 13 December 2002 (has links)
Ontologies capture the structure, relationships, semantics and other essential meta information of an application. This thesis describes a framework to automate application interoperability by using dynamically generated ontologies. We propose a set of techniques to extract ontologies from data accessible on the Web in the form of semi-structured HTML pages. Ontologies retrieved from similar applications are matched together to create a general ontology describing the application domain. Information retrieval and graph matching techniques are used to match and measure the usefulness of the ontologies created. Matching algorithms are combined together to produce global ontologies based on local ontologies inherently present in Web applications. We present a system called OntoBuilder that allows users to drive the ontology creation process using a userriendly and intuitive interface. We also present experiments for a well-known case of study: car-rental applications. We successfully achieve 90% accuracy on ontology extraction and 70% accuracy for ontology matching.
18

Pattern Recognition in Large Dimensional and Structured Datasets

Kurra, Goutham 11 March 2002 (has links)
No description available.
19

Ensemble Learning Techniques for Structured and Unstructured Data

King, Michael Allen 01 April 2015 (has links)
This research provides an integrated approach of applying innovative ensemble learning techniques that has the potential to increase the overall accuracy of classification models. Actual structured and unstructured data sets from industry are utilized during the research process, analysis and subsequent model evaluations. The first research section addresses the consumer demand forecasting and daily capacity management requirements of a nationally recognized alpine ski resort in the state of Utah, in the United States of America. A basic econometric model is developed and three classic predictive models evaluated the effectiveness. These predictive models were subsequently used as input for four ensemble modeling techniques. Ensemble learning techniques are shown to be effective. The second research section discusses the opportunities and challenges faced by a leading firm providing sponsored search marketing services. The goal for sponsored search marketing campaigns is to create advertising campaigns that better attract and motivate a target market to purchase. This research develops a method for classifying profitable campaigns and maximizing overall campaign portfolio profits. Four traditional classifiers are utilized, along with four ensemble learning techniques, to build classifier models to identify profitable pay-per-click campaigns. A MetaCost ensemble configuration, having the ability to integrate unequal classification cost, produced the highest campaign portfolio profit. The third research section addresses the management challenges of online consumer reviews encountered by service industries and addresses how these textual reviews can be used for service improvements. A service improvement framework is introduced that integrates traditional text mining techniques and second order feature derivation with ensemble learning techniques. The concept of GLOW and SMOKE words is introduced and is shown to be an objective text analytic source of service defects or service accolades. / Ph. D.
20

Structured Data Extraction from Unstructured Text / Structured Data Extraction from Unstructured Text

Kóša, Peter January 2013 (has links)
Title: Structured Data Extraction from Unstructured Text Author: Bc. Peter Kóša Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D., Department of Software Engineering Abstract: In the last 20 years, there has been an ever-growing amount of information present on the Internet and in published texts. However, this information is often in a non-structured format and this causes various problems such as the inability to efficiently search in diverse collections of texts (medical reports, ads, etc.). To overcome these problems, we need efficient tools capable of automatic processing, extracting the important information and storing of these results in some form for later reuse. The purpose of this thesis is to compare existing solutions as well as to compare them with our solution, which was created in the scope of software project SemJob. The SemJob project is introduced and the reader can therefore obtain knowledge about its inner structure and workings. Keywords: structured data extraction, extraction rules, (semi)automatic wrapper induction

Page generated in 0.063 seconds