Global ETD Search

61	Metody dolování relevantních dat z prostředí webu s využitím sociálních sítí / Datamining of Relevenat Information from WWW with Using Social Networks Smolík, Jakub January 2013 (has links) This thesis focuses on solving problems related to searching of relevant data on the internet. In text is presented possible solution in form of application capable of automated extraction and aggregation of data from web and their presentation, based on input key words. For this purpouse there were studied and discribed possibilities of automated extraction from three chosen data types, mainly used as data storages on the internet. Furthermore it focuses on ways of data mining from social networks. As a result it pressents planning, implementation, realization and testing of created appliation which can easily find, display and let user easy access searched informations.
62	Evaluation of web scraping methods : Different automation approaches regarding web scraping using desktop tools / Utvärdering av webbskrapningsmetoder : Olika automatiserings metoder kring webbskrapning med hjälp av skrivbordsverktyg Oucif, Kadday January 2016 (has links) A lot of information can be found and extracted from the semantic web in different forms through web scraping, with many techniques emerging throughout time. This thesis is written with the objective to evaluate different web scraping methods in order to develop an automated, performance reliable, easy implemented and solid extraction process. A number of parameters are set to better evaluate and compare consisting techniques. A matrix of desktop tools are examined and two were chosen for evaluation. The evaluation also includes the learning of setting up the scraping process with so called agents. A number of links gets scraped by using the presented techniques with and without executing JavaScript from the web sources. Prototypes with the chosen techniques are presented with Content Grabber as a final solution. The result is a better understanding around the subject along with a cost-effective extraction process consisting of different techniques and methods, where a good understanding around the web sources structure facilitates the data collection. To sum it all up, the result is discussed and presented with regard to chosen parameters. / En hel del information kan bli funnen och extraherad i olika format från den semantiska webben med hjälp av webbskrapning, med många tekniker som uppkommit med tiden. Den här rapporten är skriven med målet att utvärdera olika webbskrapnings metoder för att i sin tur utveckla en automatiserad, prestandasäker, enkelt implementerad och solid extraheringsprocess. Ett antal parametrar är definierade för att utvärdera och jämföra befintliga webbskrapningstekniker. En matris av skrivbords verktyg är utforskade och två är valda för utvärdering. Utvärderingen inkluderar också tillvägagångssättet till att lära sig sätta upp olika webbskrapnings processer med så kallade agenter. Ett nummer av länkar blir skrapade efter data med och utan exekvering av JavaScript från webbsidorna. Prototyper med de utvalda teknikerna testas och presenteras med webbskrapningsverktyget Content Grabber som slutlig lösning. Resultatet utav det hela är en bättre förståelse kring ämnet samt en prisvärd extraheringsprocess bestående utav blandade tekniker och metoder, där en god vetskap kring webbsidornas uppbyggnad underlättar datainsamlingen. Sammanfattningsvis presenteras och diskuteras resultatet med hänsyn till valda parametrar. web scraping data extraction automation semantic web business intelligence DOM parsing HTML parsing XPath webbskrapning datautvinning automatisering semantiska webben business intelligence DOM parsing HTML parsing XPath Engineering and Technology Teknik och teknologier
63	Video Game Network Analysis : A Study on Tooling Design / Nätverksanalys för Videospel : En Studie om Verktygsdesign Eksi, Murat, Pihl, Markus January 2020 (has links) Crackshell is an indie game studio situated in Stockholm. They released particular iterations of a game called Hammerwatch, which is developed with their in-house game engine and they are still working to extend both the Hammerwatch and the game engine. Hammerwatch is a rogue-like multiplayer game played by up to four players in a single session by using peer to-peer network topology. These days, Hammerwatch has gotten significantly popular and the planned features have led the team to question of whether their network utilization is performant and in what ways they can improve this utilization. Even though they the are ones who implemented the network part of Hammerwatch, they don’t exactly have an understanding of the underlying behavior of the utilization, nor they have any way to analyze it currently. This project is aimed to design and implement a proper tooling implementation for their data analysis needs by identifying the network topology, datastructures, extraction, storage and providing an environment that is easy to analyze the network utilization. In order to achieve this aim, an iterative approach through design thinking has been conducted with Crackshell. In this regard, there were certain decisions to be made in accordance with the constraints and the purpose of the tooling, which is defined with the help of Crackshell by the conducted workshops as a module of the design thinking approach. The above-mentioned strategy allowed a swift understanding of the problemthat led the tooling to be approved as both helpful and easy-to-use by Crackshell. The data analysis tool was implemented by using a local data extraction solution, MongoDB and Jupyter Notebook in Python together with extensions that helped further with the analysis of the collected data. The results of the data analysis deemed itself as a significant success, where problems such as the game events being sent unnecessarily frequently, stale data issues, caching opportunities, and potential data clustering issues in network packets were pointed out. Crackshell was happy with the provided ability to look at their network utilization in a detailed manner, which led them to use the implemented tooling for further analysis as Hammerwatch is kept developing. / Crackshell är en indie-spelstudio belägen i Stockholm. De har släppt ett antal spel som heter Hammerwatch, vilket är utvecklat med sin egen spelmotor. Hammerwatch och dess spelmotor utvecklas fortfarande kontinuerligt. Det är ett rogue-liknande multiplayer-spel som spelas av upp till fyra spelare i en enda session med hjälp av peer-to-peer-nätverkstopologi. Hammerwatch blev snabbt populärt och de planerade funktionerna har lett teamet till en fråga om deras nätverksanvändning är effektiv ur prestandasynpunkt och på vilka sätt de kan förbättra den. Även om det är de som implementerade nätverksdelen av Hammerwatch, har de inte exakt en förståelse för det underliggande beteendet hos nätverkskommunikationen, och de har inte heller något sätt att analysera det för närvarande.Detta projekt syftade till att utforma och implementera verktyg för att dataanalys genom att identifiera nätverkstopologi, datastrukturer, extraktion, lagring och tillhandahålla en miljö som gör det lätt att analysera nätverksanvändningen. För att uppnå detta mål valdes en iterativ metod baserad på “design thinking” denna genomfördes tillsammans med Crackshell. Under designfasen fattades beslut kring begränsningar och syfte med verktyget.Ovan nämnda strategi möjliggjorde en snabb förståelse av problemet som ledde till utvecklandet av ett verktyg som både godkänts som användbart och lätt att använda av Crackshell. Dataanalysverktyget implementerades med hjälp av en lokal lösning för utvinning av data, MongoDB och Jupyter Notebook i Python tillsammans med tillägg som hjälpte till vidare med analysen av insamlade data.Resultaten av dataanalysen löste in sig som en betydande framgång, där problem som spelhändelser som skickades onödigt ofta, data som var gammal när den nådde fram, cachemöjligheter och potentiella problem med datakluster i nätverkspaket kunde hittas. Crackshell var nöjd med resultatet och nya förmågan att titta på deras nätverksanvändning på ett detaljerat sätt. De kommer kunna använda det utvecklade verktyget till framtida analyser vid fortsatt vidareutveckling av spelmotorn. Video Games Network Analysis Data Analysis Tooling Design Design Thinking Network Packets Data Extraction Database Design Videospel Nätverkanalysen Dataanalysen Verktygsdesign Design Thinking Nätverkspaketer Data exktration Databasdesign Computer and Information Sciences Data- och informationsvetenskap
64	Space Weather Simulation Model Integration Molin, Alice, Johnstone, Julia January 2023 (has links) Space weather is the field within the space sciences that studies how the Earths magnetosphere is influenced by the Sun. The Sun is constantly emitting dangerous radiation and plasma which in some cases can affect or damage the systems on Earth. Scientists have an interest in studying this interaction and therefore visualizations of space weather data are useful. OpenSpace is an interactive software that visualizes the entire known universe with real-time data. OpenSpace supports a range of different visualization methods and techniques, for this work, the relevant visualization tools are field lines and cut planes. GAMERA is a simulation model that simulates a wide range of situations where plasma is subjected to the influence of magnetic fields, the simulations are based on curvilinear grids. This project focuses on implementing data from GAMERA into OpenSpace. OpenSpace already supports a variety of different simulation models, although none that uses curvilinear grids for the data. The curvilinear grid can adapt to the specific shape and geometry of the data, allowing for more accurate data representation. The project aims to create a pipeline for reading data files from simulation runs and visualize it as field lines and cut planes. The files used in this project contain data suitable for volumes and field lines. The method was to first develop a reader to extract and manage desired data from HDF5 files in which the simulation data is stored. The data used to visualize field lines is rendered with an already existing component in OpenSpace. Secondly, a slice operation was developed to extract cut planes from the files containing data for volume visualization, these are then visualized with the help of a component for rendering cut planes which was developed during this work. The work led to a pipeline that reads and manages simulation data from GAMERA and the data is successfully visualized. However, there is room for improvement in color rendering, robustness and level of user interaction during runtime. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> simulation model OpenSpace GAMERA HDF5 NASA space weather magnetosphere field lines cut plane curvilinear grids data extraction integration visualization hierarchical data format textures transfer function Medicinsk teknik Medicinsk teknik Media and Communication Technology Medieteknik
65	Generic Data Harvester Asp, William, Valck, Johannes January 2022 (has links) This report goes through the process of developing a generic article scraper which shall extract relevant information from an arbitrary web article. The extraction is implemented by searching and examining the HTML of the article, by using Python and XPath. The data that shall be extracted is the title, summary, publishing date and body text of the article. As there is no standard way that websites, and in particular news articles, is built, the extraction needs to be adapted for every different structure and language of articles. The resulting program should provide a proof of concept method of extracting the data showing that future development is possible. The thesis host company Acuminor is working with financial crime intelligence and are collecting information through articles and reports. To scale up the data collection and minimize the maintenance of the scraping programs, a general article scraper is needed. There exist an open source alternative called Newspaper, but since this is no longer being maintained and it can be argued is not properly designed, an internal implementation for the company could be beneficial. The program consists of a main class that imports extractor classes that have an API for extracting the data. Each extractor are decoupled from the rest in order to keep the program as modular as possible. The extraction for title, summary and date are similar, with the extractors looking for specific HTML tags that contain some common attribute that most websites implement. The text extraction is implemented using a tree that is built up from the existing text on the page and then searching the tree for the most likely node containing only the body text, using attributes such as amount of text, depth and number of text nodes. The resulting program does not match the performance of Newspaper, but shows promising results on every part of the extraction. The text extraction is very slow and often takes too much text of the article but provides a great blueprint for further improvement at the company. Acuminor will be able to have their in-house article extraction that suits their wants and needs. / Den här rapporten går igenom processen av att utveckla en generisk artikelskrapare som ska extrahera reöevamt information från en godtycklig artikelhemsida. Extraheringen kommer bli implementerad genom att söka igenom och undersöka HTML-en i artikeln, genom att använda Python och XPath. Datan som skall extraheras är titeln, summering, publiceringsdatum och brödtexten i artikeln. Eftersom det inte finns något standard sätt som hemsidor, och mer specifikt nyhetsartiklar är uppbyggda, extraheringen måste anpassas för varje olika struktur och språk av artiklar. Det resulterande programmed skall visa på ett bevis för ett koncept sätt att extrahera datan som visar på att framtida utveckling är möjlig. Projektets värdföretag Acuminor jobbar inom finansiell brottsintelligens och samlar ihop information genom artiklar och rapporter. För att skala upp insamlingen av data och minimera underhåll av skrapningsprogrammen, behövs en generell artikelskrapare. Det existerar ett öppen källkodsalternativ kallad Newspaper, men eftersom denna inte länge är underhållen och det kan argumenteras att den inte är så bra designad, är en intern implementation för företaget fördelaktigt. Programmet består av en huvudklass som importerar extraheringsklasser som har ett API för att extrahera datan. Varje extraherare är bortkopplad från resten av programmet för att hålla programmet så moodulärt som möjligt. Extraheringen för titel, summering och datum är liknande, där extragherarna tittar efter specifika HTML taggar som innehåller något gemensamt attribut som de flesta hemsidor implementerar. Textextraheringen är implementerad med ett träd som byggs upp från grunden från den existerande texten på sidan och sen söks igenom för att hitta den mest troliga noden som innehåller brödtexten, där den använder attribut såsom text, djup och antal textnoder. Det resulterande programmet matchar inte prestandan av Newspaper, men visar på lovande resultat vid varje del av extraheringen. Textextraheringen är väldigt långsam och hämtar ofta för mycket text från artikeln men lämnar ett bra underlag för vidare förbättring hos företaget. Allt som allt kommer Acuminor kunna bygga vidare på deras egna artikel extraherare som passar deras behov. News Articles Newspapers Web crawler Web site parsing Optimization Web robot Web spider Web data extraction HTML Scrapy Nyheter Artiklar Tidningar Sökrobot Analys av hemsida Optimering Webbrobot Webbspindel Data extrahering hemsidor HTML Scrapy Computer and Information Sciences Data- och informationsvetenskap
66	Metody sumarizace dokumentů na webu / Methods of Document Summarization on the Web Belica, Michal January 2013 (has links) The work deals with automatic summarization of documents in HTML format. As a language of web documents, Czech language has been chosen. The project is focused on algorithms of text summarization. The work also includes document preprocessing for summarization and conversion of text into representation suitable for summarization algorithms. General text mining is also briefly discussed but the project is mainly focused on the automatic document summarization. Two simple summarization algorithms are introduced. Then, the main attention is paid to an advanced algorithm that uses latent semantic analysis. Result of the work is a design and implementation of summarization module for Python language. Final part of the work contains evaluation of summaries generated by implemented summarization methods and their subjective comparison of the author.
67	Characteristics of the Specific Fuel Consumption for Jet Engines Bensel, Artur January 2018 (has links) (PDF) Purpose of this project is a) the evaluation of the Thrust Specific Fuel Consumption (TSFC) of jet engines in cruise as a function of flight altitude, speed and thrust and b) the determination of the optimum cruise speed for maximum range of jet airplanes based on TSFC characteristics from a). Related to a) a literature review shows different models for the influence of altitude and speed on TSFC. A simple model describing the influence of thrust on TSFC seems not to exist in the literature. Here, openly available data was collected and evaluated. TSFC versus thrust is described by the so-called bucket curve with lowest TSFC at the bucket point at a certain thrust setting. A new simple equation was devised approximating the influence of thrust on TSFC. It was found that the influence of thrust as well as of altitude on TSFC is small and can be neglected in cruise conditions in many cases. However, TSFC is roughly a linear function of speed. This follows already from first principles. Related to b) it was found that the academically taught optimum flight speed (1.316 times minimum drag speed) for maximum range of jet airplanes is inaccurate, because the derivation is based on the unrealistic assumption of TSFC being constant with speed. Taking account of the influence of speed on TSFC and on drag, the optimum flight speed is only about 1.05 to 1.11 the minimum drag speed depending on aircraft weight. The amount of actual engine data was extremely limited in this project and the results will, therefore, only be as accurate as the input data. Results may only have a limited universal validity, because only four jet engine types were analyzed. One of the project's original value is the new simple polynomial function to estimate variations in TSFC from variations in thrust while maintaining constant speed and altitude. ddc:620 info:eu-repo/classification/ddc/629.13 Luftfahrt Luftfahrzeug Flugmechanik Flugtriebwerk Aeronautics Airplanes Airplanes--Performance Airplanes--Turbojet engines engine turbofan fuel consumption SFC TSFC PSFC Turbomatch bucket curve off-takes cruise altitude speed Mach thrust BPR data extraction optimization range Breguet

Page generated in 0.0795 seconds