Global ETD Search

21	Aggregating product reviews for the Chinese market Wu, Yongliang January 2009 (has links) As of December 2007, the number of Internet users in China had increased to 210 million people. The annual growth rate reached 53.3 percent in 2008, with the average number of Internet users increasing every day by 200,000 people. Currently, China's Internet population is slightly lower than the 215 million internet users in the United States. [1] Despite the rapid growth of the Chinese economy in the global Internet market, China’s e-commerce is not following the traditional pattern of commerce, but instead has developed based on user demand. This growth has extended into every area of the Internet. In the west, expert product reviews have been shown to be an important element in a user’s purchase decision. The higher the quality of product reviews that customers received, the more products they buy from on-line shops. As the number of products and options increase, Chinese customers need impersonal, impartial, and detailed products reviews. This thesis focuses on on-line product reviews and how they affect Chinese customer’s purchase decisions. E-commerce is a complex system. As a typical model of e-commerce, we examine a Business to Consumer (B2C) on-line retail site and consider a number of factors; including some seemingly subtitle factors that may influence a customer’s eventually decision to shop on website. Specifically this thesis project will examine aggregated product reviews from different on-line sources by analyzing some existing western companies. Following this the thesis demonstrates how to aggregate product reviews for an e-business website. During this thesis project we found that existing data mining techniques made it straight forward to collect reviews. These reviews were stored in a database and web applications can query this database to provide a user with a set of relevant product reviews. One of the important issues, just as with search engines is providing the relevant product reviews and determining what order they should be presented in. In our work we selected the reviews based upon matching the product (although in some cases there are ambiguities concerning if two products are actually identical or not) and ordering the matching reviews by date - with the most recent reviews present first. Some of the open questions that remain for the future are: (1) improving the matching - to avoid the ambiguity concerning if the reviews are about the same product or not and (2) determining if the availability of product reviews actually affect a Chinese user's decision to purchase a product. / I december 2007 uppgick antalet internetanvändare i Kina har ökat till 210 miljoner människor. Den årliga tillväxttakten nådde 53,3 procent 2008, med den genomsnittliga Antalet Internet-användare ökar för varje dag av 200.000 människor. Närvarande Kinas Internet befolkningen är något lägre än de 215 miljoner Internetanvändare i USA Staterna.[1] Trots den snabba tillväxten i den kinesiska ekonomin i den globala Internetmarknaden, Kinas e-handel inte följer det traditionella mönstret av handel, men i stället har utvecklats baserat på användarnas efterfrågan. Denna tillväxt har utvidgas till alla områden I Internet. I väst har expert recensioner visat sig vara en viktig del I användarens köpbeslut. Ju högre kvalitet på produkten recensioner som kunderna mottagna fler produkter de köper från on-line butiker. Eftersom antalet produkter och alternativen ökar, kinesiska kunderna behöver opersonlig, opartisk och detaljerade produkter recensioner. Denna avhandling fokuserar på on-line recensioner och hur de påverkar Kinesiska kundens köpbeslut.</p> E-handel är ett komplext system. Som en typisk modell för e-handel, vi undersöka ett Business to Consumer (B2C) on-line-försäljning plats och överväga ett antal faktorer; inklusive några till synes subtitle faktorer som kan påverka kundens småningom Beslutet att handla på webbplatsen. Uttryckligen detta examensarbete kommer att undersöka aggregerade recensioner från olika online-källor genom att analysera vissa befintliga västra företag. Efter den här avhandlingen visar hur samlade produkt recensioner för en e-affärer webbplats. Under detta examensarbete fann vi att befintliga data mining tekniker gjort det rakt fram för att samla recensioner. Dessa översyner har lagrats i en databas och webb program kan söka denna databas för att ge en användare med en rad relevanta product recensioner. En av de viktiga frågorna, precis som med sökmotorer är att tillhandahålla relevanta produkt recensioner och bestämma vilken ordning de ska presenteras i. vårt arbete har vi valt recensioner baserat på matchning produkten (men i vissa fall det finns oklarheter i fråga om två produkter verkligen identiska eller inte) och beställa matchande recensioner efter datum - med den senaste recensioner närvarande första. Några av de öppna frågorna som kvarstår för framtiden är: (1) förbättra matchning - För att undvika oklarheter rörande om Gästrecensionerna om samma produkt eller inte och (2) avgöra om det finns recensioner faktiskt påverka en kinesisk användarens val att köpa en produkt. Website data extraction Web content mining Page scraping Web parsing Communication Systems Kommunikationssystem
22	Critical Issues in the Processing of cDNA Microarray Images Jouenne, Vincent Y. 13 July 2001 (has links) Microarray technology enables simultaneous gene expression level monitoring for thousands of genes. While this technology has now been recognized as a powerful and cost-effective tool for large-scale analysis, the many systematic sources of experimental variations introduce inherent errors in the extracted data. Data is gathered by processing scanned images of microarray slides. Therefore robust image processing is particularly important and has a large impact on downstream analysis. The processing of the scanned images can be subdivided in three phases: gridding, segmentation and data extraction. To measure the gene expression levels, the processing of cDNA microarray images must overcome a large set of issues in these three phases that motivates this study. This study presents automatic gridding methods and compares their performances. Two segmentation techniques already used, the Seeded Region Growing Algorithm and the Mann-Whitney Test, are examined. We present limitations of these techniques. Finally, we studied the data extraction method used in MicroArray Suite (MS), a microarray analysis software, via synthetic images and explain its intricacies. / Master of Science Mann-Whitney Test Image Processing Segmentation Gene Expression Data Extraction Seeded Region Growing Algorithm Automatic Gridding
23	Exploring Data Extraction and Relation Identification Using Machine Learning : Utilizing Machine-Learning Techniques to Extract Relevant Information from Climate Reports Berger, William, Fooladi, Alex, Lindgren, Markus, Messo, Michel, Rosengren, Jonas, Rådmann, Lukas January 2023 (has links) Ensuring the accessibility of data from Swedish municipal climate reports is necessary for examining climate work in Sweden. Manual data extraction is time-consuming and prone to errors, necessitating automation of the process. This project presents machine-learning techniques that can be used to extract data and information from Swedish municipal climate plans, to improve the accessibility of climate data. The proposed solution involves recognizing entities in plain text and extracting predefined relations between these using Named Entity Recognition and Relation Extraction, respectively. The result of the project is a functioning prototype in the medical domain due to the lack of annotated climate datasets in Swedish. Nevertheless, the problem remained the same: how to effectively perform data extraction from reports using machine learning techniques. The presented prototype demonstrates the potential of automating data extraction from reports. These findings imply that the system could be adapted to handle climate reports when a sufficient dataset becomes available. / Tillgängliggörande av information som sammanställs i svenska kommunala klimatplaner är avgörande för att utvärdera och ifrågasätta klimatarbetet i Sverige. Manuell dataextraktion är tidskrävande och komplicerat, vilket understryker behovet av att automatisera processen. Detta projekt utforskar maskininlärningstekniker som kan användas för att extrahera data och information från de kommunala klimatplanerna. Den föreslagna lösningen utnyttjar Named Entity Recognition för att identifiera entiteter i text och Relation Extraction för att extrahera fördefinierade relationer mellan entiteterna. I brist på svenska annoterade dataset inom klimatdomänen, är resultatet av projektet en fungerande prototyp inom den medicinska domänen. Frågeställningen är således densamma, hur maskininlärning kan användas för att utföra dataextraktion på rapporter. Prototypen som presenteras visar potentialen i att automatisera denna typ av dataextrahering. Denna framgång antyder att modellen kan anpassas för att hantera klimatrapporter när ett adekvat dataset blir tillgängligt. Data extraction Named Entity Recognition Relation Extraction Machine Learning Engineering and Technology Teknik och teknologier
24	Ontology-Based Extraction of RDF Data from the World Wide Web Chartrand, Timothy Adam 05 March 2003 (has links) (PDF) The simplicity and proliferation of the World Wide Web (WWW) has taken the availability of information to an unprecedented level. The next generation of the Web, the Semantic Web, seeks to make information more usable by machines by introducing a more rigorous structure based on ontologies. One hinderance to the Semantic Web is the lack of existing semantically marked-up data. Until there is a critical mass of Semantic Web data, few people will develop and use Semantic Web applications. This project helps promote the Semantic Web by providing content. We apply existing information-extraction techniques, in particular, the BYU ontologybased data-extraction system, to extract information from the WWW based on a Semantic Web ontology to produce Semantic Web data with respect to that ontology. As an example of how the generated Semantic Web data can be used, we provide an application to browse the extracted data and the source documents together. In this sense, the extracted data is superimposed over or is an index over the source documents. Our experiments with ontologies in four application domains show that our approach can indeed extract Semantic Web data from the WWW with precision and recall similar to that achieved by the underlying information extraction system and make that data accessible to Semantic Web applications. RDF information extraction Semantic Web superimposed data extraction extraction DAML RDFS Computer Sciences
25	Schema Matching and Data Extraction over HTML Tables Tao, Cui 16 September 2003 (has links) (PDF) Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem for the case of mostly structured data in the form of HTML tables, based on document-independent extraction ontologies. The solution entails elements of table location and table understanding, data integration, and wrapper creation. Table location and understanding allows us to locate the table of interest, recognize attributes and values, pair attributes with values, and form records. Data-integration techniques allow us to match source records with a target schema. Ontologically specified wrappers allow us to extract data from source records into a target schema. Experimental results show that we can successfully map data of interest from source HTML tables with unknown structure to a given target database schema. We can thus "directly" query source data with unknown structure through a known target schema. HTML table ontology data extraction schema matching document-independent extraction data-integration techniques Computer Sciences
26	Multi agent system for web database processing, on data extraction from online social networks Abdulrahman, Ruqayya January 2012 (has links) In recent years, there has been a flood of continuously changing information from a variety of web resources such as web databases, web sites, web services and programs. Online Social Networks (OSNs) represent such a field where huge amounts of information are being posted online over time. Due to the nature of OSNs, which offer a productive source for qualitative and quantitative personal information, researchers from various disciplines contribute to developing methods for extracting data from OSNs. However, there is limited research which addresses extracting data automatically. To the best of the author's knowledge, there is no research which focuses on tracking the real time changes of information retrieved from OSN profiles over time and this motivated the present work. This thesis presents different approaches for automated Data Extraction (DE) from OSN: crawler, parser, Multi Agent System (MAS) and Application Programming Interface (API). Initially, a parser was implemented as a centralized system to traverse the OSN graph and extract the profile's attributes and list of friends from Myspace, the top OSN at that time, by parsing the Myspace profiles and extracting the relevant tokens from the parsed HTML source files. A Breadth First Search (BFS) algorithm was used to travel across the generated OSN friendship graph in order to select the next profile for parsing. The approach was implemented and tested on two types of friends: top friends and all friends. In case of top friends, 500 seed profiles have been visited; 298 public profiles were parsed to get 2197 top friends' profiles and 2747 friendship edges, while in case of all friends, 250 public profiles have been parsed to extract 10,196 friends' profiles and 17,223 friendship edges. This approach has two main limitations. The system is designed as a centralized system that controlled and retrieved information of each user's profile just once. This means that the extraction process will stop if the system fails to process one of the profiles; either the seed profile (first profile to be crawled) or its friends. To overcome this problem, an Online Social Network Retrieval System (OSNRS) is proposed to decentralize the DE process from OSN through using MAS. The novelty of OSNRS is its ability to monitor profiles continuously over time. The second challenge is that the parser had to be modified to cope with changes in the profiles' structure. To overcome this problem, the proposed OSNRS is improved through use of an API tool to enable OSNRS agents to obtain the required fields of an OSN profile despite modifications in the representation of the profile's source web pages. The experimental work shows that using API and MAS simplifies and speeds up the process of tracking a profile's history. It also helps security personnel, parents, guardians, social workers and marketers in understanding the dynamic behaviour of OSN users. This thesis proposes solutions for web database processing on data extraction from OSNs by the use of parser and MAS and discusses the limitations and improvements. 004
27	A distributed data extraction and visualisation service for wireless sensor networks Hammoudeh, Mohammad January 2009 (has links) With the increase in applications of wireless sensor networks, data extraction and visualisation have become a key issue to develop and operate these networks. Wireless sensor networks typically gather data at a discrete number of locations. By bestowing the ability to predict inter-node values upon the network, it is proposed that it will become possible to build applications that are unaware of the concrete reality of sparse data. The aim of this thesis is to develop a service for maximising information return from large scale wireless sensor networks. This aim will be achieved through the development of a distributed information extraction and visualisation service called the mapping service. In the distributed mapping service, groups of network nodes cooperate to produce local maps which are cached and merged at a sink node, producing a map of the global network. Such a service would greatly simplify the production of higher-level information-rich representations suitable for informing other network services and the delivery of field information visualisations. The proposed distributed mapping service utilises a blend of both inductive and deductive models to successfully map sense data and the universal physical principles. It utilises the special characteristics of the application domain to render visualisations in a map format that are a precise reflection of the concrete reality. This service is suitable for visualising an arbitrary number of sense modalities. It is capable of visualising from multiple independent types of the sense data to overcome the limitations of generating visualisations from a single type of a sense modality. Furthermore, the proposed mapping service responds to changes in the environmental conditions that may impact the visualisation performance by continuously updating the application domain model in a distributed manner. Finally, a newdistributed self-adaptation algorithm, Virtual Congress Algorithm,which is based on the concept of virtual congress is proposed, with the goal of saving more power and generating more accurate data visualisation. 004.6
28	Duomenų gavimas iš daugialypių šaltinių ir jų struktūrizavimas / Data Mining from Multiple Sources and Structurization Barauskas, Antanas 19 June 2014 (has links) Šio darbo idėja yra Išgauti-Pertvarkyti-Įkelti (angl. ETL) principu veikiančios sistemos sukūrimas. Sistema išgauna duomenis iš skirtingo tipo šaltinių, juos tinkamai pertvarko ir tik tuomet įkelia į parinktą saugojimo vietą. Išnagrinėti pagrindiniai duomenų gavimo būdai ir populiariausi šiuo metu ETL įrankiai. Sukurta debesų kompiuterija paremtos daugiakomponentinės duomenų gavimo iš daugialypių šaltinių ir jų struktūrizavimo vieningu formatu sistemos architektūra ir prototipas. Skirtingai nuo duomenis kaupiančių sistemų, ši sistema duomenis išgauna tik tuomet, kai jie reikalingi. Duomenų saugojimui naudojama grafu paremta duomenų bazė, kuri leidžia saugoti ne tik duomenis bet ir jų tarpusavio ryšių informaciją. Darbo apimtis: 48 puslapiai, 19 paveikslėlių, 10 lentelių ir 30 informacijos šaltinių. / The aim of this work is to create ETL (Extract-Transform-Load) system for data extraction from different types of data sources, proper transformation of the extracted data and loading the transformed data into the selected place of storage. The main techniques of data extraction and the most popular ETL tools available today have been analyzed. An architectural solution based on cloud computing, as well as, a prototype of the system for data extraction from multiple sources and data structurization have been created. Unlike the traditional data storing - based systems, the proposed system allows to extract data only in case it is needed for analysis. The graph database employed for data storage enables to store not only the data, but also the information about the relations of the entities. Structure: 48 pages, 19 figures, 10 tables and 30 references. Informatics Duomenų gavimas Duomenų pertvarkymas Duomenų įkėlimas Duomenų saugykla Data extraction Data transformation Data loading Data warehouse
29	Structured Data Extraction from Unstructured Text / Structured Data Extraction from Unstructured Text Kóša, Peter January 2013 (has links) Title: Structured Data Extraction from Unstructured Text Author: Bc. Peter Kóša Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D., Department of Software Engineering Abstract: In the last 20 years, there has been an ever-growing amount of information present on the Internet and in published texts. However, this information is often in a non-structured format and this causes various problems such as the inability to efficiently search in diverse collections of texts (medical reports, ads, etc.). To overcome these problems, we need efficient tools capable of automatic processing, extracting the important information and storing of these results in some form for later reuse. The purpose of this thesis is to compare existing solutions as well as to compare them with our solution, which was created in the scope of software project SemJob. The SemJob project is introduced and the reader can therefore obtain knowledge about its inner structure and workings. Keywords: structured data extraction, extraction rules, (semi)automatic wrapper induction
30	Extrakcia štruktúrovaných dát z neštruktúrovaného textu / Structured Data Extraction from Unstructured Text Kóša, Peter January 2013 (has links) Title: Structured Data Extraction from Unstructured Text Author: Bc. Peter Kóša Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D., Department of Software Engineering Abstract: In the last 20 years, there has been an ever-growing amount of information present on the Internet and in published texts. However, this information is often in a non-structured format and this causes various problems such as the inability to efficiently search in diverse collections of texts (medical reports, ads, etc.). To overcome these problems, we need efficient tools capable of automatic processing, extracting the important information and storing of these results in some form for later reuse. The purpose of this thesis is to compare existing solutions as well as to compare them with our solution, which was created in the scope of software project SemJob. The SemJob project is introduced and the reader can therefore obtain knowledge about its inner structure and workings. Keywords: structured data extraction, extraction rules, ontologies, (semi)automatic wrapper induction

Search results