241 |
Junções por similaridade com expressões complexas em ambientes distribuídos / Set similarity joins with complex expressions on distributed platformsOliveira, Diego Junior do Carmo 31 August 2018 (has links)
Submitted by Liliane Ferreira (ljuvencia30@gmail.com) on 2018-10-01T13:06:03Z
No. of bitstreams: 2
Dissertação - Diego Junior do Carmo Oliveira - 2018.pdf: 2678764 bytes, checksum: c32f645ce8abd8a764bec1993d41337b (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2018-10-01T14:48:43Z (GMT) No. of bitstreams: 2
Dissertação - Diego Junior do Carmo Oliveira - 2018.pdf: 2678764 bytes, checksum: c32f645ce8abd8a764bec1993d41337b (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-10-01T14:48:43Z (GMT). No. of bitstreams: 2
Dissertação - Diego Junior do Carmo Oliveira - 2018.pdf: 2678764 bytes, checksum: c32f645ce8abd8a764bec1993d41337b (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-08-31 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / A recurrent problem that degrades the quality of the information in databases is the presence
of duplicates, i.e., multiple representations of the same real-world entity. Despite being
computationally expensive, the use of similarity operations is fundamental to identify
duplicates. Furthermore, real-world data is typically composed of different attributes and each
attribute represents a distinct type of information. The application of complex similarity
expressions is important in this context because they allow considering the importance of
each attribute in the similarity evaluation. However, due to a large amount of data present in
Big Data applications, it has become crucial to perform these operations in parallel and
distributed processing environments. In order to solve such problems of great relevance to
organizations, this work proposes a novel strategy to identify duplicates in textual data by
using similarity joins with complex expressions in a distributed environment. / Um problema recorrente que degrada a qualidade das informações em banco de dados é a
presença de duplicatas, isto é, múltiplas representações de uma mesma entidade do mundo
real. Apesar de ser computacionalmente oneroso, para realizar a identificação de duplicatas é
fundamental o emprego operações de similaridade. Além disso, os dados atuais são
tipicamente compostos por diferentes atributos, cada um destes contendo um tipo distinto de
informação. A aplicação de expressões de similaridade complexas é importante neste contexto
uma vez que permitem considerar a importância de cada atributo na avaliação da
similaridade. No entanto, em virtude da grande quantidade de dados presentes em aplicações
Big Data, fez-se necessário realizar o processamento destas operações em ambientes de
programação paralelo ou distribuído. Visando solucionar estes problemas de grande relevância
para as organizações, este trabalho propõe uma nova estratégia de processamento para identificação de duplicatas em dados textuais utilizando junções por similaridade com
expressões complexas em um ambiente distribuído.
|
242 |
Cloud Based System Integration : System Integration between Salesforce.com and Web-based ERP System using Apache Camel / Molnbaserad systemintegration : Systemintegration mellan Salesforce.com och ett webb-baserat ERPsystem med Apache CamelSöder, Mikael, Johansson, Henrik January 2017 (has links)
In an era of technological growth, cloud computing is one of the hottest topics on the market. This, along with the overall increased use of digital systems, requires solid integration options to be developed. Redpill Linpro recognizes this and has developed a cloud-based Integration Platform as a Service (IPaaS) solution called Connectivity Engine. New techniques like this can however seem very abstract to a customer, something which a demo application could help substantiate. To aid in this issue we have developed a web-based Enterprise Resource Planning (ERP) system as well as an Integration Application to connect the ERP system with Salesforce.com in a bidirectional integration. With the use of Connectivity Engine this can be hosted in the cloud and be easily accessible. The project has been a success for Redpill Linpro as well as the authors. A solid way to demonstrate the abilities of Connectivity Engine has been developed along with descriptive documentation for any sales representative assigned to pitch the platform.
|
243 |
Developing and evaluating recommender systemsFadaeian, Vahid January 2015 (has links)
In recent years, web has experienced a tremendous growth concerning users and content. As a result information overload problem has always been always one of the main discussion topics. The aim has always been to find the most desired solution in order to help users when they find it increasingly difficult to locate the accurate information at the right time. Recommender systems developed to address this need by helping users to find relevant information among huge amounts of data and they have now become a ubiquitous attribute to many websites. A recommender system guides users in their decisions by predicting their preferences while they are searching, shopping or generally surfing, based on their preferences collected from past as well as the preferences of other users. Until now, recommender systems has been vastly used in almost all professional e-commerce websites, selling or offering different variety of items from movies and music to clothes and foods. This thesis will present and explore different recommender system algorithms such as User-User Collaborative and Item-Item Collaborative filtering using open source library Apache mahout. Algorithms will be developed in order to evaluate the performance of these collaborative filtering algorithms. They will be compared and their performance will be measured in detail by using evaluation metrics such as RMSE and MAE and similarity algorithms such as Pearson and Loglikelihood.
|
244 |
Semantic Web mechanisms in Cloud EnvironmentHaddadi Makhsous, Saeed January 2014 (has links)
Virtual Private Ontology Server (VPOS) is a middleware software with focus on ontologies (semantic models). VPOS is offering a smart way to its users how to access relevant part of ontology dependent on their context. The user context can be expertise level or level of experience or job position in a hierarchy structure. Instead of having numerous numbers of ontologies associated to different user contexts, VPOS keeps only one ontology but offers sub-ontologies to users on the basis of their context. VPOS also supports reasoning to infer new consequences out of assertions stated in the ontology. These consequences are also visible for certain contexts which have access to enough assertions inside ontology to be able to deduct them. There are some issues within current implementation of VPOS. The application uses the random-access memory of local machine for loading the ontology which could be the cause of scalability issue when ontology size exceeds memory space. Also assuming that each user of VPOS holds her own instance of application it might result into maintainability issues such as inconsistency between ontologies of different users and waste of computational resources. This thesis project is about to find some practical solutions to solve the issues of current implementation, first by upgrading the architecture of application using new framework to address scalability issue and then moving to cloud addressing maintainability issues. The final production of this thesis project would be Cloud-VPOS which is an application made to deal with semantic web mechanisms and function over cloud plat-form. Cloud-VPOS would be an application where semantic web meets cloud computing by employing semantic web mechanisms as cloud services. / ebbits project (Enabling business-based Internet of Things and Services)
|
245 |
Univerzita a její areály – informace a navigace (mobilní aplikace) / The university and its campuses – information and navigation (mobile application)Cupák, Michal January 2017 (has links)
This thesis deals with development of multiplatform mobile applications. Part of the thesis is the design and implementation of a mobile application to facilitate the orientation of people in a large university environment. The application is designed for Android and iOS mobile operating systems and is implemented in the Ionic development environment. The application provides basic information about individual buildings of the university, displays buildings on the map and allows the user to navigate these buildings.
|
246 |
Paralelní a distribuované zpracování rozsáhlých textových dat / Parallel and Distributed Processing of Large Textual DataMatoušek, Martin January 2017 (has links)
This master thesis deals with task scheduling and allocation of resources in parallel and distributed enviroment. Thesis subscribes design and implementation of application for executeing of data processing with optimal resources usage.
|
247 |
Elektronická informační tabule LCD / Electronic notice board LCDBureš, Michal January 2019 (has links)
This diploma thesis deals with the research of existing solutions of electronic information systems. The purpose of this survey is to inspire what is offered to users of these professional systems, or what users expect from these systems. Based on the acquired knowledge, the thesis also deals with the design of its own system, which can serve as another alternative to professional solutions of companies on the market, in the sense of the concept of solution of the given task. After the design of the system, the thesis deals chronologically with the selected "tools" used for the actual implementation of the assigned task Electronic Information Boards. Both hardware and software parts of the task, which form the majority of this thesis. At the end of the work are presented practical results.
|
248 |
Optimalizace čtení dat z distribuované databáze / Optimization of data reading from a distributed databaseKozlovský, Jiří January 2019 (has links)
This thesis is focused on optimization of data reading from distributed NoSQL database Apache HBase with regards to the desired data granularity. The assignment was created as a product request from Seznam.cz, a.s. the Reklama division, Sklik.cz cost center to improve user experience by making filtering of aggregated statistical data available to advertiser web application users for the purpose of viewing entity performance history.
|
249 |
Sharing the love : a generic socket API for Hadoop MapreduceYee, Adam J. 01 January 2011 (has links)
Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively
|
250 |
Obrazová dokumentace v nemocničním informačním systému / Image documentation in hospital information systemRášo, Tomáš January 2008 (has links)
This thesis describes the hospital information system (HIS) CLINICOM, focusing on the department of radiology. It analyses the problems linked with Internet transport of patient image documentation and it provides instructions how to create remote access interface on the department of radiology. It gives also a detailed guide describing installation and components configuration of HIS CLINICOM.
|
Page generated in 0.0202 seconds