Global ETD Search

971	Full-fledged semantic indexing and querying model designed for seamless integration in legacy RDBMS Tekli, Joe, Chbeir, Richard, Traina, Agma J.M., Traina, Caetano, Yetongnon, Kokou, Ibanez, Carlos Raymundo, Al Assad, Marc, Kallas, Christian 09 1900 (has links) El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / In the past decade, there has been an increasing need for semantic-aware data search and indexing in textual (structured and NoSQL) databases, as full-text search systems became available to non-experts where users have no knowledge about the data being searched and often formulate query keywords which are different from those used by the authors in indexing relevant documents, thus producing noisy and sometimes irrelevant results. In this paper, we address the problem of semantic-aware querying and provide a general framework for modeling and processing semantic-based keyword queries in textual databases, i.e., considering the lexical and semantic similarities/disparities when matching user query and data index terms. To do so, we design and construct a semantic-aware inverted index structure called SemIndex, extending the standard inverted index by constructing a tightly coupled inverted index graph that combines two main resources: a semantic network and a standard inverted index on a collection of textual data. We then provide a general keyword query model with specially tailored query processing algorithms built on top of SemIndex, in order to produce semantic-aware results, allowing the user to choose the results' semantic coverage and expressiveness based on her needs. To investigate the practicality and effectiveness of SemIndex, we discuss its physical design within a standard commercial RDBMS allowing to create, store, and query its graph structure, thus enabling the system to easily scale up and handle large volumes of data. We have conducted a battery of experiments to test the performance of SemIndex, evaluating its construction time, storage size, query processing time, and result quality, in comparison with legacy inverted index. Results highlight both the effectiveness and scalability of our approach. / This study is partly funded by the National Council for Scientific Research - Lebanon (CNRS-L), by the Lebanese American University (LAU), and the Research Support Foundation of the State of Sao Paulo ( FAPESP ). Appendix SemIndex Weighting Scheme We propose a set of weighting functions to assign weight scores to SemIndex entries, including: index nodes , index edges, data nodes , and data edges . The weighting functions are used to select and rank semantically relevant results w.r.t. the user's query (cf. SemIndex query processing in Section 5). Other weight functions could be later added to cater to the index designer's needs. / Revisión por pares Inverted index Digital storage Semantic queries Search engines Textual database
972	File integrity checking Motara, Yusuf Moosa January 2006 (has links) This thesis looks at file execution as an attack vector that leads to the execution of unauthorized code. File integrity checking is examined as a means of removing this attack vector, and the design, implementation, and evaluation of a best-of-breed file integrity checker for the Linux operating system is undertaken. We conclude that the resultant file integrity checker does succeed in removing file execution as an attack vector, does so at a computational cost that is negligible, and displays innovative and useful features that are not currently found in any other Linux file integrity checker. Linux Operating systems (Computers) Database design Computer security
973	[en] DATA MINING WITH ROUGH SETS TECHNIQUES / [pt] MINERAÇÃO DE DADOS COM TÉCNICAS DE ROUGH SETS DANTE JOSE ALEXANDRE CID 13 October 2005 (has links) [pt] Esta dissertação investiga a utilização de Rough Sets no processo de descoberta de conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). O objetivo do trabalho foi avaliar o desempenho da técnica de Rough Sets na tarefa de Classificação de Dados. A Classificação é a tarefa da fase de Mineração de Dados que consiste na descoberta de regras de decisão, ou regras de inferência, que melhor representem um grupo de registros do banco de dados. O trabalho consistiu de cinco etapas principais: estudo sobre o processo de KDD; estudo sobre as técnicas de Rough Sets aplicadas à mineração de dados; análise de ferramentas de mineração de dados do mercado; evolução do projeto Bramining; e a realização de alguns estudos de caso para avaliar o Bramining. O estudo sobre o caso KDD abrangeu todas as suas fases: transformação, limpeza, seleção, mineração de dados e pós-processamento. O resultado obtido serviu de base para o aprimoramento do projeto Bramining. O estudo sobre as técnicas de Rough Sets envolveu a pesquisa de seus conceitos e sua aplicabilidade no contexto de KDD. A teoria de Rough Sets foi apresentada por Zdzislaw Pawlak no início dos anos 80 como uma abordagem matemática para a análise de dados vagos e imprecisos. Este estudo permitiu sua aplicação na ferramenta de mineração de dados desenvolvida. A análise de ferramentas de mineração de dados do mercado abrangeu o estudo e testes de aplicativos baseados em diferentes técnicas, enriquecimento a base de comparação utilizada na avaliação da pesquisa. A evolução do projeto Bramining consistiu no aprimoramento do ambiente KDD desenvolvido em estudos anteriores, passando a incluir técnica de Rough Sets em seu escopo. Os estudos de caso foram conduzidos paralelamente com o uso de Bramining e de outras ferramentas existentes, para efeito de comparação. Os índices apresentados pelo Bramining nos estudos de caso foram considerados, de forma geral, equivalentes aos do software comercial, tendo ambos obtidos regras de boa qualidade na maioria dos casos. O Bramining, entretanto, mostrou-se mais completo para o processo de KDD, graças às diversas opções nele disponíveis para preparação dos dados antes da fase de mineração. Os resultados obtidos comprovaram, através da aplicação desenvolvida, a adequação dos conceitos de Rough Sets à tarefa de classificação de dados. Alguns pontos frágeis da técnica foram identificados, como a necessidade de um mecanismo de apoio para a redução de atributos e a dificuldade em trabalhar com atributos de domínio contínuo. Porém, ao se inserir a técnica em um ambiente mais completo de KDD, como o Bramining, estas deficiências foram sanadas. As opções de preparação da base que o Bramining disponibiliza ao usuário para executar, em particular, a redução e a codificação de atributos permitem deixar os dados em estado adequado à aplicação de Rough Sets. A mineração de dados é uma questão bastante relevante nos dias atuais, e muitos métodos têm sido propostos para as diversas tarefas que dizem respeito a esta questão. A teoria de Rough Sets não mostrou significativas vantagens ou desvantagens em relação a outras técnicas já consagradas, mas foi de grande valia comprovar que há caminhos alternativos para o processo de descoberta de conhecimento. / [en] This dissertation investigates the application of Rough Sets to the process of KDD - Knowledge Discovery in Databases. The main goal of the work was to evaluate the performance of Rough Sets techniques in solving the classification problem. Classification is a task of the Data Mining step in KDD Process that performs the discovery of decision rules that best represent a group of registers in a database. The work had five major steps: study of the KDD process; study of Rough Sets techniques applied to data mining; evaluation of existing data mining tools; development of Bramining project; and execution of some case studies to evaluate Bramining. The study of KDD process included all its steps: transformation, cleaning, selection, data mining and post- processing. The results obtained served as a basis to the enhamcement of Bramining. The study of Rough Sets techniques included the research of theory´s concepts and its applicability at KDD context. The Rough Sets tehory has been introduced by Zdzislaw Pawlak in the early 80´s as a mathematical approach to the analysis of vague and uncertain data. This research made possible the implementation of the technique under the environment of the developed tool. The analysis of existing data mining tools included studying and testing of software based on different techniques, enriching the background used in the evaluation of the research. The evolution of Bramining Project consisted in the enhancement of the KDD environment developed in previous works, including the addition of Rough Sets techniques. The case studies were performed simultaneously with Bramining and a commercial minig tool, for comparison reasons. The quality of the knowledge generated by Bramining was considered equivalent to the results of commercial tool, both providing good decision rules for most of the cases. Nevertheless, Bramining proved to be more adapted to the complete KDD process, thanks to the many available features to prepare data to data mining step. The results achieved through the developed application proved the suitability of Rough Sets concepts to the data classification task. Some weaknesses of the technique were identified, like the need of a previous attribute reduction and the inability to deal with continuous domain data. But as the technique has been inserted in a more complete KDD environment like the Bramining Project, those weaknesses ceased to exist. The features of data preparation available in Bramining environment, particularly the reduction and attribute codification options, enable the user to have the database fairly adapted to the use of Rough Sets algorithms. Data mining is a very relevant issue in present days and many methods have been proposed to the different tasks involved in it. Compared to other techniques, Rough Sets Theory did not bring significant advantages or disadvantages to the process, but it has been of great value to show there are alternate ways to knowledge discovery. [pt] MINERACAO DE DADOS [en] DATA MINING [pt] BANCO DE DADOS [en] DATABASE
974	Estudo dos constituintes químicos dos óleos voláteis de espécies de Eupatorium nativas do Rio Grande do Sul e construção de banco de dados de lactonas sesquiterpênicas / Study of chemical constituents of essential oils from Eupatorium species native in Rio Grande do Sul State and framing of a sesquiterpene lactone database Souza, Tiago Juliano Tasso de January 2014 (has links) O gênero Eupatorium L. (Asteraceae) apresenta muitas espécies comumente encontradas no Rio Grande do Sul e se caracteriza pela presença de óleos voláteis e outros derivados terpenoídicos entre seus metabólitos secundários. Neste trabalho, o óleo volátil de oito espécies de Eupatorium nativas do estado foi extraído por hidrodestilação em aparelho tipo Clevenger e sua composição química foi avaliada por cromatografia gasosa acoplada a espectrometria de massas (CG/EM). As espécies analisadas foram: E. inulifolium Kunth, E. polystachyum DC, E. picturatum Malme, E. tremulum Hook. & Arn, E. ivifolium L., E. laevigatum Lam., E. casarettoi (B.L.Rob.) Steyerm., E. itatiayense Hieron. e E. gaudichaudianum DC. Os óleos voláteis de partes aéreas de E. tremulum analisados antes, durante e após a floração, aplicando-se análises multivariadas para identificar compostos com variação significativa em cada estágio de desenvolvimento da espécie. Os extratos diclorometano das mesmas espécies foram avaliados na busca de lactonas sesquiterpênicas utilizando como metodologia a mensuração da absorção característica de lactonas no espectro Infravermelho e aplicação de métodos de extração específicos para essa classe de compostos. Foi realizado fracionamento dos óleos voláteis em cromatografia em coluna aberta (CC), cromatografia flash (CC flash) e cromatografia líquida de média pressão (MPLC) com monitoramento das frações por CG/EM para o isolamento de compostos não identificados apenas por seu índice de retenção linear e espectro de massas. Utilizando essa abordagem foi isolado e identificado por seus dados espectrais de RMN um furanossesquiterpeno, 3-oxoverboccidentafurano, descrito pela primeira vez em Eupatorium. Também foi realizado o fracionamento de extratos diclorometano (E. casaretoi e E. inulifolium) em CC flash e MPLC, com monitoramento das frações por CG/EM e cromatografia líquida de alta eficiência acoplada a detector de arranjo de diodos (HPLC-DAD) na tentativa de isolamento de lactonas sesquiterpênicas. Um composto foi isolado e sua completa elucidação estrutural está em andamento. Considerando as dificuldades encontradas no processo de elucidação estrutural da molécula isolada do extrato de E. inulifolium, mesmo com a abundância de dados de ressonância magnética nuclear (RMN) 13C disponíveis para consulta, e que não foi encontrada uma base digital dedicada a lactonas sesquiterpênicas reunindo esses dados em uma plataforma facilmente acessível e utilizável, foi realizada a transposição de uma base de dados da literatura para um formato digital amigável (baseada em Excel®), com a visualização e comparação de dados facilitadas graças à utilização de formas gráficas em lugar das formas tabulares de apresentação dos espectros comuns na literatura. / The genus Eupatorium L. (Asteraceae) contains several species that are common in Rio Grande do Sul State and it is characterized by essential oils and other terpenic derivatives among its secondary metabolites. Through this survey, it was evaluated the chemical composition of the essential oil from eight species of Eupatorium natives to the State and their chemical composition was analysed by gas chromatogaphy coupled to mass spectrometry. The following species were studied: E. inulifolium Kunth, E. polystachyum DC, E. picturatum Malme, E. tremulum Hook. & Arn, E. ivifolium L., E. laevigatum Lam., E. casarettoi (B.L.Rob.) Steyerm., E. itatiayense Hieron. and E. gaudichaudianum DC. Volatile oils from aereal parts of E. tremulum were analysed before, during and after blooming. Multivariate analyses were employed in order to identify compounds showing significant variation between each developmental stage. Dichloromethane extracts of these same species were evaluated for sesquiterpene lactones by measuring the typical infrared absorption for lactones and applying extraction methods directed to this class of compounds. The essential oils were fractionated by CC, CC flash and MPLC, with fractions being monitored by GC/MS for isolation of unknown compounds, whose retention index and mass spectrum were not enough for identification. Using this approach a furansesquiterpene, 3-oxo-verboccidentafuran, described for the first time for Eupatorium, was isolated and identified based on NMR spectral data. Dichloromethane extracts (E. casarettoi and E. inulifolium) were also fractionated by CC flash and MPLC, fractions were monitored by GC/MS and HPLC-DAD aiming the isolation of sesquiterpene lactones One compound was isolated and its structural elucidation is ongoing. Considering difficulties faced in the process of structural elucidation of the compound isolated from the extract of E. inulifolium, even with plenty of NMR 13C data available for search, and considering that no digital database dedicated to sesquiterpene lactones was found, that gathers these data in an easily accessible and user friendly platform, we performed the transposition of a bibliographic database to a more user friendly digital format (baseada em Excel®), with enhanced data visualization and comparison, thanks to the use of graphical rather than the tabular spectral presentation usually found in the literature. Oleos volateis Eupatorium Essential oil Eupatorium Multivariate analysis Database NMR
975	Data Warehouse na prática : fundamentos e implantação / Date warehouse in practice: foundations and implementation Ferreira, Rafael Gastão Coimbra January 2002 (has links) Embora o conceito de Data Warehouse (doravante abreviado DW), em suas várias formas, continue atraindo interesse, muitos projetos de DW não estão gerando os benefícios esperados e muitos estão provando ser excessivamente caro de desenvolver e manter. O presente trabalho visa organizar os conceitos de DW através de uma revisão bibliográfica, discutindo seu real benefício e também de como perceber este benefício a um custo que é aceitável ao empreendimento. Em particular são analisadas metodologias que servirão de embasamento para a proposta de uma metodologia de projeto de DW, que será aplicada a um estudo de caso real para a Cia Zaffari, levando em conta critérios que são encontrados atualmente no desenvolvimento de um Data Warehouse, um subconjunto das quais será tratado no trabalho de dissertação. / Although the concept of Data Warehouse (DW), in its various forms, still attracting interest, many DW projects are not generating the benefits expected and many are proving to be too expensive to develop and to keep. This work organizes the concepts of DW through a literature review, discussing its real benefit and how to realize this benefit at a cost that is acceptable to the company. In particular methods are discussed to serve as a foundation for proposing a design methodology for DW, which will be applied to a real case study for the CIA Zaffari, taking into account criteria that are currently found in developing a data warehouse, a subset of which will be treated in the dissertation. Armazenamento : Dados Recuperacao : Informacao OLAP Data warehouse Database OLAP
976	Data Warehouse na prática : fundamentos e implantação / Date warehouse in practice: foundations and implementation Ferreira, Rafael Gastão Coimbra January 2002 (has links) Embora o conceito de Data Warehouse (doravante abreviado DW), em suas várias formas, continue atraindo interesse, muitos projetos de DW não estão gerando os benefícios esperados e muitos estão provando ser excessivamente caro de desenvolver e manter. O presente trabalho visa organizar os conceitos de DW através de uma revisão bibliográfica, discutindo seu real benefício e também de como perceber este benefício a um custo que é aceitável ao empreendimento. Em particular são analisadas metodologias que servirão de embasamento para a proposta de uma metodologia de projeto de DW, que será aplicada a um estudo de caso real para a Cia Zaffari, levando em conta critérios que são encontrados atualmente no desenvolvimento de um Data Warehouse, um subconjunto das quais será tratado no trabalho de dissertação. / Although the concept of Data Warehouse (DW), in its various forms, still attracting interest, many DW projects are not generating the benefits expected and many are proving to be too expensive to develop and to keep. This work organizes the concepts of DW through a literature review, discussing its real benefit and how to realize this benefit at a cost that is acceptable to the company. In particular methods are discussed to serve as a foundation for proposing a design methodology for DW, which will be applied to a real case study for the CIA Zaffari, taking into account criteria that are currently found in developing a data warehouse, a subset of which will be treated in the dissertation. Armazenamento : Dados Recuperacao : Informacao OLAP Data warehouse Database OLAP
977	Predicting Minimum Control Speed on the Ground (VMCG) and Minimum Control Airspeed (VMCA) of Engine Inoperative Flight Using Aerodynamic Database and Propulsion Database Generators January 2016 (has links) abstract: There are many computer aided engineering tools and software used by aerospace engineers to design and predict specific parameters of an airplane. These tools help a design engineer predict and calculate such parameters such as lift, drag, pitching moment, takeoff range, maximum takeoff weight, maximum flight range and much more. However, there are very limited ways to predict and calculate the minimum control speeds of an airplane in engine inoperative flight. There are simple solutions, as well as complicated solutions, yet there is neither standard technique nor consistency throughout the aerospace industry. To further complicate this subject, airplane designers have the option of using an Automatic Thrust Control System (ATCS), which directly alters the minimum control speeds of an airplane. This work addresses this issue with a tool used to predict and calculate the Minimum Control Speed on the Ground (VMCG) as well as the Minimum Control Airspeed (VMCA) of any existing or design-stage airplane. With simple line art of an airplane, a program called VORLAX is used to generate an aerodynamic database used to calculate the stability derivatives of an airplane. Using another program called Numerical Propulsion System Simulation (NPSS), a propulsion database is generated to use with the aerodynamic database to calculate both VMCG and VMCA. This tool was tested using two airplanes, the Airbus A320 and the Lockheed Martin C130J-30 Super Hercules. The A320 does not use an Automatic Thrust Control System (ATCS), whereas the C130J-30 does use an ATCS. The tool was able to properly calculate and match known values of VMCG and VMCA for both of the airplanes. The fact that this tool was able to calculate the known values of VMCG and VMCA for both airplanes means that this tool would be able to predict the VMCG and VMCA of an airplane in the preliminary stages of design. This would allow design engineers the ability to use an Automatic Thrust Control System (ATCS) as part of the design of an airplane and still have the ability to predict the VMCG and VMCA of the airplane. / Dissertation/Thesis / Masters Thesis Aerospace Engineering 2016 Aerospace engineering Algorithm Control Database Minimum Pilot Speed
978	Semantic Keyword Search on Large-Scale Semi-Structured Data January 2016 (has links) abstract: Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet. First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied. Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016 Computer science Big Data Database Keyword Search MapReduce Semantic
979	Tool for querying the National Household Travel Survey data Rathore, Akash January 1900 (has links) Master of Science / Department of Computer Science / Doina Caragea / The goal of the project is to create a database for storing the National Household Travel Survey (NHTS) data, and a user interface to query the database. Currently, the survey data is stored in excel files in the CSV format, which makes it hard to perform complex analyses over the data. Analyses of interest to transportation community include comparisons of the trips made by urban household to those made by rural household, finding the average trip time spent based on ethnicity, the total travel time of a particular household, the preferred vehicle by a specific household, average time spent per shopping trip, etc. The tool designed for the purpose of querying the NHTS database is a Python-based Web application. Django is used as the Web framework for this project and PostgreSQL is used for the back-end purpose. The user interface consists of various drop-down lists, text-boxes, buttons and other user interface components that facilitate querying the database and presenting the results in formats that allow easy interpretation. FusionCharts Django-Wrapper and FusionCharts Jquery-Plugin are used to visualize the data in the chart form. A Codebook of the NHTS dataset is also linked for the reference purpose at any point for the user. The tool built in the project allows the user to get a deeper understanding of the data, not only by plotting the data in the form of line charts, bar charts, two column graph, but also by providing the results of the queries in the CSV format for further analysis. NHTS National Household Travel Survey Database Django Python FusionCharts
980	Roteamento de consultas em banco de dados peer-to-peer utilizando colônias de formigas e ontologias Costa, Leandro Rincon [UNESP] 02 August 2009 (has links) (PDF) Made available in DSpace on 2014-06-11T19:29:40Z (GMT). No. of bitstreams: 0 Previous issue date: 2009-08-02Bitstream added on 2014-06-13T19:59:30Z : No. of bitstreams: 1 costa_lr_me_sjrp.pdf: 834768 bytes, checksum: 1d62bcf9978c835209f0c1a8b4cedaef (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Sistemas baseados em redes peer-to-peer come caram a se popularizar nos anos 90 e, desde então, grandes avan cos e novas aplicações têm sido desenvolvidas aproveitando as caracter sticas deste tipo de rede de computadores. Inicialmente, tais redes eram utilizadas apenas em aplicações simples como o compartilhamento de arquivos, hoje, por em, encontram-se em aplicaçãoes com grau de complexidade cada vez maior. Dentre estes sistemas mais recentes, destaca-se o compartilhamento de informações armazenadas em bancos de dados, um segmento em franco desenvolvimento. Em bancos de dados peer-to-peer, cria-se uma base de conhecimento rica e amplamente distribu da, baseada no compartilhamento de informações semanticamente relacionadas, por em sintaticamente heterogêneas. Um dos desa os desta categoria de aplicações e garantir uma forma e ciente para a busca de informações sem comprometer a autonomia de cada n o e a exibilidade da rede. Neste trabalho explora-se este desafio e apresenta-se uma proposta de suporte as buscas por meio da otimização dos caminhos, buscando reduzir o n umero de mensagens enviadas na rede sem afetar significativamente o n umero de respostas obtidas por consulta. Para tal tarefa propõe-se uma estrat egia baseada em conceitos do algoritmo de colônia de formigas e classicação das informações utilizando ontologias. Com isso foi possível adicionar o suporte semântico como facilidade na execução do processo de busca em bancos de dados peer-to-peer, al em de reduzir o tráfego de mensagens e permitir inclusive que mais resultados sejam alcan cados sem comprometer o desempenho da rede. / In the 90s, peer-to-peer systems became more popular and, since then, major advances and new applications have been developed based on the features of this kind of computer network. Initially they were used only in simple applications as le sharing, but now they have been implemented in increasingly more complex applications. Among these novel systems, it pointed out the database information sharing, which is developing rapidly. In peer-to-peer database, a very rich and widely distributed knowledge base is created, based on the sharing of semantically related but syntactically heterogeneous information. One of the challenges of such an application is to ensure an e cient way to search for information with no jeopardy either to the individual nodes autonomy or to the network exibility. The work herein explores this challenge aiming at a proposal to support the searches through paths optimization, looking for reducing the number of messages sent in network without a ecting the number of each query's answers. To do this work, it proposes a strategy based both on ant colony algorithm concepts and information classi cation by ontologies. This way, it has been possible to add the semantic support in order to ease the search process in peer-to-peer database, while reducing the message tra c and allowing even to reach more results without compromising the network performance. Banco de dados - Gerencia Redes de computadores Otimização de caminhos Database

Search results