Global ETD Search

591	GoPubMed: Ontology-based literature search for the life sciences / GoPubMed: ontologie-basierte Literatursuche für die Lebenswissenschaften Doms, Andreas 20 January 2009 (has links) (PDF) Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed. ontology textmining literature search engine biology medicine named entity recognition concept recognition PubMed Gene Ontology MeSH semantic search Ontologie PubMed MeSH Entitätenerkennung Textmining Literaturrecherche semantische Suche ddc:004 rvk:ST 304 rvk:ST 270
592	A knowledgebase of stress reponsive gene regulatory elements in arabidopsis Thaliana Adam, Muhammed Saleem January 2011 (has links) <p>Stress responsive genes play a key role in shaping the manner in which plants process and respond to environmental stress. Their gene products are linked to DNA transcription and its consequent translation into a response product. However, whilst these genes play a significant role in manufacturing responses to stressful stimuli, transcription factors coordinate access to these genes, specifically by accessing a gene&rsquo / s promoter region which houses transcription factor binding sites. Here transcriptional elements play a key role in mediating responses to environmental stress where each transcription factor binding site may constitute a potential response to a stress signal. Arabidopsis thaliana, a model organism, can be used to identify the mechanism of how transcription factors shape a plant&rsquo / s survival in a stressful environment. Whilst there are numerous plant stress research groups, globally there is a shortage of publicly available stress responsive gene databases. In addition a number of previous databases such as the Generation Challenge Programme&rsquo / s comparative plant stressresponsive gene catalogue, Stresslink and DRASTIC have become defunct whilst others have stagnated. There is currently a single Arabidopsis thaliana stress response database called STIFDB which was launched in 2008 and only covers abiotic stresses as handled by major abiotic stress responsive transcription factor families. Its data was sourced from microarray expression databases, contains numerous omissions as well as numerous erroneous entries and has not been updated since its inception.The Dragon Arabidopsis Stress Transcription Factor database (DASTF) was developed in response to the current lack of stress response gene resources. A total of 2333 entries were downloaded from SWISSPROT, manually curated and imported into DASTF. The entries represent 424 transcription factor families. Each entry has a corresponding SWISSPROT, ENTREZ GENBANK and TAIR accession number. The 5&rsquo / untranslated regions (UTR) of 417 families were scanned against TRANSFAC&rsquo / s binding site catalogue to identify binding sites. The relational database consists of two tables, namely a transcription factor table and a transcription factor family table called DASTF_TF and TF_Family respectively. Using a two-tier client-server architecture, a webserver was built with PHP, APACHE and MYSQL and the data was loaded into these tables with a PYTHON script. The DASTF database contains 60 entries which correspond to biotic stress and 167 correspond to abiotic stress while 2106 respond to biotic and/or abiotic stress. Users can search the database using text, family, chromosome and stress type search options. Online tools have been integrated into the DASTF&nbsp / database, such as HMMER, CLUSTALW, BLAST and HYDROCALCULATOR. User&rsquo / s can upload sequences to identify which transcription factor family their sequences belong to by using HMMER. The website can be accessed at http://apps.sanbi.ac.za/dastf/ and two updates per year are envisaged.</p>
593	Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from Texts Diesner, Jana 01 September 2012 (has links) This thesis is motivated by the need for scalable and reliable methods and technologies that support the construction of network data based on information from text data. Ultimately, the resulting data can be used for answering substantive and graph-theoretical questions about socio-technical networks. One main limitation with constructing network data from text data is that the validation of the resulting network data can be hard to infeasible, e.g. in the cases of covert, historical and large-scale networks. This thesis addresses this problem by identifying the impact of coding choices that must be made when extracting network data from text data on the structure of networks and network analysis results. My findings suggest that conducting reference resolution on text data can alter the identity and weight of 76% of the nodes and 23% of the links, and can cause major changes in the value of commonly used network metrics. Also, performing reference resolution prior to relation extraction leads to the retrieval of completely different sets of key entities in comparison to not applying this pre-processing technique. Based on the outcome of the presented experiments, I recommend strategies for avoiding or mitigating the identified issues in practical applications. When extracting socio-technical networks from texts, the set of relevant node classes might go beyond the classes that are typically supported by tools for named entity extraction. I address this lack of technology by developing an entity extractor that combines an ontology for sociotechnical networks that originates from the social sciences, is theoretically grounded and has been empirically validated in prior work, with a supervised machine learning technique that is based on probabilistic graphical models. This thesis does not stop at showing that the resulting prediction models achieve state of the art accuracy rates, but I also describe the process of integrating these models into an existing and publically available end-user product. As a result, users can apply these models to new text data in a convenient fashion. While a plethora of methods for building network data from information explicitly or implicitly contained in text data exists, there is a lack of research on how the resulting networks compare with respect to their structure and properties. This also applies to networks that can be extracted by using the aforementioned entity extractor as part of the relation extraction process. I address this knowledge gap by comparing the networks extracted by using this process to network data built with three alternative methods: text coding based on thesauri that associate text terms with node classes, the construction of network data from meta-data on texts, such as key words and index terms, and building network data in collaboration with subject matter experts. The outcomes of these comparative analyses suggest that thesauri generated with the entity extractor developed for this thesis need adjustments with respect to particular categories and types of errors. I am providing tools and strategies to assist with these refinements. My results also show that once these changes have been made and in contrast to manually constructed thesauri, the prediction models generalize with acceptable accuracy to other domains (news wire data, scientific writing, emails) and writing styles (formal, casual). The comparisons of networks constructed with different methods show that ground truth data built by subject matter experts are hardly resembled by any automated method that analyzes text bodies, and even less so by exploiting existing meta-data from text corpora. Thus, aiming to reconstruct social networks from text data leads to largely incomplete networks. Synthesizing the findings from this work, I outline which types of information on socio-technical networks are best captured by what network data construction method, and how to best combine these methods in order to gain a more comprehensive view on a network. When both, text data and relational data, are available as a source of information on a network, people have previously integrated these data by enhancing social networks with content nodes that represent salient terms from the text data. I present a methodological advancement to this technique and test its performance on the datasets used for the previously mentioned evaluation studies. By using this approach, multiple types of behavioral data, namely interactions between people as well as their language use, can be taken into account. I conclude that extracting content nodes from groups of structurally equivalent agents can be an appropriate strategy for enabling the comparison of the content that people produce, perceive or disseminate. These equivalence classes can represent a variety of social roles and social positions that network members occupy. At the same time, extracting content nodes from groups of structurally coherent agents can be suitable for enabling the enhancement of social networks with content nodes. The results from applying the latter approach to text data include a comparison of the outcome of topic modeling; an efficient and unsupervised information extraction technique, to the outcomes of alternative methods, including entity extraction based on supervised machine learning. My findings suggest that key entities from meta-data knowledge networks might serve as proper labels for unlabeled topics. Also, unsupervised and supervised learning leads to the retrieval of similar entities as highly likely members of highly likely topics, and key nodes from text-based knowledge networks, respectively. In summary, the contributions made with this thesis help people to collect, manage and analyze rich network data at any scale. This is a precondition for asking substantive and graph-theoretical questions, testing hypotheses, and advancing theories about networks. This thesis uses an interdisciplinary and computationally rigorous approach to work towards this goal; thereby advancing the intersection of network analysis, natural language processing and computing. socio-technical networks semantic networks information networks entity extraction relation extraction reference resolution co-occurrence based network construction accuracy assessment network clustering grouping in networks Software Engineering
594	Valdybos narių atsakomybė: ypatumai ir teismų praktikos analizė / Liability of Board members: peculiarities and jurisprudence Kakurinas, Olegas 27 January 2014 (has links) Teisės doktrinoje teigiama, kad juridinis asmuo yra išvestinis civilinės teisės subjektas, įgyjantis civilines teises, prisiimantis civilines pareigas ir jas įgyvendinantis per savo organus, kurie sudaromi ir veikia pagal įstatymus ir juridinių asmenų steigimo dokumentus. Atsižvelgiant į ribotą darbo apimtį, darbe nagrinėjami valdybos, kaip kolegialaus valdymo organo, narių civilinės atsakomybės labiausiai Lietuvos Respublikoje paplitusiai privačiųjų juridinių asmenų rūšiai – bendrovei taikymo ypatumai. Turint omenyje, kad teisinė atsakomybė kyla nevykdant ir/ar netinkamai vykdant asmeniui nustatytas teisines pareigas, visų pirma, darbe yra identifikuojamos valdybos narių pareigos bendrovei, šių pareigų turinys, aptariama valdybos teisinė prigimtis. Pagrindinis darbo tyrimas yra nukreiptas į valdybos narių teisinės atsakomybės prigimties, jos taikymo prielaidų ir pagrindų nustatymą. Šiuo tikslu darbo autorius nagrinėja ir apibrėžia teisinį santykį, susiklostantį tarp bendrovės ir valdybos narių. Darbe analizuojama kokia civilinės teisinės atsakomybės rūšis yra taikoma valdybos nariams – sutartinė ar deliktinė. Siekiant nustatyti valdybos narių civilinės teisinės atsakomybės taikymo prielaidas, analizuojama koks iš civilinės teisinės atsakomybės valdybos nariams modelių – individualus ar kolektyvinis, yra taikomas Lietuvos Respublikoje. Taip pat darbe yra nagrinėjami civilinės teisinės atsakomybės valdybos nariams taikymo sąlygų probleminiai aspektai. Darbe plačiai... [toliau žr. visą tekstą] / Legal doctrine holds that a legal entity is a derivative subject of the Civil law which acquires civil rights, assumes civil obligations and implements it through their bodies which are formed and act in accordance with the laws and documents of incorporation. Considering the limited scope of work, this thesis analyzes the peculiarities of civil liability of Board members as a collegial body to the most widespread type of private legal entities in the Republic of Lithuania - companies. Taking into account that the civil liability arises due to a failure to perform or improper performance of civil duties, thesis defines the duties of Board members to the company, the scope of such duties, as well as discusses the legal nature of the Board. The main focus of the study is to determine the nature of legal liability of Board members, its assumptions and bases. To achieve this goal the author examines and defines the legal relationship between the company and Board members. Thesis analyzes which kind of civil liability - contractual or tort - applies to the members of the Board. In order to determine the assumptions of civil liability of the members of the Board, it is analyzed whether individual or collective model of civil liability of Board members is applied in the Republic of Lithuania. Moreover, thesis covers the problematic aspects of the conditions of civil liability applied to Board members. Thesis extensively relies on jurisprudence of the courts of the Republic of... [to full text] Law Bendrovės valdyba Bendrovės valdymo organas Fiduciarinės pareigos Civilinės atsakomybės taikymo sąlygos Company's Board Management body of company Fiduciary duties Conditions of civil liability
595	Hypergraphs and information fusion for term representation enrichment : applications to named entity recognition and word sense disambiguation / Hypergraphes et fusion d’information pour l’enrichissement de la représentation de termes : applications à la reconnaissance d’entités nommées et à la désambiguïsation du sens des mots Soriano-Morales, Edmundo-Pavel 07 February 2018 (has links) Donner du sens aux données textuelles est une besoin essentielle pour faire les ordinateurs comprendre notre langage. Pour extraire des informations exploitables du texte, nous devons les représenter avec des descripteurs avant d’utiliser des techniques d’apprentissage. Dans ce sens, le but de cette thèse est de faire la lumière sur les représentations hétérogènes des mots et sur la façon de les exploiter tout en abordant leur nature implicitement éparse.Dans un premier temps, nous proposons un modèle de réseau basé sur des hypergraphes qui contient des données linguistiques hétérogènes dans un seul modèle unifié. En d’autres termes, nous introduisons un modèle qui représente les mots au moyen de différentes propriétés linguistiques et les relie ensemble en fonction desdites propriétés. Notre proposition diffère des autres types de réseaux linguistiques parce que nous visons à fournir une structure générale pouvant contenir plusieurstypes de caractéristiques descriptives du texte, au lieu d’une seule comme dans la plupart des représentations existantes.Cette représentation peut être utilisée pour analyser les propriétés inhérentes du langage à partir de différents points de vue, oupour être le point de départ d’un pipeline de tâches du traitement automatique de langage. Deuxièmement, nous utilisons des techniques de fusion de caractéristiques pour fournir une représentation enrichie unique qui exploite la nature hétérogènedu modèle et atténue l’eparsité de chaque représentation. Ces types de techniques sont régulièrement utilisés exclusivement pour combiner des données multimédia.Dans notre approche, nous considérons différentes représentations de texte comme des sources d’information distinctes qui peuvent être enrichies par elles-mêmes. Cette approche n’a pas été explorée auparavant, à notre connaissance. Troisièmement, nous proposons un algorithme qui exploite les caractéristiques du réseau pour identifier et grouper des mots liés sémantiquement en exploitant les propriétés des réseaux. Contrairement aux méthodes similaires qui sont également basées sur la structure du réseau, notre algorithme réduit le nombre de paramètres requis et surtout, permet l’utilisation de réseaux lexicaux ou syntaxiques pour découvrir les groupes de mots, au lieu d’un type unique des caractéristiques comme elles sont habituellement employées.Nous nous concentrons sur deux tâches différentes de traitement du langage naturel: l’induction et la désambiguïsation des sens des mots (en anglais, Word Sense, Induction and Disambiguation, ou WSI/WSD) et la reconnaissance d’entité nommées(en anglais, Named Entity Recognition, ou NER). Au total, nous testons nos propositions sur quatre ensembles de données différents. Nous effectuons nos expériences et développements en utilisant des corpus à accès libre. Les résultats obtenus nous permettent de montrer la pertinence de nos contributions et nous donnent également un aperçu des propriétés des caractéristiques hétérogènes et de leurs combinaisons avec les méthodes de fusion. Plus précisément, nos expériences sont doubles: premièrement, nous montrons qu’en utilisant des caractéristiques hétérogènes enrichies par la fusion, provenant de notre réseau linguistique proposé, nous surpassons la performance des systèmes à caractéristiques uniques et basés sur la simple concaténation de caractéristiques. Aussi, nous analysons les opérateurs de fusion utilisés afin de mieux comprendre la raison de ces améliorations. En général, l’utilisation indépendante d’opérateurs de fusion n’est pas aussi efficace que l’utilisation d’une combinaison de ceux-ci pour obtenir une représentation spatiale finale. Et deuxièmement, nous abordons encore une fois la tâche WSI/WSD, cette fois-ci avec la méthode à base de graphes proposée afin de démontrer sa pertinence par rapport à la tâche. Nous discutons les différents résultats obtenus avec des caractéristiques lexicales ou syntaxiques. / Making sense of textual data is an essential requirement in order to make computers understand our language. To extract actionable information from text, we need to represent it by means of descriptors before using knowledge discovery techniques.The goal of this thesis is to shed light into heterogeneous representations of words and how to leverage them while addressing their implicit sparse nature.First, we propose a hypergraph network model that holds heterogeneous linguistic data in a single unified model. In other words, we introduce a model that represents words by means of different linguistic properties and links them together accordingto said properties. Our proposition differs to other types of linguistic networks in that we aim to provide a general structure that can hold several types of descriptive text features, instead of a single one as in most representations. This representationmay be used to analyze the inherent properties of language from different points of view, or to be the departing point of an applied NLP task pipeline. Secondly, we employ feature fusion techniques to provide a final single enriched representation that exploits the heterogeneous nature of the model and alleviates the sparseness of each representation.These types of techniques are regularly used exclusively to combine multimedia data. In our approach, we consider different text representations as distinct sources of information which can be enriched by themselves. This approach has not been explored before, to the best of our knowledge. Thirdly, we propose an algorithm that exploits the characteristics of the network to identify and group semantically related words by exploiting the real-world properties of the networks. In contrast with similar methods that are also based on the structure of the network, our algorithm reduces the number of required parameters and more importantly, allows for the use of either lexical or syntactic networks to discover said groups of words, instead of the singletype of features usually employed.We focus on two different natural language processing tasks: Word Sense Induction and Disambiguation (WSI/WSD), and Named Entity Recognition (NER). In total, we test our propositions on four different open-access datasets. The results obtained allow us to show the pertinence of our contributions and also give us some insights into the properties of heterogeneous features and their combinations with fusion methods. Specifically, our experiments are twofold: first, we show that using fusion-enriched heterogeneous features, coming from our proposed linguistic network, we outperform the performance of single features’ systems and other basic baselines. We note that using single fusion operators is not efficient compared to using a combination of them in order to obtain a final space representation. We show that the features added by each combined fusion operation are important towards the models predicting the appropriate classes. We test the enriched representations on both WSI/WSD and NER tasks. Secondly, we address the WSI/WSD task with our network-based proposed method. While based on previous work, we improve it by obtaining better overall performance and reducing the number of parameters needed. We also discuss the use of either lexical or syntactic networks to solve the task.Finally, we parse a corpus based on the English Wikipedia and then store it following the proposed network model. The parsed Wikipedia version serves as a linguistic resource to be used by other researchers. Contrary to other similar resources, insteadof just storing its part of speech tag and its dependency relations, we also take into account the constituency-tree information of each word analyzed. The hope is for this resource to be used on future developments without the need to compile suchresource from zero. Réseaux linguistiques Représentation de mots Techniques de fusion Reconnaissance d’entités nommées Natural Language Processing Linguistic Network Word Representation Fusion Techniques Word Sense Induction and Disambiguation Named Entity Recognition 004.03
596	Zásada flexijistoty v právní úpravě přechodu práv a povinností z pracovněprávních vztahů / Principle of flexicurity in legal regulation of transfer of rightsand obligations from the employment-law relations Jouzová, Lada January 2018 (has links) 1 Principle of flexicurity in legal regulation of transfer of rights and obligations from the employment-law relations Abstract In her PhD thesis, the author deals with the legal regulation of transfer of rights and obligations arising from employment-law relations in the Czech Republic, in the context of the European union law, from the point of view of the concept of flexicurity in the employment-law relations. This concept includes, on the one hand, elements of flexibility in the realization of employment-law relations, and, on the other hand, security (protection) of employees in these relations which is manifested in particular by the transfer of rights and obligations from their employment-law relations to the new employer itself. At present, when changes in the employers' organizational structure, transfers of activities or tasks, mergers, the purchase of a business or a business lease, but also so called outsourcing, insourcing, and change of suppliers are becoming more and more common in companies, the issue of change of the employer, and consequently, safeguarding and protection of the rights of employees, in particular safeguarding of their employment-law relations, is becoming more up to date. Protection of employees' rights during the transfers of undertakings and businesses is one of the...
597	A importância da limitação da responsabilidade de sócios e da delimitação da responsabilidade de administradores para as relações econômicas no ordenamento brasileiro. Martins, Irena Carneiro January 2008 (has links) Submitted by Edileide Reis (leyde-landy@hotmail.com) on 2013-04-15T14:00:08Z No. of bitstreams: 1 Irena Carneiro Martins.pdf: 492924 bytes, checksum: adb1e49b063679e6e3940d77a1771caa (MD5) / Approved for entry into archive by Rodrigo Meirelles(rodrigomei@ufba.br) on 2013-05-09T17:39:33Z (GMT) No. of bitstreams: 1 Irena Carneiro Martins.pdf: 492924 bytes, checksum: adb1e49b063679e6e3940d77a1771caa (MD5) / Made available in DSpace on 2013-05-09T17:39:33Z (GMT). No. of bitstreams: 1 Irena Carneiro Martins.pdf: 492924 bytes, checksum: adb1e49b063679e6e3940d77a1771caa (MD5) Previous issue date: 2008 / Este trabalho tratou de investigar as origens do instituto da limitação da responsabilidade dos sócios e estabelecer a importância de tal limitação – a partir da harmonização entre os princípios constitucionais de proteção aos direitos sociais e os princípios – igualmente constitucionais – da livre iniciativa do qual decorre também o princípio da preservação da empresa. De modo semelhante buscou se estabelecer a importância da delimitação da responsabilização dos administradores que não possuem vínculo societário com as empresas por eles administradas tanto no âmbito legislativo quanto judicial. Nesse contexto buscou-se demonstrar – para além dos prejuízos – a ociosidade da aplicação da teoria da desconsideração da personalidade jurídica em face dos remédios jurídicos já existentes no ordenamento jurídico brasileiro para as ocasiões em que se verifique a ocorrência de fraude simulação e prática dos atos ultra vires. Advoga-se neste trabalho a possibilidade de se buscar a preservação da empresa atendendo ao chamado do devido processo legal e – simultaneamente– ao chamado da busca pela satisfação do crédito ou reparação de prejuízos ensejados mediante o abuso. da pessoa jurídica seja por administrador seja por sócio fortalecendo assim os caros institutos da segurança jurídica e previsibilidade das decisões judiciais. Concorrem também para a consagração do tudo quanto aqui exposto uma redução do ativismo judicial que se verifica em preterimento de direitos processuais que gozam de status constitucional como os da ampla defesa e do contraditório. Além disso buscou-se evidenciar a necessidade de diálogo entre Direito – através dos magistrados – e Economia a partir da compreensão por parte daqueles dos reflexos de sua atuação para o desenvolvimento econômico e consequentemente para o desenvolvimento social. Nesse tocante acredita-se útil a colaboração que pode ser fornecida pela Psicanálise a partir de uma das três instâncias do aparelho psíquico: o superego no entendimento do Judiciário como superego da sociedade. / Salvador Entrepreneurial law Atuação do poder judiciário Ponderação de princípios Direito empresarial Direito comercial Comercial law Directors liability Disregard of legal entity doctrine Principles balance techniques Judicial power performance
598	The taxation of electronic commerce and the implications for current taxation practices in South Africa Doussy, Elizabeth 01 January 2002 (has links) This study analyses the nature and implementation of electronic commerce in order to identify possible problems for taxation and pinpoint those problems which may be relevant to South Africa. Solutions suggested by certain countries and institutions are evaluated for possible implementation in South Africa. The study suggests that although current taxation legislation in South Africa is apP'icable to electronic commerce transactions it is not sufficient to cater effectively for this type of business. The conclusion reached Is that international co-operation is essential in finding solutions. A number of recommendations are made regarding aspects of South African taxation legislation which need to be clarified through policy decisions. Title of / Taxation / M.Comm. Electronic commerce Internet Web site Internet service provider Server Jurisdiction Residence Anti-avoidance Controlled foreign entity Tax characterisation Permanent establishment Double taxation agreements Income allocation 336.278658840968
599	[en] NAMED ENTITY RECOGNITION FOR PORTUGUESE / [pt] RECONHECIMENTO DE ENTIDADES MENCIONADAS PARA O PORTUGUÊS DANIEL SPECHT SILVA MENEZES 13 December 2018 (has links) [pt] A produção e acesso a quantidades imensas dados é um elemento pervasivo da era da informação. O volume de informação disponível é sem precedentes na história da humanidade e está sobre constante processo de expansão. Uma oportunidade que emerge neste ambiente é o desenvolvimento de aplicações que sejam capazes de estruturar conhecimento contido nesses dados. Neste contexto se encaixa a área de Processamento de Linguagem Natural (PLN) - Natural Language Processing (NLP) - , ser capaz de extrair informações estruturadas de maneira eficiente de fontes textuais. Um passo fundamental para esse fim é a tarefa de Reconhecimento de Entidades Mencionadas (ou nomeadas) - Named Entity Recognition (NER) - que consistem em delimitar e categorizar menções a entidades num texto. A construção de sistemas para NLP deve ser acompanhada de datasets que expressem o entendimento humano sobre as estruturas gramaticais de interesse, para que seja possível realizar a comparação dos resultados com o real discernimento humano. Esses datasets são recursos escassos, que requerem esforço humano para sua produção. Atualmente, a tarefa de NER vem sendo abordada com sucesso por meio de redes neurais artificiais, que requerem conjuntos de dados anotados tanto para avaliação quanto para treino. A proposta deste trabalho é desenvolver um dataset de grandes dimensões para a tarefa de NER em português de maneira automatizada, minimizando a necessidade de intervenção humana. Utilizamos recursos públicos como fonte de dados, nominalmente o DBpedia e Wikipédia. Desenvolvemos uma metodologia para a construção do corpus e realizamos experimentos sobre o mesmo utilizando arquiteturas de redes neurais de melhores performances reportadas atualmente. Exploramos diversas modelos de redes neurais, explorando diversos valores de hiperparâmetros e propondo arquiteturas com o foco específico de incorporar fontes de dados diferentes para treino. / [en] The production and access of huge amounts of data is a pervasive element of the Information Age. The volume of availiable data is without precedents in human history and it s in constant expansion. An oportunity that emerges in this context is the development and usage of applicationos that are capable structuring the knowledge of data. In this context fits the Natural Language Processing, being able to extract information efficiently from textual data. A fundamental step for this goal is the task of Named Entity Recognition (NER) which delimits and categorizes the mentions to entities. The development o systems for NLP tasks must be accompanied by datasets produced by humans in order to compare the system with the human discerniment for the NLP task at hand. These datasets are a scarse resource which the construction is costly in terms of human supervision. Recentlly, the NER task has been approached using artificial network models which needs datsets for both training and evaluation. In this work we propose the construction of a datasets for portuguese NER with an automatic approach using public data sources structured according to the principles of SemanticWeb, namely, DBpedia and Wikipédia. A metodology for the construction of this dataset was developed and experiments were performed using both the built dataset and the neural network architectures with the best reported results. Many setups for the experiments were evaluated, we obtained preliminary results for diverse hiperparameters values, also proposing architectures with the specific focus of incorporating diverse data sources for training. [pt] REDES NEURAIS [en] NEURAL NETWORKS [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] WIKIPEDIA [en] WIKIPEDIA [pt] PROCESSAMENTO DE LINGUAGEM NATURAL [en] NATURAL LANGUAGE PROCESSING [en] NAMED ENTITY RECOGNITION [pt] DATASETS [en] DATASETS
600	The design of a database of resources for rational therapy Steyn, Genevieve Lee 06 1900 (has links) The purpose of this study is to design a database of resources for rational therapy. An investigation of the current health situation and reorientation towards primary health care (PHC) in South Africa evidenced the need for a database of resources which would meet the demand for rational therapy information made on the Helderberg College Library by various user groups as well as make a contribution to the national health information infrastructure. Rational therapy is viewed as an approach within PHC that is rational, common-sense, wholistic and credible, focusing on the prevention and maintenance of health. A model of the steps in database design was developed. A user study identified users' requirements for design and the conceptual schema was developed. The entities, attributes, relationships and policies were presented and graphically summarised in an Entity-Relationship (E-R) diagram. The conceptual schema is the blueprint for further design and implementation of the database. / Information Science / M.Inf. Information systems Database Database design Primary health care Information science Entity-relationship model Conceptual schema rational therapy Data dictionary 005.756 Database management Relational databases Data dictionaries Medical care -- Databases

Search results