41 |
Semantic Analysis of Natural Language and Definite Clause Grammar using Statistical Parsing and ThesauriDagerman, Björn January 2013 (has links)
Services that rely on the semantic computations of users’ natural linguistic inputs are becoming more frequent. Computing semantic relatedness between texts is problematic due to the inherit ambiguity of natural language. The purpose of this thesis was to show how a sentence could be compared to a predefined semantic Definite Clause Grammar (DCG). Furthermore, it should show how a DCG-based system could benefit from such capabilities. Our approach combines openly available specialized NLP frameworks for statistical parsing, part-of-speech tagging and word-sense disambiguation. We compute the semantic relatedness using a large lexical and conceptual-semantic thesaurus. Also, we extend an existing programming language for multimodal interfaces, which uses static predefined DCGs: COactive Language Definition (COLD). That is, every word that should be acceptable by COLD needs to be explicitly defined. By applying our solution, we show how our approach can remove dependencies on word definitions and improve grammar definitions in DCG-based systems.
|
42 |
Využití vybraných medicínských tezaurů a klasifikací v portálech pro laickou veřejnost / Utilization of selected medical thesauri and classifications in patient information portalsLabský, Jan January 2019 (has links)
(in English) The thesis examines the use of medical thesauri and classifications in portals containing medical information for laymen. The thesis first describes the selected thesauri, classifications and individual portals. A survey was carried out exploring users' reasons for researching medical information. Subsequently, selected subjects were observed researching information on previously selected portals. The observation was complemented with semi- structured interviews with the subjects. The observed results were used to discern the key manifestations of the connection between portals and medical thesauri. The individual portals were further evaluated and their features most important to users were identified.
|
43 |
Analysis of the long term dynamics in thesaurus developments and its consequencesTavakolizadeh-Ravari, Mohammad 20 August 2007 (has links)
Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank „MEDLINE“. Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: „T = 3.076,6 Ln (d) – 22.695 + 0,0039d“ (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften. / This dissertation analyzes dynamic developments and use of thesauri. It focuses also on six effecting factors on the number of index terms per document or journal. MeSH and its corresponding well known database “MEDLINE” were established to conduct this research. The main consequences of statistical analyses are: 1. MeSH has developed logarithmically through three different phases. Such a thesaurus should follow the equation “T = 3,076.6 Ln(d) –22,695 + 0.0039d” (T = thesaurus terms, Ln = natural logarithm, and d = documents). To construct such a thesaurus, one needs to have at least 1,600 documents covering different topics of the thesaurus. The dynamic of thesauri such as MeSH is due to the persistent inclusion of one new term per indexing of 256 new documents. 2. The distribution of thesaurus terms yielded three classes: highly, normally, and rarely used terms. The last group is in a test phase, and only growth rates of most frequented terms in the first class and newer terms in the second class were becoming persistent over time. 3. There is a logarithmic relationship between the number of index terms per article and its pages, if the articles are between one and twenty-one pages. 4. Journal articles with abstracts received almost two more terms than those included into MEDLINE without abstracts. 5. The findability of non-English documents, such as articles written in German and indexed in an US-based database like MEDLINE, is less than that of English documents. The greatest difference is for articles with ten pages and the least is for those with twenty and more pages. 6. Journals with Impact Factors in the range from 0 to fifteen receive roughly the same number of index terms per page. 7. In an indexing system, different journals have more or less weight in their findability. Distribution of index terms per page has shown that there are three regions of journals in MEDLINE. In addition, few journals are the most favored ones and get more index term per page.
|
44 |
知識倉儲的知識結構之研究-以某行政部門為例盧美惠 Unknown Date (has links)
隨著資訊科技的蓬勃發展,經由資訊媒介的傳播,造成了企業或組織內部資訊大量的累積。因此,知識倉儲(knowledge repository)可以說是儲存各類型文件的儲存庫,主要用來管理和組織各類型資訊,例如資料庫、報告、文件、表單,都可以數位化方式儲存在知識倉儲,其功能在於進行組織內部各類型文件知識內容管理,進而協助組織提供網路服務(web service),包括:提供目錄、索引以協助使用者尋找資訊的檢索服務以及辨識和確認資訊位址的定址服務。因此,當知識不斷地從組織運作之中產生,知識與資訊的量也跟著不斷增加,如何管理這些知識就益顯重要,包括知識的表達、結構、儲存與取用方式等。
在這篇論文中,本研究試圖整理對於知識倉儲的『知識結構』之相關或背景知識,並針對文件彼此之間的相互參照關係以及索引典建立知識地圖(knowledge map),進一步將領域的相關知識,如術語或關連性等資料儲存成有結構性之知識,利用此領域知識對於文件內容附加上有語意關係之處理,在進行資訊檢索時,從而利用領域知識結構以協助使用者準確地檢索與查詢有用或相關之資訊內容。
本研究運用檔案管理全宗理論及控制層級(control level),提出因應組織結構改變之檔案系統目錄結構,劃分全宗、系列、案卷、文件等層次,知識倉儲系統藉由文件虛擬位址(DL)以及文件實體位址(URL)之對映,以處理組織結構改變之動態文件管理。本研究進一步針對具有關連的一組文件進行案卷內部分類,利用所分析之案卷類型結構,描述具有單一文件以及具有複合文件概念之文件,包括:會議記錄、法令規章等,並運用都柏林核心集(Dublin Core)描述文件資料建立Metadata結構,然後透過索引典(Thesaurus)詞彙語意關係之處理,提供概念性之語意資訊檢索。
|
45 |
Analyse et représentation de la variation terminologique et de la multidimentionalité dans un thésaurus : le cas du métalangage de la terminologieVico Ramírez, Alicia 07 1900 (has links)
Le présent travail consiste à proposer un modèle de représentation des notions
théoriques et pratiques de la terminologie et de leurs relations sous forme de thésaurus. Selon
la norme ISO 25964-1, 2011, « un thésaurus est un vocabulaire contrôlé et structuré dans
lequel les concepts sont représentés par des termes, ayant été organisés afin de rendre
explicites les relations entre les concepts (…) ». Notre objectif est de créer un outil
pédagogique à la suite d’une réflexion théorique englobant différentes perspectives
notionnelles au sein de cette discipline.
Les enjeux soulevés par la classification des concepts de certains champs de savoir
(notamment ceux donnant lieu à différentes perspectives) n’ont pas été approfondis
suffisamment dans la littérature de la terminologie, ni dans celle portant sur les thésaurus.
Comment décrire des concepts qui sont sujets à des dissensions théoriques entre les différentes
écoles de pensée? Comment classer les différentes relations entretenues par les concepts
théoriques et les applications pratiques d’une discipline? À ces questions s’ajoute celle de la
prise en compte de ces difficultés dans un thésaurus. Nous commençons par délimiter et
organiser les concepts saillants du domaine. Ensuite, à l’aide d’un corpus comprenant des
publications associées à différentes approches de la terminologie, nous étudions les
réalisations linguistiques de ces concepts et leurs relations en contexte, dans le but de les
décrire, de les classer et de les définir. Puis, nous faisons l’encodage de ces données à l’aide
d’un logiciel de gestion de thésaurus, en respectant les normes ISO applicables. La dernière
étape consiste à définir la visualisation de ces données afin de la rendre conviviale et
compréhensible.
Enfin, nous présentons les caractéristiques fondamentales du Thésaurus de la
terminologie. Nous avons analysé et représenté un échantillon de 45 concepts et leurs termes
reliés. Les différents phénomènes associés à ces descripteurs comme la multidimensionalité, la
variation conceptuelle et la variation dénominative sont aussi représentés dans notre thésaurus. / This thesis proposes a model to represent theoretical and practical concepts of terminology
as well as their terminological relationships in the form of a thesaurus. According to the ISO
25964-1:2011 standard, “a thesaurus is a controlled and structured vocabulary in which
concepts are represented by terms that have been previously arranged in order to show
explicitly the relations among concepts (…)”. Our objective is to create a pedagogical tool that
is grounded in a theoretical reflection about different theoretical perspectives within this
discipline.
The issues associated with the classification of concepts in certain fields of knowledge
(especially those with different perspectives) require further study in the field of terminology
and information science, and in the literature relating to thesauri. Indeed, how does one
describe concepts that are subject to theoretical dissent and different schools of thought? How
can the different relationships between theoretical concepts and practical applications of a
discipline be classified? To these questions is added the additional challenge of reflecting
these difficulties in a thesaurus. Our first step consists in delimiting and organizing the main
concepts of the field. Then, by means of a corpus containing different publications associated
with different approaches in terminology, we study the linguistic realizations of those concepts
and their relationships in context, with the objective of describing, classifying and defining
them. We then encode this data using thesaurus management software that respects the
relevant ISO standards. Finally, we produce visualizations of this data to make it more user
friendly and understandable.
To conclude, we present the fundamental characteristics of the Thésaurus de la
terminologie. We have analyzed and presented a sample of 45 concepts and their related
terms. Different phenomena related to these descriptors, such as multidimensionality,
conceptual variation and denominative variation, are also represented in our thesaurus.
|
46 |
中文分類主題一體化之研究:以教育學類為例 / The Study of Integrated Chinese Classification and Subject Headings: a Case Study of Education何世文, Ho, Shih-Wen Unknown Date (has links)
主題編目一直是資訊和知識組織整理的重要課題。對圖書館而言,主題編目主要就是運用分類法與主題法完成。這兩種方法在功能上各有所長、各有所短,若能使分類法與主題法結合,相互擷長補短、相輔相成,應更能發揮其功能。
本論文以教育學類為對象。首先經由文獻分析設計兩種調查問卷,一方面對臺灣地區設有教育學科系之十三所大學圖書館進行普查;另一方面對臺北市設有教育學科系之四所大學圖書館使用者進行抽樣調查。問卷目的在瞭解編目人員及讀者對於中文教育學類主題編目與主題檢索之現況、需求與看法,以作為編訂中文教育學類分類主題一體化詞表之參考。其次以目前圖書館普遍採用之賴永詳《中國圖書分類法(民78年增訂七版)》教育類(520類)與《中文圖書標題表(民84年修訂版)》教育類主題詞(520類)進行對應,發現完全不對應的比例只有0.49%。因此,本研究採用1997年版之Microsoft Access軟體為輔助工具,運用索引典詞彙控制方法並大量收錄教育學相關主題詞彙,建立資料庫,編訂完成《中文圖書教育分類主題詞表》之原型(Prototype)。
研究結論發現:控制主題編目品質需要質量並重;《中文圖書標題表》教育主題詞與賴氏《中圖法》教育類目詞已具備一體化基礎;中文主題標目缺乏一部完整之主題詞表可供依循;主題編目工具應進行的詞彙控制,並提供關鍵詞查詢;我國有編製中文教育學類分類主題一體化詞表之必要性;編目館員及讀者對主題編目與主題檢索的看法有所差異。因此,建議儘速修訂收詞廣泛的中文分類主題一體化詞表;各圖書館都應提供主題標目查詢;各圖書館應重視使用者調查與加強主題檢索之利用教育。
期望本論文之完成,能對圖書館的教育學類主題編目工作盡一份心力,進而對國內知識與資訊的組織整理有所貢獻。 / The organization of information/ knowledge is always an important field to the librarians and researchers in the library and information science. As to the library, the important field belongs to library cataloging. It contains a discussion of the two basic functions: descriptive cataloging and subject cataloging. The later includes class number and subject access point assignment. The purpose of this study attempt to explore the possibility of the integration of class number and subject access point in Taiwan.
|
47 |
Fuzzy Cluster-Based Query ExpansionTai, Chia-Hung 29 July 2004 (has links)
Advances in information and network technologies have fostered the creation and availability of a vast amount of online information, typically in the form of text documents. Information retrieval (IR) pertains to determining the relevance between a user query and documents in the target collection, then returning those documents that are likely to satisfy the user¡¦s information needs. One challenging issue in IR is word mismatch, which occurs when concepts can be described by different words in the user queries and/or documents. Query expansion is a promising approach for dealing with word mismatch in IR.
In this thesis, we develop a fuzzy cluster-based query expansion technique to solve the word mismatch problem. Using existing expansion techniques (i.e., global analysis and non-fuzzy cluster-based query expansion) as performance benchmarks, our empirical results suggest that the fuzzy cluster-based query expansion technique can provide a more accurate query result than the benchmark techniques can.
|
48 |
Sacherschliessung in Museen - Chancen und ProblemeSieglerschmidt, Jörn 28 August 2007 (has links) (PDF)
Jörn Sieglerschmidt, Bibliotheksservice-Zentrum Baden Württemberg, Konstanz, führte seine Zuhörer durch die schwierigen Aufgaben bei der Vertextung von Museumsgut, wobei er deutlich machte, dass anders als im Bibliothekswesen die Grenzen zwischen Formal- und Sacherschließung fließend sind:
http://titan.bsz-bw.de/cms/museen/musis/publ/sieglerschmidt_freiburg2007.pdf
|
49 |
Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association ThesaurusLyall-Wilson, Jennifer Rae January 2013 (has links)
The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of all the concepts within the document collection as well as the relationships these concepts have to one another. Because the representation is generated using data from the association thesaurus, a mapping will exist between the representation of the concepts and the terms used to describe these concepts. The research applies to search engines designed for use in an individual website with content focused on a specific conceptual domain. Therefore, both the document collection and the subject content must be well-bounded, which affords the ability to make use of techniques not currently feasible for general purpose search engine used on the entire web.
|
50 |
Automatische Sacherschließung an der ZBWGroß, Thomas 06 January 2012 (has links) (PDF)
Die ZBW möchte mit der Implementierung eines automatischen Sacherschließungsverfahrens einerseits dem Umstand einer stetigen Zunahme an Onlinedokumenten Rechnung tragen und andererseits bei der Inhaltserschließung neue Wege beschreiten. Neben der Entlastung der intellektuellen Erschließung durch ein semi- oder vollautomatisches Verfahren soll es darüber hinaus möglich sein, ZBW-fremde digitale Informationsressourcen jeglicher Art mit maschineller Hilfe zu indexieren und in einem gemeinsamen Suchraum auffindbar zu machen. Im derzeitigen Projekt werden hierzu die in der ZBW zur Anwendung kommenden Vokabulare (verbale Sacherschließung mit Standard-Thesaurus Wirtschaft, bzw. klassifikatorische Erschließung mit der Standardklassifikation Wirtschaft) für das maschinelle Verfahren angepasst, trainiert und evaluiert. Die Erfahrungen der ZBW mit der organisatorischen Implementierung automatischer Sacherschließung sowie die Möglichkeiten der Auswertung dieser Verfahren stehen im Mittelpunkt des Vortrages.
|
Page generated in 0.0276 seconds