Spelling suggestions: "subject:"[een] SIMILARITY"" "subject:"[enn] SIMILARITY""
661 |
Power Lines - Wasteland or Biodiversity Hotspots? / Kraftledningsgator - Biologisk öken eller mångfald?Norström Paananen, Marcus, Boström, Magnus, Ahlgren, Christian January 2008 (has links)
<p>Det svenska kulturlandskapet har förändrats radikalt under de senaste 200 åren från ett varierat och heterogent landskap till ett mer monotont, homogeniserat landskap som följd av att olika former av mänskligt resursutnyttjande har effektiviserats. Detta har lett till en fragmentering av livsmiljöerna för flera av kulturlandskapets arter. Kraftledningsgator kan tänkas hysa naturtyper som påminner om vissa av de nu försvunna eller fragmenterade livsmiljöerna (t ex betad skogsmark och vissa typer av ängsmarker) och skulle kunna ha en viktig betydelse som reträttplats och/eller spridningskorridor för dessa arter.</p><p>I en fallfällsinventering i Köpings och Strängnäs kommun i Mälardalen undersöktes förekomst och abundans av marklevande evertebrater i kraftledningsgator, skog och betesmark. Jämförelser i förekomst och abundans gjordes mellan dessa marktyper (d v s kraftledningsgator, skog och betesmark), samt mellan positioner inom kraftledningsgator (centrala och distala delar) och närliggande skogsmark. Jämförelserna innefattade dels analyser av artantal (eller snarare antal taxa) och flera olika biodiversitetsindex och dels analyser av likhet i artförekomst och individantal med "likhetsindex" (similarity index). Separata analyser gjordes inom olika taxonomiska grupper (t ex alla taxa, endast inom insekter, endast inom spindeldjur). Antalet replikat tillät statistisk testning av eventuella mönster i antal taxa och biodiversitetsindex.</p><p>Inga signifikanta skillnader dokumenterades, varken mellan de olika marktyperna eller mellan positioner inom kraftledningsgator och närliggande skog. Vi tolkar dessa resultat som att kraftledningsgator med avseende på antal taxa respektive biodiversitet <em>inte är (signifikant) sämre</em> än skogs- eller betesmark. Det kan betonas att det inte heller fanns något konsekvent (icke-signifikant) mönster som pekade på att så skulle vara fallet. Antal taxa och biodiversitetsindex tar ingen hänsyn till <em>vilka</em> arter eller taxa som ingår i analyserna. En naturtyp som hyser en individ- och artrik fauna bestående av oönskade arter (introducerade arter, "skadedjur" etc.) registrerar t ex ett högre biodiversitetsindex än en naturtyp med fåtaligt förekommande rödlistade, skyddsvärda arter. Likhetsindex belyser bättre vilka arter som är inblandade. Visserligen tas inte heller här hänsyn till exakt vilka arter som ingår (eller deras eventuella önskvärdhet eller skyddsvärde), men ett högt index indikerar att <em>samma</em> arter förekommer i de jämförda naturtyperna. I denna studie indikerar ett högt likhetsindex dessutom att antalet individer av de inblandade arterna är likartat, eftersom ett index som tar hänsyn till abundans användes.</p><p>Resultaten visade överlag höga likhetsindex, speciellt verkade kraftledningsgator och skogsmark hysa likartad evertebratfauna medan likheten mellan kraftledningsgator och betesmark var mindre uttalad. Sammanfattningsvis indikerar studien att kraftledningsgator inte verkar vara lågvärdiga livsmiljöer för de marklevande evertebrat-taxa som ingått i studien. Vi föreslår att kraftledningsgator med väl avvägda rutiner för röjning och skötsel skulle kunna spela en viktig roll i skapandet av artrika kantzoner eller marker som liknar ängs- eller betesmarker med svag hävd.</p> / <p>As a consequence of the intensification of various forms of human resource utilization rural Sweden has changed radically over the past 200 years from offering a varied and diverse landscape to a more monotonous, homogenised type of environment. This has led to fragmentation of habitats for many of the species occurring. Power line corridors might harbour habitats that resemble some of the now lost or fragmented habitats (e.g. grazed forest land and certain types of meadow), and could have important functions as refuge habitats and / or distribution corridors for these species.</p><p>In a pitfall trap study in Köping and Strängnäs municipalities in Mälardalen, the occurrence and abundance of ground-living invertebrates were investigated in power line corridors, adjoining forest and pastures. Comparisons were made between these habitat types, and between positions within the power line corridor (central and distal parts) and the nearby forested area. The comparisons included analysis of number of species (or rather the number of taxa) and several biodiversity indexes, as well as analysis of the similarity of the occurrence of certain species and individual numbers by use of "similarity index". Separate tests were made in different taxonomic groups (e.g. all taxa, only within insects, only within spiders). The number of replicates allowed statistical testing of patterns in the number of taxa and biodiversity index.</p><p>No significant differences were documented, neither between the different habitat types, nor between positions in the power line corridors and nearby forest. There was also no consistent (non-significant) pattern indicating that this would be the case. We suggest these results to indicate that power line corridors at least are not (significantly) poorer quality habitats than are forest or pasture land with regard to number of taxa and biodiversity. Number of taxa and biodiversity indices take no account of the species or taxa included in the analysis. Thus the same weight is assigned to an unwanted species (e.g. an invasive pest species or parasite) as to a red-listed, highly valued species. Similarity index takes more heed to the species involved. Although similarity indices do not consider the exact identity of involved species (or their possible value or desirability), a high index value indicates that the same species occur in the compared habitats. In this study, where an index that takes into account the abundance of species was used, a high similarity index value also indicates that the numbers of individuals are similar.</p><p>Overall, the results showed high similarity between habitat types. This would suggest that, to a large extent, power lines, forest and pasture land had the same composition of taxa, and that the taxa had similar abundances. Power lines and forest seemed to exhibit particularly high similarities, whereas the similarity between power lines and pasture land was less pronounced. Thus, this study indicates, in contrast to several previous suggestions, that power line corridors do not seem to be low quality habitats. We also suggest that power line corridors with well designed management routines could play an important role creating edges and habitats resembling meadow or low intensity grazed pasture land.</p>
|
662 |
Power Lines - Wasteland or Biodiversity Hotspots? / Kraftledningsgator - Biologisk öken eller mångfald?Norström Paananen, Marcus, Boström, Magnus, Ahlgren, Christian January 2008 (has links)
Det svenska kulturlandskapet har förändrats radikalt under de senaste 200 åren från ett varierat och heterogent landskap till ett mer monotont, homogeniserat landskap som följd av att olika former av mänskligt resursutnyttjande har effektiviserats. Detta har lett till en fragmentering av livsmiljöerna för flera av kulturlandskapets arter. Kraftledningsgator kan tänkas hysa naturtyper som påminner om vissa av de nu försvunna eller fragmenterade livsmiljöerna (t ex betad skogsmark och vissa typer av ängsmarker) och skulle kunna ha en viktig betydelse som reträttplats och/eller spridningskorridor för dessa arter. I en fallfällsinventering i Köpings och Strängnäs kommun i Mälardalen undersöktes förekomst och abundans av marklevande evertebrater i kraftledningsgator, skog och betesmark. Jämförelser i förekomst och abundans gjordes mellan dessa marktyper (d v s kraftledningsgator, skog och betesmark), samt mellan positioner inom kraftledningsgator (centrala och distala delar) och närliggande skogsmark. Jämförelserna innefattade dels analyser av artantal (eller snarare antal taxa) och flera olika biodiversitetsindex och dels analyser av likhet i artförekomst och individantal med "likhetsindex" (similarity index). Separata analyser gjordes inom olika taxonomiska grupper (t ex alla taxa, endast inom insekter, endast inom spindeldjur). Antalet replikat tillät statistisk testning av eventuella mönster i antal taxa och biodiversitetsindex. Inga signifikanta skillnader dokumenterades, varken mellan de olika marktyperna eller mellan positioner inom kraftledningsgator och närliggande skog. Vi tolkar dessa resultat som att kraftledningsgator med avseende på antal taxa respektive biodiversitet inte är (signifikant) sämre än skogs- eller betesmark. Det kan betonas att det inte heller fanns något konsekvent (icke-signifikant) mönster som pekade på att så skulle vara fallet. Antal taxa och biodiversitetsindex tar ingen hänsyn till vilka arter eller taxa som ingår i analyserna. En naturtyp som hyser en individ- och artrik fauna bestående av oönskade arter (introducerade arter, "skadedjur" etc.) registrerar t ex ett högre biodiversitetsindex än en naturtyp med fåtaligt förekommande rödlistade, skyddsvärda arter. Likhetsindex belyser bättre vilka arter som är inblandade. Visserligen tas inte heller här hänsyn till exakt vilka arter som ingår (eller deras eventuella önskvärdhet eller skyddsvärde), men ett högt index indikerar att samma arter förekommer i de jämförda naturtyperna. I denna studie indikerar ett högt likhetsindex dessutom att antalet individer av de inblandade arterna är likartat, eftersom ett index som tar hänsyn till abundans användes. Resultaten visade överlag höga likhetsindex, speciellt verkade kraftledningsgator och skogsmark hysa likartad evertebratfauna medan likheten mellan kraftledningsgator och betesmark var mindre uttalad. Sammanfattningsvis indikerar studien att kraftledningsgator inte verkar vara lågvärdiga livsmiljöer för de marklevande evertebrat-taxa som ingått i studien. Vi föreslår att kraftledningsgator med väl avvägda rutiner för röjning och skötsel skulle kunna spela en viktig roll i skapandet av artrika kantzoner eller marker som liknar ängs- eller betesmarker med svag hävd. / As a consequence of the intensification of various forms of human resource utilization rural Sweden has changed radically over the past 200 years from offering a varied and diverse landscape to a more monotonous, homogenised type of environment. This has led to fragmentation of habitats for many of the species occurring. Power line corridors might harbour habitats that resemble some of the now lost or fragmented habitats (e.g. grazed forest land and certain types of meadow), and could have important functions as refuge habitats and / or distribution corridors for these species. In a pitfall trap study in Köping and Strängnäs municipalities in Mälardalen, the occurrence and abundance of ground-living invertebrates were investigated in power line corridors, adjoining forest and pastures. Comparisons were made between these habitat types, and between positions within the power line corridor (central and distal parts) and the nearby forested area. The comparisons included analysis of number of species (or rather the number of taxa) and several biodiversity indexes, as well as analysis of the similarity of the occurrence of certain species and individual numbers by use of "similarity index". Separate tests were made in different taxonomic groups (e.g. all taxa, only within insects, only within spiders). The number of replicates allowed statistical testing of patterns in the number of taxa and biodiversity index. No significant differences were documented, neither between the different habitat types, nor between positions in the power line corridors and nearby forest. There was also no consistent (non-significant) pattern indicating that this would be the case. We suggest these results to indicate that power line corridors at least are not (significantly) poorer quality habitats than are forest or pasture land with regard to number of taxa and biodiversity. Number of taxa and biodiversity indices take no account of the species or taxa included in the analysis. Thus the same weight is assigned to an unwanted species (e.g. an invasive pest species or parasite) as to a red-listed, highly valued species. Similarity index takes more heed to the species involved. Although similarity indices do not consider the exact identity of involved species (or their possible value or desirability), a high index value indicates that the same species occur in the compared habitats. In this study, where an index that takes into account the abundance of species was used, a high similarity index value also indicates that the numbers of individuals are similar. Overall, the results showed high similarity between habitat types. This would suggest that, to a large extent, power lines, forest and pasture land had the same composition of taxa, and that the taxa had similar abundances. Power lines and forest seemed to exhibit particularly high similarities, whereas the similarity between power lines and pasture land was less pronounced. Thus, this study indicates, in contrast to several previous suggestions, that power line corridors do not seem to be low quality habitats. We also suggest that power line corridors with well designed management routines could play an important role creating edges and habitats resembling meadow or low intensity grazed pasture land.
|
663 |
Uma abordagem para identificação de domínios de aplicação em ambiente de convergência digitalVenceslau, Amanda Drielly Pires 23 July 2013 (has links)
Made available in DSpace on 2015-05-14T12:36:40Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 3026129 bytes, checksum: adbee1eaf596c14b444cb5c0d0379353 (MD5)
Previous issue date: 2013-07-23 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The emergence of the Interactive Digital Television provided, as well as advantages gain quality and optimization of the transmission, the addition of new features and services available to the user. With the advent of digital convergence between TV and Web platforms, new proposals of semantic organization of content are developed. Moreover, it was possible to introduce concepts of the Semantic Web and knowledge representation that allow semantically describe the metadata of content through ontologies. In this context, this work proposes an approach to identifying of domain of application in digital convergence environment based on the Semantic Web concepts and analysis of lexical and semantic similarity. One component integrated with Knowledge TV platform, was implemented to validate the approach. / O surgimento da Televisão Digital Interativa proporciona além de ganho de qualidade na transmissão, a adição de novos recursos e serviços disponíveis ao usuário. Com o advento da convergência digital entre as plataformas de TV e Web, novas propostas de organização semântica de conteúdo estão sendo desenvolvidas. Além disso, foi possível introduzir conceitos da Web Semântica e de representação do conhecimento que permitem descrever semanticamente os metadados de conteúdo através de ontologias. Nesse contexto, esse trabalho propõe uma abordagem para identificação de domínios de aplicação no ambiente de convergência digital baseada em conceitos da Web Semântica e nas análises de similaridade léxica e semântica. Um componente integrado a plataforma Knowledge TV, foi implementado para validar a abordagem.
|
664 |
Visipedia - Multi-dimensional Object Embedding Based on Perceptual Similarity / Visipedia - Multi-Dimensional Object Embedding Based on Perceptual SimilarityMatera, Tomáš January 2014 (has links)
Problémy jako je jemnozrnná kategorizace či výpočty s využitím lidských zdrojů se v posledních letech v komunitě stávají stále populárnějšími, což dosvědčuje i značné množství publikací na tato témata. Zatímco většina těchto prací využívá "klasických'' obrazových příznaků extrahovaných počítačem, tato se zaměřuje především na percepční vlastnosti, které nemohou být snadno zachyceny počítači a vyžadují zapojení lidí do procesu sběru dat. Práce zkoumá možnosti levného a efektivního získávání percepčních podobností od uživatelů rovněž ve vztahu ke škálovatelnosti. Dále vyhodnocuje několik relevantních experimentů a představuje metody zlepšující efektivitu sběru dat. Jsou zde také shrnuty a porovnány metody učení multidimenzionálního indexování a prohledávání tohoto prostoru. Získané výsledky jsou následně užity v komplexním experimentu vyhodnoceném na datasetu obrázků jídel. Procedura začíná získáváním podobností od uživatelů, pokračuje vytvořením multidimenzionálního prostoru jídel a končí prohledáváním tohoto prostoru.
|
665 |
Improving customer support efficiency through decision support powered by machine learningBoman, Simon January 2023 (has links)
More and more aspects of today’s healthcare are becoming integrated with medical technology and dependent on medical IT systems, which consequently puts stricter re-quirements on the companies delivering these solutions. As a result, companies delivering medical technology solutions need to spend a lot of resources maintaining high-quality, responsive customer support. In this report, possible ways of increasing customer support efficiency using machine learning and NLP is examined at Sectra, a medical technology company. This is done through a qualitative case study, where empirical data collection methods are used to elicit requirements and find ways of adding decision support. Next, a prototype is built featuring a ticket recommendation system powered by GPT-3 and based on 65 000 available support tickets, which is integrated with the customer supports workflow. Lastly, this is evaluated by having six end users test the prototype for five weeks, followed by a qualitative evaluation consisting of interviews, and a quantitative measurement of the user-perceivedusability of the proposed prototype. The results show some support that machine learning can be used to create decision support in a customer support context, as six out of six test users believed that their long-term efficiency could improve using the prototype in terms of reducing the average ticket resolution time. However, one out of the six test users expressed some skepticism towards the relevance of the recommendations generated by the system, indicating that improvements to the model must be made. The study also indicates that the use of state-of-the-art NLP models for semantic textual similarity can possibly outperform keyword searches.
|
666 |
Help Document Recommendation SystemVijay Kumar, Keerthi, Mary Stanly, Pinky January 2023 (has links)
Help documents are important in an organization to use the technology applications licensed from a vendor. Customers and internal employees frequently use and interact with the help documents section to use the applications and know about the new features and developments in them. Help documents consist of various knowledge base materials, question and answer documents and help content. In day- to-day life, customers go through these documents to set up, install or use the product. Recommending similar documents to the customers can increase customer engagement in the product and can also help them proceed without any hurdles. The main aim of this study is to build a recommendation system by exploring different machine-learning techniques to recommend the most relevant and similar help document to the user. To achieve this, in this study a hybrid-based recommendation system for help documents is proposed where the documents are recommended based on similarity of the content using content-based filtering and similarity between the users using collaborative filtering. Finally, the recommendations from content-based filtering and collaborative filtering are combined and ranked to form a comprehensive list of recommendations. The proposed approach is evaluated by the internal employees of the company and by external users. Our experimental results demonstrate that the proposed approach is feasible and provides an effective way to recommend help documents.
|
667 |
A comparison between Korean gas market and oil market in the consideration of South Korean gas market reformKo, Yeonseok 23 September 2014 (has links)
South Korea established a non-competitive natural gas market in order to have a stable and economical supply of natural gas. The allegation has been raised about the inefficiency of this non-competitive market structure, but reform attempts have failed because of protests. Proponents of this incumbent system argue that gas needs to be supplied by the public sector in a monopolized structure so as to have a stable supply of this essential good, natural gas, and to prevent market failures like exorbitant gas prices and a deficit in supply due to a natural monopoly. They also argue that the unified gas purchase endows purchasing power. However, the gas industry does not exactly meet the categorical characteristics of an essential good or a natural monopoly and the concept of purchasing power is hardly accepted. Moreover, according to agent theory and property theory, the current market and firms are likely to be inefficient; several events are proving this inefficiency to be true. However, people remain unsure about the necessity of gas market reform. Ironically, South Korea has a different policy and market approach to the oil market despite the similarity of these two fuels. The oil market in South Korea constitutes an effective competitive market via a liberalized market, and is supplying the fuel stably and economically, contrary to people’s expectations. This thesis contrasts different approaches in South Korea toward similar hydrocarbon fuels, oil and gas. The competitiveness of the oil market is examined through statistics, Lerner index, analyzing of the profit trend in the market, and price comparison by countries. Results support the validity of South Korean gas market reform if the oil market is effectively competitive through liberalization. / text
|
668 |
Les effets de la distance physique sur les processus attentionnels sont dépendants de la similarité distracteur-cible : étude à partir des potentiels reliés aux évènementsAubin, Sébrina 08 1900 (has links)
L’attention visuelle est un processus cognitif qui priorise le traitement de l’information visuelle d’une région particulière du champ visuel. En électroencéphalographie, la méthode des potentiels reliés aux évènements permet l’extraction de composantes associées à divers processus cognitifs. La N2pc, une composante latéralisée caractérisée par une déflexion négative entre 180 et 300 ms post-stimulus du côté controlatéral à l’hémichamp dans lequel l’attention est déployée, reflète les processus impliqués dans le déploiement de l’attention visuo-spatiale. De nombreuses études antérieures ont soulevé plusieurs facteurs pouvant moduler cette composante, provenant d’autant de processus de bas niveau que de processus de haut niveau. Cette présente étude comporte une série d’expériences qui approfondit les connaissances sur le rôle de l’attention sur le traitement et la représentation des items dans les champs récepteurs des aires extrastriées du cortex visuel. Ces études démontrent ainsi que l’attention peut effectivement éliminer l’influence d’un distracteur dissimilaire à la cible lorsque celui-ci se retrouve dans le même champ visuel que l’item auquel l’attention est attribuée. Cependant, lorsque l’item est similaire à la cible, son influence ne peut être éliminée. De plus, cette présente étude identifie le rôle des filtres précoces et tardifs de haut niveau sur la sélection attentionnelle. / Visual attention is a cognitive process that improves the limited capacity of the visual system by prioritising the processing of information within the attended area of the visual field. Using the event-related potentials method, components associated to such cognitive processes can be extracted from electroencephalographic activity. The N2pc, a lateralized component characterised by a negative deflection between 180 – 300 ms post-stimulus in the posterior electrodes of the hemisphere contralateral to the attended visual hemifield, reflects processes associated to the deployment of visuospatial attention. Previous studies have identified numerous factors, both from bottom-up and top-down influences, capable of modulating this component. The present study expands our understanding of attention on the processing of information from within and between receptive fields in the extrastriate visual cortex. Particularly, the present study shows that attention can be dissociated from salient items when these are dissimilar to the target and that their influence is eliminated when this particular item is located within the same receptive field as the attended item. Additionally, this study recognizes the influence of early and late target-filter processes on attentional selection.
|
669 |
An artificial intelligence approach to concatenative sound synthesisMohd Norowi, Noris January 2013 (has links)
Technological advancement such as the increase in processing power, hard disk capacity and network bandwidth has opened up many exciting new techniques to synthesise sounds, one of which is Concatenative Sound Synthesis (CSS). CSS uses data-driven method to synthesise new sounds from a large corpus of small sound snippets. This technique closely resembles the art of mosaicing, where small tiles are arranged together to create a larger image. A ‘target’ sound is often specified by users so that segments in the database that match those of the target sound can be identified and then concatenated together to generate the output sound. Whilst the practicality of CSS in synthesising sounds currently looks promising, there are still areas to be explored and improved, in particular the algorithm that is used to find the matching segments in the database. One of the main issues in CSS is the basis of similarity, as there are many perceptual attributes which sound similarity can be based on, for example it can be based on timbre, loudness, rhythm, and tempo and so on. An ideal CSS system needs to be able to decipher which of these perceptual attributes are anticipated by the users and then accommodate them by synthesising sounds that are similar with respect to the particular attribute. Failure to communicate the basis of sound similarity between the user and the CSS system generally results in output that mismatches the sound which has been envisioned by the user. In order to understand how humans perceive sound similarity, several elements that affected sound similarity judgment were first investigated. Of the four elements tested (timbre, melody, loudness, tempo), it was found that the basis of similarity is dependent on humans’ musical training where musicians based similarity on the timbral information, whilst non-musicians rely on melodic information. Thus, for the rest of the study, only features that represent the timbral information were included, as musicians are the target user for the findings of this study. Another issue with the current state of CSS systems is the user control flexibility, in particular during segment matching, where features can be assigned with different weights depending on their importance to the search. Typically, the weights (in some existing CSS systems that support the weight assigning mechanism) can only be assigned manually, resulting in a process that is both labour intensive and time consuming. Additionally, another problem was identified in this study, which is the lack of mechanism to handle homosonic and equidistant segments. These conditions arise when too few features are compared causing otherwise aurally different sounds to be represented by the same sonic values, or can also be a result of rounding off the values of the features extracted. This study addresses both of these problems through an extended use of Artificial Intelligence (AI). The Analysis Hierarchy Process (AHP) is employed to enable order dependent features selection, allowing weights to be assigned for each audio feature according to their relative importance. Concatenation distance is used to overcome the issues with homosonic and equidistant sound segments. The inclusion of AI results in a more intelligent system that can better handle tedious tasks and minimize human error, allowing users (composers) to worry less of the mundane tasks, and focusing more on the creative aspects of music making. In addition to the above, this study also aims to enhance user control flexibility in a CSS system and improve similarity result. The key factors that affect the synthesis results of CSS were first identified and then included as parametric options which users can control in order to communicate their intended creations to the system to synthesise. Comprehensive evaluations were carried out to validate the feasibility and effectiveness of the proposed solutions (timbral-based features set, AHP, and concatenation distance). The final part of the study investigates the relationship between perceived sound similarity and perceived sound interestingness. A new framework that integrates all these solutions, the query-based CSS framework, was then proposed. The proof-of-concept of this study, ConQuer, was developed based on this framework. This study has critically analysed the problems in existing CSS systems. Novel solutions have been proposed to overcome them and their effectiveness has been tested and discussed, and these are also the main contributions of this study.
|
670 |
Semi-automated co-reference identification in digital humanities collectionsCroft, David January 2014 (has links)
Locating specific information within museum collections represents a significant challenge for collection users. Even when the collections and catalogues exist in a searchable digital format, formatting differences and the imprecise nature of the information to be searched mean that information can be recorded in a large number of different ways. This variation exists not just between different collections, but also within individual ones. This means that traditional information retrieval techniques are badly suited to the challenges of locating particular information in digital humanities collections and searching, therefore, takes an excessive amount of time and resources. This thesis focuses on a particular search problem, that of co-reference identification. This is the process of identifying when the same real world item is recorded in multiple digital locations. In this thesis, a real world example of a co-reference identification problem for digital humanities collections is identified and explored. In particular the time consuming nature of identifying co-referent records. In order to address the identified problem, this thesis presents a novel method for co-reference identification between digitised records in humanities collections. Whilst the specific focus of this thesis is co-reference identification, elements of the method described also have applications for general information retrieval. The new co-reference method uses elements from a broad range of areas including; query expansion, co-reference identification, short text semantic similarity and fuzzy logic. The new method was tested against real world collections information, the results of which suggest that, in terms of the quality of the co-referent matches found, the new co-reference identification method is at least as effective as a manual search. The number of co-referent matches found however, is higher using the new method. The approach presented here is capable of searching collections stored using differing metadata schemas. More significantly, the approach is capable of identifying potential co-reference matches despite the highly heterogeneous and syntax independent nature of the Gallery, Library Archive and Museum (GLAM) search space and the photo-history domain in particular. The most significant benefit of the new method is, however, that it requires comparatively little manual intervention. A co-reference search using it has, therefore, significantly lower person hour requirements than a manually conducted search. In addition to the overall co-reference identification method, this thesis also presents: • A novel and computationally lightweight short text semantic similarity metric. This new metric has a significantly higher throughput than the current prominent techniques but a negligible drop in accuracy. • A novel method for comparing photographic processes in the presence of variable terminology and inaccurate field information. This is the first computational approach to do so.
|
Page generated in 0.0545 seconds