1 |
Semantically-enhanced image tagging systemRahuma, Awatef January 2013 (has links)
In multimedia databases, data are images, audio, video, texts, etc. Research interests in these types of databases have increased in the last decade or so, especially with the advent of the Internet and Semantic Web. Fundamental research issues vary from unified data modelling, retrieval of data items and dynamic nature of updates. The thesis builds on findings in Semantic Web and retrieval techniques and explores novel tagging methods for identifying data items. Tagging systems have become popular which enable the users to add tags to Internet resources such as images, video and audio to make them more manageable. Collaborative tagging is concerned with the relationship between people and resources. Most of these resources have metadata in machine processable format and enable users to use free- text keywords (so-called tags) as search techniques. This research references some tagging systems, e.g. Flicker, delicious and myweb2.0. The limitation with such techniques includes polysemy (one word and different meaning), synonymy (different words and one meaning), different lexical forms (singular, plural, and conjugated words) and misspelling errors or alternate spellings. The work presented in this thesis introduces semantic characterization of web resources that describes the structure and organization of tagging, aiming to extend the existing Multimedia Query using similarity measures to cater for collaborative tagging. In addition, we discuss the semantic difficulties of tagging systems, suggesting improvements in their accuracies. The scope of our work is classified as follows: (i) Increase the accuracy and confidence of multimedia tagging systems. (ii) Increase the similarity measures of images by integrating varieties of measures. To address the first shortcoming, we use the WordNet based on a tagging system for social sharing and retrieval of images as a semantic lingual ontology resource. For the second shortcoming we use the similarity measures in different ways to recognise the multimedia tagging system. Fundamental to our work is the novel information model that we have constructed for our computation. This is based on the fact that an image is a rich object that can be characterised and formulated in n-dimensions, each dimension contains valuable information that will help in increasing the accuracy of the search. For example an image of a tree in a forest contains more information than an image of the same tree but in a different environment. In this thesis we characterise a data item (an image) by a primary description, followed by n-secondary descriptions. As n increases, the accuracy of the search improves. We give various techniques to analyse data and its associated query. To increase the accuracy of the tagging system we have performed different experiments on many images using similarity measures and various techniques from VoI (Value of Information). The findings have shown the linkage/integration between similarity measures and that VoI improves searches and helps/guides a tagger in choosing the most adequate of tags.
|
2 |
Design and Implementation of an Optical Tag ReaderGummalla, Srikanth 17 April 2009 (has links)
No description available.
|
3 |
A generic architecture for semantic enhanced tagging systemsMagableh, Murad January 2011 (has links)
The Social Web, or Web 2.0, has recently gained popularity because of its low cost and ease of use. Social tagging sites (e.g. Flickr and YouTube) offer new principles for end-users to publish and classify their content (data). Tagging systems contain free-keywords (tags) generated by end-users to annotate and categorise data. Lack of semantics is the main drawback in social tagging due to the use of unstructured vocabulary. Therefore, tagging systems suffer from shortcomings such as low precision, lack of collocation, synonymy, multilinguality, and use of shorthands. Consequently, relevant contents are not visible, and thus not retrievable while searching in tag-based systems. On the other hand, the Semantic Web, so-called Web 3.0, provides a rich semantic infrastructure. Ontologies are the key enabling technology for the Semantic Web. Ontologies can be integrated with the Social Web to overcome the lack of semantics in tagging systems. In the work presented in this thesis, we build an architecture to address a number of tagging systems drawbacks. In particular, we make use of the controlled vocabularies presented by ontologies to improve the information retrieval in tag-based systems. Based on the tags provided by the end-users, we introduce the idea of adding “system tags” from semantic, as well as social, resources. The “system tags” are comprehensive and wide-ranging in comparison with the limited “user tags”. The system tags are used to fill the gap between the user tags and the search terms used for searching in the tag-based systems. We restricted the scope of our work to tackle the following tagging systems shortcomings: - The lack of semantic relations between user tags and search terms (e.g. synonymy, hypernymy), - The lack of translation mediums between user tags and search terms (multilinguality), - The lack of context to define the emergent shorthand writing user tags. To address the first shortcoming, we use the WordNet ontology as a semantic lingual resource from where system tags are extracted. For the second shortcoming, we use the MultiWordNet ontology to recognise the cross-languages linkages between different languages. Finally, to address the third shortcoming, we use tag clusters that are obtained from the Social Web to create a context for defining the meaning of shorthand writing tags. A prototype for our architecture was implemented. In the prototype system, we built our own database to host videos that we imported from real tag-based system (YouTube). The user tags associated with these videos were also imported and stored in the database. For each user tag, our algorithm adds a number of system tags that came from either semantic ontologies (WordNet or MultiWordNet), or from tag clusters that are imported from the Flickr website. Therefore, each system tag added to annotate the imported videos has a relationship with one of the user tags on that video. The relationship might be one of the following: synonymy, hypernymy, similar term, related term, translation, or clustering relation. To evaluate the suitability of our proposed system tags, we developed an online environment where participants submit search terms and retrieve two groups of videos to be evaluated. Each group is produced from one distinct type of tags; user tags or system tags. The videos in the two groups are produced from the same database and are evaluated by the same participants in order to have a consistent and reliable evaluation. Since the user tags are used nowadays for searching the real tag-based systems, we consider its efficiency as a criterion (reference) to which we compare the efficiency of the new system tags. In order to compare the relevancy between the search terms and each group of retrieved videos, we carried out a statistical approach. According to Wilcoxon Signed-Rank test, there was no significant difference between using either system tags or user tags. The findings revealed that the use of the system tags in the search is as efficient as the use of the user tags; both types of tags produce different results, but at the same level of relevance to the submitted search terms.
|
4 |
在社會網路上透過Tag-Thesaurus模型達到有效的資源彙整 / Resource Aggregation via Tag-Thesaurus model on Social Web宋昆銘 Unknown Date (has links)
我們從自然語言領域中借用Thesaurus模型作為字彙關聯的基礎,陸續加入Folksonomy概念、Social Network Service指標的蒐集以及domain-specific ontology來建構Tag-Thesaurus模型,用來解決使用一般tagging system資源彙整能力不足的問題。首先我們對將要實驗的領域選取初始字彙,並利用這些字彙建構Tag-Thesaurus模型。接著將預先準備的這些字彙釋放到社會網路服務平台的tagging system中,透過社會網路服務平台中的tagging system來蒐集使用者對於資源的平面分類資訊,利用這些資訊來對Tag-Thesaurus模型持續地擴充。透過這樣的Tag-Thesaurus模型,我們將可以獲得較佳的資源彙整。domain-specific ontology的加入將可以強化由上而下的資源彙整。而Social Network Service當中的其他資訊,如FOAF[16]或是個人的偏好等,將可以提昇個人化資源彙整的能力。這樣的結合方式不僅是ontology應用的示範,我們更希望透過這樣的混合式模型,使得Web 2.0這樣子廣泛蒐集眾人智慧的概念能夠成為跨入語意網的橋樑。 / We aggregate various resources through the Tag-Thesaurus Model. There are three parts in Tag-Thesaurus model, the Folksonomy formal model, indices collection on Social Network Service, and lightweight domain-specific ontology. The Folksnomy model reconstruct relationships between tags, and we can aggregate resources by tags. The indices collection on Social Network Service help us to decide which resource are more important. Finally, the lightweight domain-specific ontology provide the standard interface to describe the relationships between tags.
|
5 |
Coévolution d'organisations sociales et spatiales dans les systèmes multi-agents : application aux systèmes de tagging collaboratifs / Coevolution of social and spatial organizations in multi-agent systems : application to collaborative tagging systemsRupert, Maya 02 September 2009 (has links)
L’évolution du Web et de ses applications subit depuis quelques années une mutation vers les technologies qui incluent la dimension sociale comme entité de première classe. Nous témoignons dans le passage du Web 1.0 au Web 2.0 puis au Web 3.0, 4.0 etc.. que les utilisateurs et les réseaux sociaux qui se forment sont au centre de cette évolution. Le web exhibe aussi toutes les caractéristiques d’un système complexe. Ces propriétés systèmes complexes et cette dimension sociale doivent être prises en considération lors de la conception et le développement des applications web. Considérons le cas des systèmes de tagging ou d’étiquetage collaboratifs. Ces systèmes sont un exemple de systèmes complexes, auto-organisés et socialement conscients. Le paradigme des systèmes multi-agents coordonné par les mécanismes d’auto-organisations a été utilisé d’une façon effective pour la conception et modélisation des systèmes complexes. Les systèmes de tagging collaboratifs actuels ne prennent pas l’avantage complet de leurs caractéristiques systèmes complexes, surtout dans l’adaptation à leur environnement et l’émergence de nouvelles fonctionnalités. Dans ce travail de thèse, nous proposons un modèle pour la conception et développement d’un nouveau système d’étiquetage collaboratif MySURF (My Similar Users, Resources, Folksonomies), utilisant une approche multi-agents gouvernée par la coévolution des organisations sociales et spatiales des agents. Nous montrons comment ce système proposé offre plusieurs nouvelles fonctionnalités qui peuvent améliorer les systèmes d’étiquetage collaboratifs actuels. / The evolution of the Web and its applications has undergone in the last few years a mutation towards technologies that include the social dimension as a first class entity. We are witnessing in the evolution of the web from the web 1.0 to web 2.0 to web 3.0 and eventually web 4.0 that the users, their interactions and the emerging social networks are in the center of this evolution. The web also exhibits all the characteristics of a complex system. These complex systems properties and this social dimension must be taken into consideration in the design and the development of new web applications. Let us consider the case of collaborative tagging systems. These systems are an example of complex, self-organized and socially aware systems. The multi-agent systems paradigm coordinated by self-organizations mechanisms was used in an effective way for the design and modeling of the complex systems. Current collaborative tagging systems do not take full advantage of the characteristics of complex systems, especially in adapting to their environment and the emergence of new features. In this thesis, we propose a model for the design and development of a new collaborative tagging system MySURF (My Similar Users, Resources, Folksonomies), using a multi-agent system approach governed by the coevolution of the social and spatial organization of the agents. We show how the proposed system offers several new features that can improve current collaborative tagging systems.
|
6 |
Slowing down to speed up : protecting users against massive attacks in content distribution systems / Atrasar para aprimorar : protegendo usuários contra ataques massivos em sistemas de distribuição de conteúdoSantos, Flávio Roberto January 2013 (has links)
A Internet tem se tornado uma plataforma importante para interação e compartilhamento de arquivos, o que motivou uma crescente demanda por serviços eficientes. Sistemas de distribuição de conteúdo (CDS) precisaram ser criados visando modernidade e robustez. No contexto desta tese, CDS são definidos como sistemas usados para compartilhar qualquer tipo de conteúdo na Internet. Duas categorias de CDS se destacam como as mais populares: compartilhamento de arquivos e sistemas de mídia contínua. Arquiteturas par-a-par (P2P) surgiram como potenciais soluções para o aprimoramento da disseminação de conteúdo nos CDS. Nesse contexto, a popularização das arquiteturas P2P motivou a comunidade científica a investigar alguns aspectos de pesquisa desafiadores, e.g., otimização de topologias de redes, mecanismos de inicialização de sistemas e serviços de descoberta de recursos. Um desafio com interesse especial a esta tese diz respeito a mecanismos para conciliar a preferência dos usuários aos conteúdos publicados. Esse aspecto é importante para garantir uma boa qualidade de experiência (QoE) aos usuários dos sistemas, uma vez que podem existir divergências entre opiniões na descrição dos conteúdos e ações maliciosas. Esforços de pesquisa constantes têm sido feitos para combater poluição de conteúdo em CDS. Abordagens buscam construir uma base de conhecimento sobre poluidores e conteúdos poluídos para identificar e isolar conteúdos suspeitos depois que eles são publicados. Entretanto, o tempo de reação dessas abordagens até considerar um conteúdo poluído é consideravelmente longo, permitindo uma ampla disseminação de poluição. Além disso, algumas abordagens anteriores buscam polarizar conteúdos entre poluídos ou não, desconsiderando a intrínseca subjetividade acerca da classificação dos conteúdos compartilhados. O objetivo principal desta tese é propor um mecanismo para prover uma boa QoE aos usuários – agindo proativamente durante as fases iniciais da publicação dos conteúdos – e reduzir os efeitos de interferências maliciosas. Para alcançar tal objetivo, três passos principais guiaram o trabalho de pesquisa apresentado nesta tese. Primeiro, propusemos uma estratégia inovadora que opera de forma conservadora para conter a disseminação de poluição. Segundo, estendemos nossa solução para lidar com a subjetividade acerca das descrições dos conteúdos. Terceiro, tratamos o ataque de poluição como um ataque massivo. Para avaliar a solução, experimentos foram executados utilizando testes reais e simulações. Resultados ressaltaram a importância de adotar medidas de segurança para combater comportamentosmaliciosos em CDS. Na ausência de mecanismos de contramedida, pequenas proporções (10%) de atacantes foram capazes de comprometer o sistema. A instanciação da estratégia conservadora proposta nesta tese demonstrou a eficácia em atrasar usuários para contornar ataques massivos. / The Internet has become a large platform where users can interact and share personal files or third-party productions. Considering the increasing demand for efficient content sharing, modern and robust content distribution systems (CDS) need to be deployed and maintained. In the context of this thesis, CDS are defined as systems used for sharing any kind of content on the Internet. Two categories of CDS are underscored as the most popular ones: file sharing and streaming systems. Peer-to-peer (P2P) architectures have emerged as a potential solution to improve content dissemination in CDS. The popularization of P2P architectures, in the context of CDS,motivated the scientific community to investigate some challenging problems, namely network topology optimization, bootstrap mechanisms, and service discovery. One particular interesting challenge, in the context of this thesis, is related to mechanisms to approximate users to their personal interests. This is important to guarantee good quality of experience (QoE) to users when searching for content. Imprecise descriptions are likely to happen due to different users’ opinion or malicious behavior. Substantial research has been carried out to fight content pollution in CDS. Proposed approaches try to identify and isolate suspicious content after publication. The rationale is to build a base of knowledge about polluters and fake content. However, the reaction time until a content is considered polluted is considerably long, which allows pollution to get widely disseminated. Furthermore, some previous approaches attempt to polarize contents in either polluted or not, not taking into account the inherent subjectivity behind the evaluation of shared contents. The main objective of this thesis is to devise a mechanism to provide users a good QoE – by acting proactively in the early stages of content distribution life cycle – and reduce the effect of malicious interferences. To achieve that, three main steps guided the research work presented in this thesis. First, we proposed a novel strategy that operates conservatively to avoid wide pollution dissemination. Second, we extended our previous solution to cope with the subjectivity regarding content descriptions. Third, and last, we address the pollution attack as a massive attack. To evaluate our solution, a set of experiments was carried out using both real tests and simulations. Results showed the importance of adopting security measures to mitigate malicious behavior in CDS. In the absence of countermeasure mechanisms, even a small proportion (10%) of attackers was able to subvert the system. The introduction of a conservative strategy in this thesis demonstrated the efficacy of delaying users in circumventing massive attacks.
|
7 |
Slowing down to speed up : protecting users against massive attacks in content distribution systems / Atrasar para aprimorar : protegendo usuários contra ataques massivos em sistemas de distribuição de conteúdoSantos, Flávio Roberto January 2013 (has links)
A Internet tem se tornado uma plataforma importante para interação e compartilhamento de arquivos, o que motivou uma crescente demanda por serviços eficientes. Sistemas de distribuição de conteúdo (CDS) precisaram ser criados visando modernidade e robustez. No contexto desta tese, CDS são definidos como sistemas usados para compartilhar qualquer tipo de conteúdo na Internet. Duas categorias de CDS se destacam como as mais populares: compartilhamento de arquivos e sistemas de mídia contínua. Arquiteturas par-a-par (P2P) surgiram como potenciais soluções para o aprimoramento da disseminação de conteúdo nos CDS. Nesse contexto, a popularização das arquiteturas P2P motivou a comunidade científica a investigar alguns aspectos de pesquisa desafiadores, e.g., otimização de topologias de redes, mecanismos de inicialização de sistemas e serviços de descoberta de recursos. Um desafio com interesse especial a esta tese diz respeito a mecanismos para conciliar a preferência dos usuários aos conteúdos publicados. Esse aspecto é importante para garantir uma boa qualidade de experiência (QoE) aos usuários dos sistemas, uma vez que podem existir divergências entre opiniões na descrição dos conteúdos e ações maliciosas. Esforços de pesquisa constantes têm sido feitos para combater poluição de conteúdo em CDS. Abordagens buscam construir uma base de conhecimento sobre poluidores e conteúdos poluídos para identificar e isolar conteúdos suspeitos depois que eles são publicados. Entretanto, o tempo de reação dessas abordagens até considerar um conteúdo poluído é consideravelmente longo, permitindo uma ampla disseminação de poluição. Além disso, algumas abordagens anteriores buscam polarizar conteúdos entre poluídos ou não, desconsiderando a intrínseca subjetividade acerca da classificação dos conteúdos compartilhados. O objetivo principal desta tese é propor um mecanismo para prover uma boa QoE aos usuários – agindo proativamente durante as fases iniciais da publicação dos conteúdos – e reduzir os efeitos de interferências maliciosas. Para alcançar tal objetivo, três passos principais guiaram o trabalho de pesquisa apresentado nesta tese. Primeiro, propusemos uma estratégia inovadora que opera de forma conservadora para conter a disseminação de poluição. Segundo, estendemos nossa solução para lidar com a subjetividade acerca das descrições dos conteúdos. Terceiro, tratamos o ataque de poluição como um ataque massivo. Para avaliar a solução, experimentos foram executados utilizando testes reais e simulações. Resultados ressaltaram a importância de adotar medidas de segurança para combater comportamentosmaliciosos em CDS. Na ausência de mecanismos de contramedida, pequenas proporções (10%) de atacantes foram capazes de comprometer o sistema. A instanciação da estratégia conservadora proposta nesta tese demonstrou a eficácia em atrasar usuários para contornar ataques massivos. / The Internet has become a large platform where users can interact and share personal files or third-party productions. Considering the increasing demand for efficient content sharing, modern and robust content distribution systems (CDS) need to be deployed and maintained. In the context of this thesis, CDS are defined as systems used for sharing any kind of content on the Internet. Two categories of CDS are underscored as the most popular ones: file sharing and streaming systems. Peer-to-peer (P2P) architectures have emerged as a potential solution to improve content dissemination in CDS. The popularization of P2P architectures, in the context of CDS,motivated the scientific community to investigate some challenging problems, namely network topology optimization, bootstrap mechanisms, and service discovery. One particular interesting challenge, in the context of this thesis, is related to mechanisms to approximate users to their personal interests. This is important to guarantee good quality of experience (QoE) to users when searching for content. Imprecise descriptions are likely to happen due to different users’ opinion or malicious behavior. Substantial research has been carried out to fight content pollution in CDS. Proposed approaches try to identify and isolate suspicious content after publication. The rationale is to build a base of knowledge about polluters and fake content. However, the reaction time until a content is considered polluted is considerably long, which allows pollution to get widely disseminated. Furthermore, some previous approaches attempt to polarize contents in either polluted or not, not taking into account the inherent subjectivity behind the evaluation of shared contents. The main objective of this thesis is to devise a mechanism to provide users a good QoE – by acting proactively in the early stages of content distribution life cycle – and reduce the effect of malicious interferences. To achieve that, three main steps guided the research work presented in this thesis. First, we proposed a novel strategy that operates conservatively to avoid wide pollution dissemination. Second, we extended our previous solution to cope with the subjectivity regarding content descriptions. Third, and last, we address the pollution attack as a massive attack. To evaluate our solution, a set of experiments was carried out using both real tests and simulations. Results showed the importance of adopting security measures to mitigate malicious behavior in CDS. In the absence of countermeasure mechanisms, even a small proportion (10%) of attackers was able to subvert the system. The introduction of a conservative strategy in this thesis demonstrated the efficacy of delaying users in circumventing massive attacks.
|
8 |
Slowing down to speed up : protecting users against massive attacks in content distribution systems / Atrasar para aprimorar : protegendo usuários contra ataques massivos em sistemas de distribuição de conteúdoSantos, Flávio Roberto January 2013 (has links)
A Internet tem se tornado uma plataforma importante para interação e compartilhamento de arquivos, o que motivou uma crescente demanda por serviços eficientes. Sistemas de distribuição de conteúdo (CDS) precisaram ser criados visando modernidade e robustez. No contexto desta tese, CDS são definidos como sistemas usados para compartilhar qualquer tipo de conteúdo na Internet. Duas categorias de CDS se destacam como as mais populares: compartilhamento de arquivos e sistemas de mídia contínua. Arquiteturas par-a-par (P2P) surgiram como potenciais soluções para o aprimoramento da disseminação de conteúdo nos CDS. Nesse contexto, a popularização das arquiteturas P2P motivou a comunidade científica a investigar alguns aspectos de pesquisa desafiadores, e.g., otimização de topologias de redes, mecanismos de inicialização de sistemas e serviços de descoberta de recursos. Um desafio com interesse especial a esta tese diz respeito a mecanismos para conciliar a preferência dos usuários aos conteúdos publicados. Esse aspecto é importante para garantir uma boa qualidade de experiência (QoE) aos usuários dos sistemas, uma vez que podem existir divergências entre opiniões na descrição dos conteúdos e ações maliciosas. Esforços de pesquisa constantes têm sido feitos para combater poluição de conteúdo em CDS. Abordagens buscam construir uma base de conhecimento sobre poluidores e conteúdos poluídos para identificar e isolar conteúdos suspeitos depois que eles são publicados. Entretanto, o tempo de reação dessas abordagens até considerar um conteúdo poluído é consideravelmente longo, permitindo uma ampla disseminação de poluição. Além disso, algumas abordagens anteriores buscam polarizar conteúdos entre poluídos ou não, desconsiderando a intrínseca subjetividade acerca da classificação dos conteúdos compartilhados. O objetivo principal desta tese é propor um mecanismo para prover uma boa QoE aos usuários – agindo proativamente durante as fases iniciais da publicação dos conteúdos – e reduzir os efeitos de interferências maliciosas. Para alcançar tal objetivo, três passos principais guiaram o trabalho de pesquisa apresentado nesta tese. Primeiro, propusemos uma estratégia inovadora que opera de forma conservadora para conter a disseminação de poluição. Segundo, estendemos nossa solução para lidar com a subjetividade acerca das descrições dos conteúdos. Terceiro, tratamos o ataque de poluição como um ataque massivo. Para avaliar a solução, experimentos foram executados utilizando testes reais e simulações. Resultados ressaltaram a importância de adotar medidas de segurança para combater comportamentosmaliciosos em CDS. Na ausência de mecanismos de contramedida, pequenas proporções (10%) de atacantes foram capazes de comprometer o sistema. A instanciação da estratégia conservadora proposta nesta tese demonstrou a eficácia em atrasar usuários para contornar ataques massivos. / The Internet has become a large platform where users can interact and share personal files or third-party productions. Considering the increasing demand for efficient content sharing, modern and robust content distribution systems (CDS) need to be deployed and maintained. In the context of this thesis, CDS are defined as systems used for sharing any kind of content on the Internet. Two categories of CDS are underscored as the most popular ones: file sharing and streaming systems. Peer-to-peer (P2P) architectures have emerged as a potential solution to improve content dissemination in CDS. The popularization of P2P architectures, in the context of CDS,motivated the scientific community to investigate some challenging problems, namely network topology optimization, bootstrap mechanisms, and service discovery. One particular interesting challenge, in the context of this thesis, is related to mechanisms to approximate users to their personal interests. This is important to guarantee good quality of experience (QoE) to users when searching for content. Imprecise descriptions are likely to happen due to different users’ opinion or malicious behavior. Substantial research has been carried out to fight content pollution in CDS. Proposed approaches try to identify and isolate suspicious content after publication. The rationale is to build a base of knowledge about polluters and fake content. However, the reaction time until a content is considered polluted is considerably long, which allows pollution to get widely disseminated. Furthermore, some previous approaches attempt to polarize contents in either polluted or not, not taking into account the inherent subjectivity behind the evaluation of shared contents. The main objective of this thesis is to devise a mechanism to provide users a good QoE – by acting proactively in the early stages of content distribution life cycle – and reduce the effect of malicious interferences. To achieve that, three main steps guided the research work presented in this thesis. First, we proposed a novel strategy that operates conservatively to avoid wide pollution dissemination. Second, we extended our previous solution to cope with the subjectivity regarding content descriptions. Third, and last, we address the pollution attack as a massive attack. To evaluate our solution, a set of experiments was carried out using both real tests and simulations. Results showed the importance of adopting security measures to mitigate malicious behavior in CDS. In the absence of countermeasure mechanisms, even a small proportion (10%) of attackers was able to subvert the system. The introduction of a conservative strategy in this thesis demonstrated the efficacy of delaying users in circumventing massive attacks.
|
9 |
世界城市的概念輪廓與連結:以Flickr Tags為例 / The World Cities Concept Profiling And Concatenation:A Case Study On Flickr Tags曹期鈞, Tsao, Chi Chun Unknown Date (has links)
在這社會網路蓬勃發展之中、網際網路頻寬與速度相繼提昇的資訊年代,結合網路科技所衍生的Flickr網路相簿因應而生。Flickr提供許多API程式讓使用者或有興趣研究的專家學者能透過Flickr所收集及其所探討的議題,來觀察社會網路的變化情形。
社會網路主要是由節點以及節點間彼此相連結所形成,常見的網路模型大致可分為One-mode與Two-mode兩種網路結構,而本文則採用內部同時有兩種類節點、由兩個城市與Tags共同組合而成的Two-mode網路為基礎架構,期望藉此來闡述一個Tags系統分析法,利用Flickr使用者收集、標註之Flickr標記來與世界城市的概念輪廓相連結,透過提取城市語義分配給Flickr上照片的Tags,以及解決Part-Of-Speech (POS)、詞幹還原及雜訊處理…等問題,來達成依據排名結果分析出城市概念輪廓的最終目的。
除此之外,本文還運用了Flickr tag資料來彙整出41個城市的前100名tag,再篩選出前10名的tag,將其與相關的城市歸類一起比較。本文亦使用字詞共現指標(Tag co-occurrence)來計算與該城市的關聯性,再利用此法則來歸納出這兩個城市字詞共同出現的機會,以便於了解城市與城市之間的關連字詞組合。最後,本研究亦透過Flickr網站本身Popular Tags經由分析及匯出標籤雲的結果來與本文之實驗結果相對照,本實驗85%的吻合度驗證了可靠性。 / The Flickr Web Albums was born in the information age of social network growth, internet bandwidth and speed improvement. Users and researchers can observe the changing of social network from topics collected and studied by Flickr using API programs provided by Flickr.
The main structure of social network can be distinguished one-mode and two-mode network which is composed by nodes, generally. An approach for world cities concept profiling analysis is developed in this study by conbineing two types of nodes and two cities with tag which is the two-mode network using extracting city semantics for tags assigned to photos on Flickr, solving Part-of-Speech(POS), Stemming reduction and noise handing by collecting Flickr's tags from Flickr users.
The top 100 tags were slected for 41 cities and then top 10 tags for each city were also extracted. The Tag co-occurrence was also applied to analysis the relationship of cities. Then the connection between the cities can be understood by the result of tag co-occurrence opportunities. The 85% accurancy was demonstrated by comparing the result of analysised and exported Popular Tags from Flickr Website service and the result of experiments in this study.
|
Page generated in 0.1257 seconds