• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 386
  • 176
  • 42
  • 26
  • 26
  • 24
  • 20
  • 20
  • 12
  • 12
  • 9
  • 9
  • 9
  • 9
  • 9
  • Tagged with
  • 915
  • 212
  • 144
  • 140
  • 129
  • 103
  • 97
  • 84
  • 81
  • 81
  • 71
  • 70
  • 69
  • 67
  • 64
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
811

話題製作人: 從調適性新聞看使用者創新模式 / Topic News Producer : From Adaptive News of User Innovation Model

方效慈, Fang, Hsiao Tzu Unknown Date (has links)
創新於本世紀中為各大企業爭相研發或追求的珍貴資產,但此珍貴資產之所以致能領先市場、使企業居於永續的市場領導地位則是各個產業所致力研究的範疇。如何能達到永續的市場領導而居於市場不敗之優勢 ? 是須要透過具有符合市場劇變的創新?而其創新又須要具有如何的特性或是與時代性相關的演化性 ? 另,市場劇變的因素是否又與使用者以及代表新科技的載體有著密不可分的關係 ? 本研究將以一個現今仍居領導地位的網站實例來分析其致勝的原因,並且以質性研究的方法進行層層的頗析,揭開網路服務的神秘面紗。 網路服務的世界,豐富且多樣化,因此本研究將創新的研究範圍聚焦於新聞自製性內容的「話題新聞」為主軸,透過「話題新聞」的守門人:話題製作人,並且對應於創新三大構面:內容、載體與商業模式,進而探討其不斷創新是須要具有調適性的特質。而若以創新理論的精神為內涵,又同時須結合網路之「使用者為大」的雙重考量下,研究則以Henry教授的開放式創新與von Hipple教授的使用者創新,並為核心理論的依據。至此,本篇研究的架構清晰易見。 經由親身參與的田野調查與近期的資料蒐集,本研究將透過網路服務的特性與使用者溝通的互動方式,整理後發現其特殊的共創過程與調適性的特質。研究中更將揭露自製性內容的共創對於載體的影響,並且對應於資訊傳播模式的演化,爾後延伸至商業模式的多所變化。最終,希望透過本研究能提供台灣傳統新聞媒體及企業界創新的具體參考,不僅在理論面向得以印證及延伸,更於實務面提供操作的執行方向。 關鍵字:使用者創新,開放式創新:內容、載體與商業模式,調適性,新聞製 作,資訊傳播,經濟活動,話題新聞,製作人,行動研究,質性研 究。 / Innovation is the precious asset in this century for all the companies to seek, study or develop as it is the way to win or sustain the leading position where the enterprises are eager to reach. However, how to continue the innovation so as to retain the leadership? Or which attributes or evolutions the innovation should have to react to the market change? How the factors of market transformation correlate with the users and high-tech (electronics) devices? This study will use a global leading website as a sample and explore the elements of its success through the qualitative research method. The internet contains the diversified services. Thus the research selects the news reproduction as the main study subject. By observing the daily operation of the “Topic News” gatekeepers, Topic News producers, and analyzing three dimensions of innovation: contents, devices and business models, the study figures out the fact that the constant innovation should possess the characteristic of Adaptability. Furthermore, considering both the essence of Innovation theories and the internet user behavior, the study cites Open Innovation by Prof. Henry and User Innovation by Prof. von Hipple as the core theories to support the whole research. By participating in the field investigation and collecting the recent data, the study discovers the particular procedure of co-producing contents as well as the characteristic of Adaptability through researching on the internet service and its communication and interaction with the users. The study further reveals the impact which the reproduced content makes on the devices themselves, evolution of the information dissemination and even the types of the business models. The ultimate objective of the study is not only to prove the truth of the theories but also to contribute several suggestions and references towards the traditional news media and the enterprise innovation on the execution and operation directions. Key words : User Innovation, Open Innovation, Content, Devices and Business model, Adaptive, News re-production, Dissemination of information, Economic Activity, Topic News, Topic News Producer, Action Research, Qualitative Research.
812

語言與手勢之溝通動態性 / Communicative Dynamism in Language and Co-speech Gesture

楊婉君, Yang, Wan Chun Unknown Date (has links)
本研究探討溝通動態與手勢之關係。本研究相較於過去的研究提供了中文口語語料的量化結果,除了檢視語言上的溝通動態值與手勢出現、手勢類型的關係以外,並加入手勢特徵進而檢視。語言上的溝通動態值分別以「資訊狀態」與「主題連續性」來決定溝通動態值之高低,並且將手勢所伴隨的語詞依詞性分開分為名詞與動詞,檢視是否手勢上會依循語碼原則(the coding principle):當溝通動態值越高,所使用的代碼材料便越多;當溝通動態值越低,所使用的代碼材料越少。結果發現,以決定溝通動態值的標準來看,資訊狀態較主題連續性更能依語碼原則反應溝通動態值。原因是因為資訊狀態反應訊息的新舊差異性,而主題連續性反應的是舊訊息當中的不同的「舊」的程度差異,因此前者較能反應溝通動態與手勢之關係;而以手勢伴隨的語詞而言,動詞較名詞更能依語碼原則反應溝通動態值。因為動詞相較於名詞而言,在語言上無完整的語碼系統以反應溝通動態值之高低,因此倚賴手勢出現與手勢特徵來反應語言上的溝通動態值之高低。 / The study investigates the correlation between communicative dynamism (CD) and gesture. Different from the previous studies, the present study provides quantitative analysis based on Chinese conversational data. The study examines the correlation between CD in language and the occurence of gestures, gestural types and gestural features. The various degrees of CD are deteremined by two separate criteria, namely “information status” and “topic continuity”. Morover, the study also distinguishes between the nominal affiliates of gesture and the verbal counterparts. The study found that gestures occur at the two extremities of CD. Gestures tend to co-occur with linguistic elements bearing the highest or the lowest CD. In addition, based on the criterion of “information status”, stroke duration and handedness were found to reflect the various degrees of CD. On the other hand, based on the criterion of “topic continuity”, all gestural features including stroke duration, gestural space, handedness and stroke frequency have no correlation with CD.
813

Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval

Mooman, Abdelniser January 2012 (has links)
The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines.
814

Depression in primary care detection, treatment, and patients' own perspectives /

Hansson, Maja, January 2010 (has links)
Diss. (sammanfattning) Umeå : Umeå universitet, 2010.
815

The role of informational support in relation to health care service use among individuals newly diagnosed with cancer /

Dubois, Sylvie. January 2008 (has links)
Background: The relationship between informational support and use of health care services among individuals newly diagnosed with cancer remains little documented despite its importance for optimal care delivery. Aim: To document the role of informational support in light of patterns of health services used by women and men newly diagnosed with cancer. Method: A sequential mixed methods approach (i.e., quantitative-qualitative) was conducted among women and men newly diagnosed with either breast or prostate cancer. First, an existing quantitative database was used to determine whether an intervention relying on multimedia tool as a complement to the provision of usual cancer informational support to patients (N = 250) would modify subsequent health care service use. A follow-up qualitative inquiry with distinct individuals also newly diagnosed (N = 20) was conducted to explore this relationship further. Next, the resulting quantitative and qualitative findings were merged and reanalyzed using a quantitative-hierarchical approach to enhance our understanding of the phenomenon. Findings: Several personal and contextual factors were found to qualify the relationship between cancer informational support and health service use. Although quantitative analyses showed no significant differences in terms of overall reliance on health care services among participants who received more intense cancer informational support as opposed to those who received care as usual, several sex differences were noted in terms of number of visits to health care professionals, time spent with nurses and satisfaction with cancer information received. Qualitative findings revealed that participants reported a variety of experiences pertaining to cancer information received (e.g., positive, unsupportive or mixed) as well as several processes at play (e.g., cancer information seen as enabling, confirming, or conflicting). These differences in informational support, in turn, influenced their subsequent service utilization (e.g., more phone calls made to health professionals, reduction in face-to-face visits, reluctance to use cancer-related services). The mixed data analysis clarified further the findings allowing a broader perspective to emerge. Conclusion: Findings underscore that the relationship between cancer information and use of services is not as straightforward as initially anticipated. These findings provide initial insights that may inform future research on the topic and assist health care providers in optimizing their cancer informational interventions to guide patients in their reliance on health care services.
816

中文主題串英譯之研究 / A Study on the English Translation of Chinese Topic Chains

康恆銘, Heng-ming Kang Unknown Date (has links)
中文可說是個篇章導向的語言(discourse-oriented language),以篇章為其基本單位,數個中文子句不藉連接詞即可連結成主題串(topic chain)。當如此龐大的篇章單位翻譯成主語顯著的英文時,譯者會面臨的困難是,如何將主題串切割成數個英文句子。然而,此議題卻鮮少前人研究。因此,本論文試圖探討中文主題串英譯時的翻譯策略,並著重於斷句的影響因素。分析的語料來自漢英對照版的台灣光華雜誌以及翻譯教科書。為了瞭解翻譯策略,本研究分析中文的語意分段標記與資訊順序,並與譯文比較。分析結果顯示譯者在翻譯時會採用三種策略:反映段落標記(Reflecting the Markers)、建立語句關係(Establishing Textuality)、重整資訊(Rearranging Information)。第一個策略是將中文段落標記作為英譯斷句的依據。段落標記包括三類:主題的形式、連接詞、標點符號。第二個策略,建立語句關係,表示英譯斷句依據中文主題串各句子之間的篇章關係。第三個策略,重整資訊,表示透過增加、刪除、調換順序等方式調整原文的資訊。 / Chinese is considered a discourse-oriented language. The basic unit of the Chinese language is discourse-based. Several Chinese clauses can be linked together without any connectives to form a topic chain. When such a large discourse is translated into English, a subject-prominent language, translators may have difficulty deciding how to segment a Chinese topic chain into English sentences. However, little research has been done on this topic. The present study aims to explore translation strategies used in translating Chinese topic chains into English. In particular, the demarcation mechanism will be the focus. Chinese-to-English translation data from Taiwan Panorama, a Chinese-English bilingual magazine, and from translation textbooks are collected for analysis. The demarcation markers and information flow in Chinese are analyzed and compared to understand how they are treated in the English translation. Three strategies have been found: Reflecting the Markers, Establishing Textuality, and Rearranging Information. Reflecting the Markers is to reflect the Chinese boundary markers as English demarcation points. Boundary markers contain nominal references of topic, connectives, and punctuation marks. Establishing Textuality is to organize the Chinese topic chain based on the internal textual relationships. Rearranging Information is to add, delete, or reorder the information.
817

巨量資料環境下之新聞主題暨輿情與股價關係之研究 / A Study of the Relevance between News Topics & Public Opinion and Stock Prices in Big Data

張良杰, Chang, Liang Chieh Unknown Date (has links)
近年來科技、網路以及儲存媒介的發達,產生的資料量呈現爆炸性的成長,也宣告了巨量資料時代的來臨。擁有巨量資料代表了不必再依靠傳統抽樣的方式來蒐集資料,分析數據也不再有資料收集不足以致於無法代表母題的限制。突破傳統的限制後,巨量資料的精隨在於如何從中找出有價值的資訊。 以擁有大量輿論和人際互動資訊的社群網站為例,就有相關學者研究其情緒與股價具有正相關性,本研究也試著利用同樣具有巨量資料特性的網路新聞,抓取中央新聞社2013年7月至2014年5月之經濟類新聞共計30,879篇,結合新聞主題偵測與追蹤技術及情感分析,利用新聞事件相似的概念,透過連結匯聚成網絡並且分析新聞的情緒和股價指數的關係。 研究結果顯示,新聞事件間可以連結成一特定新聞主題,且能在龐大的網絡中找出不同的新聞主題,並透過新聞主題之連結產生新聞主題脈絡。對此提供一種新的方式來迅速了解巨量新聞內容,也能有效的回溯新聞主題及新聞事件。 在新聞情緒和股價指數方面,研究發現新聞情緒影響了股價指數之波動,其相關係數達到0.733562;且藉由情緒與心理線及買賣意願指標之比較,顯示新聞的情緒具有一定的程度能夠成為股價判斷之參考依據。 / In recent years, the technology, network, and storage media developed, the amount of generated data with the explosive growth, and also declared the new era of big data. Having big data let us no longer rely on the traditional sample ways to collect data, and no longer have the issue that could not represent the population which caused by the inadequate data collection. Once we break the limitations, the main spirit of big data is how to find out the valuable information in big data. For example, the social network sites (SNS) have a lot of public opinions and interpersonal information, and scholars have founded that the emotions in SNS have a positive correlation with stock prices. Therefore, the thesis tried to focus on the news which have the same characteristic of big data, using the web crawl to catch total of 30,879 economics news articles form the Central News Agency, furthermore, took the “Topic Detection & Tracking” and “Sentiment Analysis” technology on these articles. Finally, based on the concept of the similarity between news articles, through the links converging networks and analyze the relevant between news sentiment and stock prices. The results shows that news events can be linked to specific news topics, identify different news topics in a large network, and form the news topic context by linked news topics together. The thesis provides a new way to quickly understand the huge amount of news, and backtracking news topics and news event with effective. In the aspect of news sentiment and stock prices, the results shows that the news sentiments impact the fluctuations of stock prices, and the correlation coefficient is 0.733562. By comparing the emotion with psychological lines & trading willingness indicators, the emotion is better than the two indicators in the stock prices determination.
818

Multi-Agent User-Centric Specialization and Collaboration for Information Retrieval

Mooman, Abdelniser January 2012 (has links)
The amount of information on the World Wide Web (WWW) is rapidly growing in pace and topic diversity. This has made it increasingly difficult, and often frustrating, for information seekers to retrieve the content they are looking for as information retrieval systems (e.g., search engines) are unable to decipher the relevance of the retrieved information as it pertains to the information they are searching for. This issue can be decomposed into two aspects: 1) variability of information relevance as it pertains to an information seeker. In other words, different information seekers may enter the same search text, or keywords, but expect completely different results. It is therefore, imperative that information retrieval systems possess an ability to incorporate a model of the information seeker in order to estimate the relevance and context of use of information before presenting results. Of course, in this context, by a model we mean the capture of trends in the information seeker's search behaviour. This is what many researchers refer to as the personalized search. 2) Information diversity. Information available on the World Wide Web today spans multitudes of inherently overlapping topics, and it is difficult for any information retrieval system to decide effectively on the relevance of the information retrieved in response to an information seeker's query. For example, the information seeker who wishes to use WWW to learn about a cure for a certain illness would receive a more relevant answer if the search engine was optimized into such domains of topics. This is what is being referred to in the WWW nomenclature as a 'specialized search'. This thesis maintains that the information seeker's search is not intended to be completely random and therefore tends to portray itself as consistent patterns of behaviour. Nonetheless, this behaviour, despite being consistent, can be quite complex to capture. To accomplish this goal the thesis proposes a Multi-Agent Personalized Information Retrieval with Specialization Ontology (MAPIRSO). MAPIRSO offers a complete learning framework that is able to model the end user's search behaviour and interests and to organize information into categorized domains so as to ensure maximum relevance of its responses as they pertain to the end user queries. Specialization and personalization are accomplished using a group of collaborative agents. Each agent employs a Reinforcement Learning (RL) strategy to capture end user's behaviour and interests. Reinforcement learning allows the agents to evolve their knowledge of the end user behaviour and interests as they function to serve him or her. Furthermore, REL allows each agent to adapt to changes in an end user's behaviour and interests. Specialization is the process by which new information domains are created based on existing information topics, allowing new kinds of content to be built exclusively for information seekers. One of the key characteristics of specialization domains is the seeker centric - which allows intelligent agents to create new information based on the information seekers' feedback and their behaviours. Specialized domains are created by intelligent agents that collect information from a specific domain topic. The task of these specialized agents is to map the user's query to a repository of specific domains in order to present users with relevant information. As a result, mapping users' queries to only relevant information is one of the fundamental challenges in Artificial Intelligent (AI) and machine learning research. Our approach employs intelligent cooperative agents that specialize in building personalized ontology information domains that pertain to each information seeker's specific needs. Specializing and categorizing information into unique domains is one of the challenge areas that have been addressed and various proposed solutions were evaluated and adopted to address growing information. However, categorizing information into unique domains does not satisfy each individualized information seeker. Information seekers might search for similar topics, but each would have different interests. For example, medical information of a specific medical domain has different importance to both the doctor and patients. The thesis presents a novel solution that will resolve the growing and diverse information by building seeker centric specialized information domains that are personalized through the information seekers' feedback and behaviours. To address this challenge, the research examines the fundamental components that constitute the specialized agent: an intelligent machine learning system, user input queries, an intelligent agent, and information resources constructed through specialized domains. Experimental work is reported to demonstrate the efficiency of the proposed solution in addressing the overlapping information growth. The experimental work utilizes extensive user-centric specialized domain topics. This work employs personalized and collaborative multi learning agents and ontology techniques thereby enriching the queries and domains of the user. Therefore, experiments and results have shown that building specialized ontology domains, pertinent to the information seekers' needs, are more precise and efficient compared to other information retrieval applications and existing search engines.
819

Weaving the semantic web: Contributions and insights

Cregan, Anne, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links)
The semantic web aims to make the meaning of data on the web explicit and machine processable. Harking back to Leibniz in its vision, it imagines a world of interlinked information that computers `understand' and `know' how to process based on its meaning. Spearheaded by the World Wide Web Consortium, ontology languages OWL and RDF form the core of the current technical offerings. RDF has successfully enabled the construction of virtually unlimited webs of data, whilst OWL gives the ability to express complex relationships between RDF data triples. However, the formal semantics of these languages limit themselves to that aspect of meaning that can be captured by mechanical inference rules, leaving many open questions as to other aspects of meaning and how they might be made machine processable. The Semantic Web has faced a number of problems that are addressed by the included publications. Its germination within academia, and logical semantics has seen it struggle to become familiar, accessible and implementable for the general IT population, so an overview of semantic technologies is provided. Faced with competing `semantic' languages, such as the ISO's Topic Map standards, a method for building ISO-compliant Topic Maps in the OWL DL language has been provided, enabling them to take advantage of the more mature OWL language and tools. Supplementation with rules is needed to deal with many real-world scenarios and this is explored as a practical exercise. The available syntaxes for OWL have hindered domain experts in ontology building, so a natural language syntax for OWL designed for use by non-logicians is offered and compared with similar offerings. In recent years, proliferation of ontologies has resulted in far more than are needed in any given domain space, so a mechanism is proposed to facilitate the reuse of existing ontologies by giving contextual information and leveraging social factors to encourage wider adoption of common ontologies and achieve interoperability. Lastly, the question of meaning is addressed in relation to the need to define one's terms and to ground one's symbols by anchoring them effectively, ultimately providing the foundation for evolving a `Pragmatic Web' of action.
820

Appariement de contenus textuels dans le domaine de la presse en ligne : développement et adaptation d'un système de recherche d'information / Pairing textual content in the field of on-line news : development and adaptation of an information retrieval system

Désoyer, Adèle 27 November 2017 (has links)
L'objectif de cette thèse, menée dans un cadre industriel, est d'apparier des contenus textuels médiatiques. Plus précisément, il s'agit d'apparier à des articles de presse en ligne des vidéos pertinentes, pour lesquelles nous disposons d'une description textuelle. Notre problématique relève donc exclusivement de l'analyse de matériaux textuels, et ne fait intervenir aucune analyse d'image ni de langue orale. Surviennent alors des questions relatives à la façon de comparer des objets textuels, ainsi qu'aux critères mobilisés pour estimer leur degré de similarité. L'un de ces éléments est selon nous la similarité thématique de leurs contenus, autrement dit le fait que deux documents doivent relater le même sujet pour former une paire pertinente. Ces problématiques relèvent du domaine de la recherche d'information (ri), dans lequel nous nous ancrons principalement. Par ailleurs, lorsque l'on traite des contenus d'actualité, la dimension temporelle est aussi primordiale et les problématiques qui l'entourent relèvent de travaux ayant trait au domaine du topic detection and tracking (tdt) dans lequel nous nous inscrivons également.Le système d'appariement développé dans cette thèse distingue donc différentes étapes qui se complètent. Dans un premier temps, l'indexation des contenus fait appel à des méthodes de traitement automatique des langues (tal) pour dépasser la représentation classique des textes en sac de mots. Ensuite, deux scores sont calculés pour rendre compte du degré de similarité entre deux contenus : l'un relatif à leur similarité thématique, basé sur un modèle vectoriel de ri; l'autre à leur proximité temporelle, basé sur une fonction empirique. Finalement, un modèle de classification appris à partir de paires de documents, décrites par ces deux scores et annotées manuellement, permet d'ordonnancer les résultats.L'évaluation des performances du système a elle aussi fait l'objet de questionnements dans ces travaux de thèse. Les contraintes imposées par les données traitées et le besoin particulier de l'entreprise partenaire nous ont en effet contraints à adopter une alternative au protocole classique d'évaluation en ri, le paradigme de Cranfield. / The goal of this thesis, conducted within an industrial framework, is to pair textual media content. Specifically, the aim is to pair on-line news articles to relevant videos for which we have a textual description. The main issue is then a matter of textual analysis, no image or spoken language analysis was undertaken in the present study. The question that arises is how to compare these particular objects, the texts, and also what criteria to use in order to estimate their degree of similarity. We consider that one of these criteria is the topic similarity of their content, in other words, the fact that two documents have to deal with the same topic to form a relevant pair. This problem fall within the field of information retrieval (ir) which is the main strategy called upon in this research. Furthermore, when dealing with news content, the time dimension is of prime importance. To address this aspect, the field of topic detection and tracking (tdt) will also be explored.The pairing system developed in this thesis distinguishes different steps which complement one another. In the first step, the system uses natural language processing (nlp) methods to index both articles and videos, in order to overcome the traditionnal bag-of-words representation of texts. In the second step, two scores are calculated for an article-video pair: the first one reflects their topical similarity and is based on a vector space model; the second one expresses their proximity in time, based on an empirical function. At the end of the algorithm, a classification model learned from manually annotated document pairs is used to rank the results.Evaluation of the system's performances raised some further questions in this doctoral research. The constraints imposed both by the data and the specific need of the partner company led us to adapt the evaluation protocol traditionnal used in ir, namely the cranfield paradigm. We therefore propose an alternative solution for evaluating the system that takes all our constraints into account.

Page generated in 0.1073 seconds