Spelling suggestions: "subject:"[een] DOCUMENT"" "subject:"[enn] DOCUMENT""
501 |
A Clustering-based Approach to Document-Category IntegrationCheng, Tsang-Hsiang 04 September 2003 (has links)
E-commerce applications generate and consume tremendous amount of online information that is typically available as textual documents. Observations of textual document management practices by organizations or individuals suggest the popularity of using categories (or category hierarchies) to organize, archive and access documents. On the other hand, an organization (or individual) also constantly acquires new documents from various Internet sources. Consequently, integration of relevant categorized documents into existent categories of the organization (or individual) becomes an important issue in the e-commerce era. Existing categorization-based approach for document-category integration (specifically, the Enhanced Naïve Bayes classifier) incurs several limitations, including homogeneous assumption on categorization schemes used by master and source catalogs and requirement for a large-sized master categories as training data. In this study, we developed a Clustering-based Category Integration (CCI) technique to deal with integrating two document catalogs each of which is organized non-hierarchically (i.e., in a flat set). Using the Enhanced Naïve Bayes classifier as benchmarks, the empirical evaluation results showed that the proposed CCI technique appeared to improve the effectiveness of document-category integration accuracy in different integration scenarios and seemed to be less sensitive to the size of master categories than the categorization-based approach.
Furthermore, to integrate the document categories that are organized hierarchically, we proposed a Clustering-based category-Hierarchy Integration (referred to as CHI) technique extended the CCI technique and for category-hierarchy integration. The empirical evaluation results showed that the CHI technique appeared to improve the effectiveness of hierarchical document-category integration than that attained by CCI under homogeneous and comparable scenarios.
|
502 |
Muitinės informacinės sistemos funkcionavimo ir teisinio reguliavimo analizė e-muitinės kontekste / Analysis of the functioning of the customs information system and its legal regulation in the context of e-customsJakavonis, Petras 22 January 2008 (has links)
Magistro baigiamajame darbe nagrinėjama muitinės informacinės sistemos transformavimo bei pritaikymo veikti e-muitinės sąlygomis problema. Išsikeltas tikslas yra sąlygojamas prieštaravimo tarp galiojančio muitinės veiklos teisinio reglamentavimo ir sparčiai besikeičiančių globalaus verslo sąlygų. Būtinybė visus prekybos veiksmus atlikti elektroninėje terpėje reikalauja papildyti muitinės informacinę sistemą trūkstamais elementais bei keisti muitinės informacinės sistemos funkcionavimo teisinę aplinką. Norint suformuluoti šios problemos sprendimo siūlymus išnagrinėta mokslinėje literatūroje aptinkama e-muitinės elementų (elektroninių duomenų mainų, e-eksporto bei kt.) taikymo praktika įvairiose šalyse. Naudojant duomenų analizės metodą atlikta muitinės informacinės sistemos funkcionavimo analizė, parengta statistinių duomenų analizė lyginant gautus duomenis su kitomis ES šalimis. Išanalizavus teorinius bei praktinius muitinės informacinės sistemos funkcionavimo e-muitinės kontekste aspektus baigiamojoje dalyje atlikta galiojančių ES, Lietuvos, bei Muitinės departamento norminių teisės aktų analizė ir nustatytos teisinės reglamentavimo spragos trukdančios e-muitinės efektyviam veikimui bei plėtojimui. Apibendrinus iš kiekvienos dalies gautus rezultatu pateikiamos išvados bei problemos sprendimo siūlymai. / The problem of the Customs Information System transformation and its adaptation to operate under the conditions of e-customs is being analyzed in the master‘s final work. The aim of this work is defined by contradiction between the applicable legal regulations for the activities of the Customs and rapidly changing conditions of global business. The necessity to perform all trade actions in the electronic medium requires to supplement the Customs Information System with the missing elements and to change the legal environment for the operation of the Customs Information System.
In order to provide offers for the solution of this problem, practice of application of e-customs elements (electronic data exchange, e-export, etc) in different countries has been analyzed in the scientific literature. Using a method of data analysis, Customs Information System analysis has been made, and analysis of the statistical data has been established by comparing the received results with the other EU countries.
After the analysis of the theoretical and practical aspects of the operation of the Customs Information System in the context of e-customs had been made, the final part of the work has been dedicated to the analysis of the applicable regulatory legislative acts of the EU, Lithuania and the Customs Department; and legal gaps of the regulations which impede efficient operation and development of e-customs has been defined.
After summarizing the results of every part, conclusions and... [to full text]
|
503 |
Extraction de lexiques bilingues à partir de corpus comparablesHazem, Amir 11 October 2013 (has links) (PDF)
La plupart des travaux en acquisition de lexiques bilingues à partir de corpus comparables reposent sur l'hypothèse distributionnelle qui a été étendue au scénario bilingue. Deux mots ont de fortes chances d'être en relation de traduction s'ils apparaissent dans les mêmes contextes lexicaux. Ce postulat suppose donc une définition claire et rigoureuse du contexte et une connaissance parfaite des indices contextuels. Or, la complexité et les spécificités de chaque langue font qu'il n'est pas aisé d'énoncer une telle définition qui garantisse une extraction de couples de traductions, efficace dans tous les cas de figure. Toute la difficulté réside dans la manière de définir, d'extraire et de comparer ces contextes dans le but de construire des lexiques bilingues fiables. Nous nous efforcerons tout au long des différents chapitres de cette thèse à essayer de mieux comprendre cette notion de contexte, pour ensuite l'étendre et l'adapter afin d'améliorer la qualité des lexiques bilingues. Une première partie des contributions vise à améliorer l'approche directe qui fait office de référence dans la communauté. Nous proposerons plusieurs manières d'aborder le contexte des mots pour mieux les caractériser. Dans la deuxième partie des contributions, nous commencerons par présenter une approche qui vise à améliorer l'approche par similarité inter-langue. Ensuite, une méthode nommée Q-Align, directement inspirée des systèmes de question/réponse sera présentée. Enfin, nous présenterons plusieurs transformations mathématiques et donc plusieurs représentations vectorielles, pour nous concentrer essentiellement sur celles que nous aurons choisi pour développer une nouvelle méthode d'alignement.
|
504 |
Localisation interne et en contexte des logiciels commerciaux et libresFraisse, Amel 10 June 2010 (has links) (PDF)
Nous proposons une méthode novatrice pour permettre la localisation en contexte de la majorité des logiciels commerciaux et libres, ceux programmés en Java et en C++/C#. Actuellement, la traduction des documents techniques ainsi que celle des éléments d'interface des logiciels commerciaux est confiée uniquement à des professionnels, ce qui allonge le processus de traduction, le rend coûteux, et quelquefois aboutit à une mauvaise qualité car les traducteurs professionnels n'ont pas accès au contexte d'utilisation des éléments textuels. Dès que l'on sort du petit ensemble des quelques langues les mieux dotées, et que lon veut localiser un logiciel pour des " langues peu dotées ", ce processus n'est plus viable pour des raisons de coût et surtout de rareté, de cherté, ou d'absence de traducteurs professionnels. Notre méthode consiste à faire participer de façon efficace et dynamique les bêta-testeurs et les utilisateurs finals au processus de localisation : pendant qu'ils utilisent l'application, les utilisateurs connaissant la langue originale du logiciel (souvent mais pas toujours l'anglais) peuvent intervenir sur les éléments textuels d'interface que l'application leur présente dans leur contexte d'utilisation courant. Ils peuvent ainsi traduire en contexte les boutons, les menus, les étiquettes, les onglets, etc., ou améliorer la traduction proposée par des systèmes de traduction automatique (TA) ou des mémoires de traductions (MT). Afin de mettre en place ce nouveau paradigme, nous avons besoin d'intervenir très localement sur le code source du logiciel : il s'agit donc aussi d'un paradigme de localisation interne. La mise en place d'une telle approche de localisation a nécessité l'intégration d'un gestionnaire de flot de traductions " SECTra_w ". Ainsi, nous avons un nouveau processus de localisation tripartite dont les trois parties sont l'utilisateur, l'éditeur du logiciel et le site collaboratif SECTra_w. Nous avons effectué une expérimentation complète du nouveau processus de localisation sur deux logiciels libres à code source ouvert : Notepad-plus-plus et Vuze.
|
505 |
Distributed Document Clustering and Cluster Summarization in Peer-to-Peer EnvironmentsHammouda, Khaled M. January 2007 (has links)
This thesis addresses difficult challenges in distributed document clustering and cluster summarization. Mining large document collections poses many challenges, one of which is the extraction of topics or summaries from documents for the purpose of interpretation of clustering results. Another important challenge, which is caused by new trends in distributed repositories and peer-to-peer computing, is that document data is becoming more distributed.
We introduce a solution for interpreting document clusters using keyphrase extraction from multiple documents simultaneously. We also introduce two solutions for the problem of distributed document clustering in peer-to-peer environments, each satisfying a different goal: maximizing local clustering quality through collaboration, and maximizing global clustering quality through cooperation.
The keyphrase extraction algorithm efficiently extracts and scores candidate keyphrases from a document cluster. The algorithm is called CorePhrase and is based on modeling document collections as a graph upon which we can leverage graph mining to extract frequent and significant phrases, which are used to label the clusters. Results show that CorePhrase can extract keyphrases relevant to documents in a cluster with very high accuracy. Although this algorithm can be used to summarize centralized clusters, it is specifically employed within distributed clustering to both boost distributed clustering accuracy, and to provide summaries for distributed clusters.
The first method for distributed document clustering is called collaborative peer-to-peer document clustering, which models nodes in a peer-to-peer network as collaborative nodes with the goal of improving the quality of individual local clustering solutions. This is achieved through the exchange of local cluster summaries between peers, followed by recommendation of documents to be merged into remote clusters. Results on large sets of distributed document collections show that: (i) such collaboration technique achieves significant improvement in the final clustering of individual nodes; (ii) networks with larger number of nodes generally achieve greater improvements in clustering after collaboration relative to the initial clustering before collaboration, while on the other hand they tend to achieve lower absolute clustering quality than networks with fewer number of nodes; and (iii) as more overlap of the data is introduced across the nodes, collaboration tends to have little effect on improving clustering quality.
The second method for distributed document clustering is called hierarchically-distributed document clustering. Unlike the collaborative model, this model aims at producing one clustering solution across the whole network. It specifically addresses scalability of network size, and consequently the distributed clustering complexity, by modeling the distributed clustering problem as a hierarchy of node neighborhoods. Summarization of the global distributed clusters is achieved through a distributed version of the CorePhrase algorithm. Results on large document sets show that: (i) distributed clustering accuracy is not affected by increasing the number of nodes for networks of single level; (ii) we can achieve decent speedup by making the hierarchy taller, but on the expense of clustering quality which degrades as we go up the hierarchy; (iii) in networks that grow arbitrarily, data gets more fragmented across neighborhoods causing poor centroid generation, thus suggesting we should not increase the number of nodes in the network beyond a certain level without increasing the data set size; and (iv) distributed cluster summarization can produce accurate summaries similar to those produced by centralized summarization.
The proposed algorithms offer high degree of flexibility, scalability, and interpretability of large distributed document collections. Achieving the same results using current methodologies require centralization of the data first, which is sometimes not feasible.
|
506 |
A Cots-software Requirements Elicitation Method From Business Process ModelsAslan, Ercan 01 January 2003 (has links) (PDF)
In this thesis, COTS-software requirements elicitation, which is an input for RFP in software intensive automation system&rsquo / s acquisition, is examined. Business Process Models are used for COTS-software requirements elicitation. A new method, namely CREB, is developed to meet the requirements of COTS-software. A software intensive system acquisition of a military organization is used to validate the method.
|
507 |
How do people manage their documents?: an empirical investigation into personal document management practices among knowledge workersHenderson, Sarah January 2009 (has links)
Personal document management is the activity of managing a collection of digital documents performed by the owner of the documents, and consists of creation/acquisition, organisation, finding and maintenance. Document management is a pervasive aspect of digital work, but has received relatively little attention from researchers. The hierarchical file system used by most people to manage their documents has not conceptually changed in decades. Although revolutionary prototypes have been developed, these have not been grounded in a thorough understanding of document management behaviour and therefore have not resulted in significant changes to document management interfaces. Improvements in understanding document management can result in productivity gains for knowledge workers, and since document management is such a common activity, small improvements can deliver large gains. The aim of this research was to understand how people manage their personal document collections and to develop guidelines for the development of tools to support personal document management. A field study was conducted that included interviews, a survey and file system snapshot. The interviews were conducted with ten participants to investigate their document management strategies, structures and struggles. In addition to qualitative analysis of semi-structured interviews, a novel investigation technique was developed in the form of a file system snapshot which collects information about document structures and derives a number of metrics which describe the document structure. A survey was also conducted, consisting of a questionnaire and a file system snapshot, which enabled the findings of the field study to be validated, and to collect information from a greater number of participants. The results of this research culminated in (1) development of a conceptual framework highlighting the key personal document management attitudes, behaviours and concerns; (2) model of basic operations that any document management system needs to provide; (3) identification of piling, filing and structuring as three key document management strategies; (4) guidelines for the development of user interfaces to support document management, including specific guidelines for each document management strategy. These contributions both improve knowledge of personal document management on which future research can build, and provide practical advice to document management system designers which should result in the development of more usable system.
|
508 |
How do people manage their documents?: an empirical investigation into personal document management practices among knowledge workersHenderson, Sarah January 2009 (has links)
Personal document management is the activity of managing a collection of digital documents performed by the owner of the documents, and consists of creation/acquisition, organisation, finding and maintenance. Document management is a pervasive aspect of digital work, but has received relatively little attention from researchers. The hierarchical file system used by most people to manage their documents has not conceptually changed in decades. Although revolutionary prototypes have been developed, these have not been grounded in a thorough understanding of document management behaviour and therefore have not resulted in significant changes to document management interfaces. Improvements in understanding document management can result in productivity gains for knowledge workers, and since document management is such a common activity, small improvements can deliver large gains. The aim of this research was to understand how people manage their personal document collections and to develop guidelines for the development of tools to support personal document management. A field study was conducted that included interviews, a survey and file system snapshot. The interviews were conducted with ten participants to investigate their document management strategies, structures and struggles. In addition to qualitative analysis of semi-structured interviews, a novel investigation technique was developed in the form of a file system snapshot which collects information about document structures and derives a number of metrics which describe the document structure. A survey was also conducted, consisting of a questionnaire and a file system snapshot, which enabled the findings of the field study to be validated, and to collect information from a greater number of participants. The results of this research culminated in (1) development of a conceptual framework highlighting the key personal document management attitudes, behaviours and concerns; (2) model of basic operations that any document management system needs to provide; (3) identification of piling, filing and structuring as three key document management strategies; (4) guidelines for the development of user interfaces to support document management, including specific guidelines for each document management strategy. These contributions both improve knowledge of personal document management on which future research can build, and provide practical advice to document management system designers which should result in the development of more usable system.
|
509 |
How do people manage their documents?: an empirical investigation into personal document management practices among knowledge workersHenderson, Sarah January 2009 (has links)
Personal document management is the activity of managing a collection of digital documents performed by the owner of the documents, and consists of creation/acquisition, organisation, finding and maintenance. Document management is a pervasive aspect of digital work, but has received relatively little attention from researchers. The hierarchical file system used by most people to manage their documents has not conceptually changed in decades. Although revolutionary prototypes have been developed, these have not been grounded in a thorough understanding of document management behaviour and therefore have not resulted in significant changes to document management interfaces. Improvements in understanding document management can result in productivity gains for knowledge workers, and since document management is such a common activity, small improvements can deliver large gains. The aim of this research was to understand how people manage their personal document collections and to develop guidelines for the development of tools to support personal document management. A field study was conducted that included interviews, a survey and file system snapshot. The interviews were conducted with ten participants to investigate their document management strategies, structures and struggles. In addition to qualitative analysis of semi-structured interviews, a novel investigation technique was developed in the form of a file system snapshot which collects information about document structures and derives a number of metrics which describe the document structure. A survey was also conducted, consisting of a questionnaire and a file system snapshot, which enabled the findings of the field study to be validated, and to collect information from a greater number of participants. The results of this research culminated in (1) development of a conceptual framework highlighting the key personal document management attitudes, behaviours and concerns; (2) model of basic operations that any document management system needs to provide; (3) identification of piling, filing and structuring as three key document management strategies; (4) guidelines for the development of user interfaces to support document management, including specific guidelines for each document management strategy. These contributions both improve knowledge of personal document management on which future research can build, and provide practical advice to document management system designers which should result in the development of more usable system.
|
510 |
A framework for responsive content adaptation in electronic display networksWest, Philip January 2006 (has links)
Recent trends show an increase in the availability and functionality of handheld devices, wireless network technology, and electronic display networks. We propose the novel integration of these technologies to provide wireless access to content delivered to large-screen display systems. Content adaptation is used as a method of reformatting web pages to display more appropriately on handheld devices, and to remove unwanted content. A framework is presented that facilitates content adaptation, implemented as an adaptation layer, which is extended to provide personalization of adaptation settings and response to network conditions. The framework is implemented as a proxy server for a wireless network, and handles HTML and XML documents. Once a document has been requested by a user, the HTML/XML is retrieved and parsed, creating a Document Object Model tree representation. It is then altered according to the user’s personal settings or predefined settings, based on current network usage and the network resources available. Three adaptation techniques were implemented; spatial representation, which generates an image map of the document, text summarization, which creates a tree view representation of a document, and tag extraction, which replaces specific tags with links. Three proof-of-concept systems were developed in order to test the robustness of the framework. A system for use with digital slide shows, a digital signage system, and a generalized system for use with the internet were implemented. Testing was performed by accessing sample web pages through the content adaptation proxy server. Tag extraction works correctly for all HTML and XML document structures, whereas spatial representation and text summarization are limited to a controlled subset. Results indicate that the adaptive system has the ability to reduce average bandwidth usage, by decreasing the amount of data on the network, thereby allowing a greater number of users access to content. This suggests that responsive content adaptation has a positive influence on network performance metrics.
|
Page generated in 0.056 seconds