Global ETD Search

51	Shlukování textových dokumentů a jejich částí / Shlukování textových dokumentů a jejich částí Zápotocký, Radoslav January 2011 (has links) This thesis analyses use of vector-space model and data clustering approaches on parts of single document - on chapters, paragraphs and sentences. A simulation application (SimDIS), written in C# programming language is also part of this thesis. The application implements the adjusted model and provides tools for visualization of vectors and clusters.
52	Shlukování textových dokumentů a jejich částí / Shlukování textových dokumentů a jejich částí Zápotocký, Radoslav January 2011 (has links) This thesis analyses use of vector-space model and data clustering approaches on parts of single document - on chapters, paragraphs and sentences - to allow simple navigation between similar parts. A simulation application (SimDIS), written in C# programming language is also part of this thesis. The application implements the described model and provides tools for visualization of vectors and clusters.
53	Classificação de textos com redes complexas / Using complex networks to classify texts Amancio, Diego Raphael 29 October 2013 (has links) A classificação automática de textos em categorias pré-estabelecidas tem despertado grande interesse nos últimos anos devido à necessidade de organização do número crescente de documentos. A abordagem dominante para classificação é baseada na análise de conteúdo dos textos. Nesta tese, investigamos a aplicabilidade de atributos de estilo em tarefas tradicionais de classificação, usando a modelagem de textos como redes complexas, em que os vértices representam palavras e arestas representam relações de adjacência. Estudamos como métricas topológicas podem ser úteis no processamento de línguas naturais, sendo a tarefa de classificação apoiada por métodos de aprendizado de máquina, supervisionado e não supervisionado. Um estudo detalhado das métricas topológicas revelou que várias delas são informativas, por permitirem distinguir textos escritos em língua natural de textos com palavras distribuídas aleatoriamente. Mostramos também que a maioria das medidas de rede depende de fatores sintáticos, enquanto medidas de intermitência são mais sensíveis à semântica. Com relação à aplicabilidade da modelagem de textos como redes complexas, mostramos que existe uma dependência significativa entre estilo de autores e topologia da rede. Para a tarefa de reconhecimento de autoria de 40 romances escritos por 8 autores, uma taxa de acerto de 65% foi obtida com métricas de rede e intermitência de palavras. Ainda na análise de estilo, descobrimos que livros pertencentes ao mesmo estilo literário tendem a possuir estruturas topológicas similares. A modelagem de textos como redes também foi útil para discriminar sentidos de palavras ambíguas, a partir apenas de informação topológica dos vértices, evidenciando uma relação não trivial entre sintaxe e semântica. Para algumas palavras, a discriminação com redes complexas foi ainda melhor que a estratégia baseada em padrões de recorrência contextual de palavras polissêmicas. Os estudos desenvolvidos nesta tese confirmam que aspectos de estilo e semânticos influenciam na organização estrutural de conceitos em textos modelados como rede. Assim, a modelagem de textos como redes de adjacência de palavras pode ser útil não apenas para entender mecanismos fundamentais da linguagem, mas também para aperfeiçoar aplicações reais quando combinada com métodos tradicionais de processamento de texto. / The automatic classification of texts in pre-established categories is drawing increasing interest owing to the need to organize the ever growing number of electronic documents. The prevailing approach for classification is based on analysis of textual contents. In this thesis, we investigate the applicability of attributes based on textual style using the complex network (CN) representation, where nodes represent words and edges are adjacency relations. We studied the suitability of CN measurements for natural language processing tasks, with classification being assisted by supervised and unsupervised machine learning methods. A detailed study of topological measurements in texts revealed that several measurements are informative in the sense that they are able to distinguish meaningful from shuffled texts. Moreover, most measurements depend on syntactic factors, while intermittency measurements are more sensitive to semantic factors. As for the use of the CN model in practical scenarios, there is significant correlation between authors style and network topology. We achieved an accuracy rate of 65% in discriminating eight authors of novels with the use of network and intermittency measurements. During the stylistic analysis, we also found that books belonging to the same literary movement could be identified from their similar topological features. The network model also proved useful for disambiguating word senses. Upon employing only topological information to characterize nodes representing polysemous words, we found a strong relationship between syntax and semantics. For several words, the CN approach performed surprisingly better than the method based on recurrence patterns of neighboring words. The studies carried out in this thesis confirm that stylistic and semantic aspects play a crucial role in the structural organization of word adjacency networks. The word adjacency model investigated here might be useful not only to provide insight into the underlying mechanisms of the language, but also to enhance the performance of real applications implementing both CN and traditional approaches. Classificação textual Complex networks Pattern recognition Processamento de texto Reconhecimento de padrões Redes complexas Text classification Text processing
54	Enhancing performance in publish/subscribe systems Unknown Date (has links) Publish/subscribe is a powerful paradigm for distributed applications based on decoupled clients of information. In pub/sub applications, there exist a large amount of publishers and subscribes ranging from hundreds to millions. Publish/subscribe systems need to disseminate numerous events through a network of brokers. Due to limited resources of brokers, there may be lots of events that cannot be handled in time which in turn causes overload problem. Here arises the need of admission control mechanism to provide guaranteed services in publish/subscribe systems. Our approach gives the solution to this overload problem in the network of brokers by limiting the incoming subscriptions by certain criteria. The criteria are the factors like resources which include bandwidth, CPU, memory (in broker network), resource requirements by the subscription. / by Akshay Kamdar. / Thesis (M.S.C.S.)--Florida Atlantic University, 2009. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2009. Mode of access: World Wide Web. Embedded computer systems Text processing (Computer science)
55	An automated Chinese text processing system (ACCESS): user-friendly interface and feature enhancement. January 1994 (has links) Suen Tow Sunny. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 65-67). / Introduction --- p.1 / Chapter 1. --- ACCESS with an Extendible User-friendly X/Chinese Interface --- p.4 / Chapter 1.1. --- System requirement --- p.4 / Chapter 1.1.1. --- User interface issue --- p.4 / Chapter 1.1.2. --- Development issue --- p.5 / Chapter 1.2. --- Development decision --- p.6 / Chapter 1.2.1. --- X window system --- p.6 / Chapter 1.2.2. --- X/Chinese toolkit --- p.7 / Chapter 1.2.3. --- C language --- p.8 / Chapter 1.2.4. --- Source code control system --- p.8 / Chapter 1.3. --- System architecture --- p.9 / Chapter 1.4. --- User interface --- p.10 / Chapter 1.5. --- Sample screen --- p.13 / Chapter 1.6. --- System extension --- p.14 / Chapter 1.7. --- System portability --- p.18 / Chapter 2. --- Study on Algorithms for Automatically Correcting Characters in Chinese Cangjie-typed Text --- p.19 / Chapter 2.1. --- Chinese character input --- p.19 / Chapter 2.1.1. --- Chinese keyboards --- p.20 / Chapter 2.1.2. --- Keyboard redefinition scheme --- p.21 / Chapter 2.2. --- Cangjie input method --- p.24 / Chapter 2.3. --- Review on existing techniques for automatically correcting words in English text --- p.26 / Chapter 2.3.1. --- Nonword error detection --- p.27 / Chapter 2.3.2. --- Isolated-word error correction --- p.28 / Chapter 2.3.2.1. --- Spelling error patterns --- p.29 / Chapter 2.3.2.2. --- Correction techniques --- p.31 / Chapter 2.3.3. --- Context-dependent word correction research --- p.32 / Chapter 2.3.3.1. --- Natural language processing approach --- p.33 / Chapter 2.3.3.2. --- Statistical language model --- p.35 / Chapter 2.4. --- Research on error rates and patterns in Cangjie input method --- p.37 / Chapter 2.5. --- Similarities and differences between Chinese and English typed text --- p.41 / Chapter 2.5.1. --- Similarities --- p.41 / Chapter 2.5.2. --- Differences --- p.42 / Chapter 2.6. --- Proposed algorithm for automatic Chinese text correction --- p.44 / Chapter 2.6.1. --- Sentence level --- p.44 / Chapter 2.6.2. --- Part-of-speech level --- p.45 / Chapter 2.6.3. --- Character level --- p.47 / Conclusion --- p.50 / Appendix A Cangjie Radix Table --- p.51 / Appendix B Sample Text --- p.52 / Article 1 --- p.52 / Article 2 --- p.53 / Article 3 --- p.56 / Article 4 --- p.58 / Appendix C Error Statistics --- p.61 / References --- p.65 Chinese language--Data processing Text processing (Computer science) User interfaces (Computer systems) Input design, Computer
56	M&A2: a complete associative word network based Chinese document search engine. January 2001 (has links) Hu Ke. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 56-58). / Abstracts in English and Chinese. Web search engines Chinese language--Data processing Information retrieval Text processing (Computer science) World Wide Web
57	Automatic construction of wrappers for semi-structured documents. January 2001 (has links) Lin Wai-yip. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 114-123). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Information Extraction --- p.1 / Chapter 1.2 --- IE from Semi-structured Documents --- p.3 / Chapter 1.3 --- Thesis Contributions --- p.7 / Chapter 1.4 --- Thesis Organization --- p.9 / Chapter 2 --- Related Work --- p.11 / Chapter 2.1 --- Existing Approaches --- p.11 / Chapter 2.2 --- Limitations of Existing Approaches --- p.18 / Chapter 2.3 --- Our HISER Approach --- p.20 / Chapter 3 --- System Overview --- p.23 / Chapter 3.1 --- Hierarchical record Structure and Extraction Rule learning (HISER) --- p.23 / Chapter 3.2 --- Hierarchical Record Structure --- p.29 / Chapter 3.3 --- Extraction Rule --- p.29 / Chapter 3.4 --- Wrapper Adaptation --- p.32 / Chapter 4 --- Automatic Hierarchical Record Structure Construction --- p.34 / Chapter 4.1 --- Motivation --- p.34 / Chapter 4.2 --- Hierarchical Record Structure Representation --- p.36 / Chapter 4.3 --- Constructing Hierarchical Record Structure --- p.38 / Chapter 5 --- Extraction Rule Induction --- p.43 / Chapter 5.1 --- Rule Representation --- p.43 / Chapter 5.2 --- Extraction Rule Induction Algorithm --- p.47 / Chapter 6 --- Experimental Results of Wrapper Learning --- p.54 / Chapter 6.1 --- Experimental Methodology --- p.54 / Chapter 6.2 --- Results on Electronic Appliance Catalogs --- p.56 / Chapter 6.3 --- Results on Book Catalogs --- p.60 / Chapter 6.4 --- Results on Seminar Announcements --- p.62 / Chapter 7 --- Adapting Wrappers to Unseen Information Sources --- p.69 / Chapter 7.1 --- Motivation --- p.69 / Chapter 7.2 --- Support Vector Machines --- p.72 / Chapter 7.3 --- Feature Selection --- p.76 / Chapter 7.4 --- Automatic Annotation of Training Examples --- p.80 / Chapter 7.4.1 --- Building SVM Models --- p.81 / Chapter 7.4.2 --- Seeking Potential Training Example Candidates --- p.82 / Chapter 7.4.3 --- Classifying Potential Training Examples --- p.84 / Chapter 8 --- Experimental Results of Wrapper Adaptation --- p.86 / Chapter 8.1 --- Experimental Methodology --- p.86 / Chapter 8.2 --- Results on Electronic Appliance Catalogs --- p.89 / Chapter 8.3 --- Results on Book Catalogs --- p.93 / Chapter 9 --- Conclusions and Future Work --- p.97 / Chapter 9.1 --- Conclusions --- p.97 / Chapter 9.2 --- Future Work --- p.100 / Chapter A --- Sample Experimental Pages --- p.101 / Chapter B --- Detailed Experimental Results of Wrapper Adaptation of HISER --- p.109 / Bibliography --- p.114 Text processing (Computer science)
58	Extracting causation knowledge from natural language texts. January 2002 (has links) Chan Ki, Cecia. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 95-99). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Our Contributions --- p.4 / Chapter 1.2 --- Thesis Organization --- p.5 / Chapter 2 --- Related Work --- p.6 / Chapter 2.1 --- Using Knowledge-based Inferences --- p.7 / Chapter 2.2 --- Using Linguistic Techniques --- p.8 / Chapter 2.2.1 --- Using Linguistic Clues --- p.8 / Chapter 2.2.2 --- Using Graphical Patterns --- p.9 / Chapter 2.2.3 --- Using Lexicon-syntactic Patterns of Causative Verbs --- p.10 / Chapter 2.2.4 --- Comparisons with Our Approach --- p.10 / Chapter 2.3 --- Discovery of Extraction Patterns for Extracting Relations --- p.11 / Chapter 2.3.1 --- Snowball system --- p.12 / Chapter 2.3.2 --- DIRT system --- p.12 / Chapter 2.3.3 --- Comparisons with Our Approach --- p.13 / Chapter 3 --- Semantic Expectation-based Knowledge Extraction --- p.14 / Chapter 3.1 --- Semantic Expectations --- p.14 / Chapter 3.2 --- Semantic Template --- p.16 / Chapter 3.2.1 --- Causation Semantic Template --- p.16 / Chapter 3.3 --- Sentence Templates --- p.17 / Chapter 3.4 --- Consequence and Reason Templates --- p.22 / Chapter 3.5 --- Causation Knowledge Extraction Framework --- p.25 / Chapter 3.5.1 --- Template Design --- p.25 / Chapter 3.5.2 --- Sentence Screening --- p.27 / Chapter 3.5.3 --- Semantic Processing --- p.28 / Chapter 4 --- Using Thesaurus and Pattern Discovery for SEKE --- p.33 / Chapter 4.1 --- Using a Thesaurus --- p.34 / Chapter 4.2 --- Pattern Discovery --- p.37 / Chapter 4.2.1 --- Use of Semantic Expectation-based Knowledge Extraction --- p.37 / Chapter 4.2.2 --- Use of Part of Speech Information --- p.39 / Chapter 4.2.3 --- Pattern Representation --- p.39 / Chapter 4.2.4 --- Constructing the Patterns --- p.40 / Chapter 4.2.5 --- Merging the Patterns --- p.43 / Chapter 4.3 --- Pattern Matching --- p.44 / Chapter 4.3.1 --- Matching Score --- p.46 / Chapter 4.3.2 --- Support of Patterns --- p.48 / Chapter 4.3.3 --- Relevancy of Sentence Templates --- p.48 / Chapter 4.4 --- Applying the Newly Discovered Patterns --- p.49 / Chapter 5 --- Applying SEKE on Hong Kong Stock Market Domain --- p.52 / Chapter 5.1 --- Template Design --- p.53 / Chapter 5.1.1 --- Semantic Templates --- p.53 / Chapter 5.1.2 --- Sentence Templates --- p.53 / Chapter 5.1.3 --- Consequence and Reason Templates: --- p.55 / Chapter 5.2 --- Pattern Discovery --- p.58 / Chapter 5.2.1 --- Support of Patterns --- p.58 / Chapter 5.2.2 --- Relevancy of Sentence Templates --- p.58 / Chapter 5.3 --- Causation Knowledge Extraction Result --- p.58 / Chapter 5.3.1 --- Evaluation Approach --- p.61 / Chapter 5.3.2 --- Parameter Investigations --- p.61 / Chapter 5.3.3 --- Experimental Results --- p.65 / Chapter 5.3.4 --- Knowledge Discovered --- p.68 / Chapter 5.3.5 --- Parameter Effect --- p.75 / Chapter 6 --- Applying SEKE on Global Warming Domain --- p.80 / Chapter 6.1 --- Template Design --- p.80 / Chapter 6.1.1 --- Semantic Templates --- p.81 / Chapter 6.1.2 --- Sentence Templates --- p.81 / Chapter 6.1.3 --- Consequence and Reason Templates --- p.83 / Chapter 6.2 --- Pattern Discovery --- p.85 / Chapter 6.2.1 --- Support of Patterns --- p.85 / Chapter 6.2.2 --- Relevancy of Sentence Templates --- p.85 / Chapter 6.3 --- Global Warming Domain Result --- p.85 / Chapter 6.3.1 --- Evaluation Approach --- p.85 / Chapter 6.3.2 --- Experimental Results --- p.88 / Chapter 6.3.3 --- Knowledge Discovered --- p.89 / Chapter 7 --- Conclusions and Future Directions --- p.92 / Chapter 7.1 --- Conclusions --- p.92 / Chapter 7.2 --- Future Directions --- p.93 / Bibliography --- p.95 / Chapter A --- Penn Treebank Part of Speech Tags --- p.100 Text processing (Computer science) Semantics--Data processing Computational linguistics
59	Automatic text categorization for information filtering. January 1998 (has links) Ho Chao Yang. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 157-163). / Abstract also in Chinese. / Abstract --- p.i / Acknowledgment --- p.iii / List of Figures --- p.viii / List of Tables --- p.xiv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Automatic Document Categorization --- p.1 / Chapter 1.2 --- Information Filtering --- p.3 / Chapter 1.3 --- Contributions --- p.6 / Chapter 1.4 --- Organization of the Thesis --- p.7 / Chapter 2 --- Related Work --- p.9 / Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9 / Chapter 2.1.1 --- Rule-Based Approach --- p.10 / Chapter 2.1.2 --- Similarity-Based Approach --- p.13 / Chapter 2.2 --- Existing Information Filtering Approaches --- p.19 / Chapter 2.2.1 --- Information Filtering Systems --- p.19 / Chapter 2.2.2 --- Filtering in TREC --- p.21 / Chapter 3 --- Document Pre-Processing --- p.23 / Chapter 3.1 --- Document Representation --- p.23 / Chapter 3.2 --- Classification Scheme Learning Strategy --- p.26 / Chapter 4 --- A New Approach - IBRI --- p.31 / Chapter 4.1 --- Overview of Our New IBRI Approach --- p.31 / Chapter 4.2 --- The IBRI Representation and Definitions --- p.34 / Chapter 4.3 --- The IBRI Learning Algorithm --- p.37 / Chapter 5 --- IBRI Experiments --- p.43 / Chapter 5.1 --- Experimental Setup --- p.43 / Chapter 5.2 --- Evaluation Metric --- p.45 / Chapter 5.3 --- Results --- p.46 / Chapter 6 --- A New Approach - GIS --- p.50 / Chapter 6.1 --- Motivation of GIS --- p.50 / Chapter 6.2 --- Similarity-Based Learning --- p.51 / Chapter 6.3 --- The Generalized Instance Set Algorithm (GIS) --- p.58 / Chapter 6.4 --- Using GIS Classifiers for Classification --- p.63 / Chapter 6.5 --- Time Complexity --- p.64 / Chapter 7 --- GIS Experiments --- p.68 / Chapter 7.1 --- Experimental Setup --- p.68 / Chapter 7.2 --- Results --- p.73 / Chapter 8 --- A New Information Filtering Approach Based on GIS --- p.87 / Chapter 8.1 --- Information Filtering Systems --- p.87 / Chapter 8.2 --- GIS-Based Information Filtering --- p.90 / Chapter 9 --- Experiments on GIS-based Information Filtering --- p.95 / Chapter 9.1 --- Experimental Setup --- p.95 / Chapter 9.2 --- Results --- p.100 / Chapter 10 --- Conclusions and Future Work --- p.108 / Chapter 10.1 --- Conclusions --- p.108 / Chapter 10.2 --- Future Work --- p.110 / Chapter A --- Sample Documents in the corpora --- p.111 / Chapter B --- Details of Experimental Results of GIS --- p.120 / Chapter C --- Computational Time of Reuters-21578 Experiments --- p.141 Text processing (Computer science) Nearest neighbor analysis (Statistics) Information retrieval
60	Associative information network and applications to an intelligent search engine. / CUHK electronic theses & dissertations collection January 1998 (has links) Qin An. / Thesis (Ph.D.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (p. 135-142). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese. Web search engines Text processing (Computer science) World Wide Web Chinese language--Data processing

Search results