Global ETD Search

621	Europäische Öffentlichkeit durch "policy-frames" / Zwischen direkter Parlamentsbeteiligung in EU-Angelegenheiten, medialem politischen Diskurs und text mining / European public sphere through policies / Between direct parliamentary involvement in European matters, medial political discourse and text mining Szczerbak, Paweł 28 April 2017 (has links) No description available. 300 European Union European Integration public sphere european public sphere policies policy text mining frame frames framing cooccurrence cooccurrence analysis Soziologie (PPN62125505X)
622	A new framework for a technological perspective of knowledge management Botha, Antonie Christoffel 26 June 2008 (has links) Rapid change is a defining characteristic of our modern society. This has huge impact on society, governments, and businesses. Businesses are forced to fundamentally transform themselves to survive in a challenging economy. Transformation implies change in the way business is conducted, in the way people perform their contribution to the organisation, and in the way the organisation perceives and manages its vital assets – which increasingly are built around the key assets of intellectual capital and knowledge. The latest management tool and realisation of how to respond to the challenges of the economy in the new millennium, is the idea of "knowledge management" (KM). In this study we have focused on synthesising the many confusing points of view about the subject area, such as: <ul><li> a. different focus points or perspectives; </li><li> b. different definitions and positioning of the subject; as well as</li><li> c. a bewildering number of definitions of what knowledge is and what KM entails.</li></ul> There exists a too blurred distinction in popular-magazine-like sources about this area between subjects and concepts such as: knowledge versus information versus data; the difference between information management and knowledge management; tools available to tackle the issues in this field of study and practice; and the role technology plays versus the huge hype from some journalists and within the vendor community. Today there appears to be a lack of a coherent set of frameworks to abstract, comprehend, and explain this subject area; let alone to build successful systems and technologies with which to apply KM. The study is comprised of two major parts:<ul><li> 1. In the first part the study investigates the concepts, elements, drivers, and challenges related to KM. A set of models for comprehending these issues and notions is contributed as we considered intellectual capital, organizational learning, communities of practice, and best practices. </li><li> 2. The second part focuses on the technology perspective of KM. Although KM is primarily concerned with non-technical issues this study concentrates on the technical issues and challenges. A new technology framework for KM is proposed to position and relate the different KM technologies as well as the two key applications of KM, namely knowledge portals and knowledge discovery (including text mining). </li></ul> It is concluded that KM and related concepts and notions need to be understood firmly as well as effectively positioned and employed to support the modern business organisation in its quest to survive and grow. The main thesis is that KM technology is a necessary but insufficient prerequisite and a key enabler for successful KM in a rapidly changing business environment. / Thesis (PhD (Computer Science))--University of Pretoria, 2010. / Computer Science / unrestricted Knowledge Tacit knowledge Explicit knowledge Knowledge management Knowledge conversion life cycle Rapid change Organizational change Organizational learning Intellectual capital Collaboration Advanced search Externalization Internalization Socialization Knowledge sharing Knowledge creation Knowledge representation Taxonomy Ontology E-learning Knowledge portals Knowledge discovery Text mining Semantic km aspects The semantic web Communities of practice UCTD
623	Využití metod dolování dat pro analýzu sociálních sítí / Using of Data Mining Method for Analysis of Social Networks Novosad, Andrej January 2013 (has links) Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.
624	Von Mises-Fisher based (co-)clustering for high-dimensional sparse data : application to text and collaborative filtering data / Modèles de mélange de von Mises-Fisher pour la classification simple et croisée de données éparses de grande dimension Salah, Aghiles 21 November 2016 (has links) La classification automatique, qui consiste à regrouper des objets similaires au sein de groupes, également appelés classes ou clusters, est sans aucun doute l’une des méthodes d’apprentissage non-supervisé les plus utiles dans le contexte du Big Data. En effet, avec l’expansion des volumes de données disponibles, notamment sur le web, la classification ne cesse de gagner en importance dans le domaine de la science des données pour la réalisation de différentes tâches, telles que le résumé automatique, la réduction de dimension, la visualisation, la détection d’anomalies, l’accélération des moteurs de recherche, l’organisation d’énormes ensembles de données, etc. De nombreuses méthodes de classification ont été développées à ce jour, ces dernières sont cependant fortement mises en difficulté par les caractéristiques complexes des ensembles de données que l’on rencontre dans certains domaines d’actualité tel que le Filtrage Collaboratif (FC) et de la fouille de textes. Ces données, souvent représentées sous forme de matrices, sont de très grande dimension (des milliers de variables) et extrêmement creuses (ou sparses, avec plus de 95% de zéros). En plus d’être de grande dimension et sparse, les données rencontrées dans les domaines mentionnés ci-dessus sont également de nature directionnelles. En effet, plusieurs études antérieures ont démontré empiriquement que les mesures directionnelles, telle que la similarité cosinus, sont supérieurs à d’autres mesures, telle que la distance Euclidiennes, pour la classification des documents textuels ou pour mesurer les similitudes entre les utilisateurs/items dans le FC. Cela suggère que, dans un tel contexte, c’est la direction d’un vecteur de données (e.g., représentant un document texte) qui est pertinente, et non pas sa longueur. Il est intéressant de noter que la similarité cosinus est exactement le produit scalaire entre des vecteurs unitaires (de norme 1). Ainsi, d’un point de vue probabiliste l’utilisation de la similarité cosinus revient à supposer que les données sont directionnelles et réparties sur la surface d’une hypersphère unité. En dépit des nombreuses preuves empiriques suggérant que certains ensembles de données sparses et de grande dimension sont mieux modélisés sur une hypersphère unité, la plupart des modèles existants dans le contexte de la fouille de textes et du FC s’appuient sur des hypothèses populaires : distributions Gaussiennes ou Multinomiales, qui sont malheureusement inadéquates pour des données directionnelles. Dans cette thèse, nous nous focalisons sur deux challenges d’actualité, à savoir la classification des documents textuels et la recommandation d’items, qui ne cesse d’attirer l’attention dans les domaines de la fouille de textes et celui du filtrage collaborative, respectivement. Afin de répondre aux limitations ci-dessus, nous proposons une série de nouveaux modèles et algorithmes qui s’appuient sur la distribution de von Mises-Fisher (vMF) qui est plus appropriée aux données directionnelles distribuées sur une hypersphère unité. / Cluster analysis or clustering, which aims to group together similar objects, is undoubtedly a very powerful unsupervised learning technique. With the growing amount of available data, clustering is increasingly gaining in importance in various areas of data science for several reasons such as automatic summarization, dimensionality reduction, visualization, outlier detection, speed up research engines, organization of huge data sets, etc. Existing clustering approaches are, however, severely challenged by the high dimensionality and extreme sparsity of the data sets arising in some current areas of interest, such as Collaborative Filtering (CF) and text mining. Such data often consists of thousands of features and more than 95% of zero entries. In addition to being high dimensional and sparse, the data sets encountered in the aforementioned domains are also directional in nature. In fact, several previous studies have empirically demonstrated that directional measures—that measure the distance between objects relative to the angle between them—, such as the cosine similarity, are substantially superior to other measures such as Euclidean distortions, for clustering text documents or assessing the similarities between users/items in CF. This suggests that in such context only the direction of a data vector (e.g., text document) is relevant, not its magnitude. It is worth noting that the cosine similarity is exactly the scalar product between unit length data vectors, i.e., L 2 normalized vectors. Thus, from a probabilistic perspective using the cosine similarity is equivalent to assuming that the data are directional data distributed on the surface of a unit-hypersphere. Despite the substantial empirical evidence that certain high dimensional sparse data sets, such as those encountered in the above domains, are better modeled as directional data, most existing models in text mining and CF are based on popular assumptions such as Gaussian, Multinomial or Bernoulli which are inadequate for L 2 normalized data. In this thesis, we focus on the two challenging tasks of text document clustering and item recommendation, which are still attracting a lot of attention in the domains of text mining and CF, respectively. In order to address the above limitations, we propose a suite of new models and algorithms which rely on the von Mises-Fisher (vMF) assumption that arises naturally for directional data lying on a unit-hypersphere. Apprentissage statistique Classification Classification croisée Modèles de mélanges Statistiques directionnelles Distribution de von Mises-Fisher Fouille de textes Systèmes de recommandation Filtrage collaboratif Matrices creuses Grande dimension Machine learning Clustering Co-clustering Mixture models Directional statistics Von Mises-Fisher distribution Text mining Recommender systems Collaborative filtering Sparse data High dimensional data 003.3
625	Interactive pattern mining of neuroscience data Waranashiwar, Shruti Dilip 29 January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Text mining is a process of extraction of knowledge from unstructured text documents. We have huge volumes of text documents in digital form. It is impossible to manually extract knowledge from these vast texts. Hence, text mining is used to find useful information from text through the identification and exploration of interesting patterns. The objective of this thesis in text mining area is to find compact but high quality frequent patterns from text documents related to neuroscience field. We try to prove that interactive sampling algorithm is efficient in terms of time when compared with exhaustive methods like FP Growth using RapidMiner tool. Instead of mining all frequent patterns, all of which may not be interesting to user, interactive method to mine only desired and interesting patterns is far better approach in terms of utilization of resources. This is especially observed with large number of keywords. In interactive patterns mining, a user gives feedback on whether a pattern is interesting or not. Using Markov Chain Monte Carlo (MCMC) sampling method, frequent patterns are generated in an interactive way. Thesis discusses extraction of patterns between the keywords related to some of the common disorders in neuroscience in an interactive way. PubMed database and keywords related to schizophrenia and alcoholism are used as inputs. This thesis reveals many associations between the different terms, which are otherwise difficult to understand by reading articles or journals manually. Graphviz tool is used to visualize associations. Data Mining Text Mining PubMed Graphic methods -- Data processing Software visualization User interfaces (Computer systems) Neuroinformatics -- Data processing Markov processes Monte Carlo method Statistics -- Data processing Life sciences literature -- Research Schizophrenia -- Data processing Alcoholism -- Data processing
626	Stylometry: Quantifying Classic Literature For Authorship Attribution : - A Machine Learning Approach Yousif, Jacob, Scarano, Donato January 2024 (has links) Classic literature is rich, be it linguistically, historically, or culturally, making it valuable for future studies. Consequently, this project chose a set of 48 classic books to conduct a stylometric analysis on the defined set of books, adopting an approach used by a related work to divide the books into text segments, quantify the resulting text segments, and analyze the books using the quantified values to understand the linguistic attributes of the books. Apart from the latter, this project conducted different classification tasks for other objectives. In one respect, the study used the quantified values of the text segments of the books for classification tasks using advanced models like LightGBM and TabNet to assess the application of this approach in authorship attribution. From another perspective, the study utilized a State-Of-The-Art model, namely, RoBERTa for classification tasks using the segmented texts of the books instead to evaluate the performance of the model in authorship attribution. The results uncovered the characteristics of the books to a reasonable degree. Regarding the authorship attribution tasks, the results suggest that segmenting and quantifying text using stylometric analysis and supervised machine learning algorithms is practical in such tasks. This approach, while showing promise, may still require further improvements to achieve optimal performance. Lastly, RoBERTa demonstrated high performance in authorship attribution tasks. Authorship Attribution Classic Literature Analysis Clustering Data Science Deep Learning Feature Engineering Feature Extraction Gradient Descent K-Means LightGBM Machine Learning Multiclass Classification NLP Neural Network RoBERTa Stylometric Analysis Stylometry TabNet t-SNE Text Mining Transformer Models Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap
627	Zoetrope – Interactive Feature Exploration in News Videos Liebl, Bernhard, Burghardt, Manuel 11 July 2024 (has links) No description available. info:eu-repo/classification/ddc/006 ddc:006 info:eu-repo/classification/ddc/770 ddc:770
628	文字背後的意含-資訊的量化測量公司基本面與股價（以中鋼為例） / Behind the words - quantifying information to measure firms' fundamentals and stock return (taking the China steel corporation as example) 傅奇珅, Fu, Chi Shen Unknown Date (has links) 本研究蒐集經濟日報、聯合報、與聯合晚報的新聞文章，以中研院的中文斷詞性統進行結構性的處理，參考並延伸Tetlock、Saar-Tsechansky和Macskassy(2008)的研究方法，檢驗使用一個簡單的語言量化方式是否能夠用來解釋與預測個別公司的會計營收與股票報酬。有以下發現： 1. 正面詞彙(褒義詞)在新聞報導中的比例能夠預測高的公司營收。 2. 公司的股價對負面詞彙(貶義詞)有過度反應的現象，對正面詞彙(褒義詞)則有效率地充分反應。綜合以上發現，本論文得到，新聞媒體的文字內容能夠捕捉到一些關於公司基本面難以量化的部份，而投資者迅速地將這些資訊併入股價。 / This research collects all of the news stories about China Steel Corporation from Economic Daily News, United Daily News, and United Evening News. These articles I collect are segmented by a Chinese Word Segmentation System of Academia Sinica and used by the methodology of Tetlock, Saar-Tsechansky, and Macskassy(2008). I examine whether a simple quantitative measure fo language can be used to predict individual firms’ accounting sales and stock returns. My two main findings are: 1. the fraction of positive words (commendatory term) in firm-specific news stories forecasts high firm sales; 2. firm’s stock prices briefly overreaction to the information embedded in negative words (Derogatory term); on the other hand, firm’s stock prices efficiently incorporate the information embedded in positive words (commendatory term). All of the above, we conclude this linguistic media content captures otherwise hard-toquantify aspects of firms’ fundamentals, which investors quickly incorporate into stock prices. 內容分析法文字資訊資訊內涵文件資料探勘關鍵資訊擷取資訊效果褒義詞貶義詞正面詞彙負面詞彙基本面分析股票報酬分析 Content Analysis Textual Information Informative Content Text Mining Information Effect Critical Information Extraction Commendatory Term Derogatory Term Positive words Negative words Fundamental Analysis Stock Return Analysis
629	Semi-automated Ontology Generation for Biocuration and Semantic Search Wächter, Thomas 01 February 2011 (has links) (PDF) Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org. Alternativen zu Tierversuchen Tierschutz Internet Software Ontologielernen Automatische Termgenerierung Definitionsextraction Suchmaschine 3R Prinzip Literatursuche Recherche REACH Animal Testing Alternatives Animal Welfare Biomedical Research Internet Software Ontology Learning Automatic Term Recognition Definition Extraction Search Engine 3Rs Principle Literature Search REACH ddc:006 ddc:004 ddc:576 rvk:WC 7700 Ontologie Ontologie <Wissensverarbeitung> Tierversuch Suchmaschine Europäische Union / REACH-Verordnung Indexierung <Inhaltserschließung> Schlagwortkatalogisierung Information Retrieval Information-Retrieval-System Biomedizin Text Mining
630	Tuning of machine learning algorithms for automatic bug assignment Artchounin, Daniel January 2017 (has links) In software development projects, bug triage consists mainly of assigning bug reports to software developers or teams (depending on the project). The partial or total automation of this task would have a positive economic impact on many software projects. This thesis introduces a systematic four-step method to find some of the best configurations of several machine learning algorithms intending to solve the automatic bug assignment problem. These four steps are respectively used to select a combination of pre-processing techniques, a bug report representation, a potential feature selection technique and to tune several classifiers. The aforementioned method has been applied on three software projects: 66 066 bug reports of a proprietary project, 24 450 bug reports of Eclipse JDT and 30 358 bug reports of Mozilla Firefox. 619 configurations have been applied and compared on each of these three projects. In production, using the approach introduced in this work on the bug reports of the proprietary project would have increased the accuracy by up to 16.64 percentage points. bug triage bug assignment bug mining bug report activity-based approach issue tracking bug repository bug tracker pre-processing feature extraction feature selection tuning model selection hyper-parameter optimization text mining text classification classifier supervised learning machine learning information retrieval bugzilla eclipse jdt mozilla firefox open source software proprietary project accuracy mean reciprocal rank software development software maintenance software engineering Computer and Information Sciences Data- och informationsvetenskap

Search results