Global ETD Search

211	Understanding the Form and Function of Neuronal Physiological Diversity Tripathy, Shreejoy J. 31 October 2013 (has links) For decades electrophysiologists have recorded and characterized the biophysical properties of a rich diversity of neuron types. This diversity of neuron types is critical for generating functionally important patterns of brain activity and implementing neural computations. In this thesis, I developed computational methods towards quantifying neuron diversity and applied these methods for understanding the functional implications of within-type neuron variability and across-type neuron diversity. First, I developed a means for defining the functional role of differences among neurons of the same type. Namely, I adapted statistical neuron models, termed generalized linear models, to precisely capture how the membranes of individual olfactory bulb mitral cells transform afferent stimuli to spiking responses. I then used computational simulations to construct virtual populations of biophysically variable mitral cells to study the functional implications of within-type neuron variability. I demonstrate that an intermediate amount of intrinsic variability enhances coding of noisy afferent stimuli by groups of biophysically variable mitral cells. These results suggest that within-type neuron variability, long considered to be a disadvantageous consequence of biological imprecision, may serve a functional role in the brain. Second, I developed a methodology for quantifying the rich electrophysiological diversity across the majority of the neuron types throughout the mammalian brain. Using semi-automated text-mining, I built a database, Neuro- Electro, of neuron type specific biophysical properties extracted from the primary research literature. This data is available at http://neuroelectro.org, which provides a publicly accessible interface where this information can be viewed. Though the extracted physiological data is highly variable across studies, I demonstrate that knowledge of article-specific experimental conditions can significantly explain the observed variance. By applying simple analyses to the dataset, I find that there exist 5-7 major neuron super-classes which segregate on the basis of known functional roles. Moreover, by integrating the NeuroElectro dataset with brain-wide gene expression data from the Allen Brain Atlas, I show that biophysically-based neuron classes correlate highly with patterns of gene expression among voltage gated ion channels and neurotransmitters. Furthermore, this work lays the conceptual and methodological foundations for substantially enhanced data sharing in neurophysiological investigations in the future. neuron diversity neuron coding stimulus decoding olfactory bulb neurophysiology text mining
212	Language Engineering for Information Extraction Schierle, Martin 10 January 2012 (has links) (PDF) Accompanied by the cultural development to an information society and knowledge economy and driven by the rapid growth of the World Wide Web and decreasing prices for technology and disk space, the world\'s knowledge is evolving fast, and humans are challenged with keeping up. Despite all efforts on data structuring, a large part of this human knowledge is still hidden behind the ambiguities and fuzziness of natural language. Especially domain language poses new challenges by having specific syntax, terminology and morphology. Companies willing to exploit the information contained in such corpora are often required to build specialized systems instead of being able to rely on off the shelf software libraries and data resources. The engineering of language processing systems is however cumbersome, and the creation of language resources, annotation of training data and composition of modules is often enough rather an art than a science. The scientific field of Language Engineering aims at providing reliable information, approaches and guidelines of how to design, implement, test and evaluate language processing systems. Language engineering architectures have been a subject of scientific work for the last two decades and aim at building universal systems of easily reusable components. Although current systems offer comprehensive features and rely on an architectural sound basis, there is still little documentation about how to actually build an information extraction application. Selection of modules, methods and resources for a distinct usecase requires a detailed understanding of state of the art technology, application demands and characteristics of the input text. The main assumption underlying this work is the thesis that a new application can only occasionally be created by reusing standard components from different repositories. This work recapitulates existing literature about language resources, processing resources and language engineering architectures to derive a theory about how to engineer a new system for information extraction from a (domain) corpus. This thesis was initiated by the Daimler AG to prepare and analyze unstructured information as a basis for corporate quality analysis. It is therefore concerned with language engineering in the area of Information Extraction, which targets the detection and extraction of specific facts from textual data. While other work in the field of information extraction is mainly concerned with the extraction of location or person names, this work deals with automotive components, failure symptoms, corrective measures and their relations in arbitrary arity. The ideas presented in this work will be applied, evaluated and demonstrated on a real world application dealing with quality analysis on automotive domain language. To achieve this goal, the underlying corpus is examined and scientifically characterized, algorithms are picked with respect to the derived requirements and evaluated where necessary. The system comprises language identification, tokenization, spelling correction, part of speech tagging, syntax parsing and a final relation extraction step. The extracted information is used as an input to data mining methods such as an early warning system and a graph based visualization for interactive root cause analysis. It is finally investigated how the unstructured data facilitates those quality analysis methods in comparison to structured data. The acceptance of these text based methods in the company\'s processes further proofs the usefulness of the created information extraction system. Textanalyse Qualitätsanalyse Informationsextraktion Text Mining Information Extraction Quality Analysis Language Engineering ddc:000
213	運用kNN文字探勘分析智慧型終端App群集之研究 / The study of analyzing smart handheld device App's clusters by using kNN text mining 曾國傑, Tseng, Kuo Chieh Unknown Date (has links) 隨著智慧型終端設備日益普及，使用者對App需求逐漸增加，各大企業也因此開創了一種新的互動性行銷方式。同時，App下載所帶來的龐大商機也促使許多開發人員紛紛加入App的開發行列，造成App的數量呈現爆炸性成長，而讓使用者在面對種類繁多的App時，無法做出有效率的選擇。故本研究將透過文字探勘與kNN集群分析技術，分析網友發表的App推薦文並將App進行分群；再藉由參數的調整，期望能透過衡量指標的評估來獲得最佳品質之分群，以便作為使用者選擇App之參考依據。為了使大量App進行分群以解決使用者「資訊超載」的問題，本研究以App Store之遊戲類App為分析對象，蒐集了439篇App推薦文章，並依App推薦對象之異同，將其合併成357篇App推薦文章；接著，透過文字探勘技術將文章轉換成可相互比較的向量空間模型，再利用kNN群集分析對其進行分群。同時，藉由參數組合中k值與文件相似度門檻值的調整來獲得最佳品質之分群；其分群品質的評估則透過平均群內相似度等指標來進行衡量；而為了提升分群品質，本研究採用「多階段分群」，以分群後各群集內的文章數量來判斷是否進行再分群或群集合併。本研究結果顯示第一階段分群在k值為10、文件相似度門檻值為0.025時，能獲得最佳之分群品質。而在後續階段的分群過程中，因群集內文章數減少，故將k值降低並逐漸提高文件相似度門檻值以獲得分群效果。第二階段結束後，可針對已達到分群停止條件之群集進行關鍵詞彙萃取，並可歸類出「棒球/射擊」與「投擲飛行」等6種App類型；其後階段依循相同分群規則可獲得「守城塔防」等14種App類型。分群結束後，共可分出36個群集並獲得20種App類型。分群過程中，平均群內相似度逐漸增加；平均群間相似度則逐漸下降；分群品質衡量指標由第一階段分群後的12.65%提升到第五階段結束時的75.81%。由本研究可知分群之後相似度高的App會逐漸聚集成群，所獲得之各群集命名結果將能作為使用者選擇App之參考依據；App軟體開發人員也能從各群集之關鍵詞彙中了解使用者所注重的遊戲元素，改善App內容以更符合使用者之需求。而以本研究結果為基礎，透過建立專業詞庫改善分群品質、利用文件摘要技術加強使用者對各群集之了解，或建立App推薦系統等皆可做為未來研究之方向。 / With the popularity of Smart Handheld Devices are increasing, the needs of “App” are spreading. Developers whom devote themselves to this opportunity are also rising, making the total number of Apps growing rapidly. Facing these kind of situation, users couldn’t choose the App they need efficiently. This research uses text mining and kNN Clustering technique analyzing the recommendation reviews of App by netizen then clustering the App recommendation articles; Through the adjustments of parameters, we expect to evaluate the measurement indicators to obtain the best quality cluster to use as a basis for users to select Apps. In order to solve the information overload for the user, we analyzed apps of the “Games” category form App store and sorted out to 357 App recommendation articles to use as our analysis target. Then we used text mining technique to process the articles and uses kNN clustering analysis to sort out the articles. Simultaneously, we fine tuning the measurement indicators to find the optimal cluster. This research uses multi-phase clustering technique to assure the quality of each cluster. We discriminate 36 clusters and 20 categories from the clustering results. During the clustering process, the Mean of Intra-cluster Similarity increases gradually; in the contrary, the Mean of Inter-cluster Similarity reduces. The “Cluster Quality” increases from 12.65% significantly to 75.81%. In conclusion, similar Apps will gradually been clustered by its similarities, and can be used to be a reference by its cluster’s name. The App developers can also understands the game elements which the users pay greater attentions and tailored their contents to match the needs of the users according to the key phrases from each cluster. In further discussion, building specialized terms database of App to improve the quality of the clustering, using summarization technique to robust user understanding of each cluster, or to build up App recommendation system is liking to be further studied via using the results by this research. App kNN 群集分析文字探勘 App kNN Clustering Text Mining
214	Vom WWW zur Kollokation praxisorientiertes Verfahren zur Kollokations- und Terminologieakquisation für Übersetzer und Dolmetscher Dörr, Simone January 2005 (has links) Zugl.: Heidelberg, Univ., Diplomarbeit, 2005 / Titel auf der Beil.
215	Text Mining im Customer Relationship Management / Rentzmann, René. January 2008 (has links) Kath. Universiẗat, Diss.--Eichstätt-Ingolstadt, 2007.
216	Extracting information for biology Šarić, Jasmin, January 2006 (has links) Stuttgart, Univ., Diss., 2006.
217	Erfolgsmessung informationsorientierter Websites Stolz, Carsten Dirk January 2007 (has links) Eichstätt-Ingolstadt, Univ., Diss., 2007
218	Εξόρυξη πληροφορίας από βιοϊατρική βιβλιογραφία : εφαρμογή στην ανάλυση κειμένων (text mining) από πηγές στον παγκόσμιο ιστό Ιωάννου, Ζαφειρία - Μαρίνα 23 January 2012 (has links) Τα τελευταία χρόνια, υπάρχει ένα αυξανόμενο ενδιαφέρον για την αυτόματη εξόρυξη κειμένων (Text Mining) με βιοϊατρικό περιεχόμενο, λόγω της ραγδαίας αύξησης των δημοσιεύσεων που είναι αποθηκευμένες σε ηλεκτρονική μορφή σε Βάσεις Δεδομένων του Παγκόσμιου Ιστού, όπως το PubMed και το Springerlink. Το βασικό πρόβλημα που κάνει αυτό τον στόχο περισσότερο προκλητικό και δύσκολο είναι η αδυναμία της επεξεργασίας της διαθέσιμης αυτής πληροφορίας και της εξαγωγής χρήσιμων συνδέσεων και συμπερασμάτων. Κρίνεται, επομένως, επιτακτική η ανάπτυξη νέων εργαλείων που θα διευκολύνουν την εξόρυξη γνώσης από κείμενα βιολογικού περιεχομένου. Σκοπός της παρούσας διπλωματικής εργασίας είναι αρχικά η παρουσίαση γνωστών μεθόδων εξόρυξης δεδομένων από κείμενα αλλά και η ανάπτυξη ενός εργαλείου για την αποδοτική και αξιόπιστη ανακάλυψη γνώσεων από βιοϊατρική βιβλιογραφία που να βασίζεται σε προηγμένες τεχνικές εξόρυξης γνώσης από κείμενα. Πιο συγκεκριμένα, η προσπάθειά μας επικεντρώνεται στην ανάπτυξη ενός αποδοτικού αλγόριθμου συσταδοποίησης και τη χρήση αποδοτικών τεχνικών που αξιολογούν τα αποτελέσματα της συσταδοποίησης, έτσι ώστε να παρέχεται βοήθεια στον χρήστη στην προσπάθεια αναζήτησης του για πληροφορία βιολογικού περιεχομένου. Ο προτεινόμενος αλγόριθμος βασίζεται σε διαφορετικές τεχνικές συσταδοποίησης, όπως ο Ιεραρχικός Αλγόριθμος και ο Spherical K-means Αλγόριθμος και εφαρμόζει μια τελική ταξινόμηση με βάση το Impact Factor των κειμένων που ανακτήθηκαν. Τα βασικά βήματα που περιλαμβάνει ο αλγόριθμος είναι: η προεπεξεργασία των κειμένων, η αναπαράσταση των κειμένων σε διανυσματική μορφή με χρήση του Διανυσματικού Μοντέλου (Vector Space Model), η εφαρμογή της Λανθάνουσας Σημασιολογικής Δεικτοδότησης (Latent Semantic Indexing), η Ασαφής Συσταδοποίηση (Fuzzy Clustering), ο Ιεραρχικός Αλγόριθμος (Hierarchical Algorithm), o Spherical K-means Αλγόριθμος, η επιλογή της καλύτερης συστάδας και τέλος η ταξινόμηση με βάση το Impact Factor των κειμένων που ανακτήθηκαν. Η εφαρμογή που υλοποιούμε βασίζεται στον παραπάνω αλγόριθμο και προσφέρει δύο τρόπους αναζήτησης: 1) σε τρέχοντα ερωτήματα του χρήστη, τα οποία αποθηκεύονται στη βάση δεδομένων και επομένως λειτουργεί ως μέσο συμπιεσμένης αποθήκευσης των προηγούμενων ερωτημάτων του χρήστη, 2) αναζήτηση μέσα από μία λίστα προκαθορισμένων Topic βιολογικού περιεχομένου και επομένως παρέχει στο χρήστη μια επιπλέον βοήθεια σε ένα ευρύ φάσμα ερωτημάτων. Επιπλέον, η εφαρμογή εξάγει χρήσιμες συσχετίσεις όρων χρησιμοποιώντας τις τελικές συστάδες. / There is an increasing interest in automatic text mining in biomedical texts due to the increasing number of electronically available publications stored in databases such as PubMed and SpringerLink. The main problem that makes this goal more challenging and difficult is the inability of processing the available information and extracting useful connections and assumptions. Therefore, there is an urgent need for new text-mining tools to facilitate the process of text mining from biomedical documents. The goal of the present diploma thesis is to present known methods of text mining, and to develop an application that provides reliable knowledge from biomedical literature based on efficient text mining techniques. In particular, our attempt is mainly focused on developing an efficient clustering algorithm and using techniques for evaluating the results of clustering, in order to assist the users in their biological information seeking activities. The proposed algorithm involves different clustering techniques, such as Hierarchical Algorithm, Spherical K-means Algorithm and employs a final ranking according to Impact Factor of retrieved documents. The basic steps of our algorithm are: preprocessing of text’s content, representation with the vector space model, applying Latent Semantic Indexing (LSI), fuzzy clustering, hierarchical clustering, spherical k-means clustering, selection of the best cluster and ranking of biomedical documents according to their impact factor. The application that we implement is based on the above algorithm and provides two search methods: 1) search with user’s queries, which are saved in the database and thus playing the role of a compacted storage of his past search activities, 2) search through a list of pre-specified biological Topics, and thus providing the user with an extra assistance in his various queries. Moreover the whole scheme can mine useful associations between terms by exploiting the nature of the formed clusters. Βιοπληροφορική 006.312 Biomedical text mining Bioinformatics Document clustering
219	An information system for assessing the likelihood of child labor in supplier locations leveraging Bayesian networks and text mining Thöni, Andreas, Taudes, Alfred, Tjoa, A Min January 2018 (has links) (PDF) This paper presents an expert system to monitor social sustainability compliance in supply chains. The system allows to continuously rank suppliers based on their risk of breaching sustainability standards on child labor. It uses a Bayesian network to determine the breach likelihood for each supplier location based on the integration of statistical data, audit results and public reports of child labor incidents. Publicly available statistics on the frequency of child labor in different regions and industries are used as contextual prior. The impact of audit results on the breach likelihood is calibrated based on expert input. Child labor incident observations are included automatically from publicly available news sources using text mining algorithms. The impact of an observation on the breach likelihood is determined by its relevance, credibility and frequency. Extensive tests reveal that the expert system correctly replicates the decisions of domain experts in the fields supply chain management, sustainability management, and risk management.
220	ENEM nas redes sociais: minera??o de textos e clusteriza??o Silva, Leila Maria 18 December 2017 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-07-24T17:34:56Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) leila_maria_silva.pdf: 2106552 bytes, checksum: 53ba37c88f3aa004f2201a85b74fd640 (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-10-04T19:43:35Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) leila_maria_silva.pdf: 2106552 bytes, checksum: 53ba37c88f3aa004f2201a85b74fd640 (MD5) / Made available in DSpace on 2018-10-04T19:43:35Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) leila_maria_silva.pdf: 2106552 bytes, checksum: 53ba37c88f3aa004f2201a85b74fd640 (MD5) Previous issue date: 2017 / A internet ? hoje a maior fonte de informa??o eletr?nica existente. Cresce a cada dia o n?mero de usu?rios da internet, e consequentemente o uso das redes sociais online. S?o muitas as informa??es novas que ficam embutidas nas bases de dados textuais. Por causa da sua natureza din?mica, ou seja, milh?es de p?ginas surgem e desaparecem todos os dias, a tarefa de encontrar informa??es relevantes nessas bases de dados se torna muito dif?cil. As t?cnicas de minera??o de textos para a descoberta de informa??es na web surgiram da necessidade de sanar este problema. O presente trabalho versa sobre a aplica??o de m?todos de minera??o de textos com clusteriza??o na grande quantidade de mensagens sobre o Exame Nacional do Ensino M?dio no ano de 2016 provenientes da rede social Twitter. O foco deste estudo est? na obten??o de grupos de textos, a fim de possibilitar uma visualiza??o resumida e sintetizada dos assuntos mais comentados pelos usu?rios. Para manipula??o dessas bases textuais, o Modelo Cassiopeia foi utilizado empregando seu algoritmo de agrupamento textual que tem como principal finalidade gerar agrupamentos, ou seja, clusters (grupos) de documentos textuais que apresentam algum tipo de similaridade. O Modelo Cassiopeia apresenta um limite de processamento com a quantidade m?xima de 700 tweets. Os tweets passam primeiramente pela fase de limpeza dos textos no pr?-processamento, logo ap?s, a utiliza??o do algoritmo no processamento e por fim, as an?lises dos resultados no p?s-processamento. Os resultados obtidos neste trabalho mostram valores coesos quanto ? similaridade dos documentos dentro de um cluster e entre os clusters, avaliados por medidas de agrupamento textual, proposto pelo Modelo Cassiopeia. Isso demonstra a aplicabilidade dessa proposta para a visualiza??o sintetizada das informa??es mais significativas de um determinado tema, muitas vezes permitindo que a??es sejam antecipadas e impactos sobre a popula??o afetada sejam reduzidos. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / The Internet is today the largest source of existing electronic information. The number of Internet users is increasing daily, and consequently the use of online networks online. There are many new information that is embedded in textual databases. Because of its dynamic nature- that is, millions of pages and other numbers-a task of finding relevant information in those databases becomes very difficult. The techniques of text mining for a discovery of information on the web came from the need to heal this problem. The present work is about an application of methods of text mining with clustering in the large amount of messages on the National High School Exams in the year 2016 issu social network Twitter. The focus of this study is on obtaining groups of texts in order to enable a summary and synthesized publication of the appropriate comments of the users. For manipulation of textual bases, the Cassiopeia Model was used by using its textual grouping algorithm that has as main purpose to generate clusters, that is, clusters of textual documents and executed some kind of similarity. The Cassiopeia Model has a processing limit with a maximum of 700 tweets. The tweets first pass through the phase of cleaning the texts without preprocessing, afterwards, a use of the algorithm without processing and, finally, as analysis of the results without post-processing. The results obtained in this work are more closely related to the similarity of the documents within the cluster and between the clusters, through the measurements of textual grouping, proposed by the Cassiopeia Model. This demonstrates an application for an uninformed publication of the most important information on a given topic, often allowing actions to be anticipated and impacts on an affected population to be reduced. Minera??o de textos Twitter ENEM Clusteriza??o Redes sociais Cassiopeia Text mining Clustering Social networks

Search results