1 |
Cleanup Memory in Biologically Plausible Neural NetworksSingh, Raymon January 2005 (has links)
During the past decade, a new class of knowledge representation has emerged known as structured distributed representation (SDR). A number of schemes for encoding and manipulating such representations have been developed; e. g. Pollack's Recursive Auto-Associative Memory (RAAM), Kanerva's Binary Spatter Code (BSC), Gayler's MAP encoding, and Plate's Holographically Reduced Representations (HRR). All such schemes encode structural information throughout the elements of high dimensional vectors, and are manipulated with rudimentary algebraic operations. <br /><br /> Most SDRs are very compact; components and compositions of components are all represented as fixed-width vectors. However, such compact compositions are unavoidably noisy. As a result, resolving constituent components requires a cleanup memory. In its simplest form, cleanup is performed with a list of vectors that are sequentially compared using a similarity metric. The closest match is deemed the cleaned codevector. <br /><br /> While SDR schemes were originally designed to perform cognitive tasks, none of them have been demonstrated in a neurobiologically plausible substrate. Potentially, mathematically proven properties of these systems may not be neurally realistic. Using Eliasmith and Anderson's (2003) Neural Engineering Framework, I construct various spiking neural networks to simulate a general cleanup memory that is suitable for many schemes. <br /><br /> Importantly, previous work has not taken advantage of parallelization or the high-dimensional properties of neural networks. Nor have they considered the effect of noise within these systems. As well, additional improvements to the cleanup operation may be possible by more efficiently structuring the memory itself. In this thesis I address these lacuna, provide an analysis of systems accuracy, capacity, scalability, and robustness to noise, and explore ways to improve the search efficiency.
|
2 |
Cleanup Memory in Biologically Plausible Neural NetworksSingh, Raymon January 2005 (has links)
During the past decade, a new class of knowledge representation has emerged known as structured distributed representation (SDR). A number of schemes for encoding and manipulating such representations have been developed; e. g. Pollack's Recursive Auto-Associative Memory (RAAM), Kanerva's Binary Spatter Code (BSC), Gayler's MAP encoding, and Plate's Holographically Reduced Representations (HRR). All such schemes encode structural information throughout the elements of high dimensional vectors, and are manipulated with rudimentary algebraic operations. <br /><br /> Most SDRs are very compact; components and compositions of components are all represented as fixed-width vectors. However, such compact compositions are unavoidably noisy. As a result, resolving constituent components requires a cleanup memory. In its simplest form, cleanup is performed with a list of vectors that are sequentially compared using a similarity metric. The closest match is deemed the cleaned codevector. <br /><br /> While SDR schemes were originally designed to perform cognitive tasks, none of them have been demonstrated in a neurobiologically plausible substrate. Potentially, mathematically proven properties of these systems may not be neurally realistic. Using Eliasmith and Anderson's (2003) Neural Engineering Framework, I construct various spiking neural networks to simulate a general cleanup memory that is suitable for many schemes. <br /><br /> Importantly, previous work has not taken advantage of parallelization or the high-dimensional properties of neural networks. Nor have they considered the effect of noise within these systems. As well, additional improvements to the cleanup operation may be possible by more efficiently structuring the memory itself. In this thesis I address these lacuna, provide an analysis of systems accuracy, capacity, scalability, and robustness to noise, and explore ways to improve the search efficiency.
|
3 |
深度學習於中文句子之表示法學習 / Deep learning techniques for Chinese sentence representation learning管芸辰, Kuan, Yun Chen Unknown Date (has links)
本篇論文主要在探討如何利用近期發展之深度學習技術在於中文句子分散式表示法學習。近期深度學習受到極大的注目,相關技術也隨之蓬勃發展。然而相關的分散式表示方式,大多以英文為主的其他印歐語系作為主要的衡量對象,也據其特性發展。除了印歐語系外,另外漢藏語系及阿爾泰語系等也有眾多使用人口。還有獨立語系的像日語、韓語等語系存在,各自也有其不同的特性。中文本身屬於漢藏語系,本身具有相當不同的特性,像是孤立語、聲調、量詞等。近來也有許多論文使用多語系的資料集作為評量標準,但鮮少去討論各語言間表現的差異。
本論文利用句子情緒分類之實驗,來比較近期所發展之深度學習之技術與傳統詞向量表示法的差異,我們將以TF-IDF為基準比較其他三個PVDM、Siamese-CBOW及Fasttext的表現差異,也深入探討此些模型對於中文句子情緒分類之表現。 / The paper demonstrates how the deep learning methods published in recent years applied in Chinese sentence representation learning.
Recently, the deep learning techniques have attracted the great attention. Related areas also grow enormously.
However, the most techniques use Indo-European languages mainly as evaluation objective and developed corresponding to their properties. Besides Indo-European languages, there are Sino-Tibetan language and Altaic language, which also spoken widely. There are only some independent languages like Japanese or Korean, which have their own properties. Chinese itself is belonged to Sino-Tibetan language family and has some characters like isolating language, tone, count word...etc.Recently, many publications also use the multilingual dataset to evaluate their performance, but few of them discuss the differences among different languages.
This thesis demonstrates that we perform the sentiment analysis on Chinese Weibo dataset to quantize the effectiveness of different deep learning techniques. We compared the traditional TF-IDF model with PVDM, Siamese-CBOW, and FastText, and evaluate the model they created.
|
4 |
Mining Heterogeneous Electronic Health Records DataBai, Tian January 2019 (has links)
Electronic health record (EHR) systems are used by medical providers to streamline the workflow and enable sharing of patient data with different providers. Beyond that primary purpose, EHR data have been used in healthcare research for exploratory and predictive analytics. EHR data are heterogeneous collections of both structured and unstructured information. In order to store data in a structured way, several ontologies have been developed to describe diagnoses and treatments. On the other hand, the unstructured clinical notes contain various more nuanced information about patients. The multidimensionality and complexity of EHR data pose many unique challenges and problems for both data mining and medical communities. In this thesis, we address several important issues and develop novel deep learning approaches in order to extract insightful knowledge from these data. Representing words as low dimensional vectors is very useful in many natural language processing tasks. This idea has been extended to medical domain where medical codes listed in medical claims are represented as vectors to facilitate exploratory analysis and predictive modeling. However, depending on a type of a medical provider, medical claims can use medical codes from different ontologies or from a combination of ontologies, which complicates learning of the representations. To be able to properly utilize such multi-source medical claim data, we propose an approach that represents medical codes from different ontologies in the same vector space. The new approach was evaluated on the code cross-reference problem, which aims at identifying similar codes across different ontologies. In our experiments, we show the proposed approach provide superior cross-referencing when compared to several existing approaches. Furthermore, considering EHR data also contain unstructured clinical notes, we also propose a method that jointly learns medical concept and word representations. The jointly learned representations of medical codes and words can be used to extract phenotypes of different diseases. Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. We propose a novel interpretable deep learning model, called Timeline. The main novelty of Timeline is that it has a mechanism that learns time decay factors for every medical code. We evaluated Timeline on two large-scale real world data sets. The specific task was to predict what is the primary diagnosis category for the next hospital visit given previous visits. Our results show that Timeline has higher accuracy than the state of the art deep learning models based on RNN. Clinical notes contain detailed information about health status of patients for each of their encounters with a health system. Developing effective models to automatically assign medical codes to clinical notes has been a long-standing active research area. Considering the large amount of online disease knowledge sources, which contain detailed information about signs and symptoms of different diseases, their risk factors, and epidemiology, we consider Wikipedia as an external knowledge source and propose Knowledge Source Integration (KSI), a novel end-to-end code assignment framework, which can integrate external knowledge during training of any baseline deep learning model. To evaluate KSI, we experimented with automatic assignment of ICD-9 diagnosis codes to clinical notes, aided by Wikipedia documents corresponding to the ICD-9 codes. The results show that KSI consistently improves the baseline models and that it is particularly successful in rare codes prediction. / Computer and Information Science
|
Page generated in 0.1355 seconds