Global ETD Search

361	The development of a single nucleotide polymorphism database for forensic identification of specified physical traits Alecia Geraldine Naidu January 2009 (has links) <p>Many Single Nucleotide Polymorphisms (SNPs) found in coding or regulatory regions within the human genome lead to phenotypic differences that make prediction of physical appearance, based on genetic analysis, potentially useful in forensic investigations. Complex traits such as pigmentation can be predicted from the genome sequence, provided that genes with strong effects on the trait exist and are known. Phenotypic traits may also be associated with variations in gene expression due to the presence of SNPs in promoter regions. In this project, the identification of genes associated with these physical traits of potential forensic relevance have been collated from the literature using a text mining platform and hand curation. The SNPs associated with these genes have been acquired from public SNP repositories such as the International HapMap project, dbSNP and Ensembl. Characterization of different population groups based on the SNPs has been performed and the results and data stored in a MySQL database. This database contains SNP genotyping data with respect to physical phenotypic differences of forensic interest. The potential forensicrelevance of the SNP information contained in this database has been verified through in silico SNP analysis aimed at establishing possible relationships between SNP occurrence and phenotype. The software used for this analysis is MATCH&trade / .</p> Bioinformatics Forensics Pigmentation Height Single Nucleotide Polymorphism Population Text-Mining MySQL Forensic SNP Phenotype Database Web user interface.
362	Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance Kojo, Kwofie Samuel January 2011 (has links) <p>To ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via&nbsp / http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based&nbsp / approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the&nbsp / Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific&nbsp / exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to&nbsp / computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can&nbsp / retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously&nbsp / reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles.&nbsp / Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via &ldquo / string searches&rdquo / . The utility of HCVpro&nbsp / has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin.&nbsp / Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their&nbsp / diagnostic potential. The data archived in HCVpro can be&nbsp / utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.&nbsp / </p>
363	Efficient Temporal Synopsis of Social Media Streams Abouelnagah, Younes January 2013 (has links) Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search. Information Retrieval Text Mining Data Mining Social Media Temporal summarization Frequent itemset mining Microblogs Frequent pattern mining Computer Science
364	運用文字探勘技術建立MD&A之分類閱讀器 / Using text-mining technology in developing a classified reader for MD&A 吳詩婷, Wu, Shih Ting Unknown Date (has links) 年報中富含眾多資訊，其中包含財務性資訊與文字性資訊，財務性資訊之分析方法已相當成熟，而文字性資訊受限於格式及檔案類型，而降低投資人使用或分析此類資訊之效率。管理階層討論與分析(Management’s Discussion & Analysis of Financial Condition and Results of Operations，以下簡稱MD&A)係管理階層傳達其經營決策觀點予投資人之媒介，投資人可透過閱讀MD&A取得更多資訊，過去學者之研究亦證實該項目內之文字性資訊有其重要性，由於文字性資訊缺乏通用之分類架構，因此投資人需耗費較多時間與成本分析該資訊。本研究自美國科技業上市公司，隨機選取40家企業2012年之年報作為樣本資料，藉由文字探勘技術，運用TFIDF將MD&A文字性內容分類至EBRC針對MD&A所發布之分類架構，建立分類閱讀器，使投資人可利用透過系統分類並彙整之文句，迅速取得所需之文字性資訊，以協助使用者有效率地閱讀這些非結構化之文字資訊，藉以減少資料蒐集之時間，提升文字性資訊之可使用性。 / Annual reports are rich in information, which contains financial information and textual information. While the approach of analyzing financial information is common, textual information is confined by its format or the file type it is stored, thus decreasing the efficiency of analyzing this sort of information. Management’s Discussion & Analysis of Financial Condition and Results of Operations (MD&A) is the vehicle for investor to share the sight of managements’ decision making consideration, through reading MD&A investor could obtain more information. According to past researches, textual information is of importance. Due to the lack of a common framework, investors would consume more time and cost to analyze textual information. This research randomly selected 40 samples from publicly traded technology firms of the United-States. Utilizing text-mining technology and TFIDF, classify textual information of MD&A into the framework EBRC established, developing a classified reader for MD&A. To assist investors read non-constructed textual information efficiently and reduce the time of information gathering, thereby enhancing the usability of textual information. 文字性資訊 MD&A 文字探勘 Textual information MD&A text-mining
365	Cluster-based Query Expansion Technique Huang, Chun-Neng 14 August 2003 (has links) As advances in information and networking technologies, huge amount of information typically in the form of text documents are available online. To facilitate efficient and effective access to documents relevant to users¡¦ information needs, information retrieval systems have been imposed a more significant role than ever. One challenging issue in information retrieval is word mismatch that refers to the phenomenon that concepts may be described by different words in user queries and/or documents. The word mismatch problem, if not appropriately addressed, would degrade retrieval effectiveness critically of an information retrieval system. In this thesis, we develop a cluster-based query expansion technique to solve the word mismatch problem. Using the traditional query expansion techniques (i.e., global analysis and local feedback) as performance benchmarks, the empirical results suggest that when a user query only consists of one query term, the global analysis technique is more effective. However, if a user query consists of two or more query terms, the cluster-based query expansion technique can provide a more accurate query result, especially within the first few top-ranked documents retrieved. Term Association Query Expansion Thesaurus Text Mining Information Retrieval Word Mismatch Cluster-based Query Expansion Document Clustering
366	Μέτρα ομοιότητας στην τεχνική ομαδοποίησης (clustering): εφαρμογή στην ανάλυση κειμένων (text mining) / Similarity measures in clustering: an application in text mining Παπαστεργίου, Θωμάς 17 May 2007 (has links) Ανάπτυξη ενός μέτρου ανομοιότητας μεταξύ κατηγορικών δεδομένων και η εφαρμογή του για την ομαδοποίηση κειμένων και την λύση του προβλήματος αυθεντiκότητας κειμένων. / Developement of a similarity measure for categorical data and the application of the measure in text clustering and in the authoring attribution problem. Ομαδοποίηση Μέτρα ομοιότητας Εξόρυξη κειμένου 006.312 Clustering Similarity measures Authoring attribution problem Text mining
367	Efficient Temporal Synopsis of Social Media Streams Abouelnagah, Younes January 2013 (has links) Search and summarization of streaming social media, such as Twitter, requires the ongoing analysis of large volumes of data with dynamically changing characteristics. Tweets are short and repetitious -- lacking context and structure -- making it difficult to generate a coherent synopsis of events within a given time period. Although some established algorithms for frequent itemset analysis might provide an efficient foundation for synopsis generation, the unmodified application of standard methods produces a complex mass of rules, dominated by common language constructs and many trivial variations on topically related results. Moreover, these results are not necessarily specific to events within the time period of interest. To address these problems, we build upon the Linear time Closed itemset Mining (LCM) algorithm, which is particularly suited to the large and sparse vocabulary of tweets. LCM generates only closed itemsets, providing an immediate reduction in the number of trivial results. To reduce the impact of function words and common language constructs, we apply a filltering step that preserves these terms only when they may form part of a relevant collocation. To further reduce trivial results, we propose a novel strengthening of the closure condition of LCM to retain only those results that exceed a threshold of distinctiveness. Finally, we perform temporal ranking, based on information gain, to identify results that are particularly relevant to the time period of interest. We evaluate our work over a collection of tweets gathered in late 2012, exploring the efficiency and filtering characteristic of each processing step, both individually and collectively. Based on our experience, the resulting synopses from various time periods provide understandable and meaningful pictures of events within those periods, with potential application to tasks such as temporal summarization and query expansion for search. Information Retrieval Text Mining Data Mining Social Media Temporal summarization Frequent itemset mining Microblogs Frequent pattern mining Computer Science
368	Concept Based Knowledge Discovery from Biomedical Literature. Radovanovic, Aleksandar. January 2009 (has links) <p>This thesis describes and introduces novel methods for knowledge discovery and presents a software system that is able to extract information from biomedical literature, review interesting connections between various biomedical concepts and in so doing, generates new hypotheses. The experimental results obtained by using methods described in this thesis, are compared to currently published results obtained by other methods and a number of case studies are described. This thesis shows how the technology&nbsp / resented can be integrated with the researchers&rsquo / own knowledge, experimentation and observations for optimal progression of scientific research.</p> Bioinformatics Text mining PubMed Entity recognition Information extraction Relation Extraction Levenshtein distance Supervised classification Natural Language Processing Machine learning.
369	The development of a single nucleotide polymorphism database for forensic identification of specified physical traits Alecia Geraldine Naidu January 2009 (has links) <p>Many Single Nucleotide Polymorphisms (SNPs) found in coding or regulatory regions within the human genome lead to phenotypic differences that make prediction of physical appearance, based on genetic analysis, potentially useful in forensic investigations. Complex traits such as pigmentation can be predicted from the genome sequence, provided that genes with strong effects on the trait exist and are known. Phenotypic traits may also be associated with variations in gene expression due to the presence of SNPs in promoter regions. In this project, the identification of genes associated with these physical traits of potential forensic relevance have been collated from the literature using a text mining platform and hand curation. The SNPs associated with these genes have been acquired from public SNP repositories such as the International HapMap project, dbSNP and Ensembl. Characterization of different population groups based on the SNPs has been performed and the results and data stored in a MySQL database. This database contains SNP genotyping data with respect to physical phenotypic differences of forensic interest. The potential forensicrelevance of the SNP information contained in this database has been verified through in silico SNP analysis aimed at establishing possible relationships between SNP occurrence and phenotype. The software used for this analysis is MATCH&trade / .</p> Bioinformatics Forensics Pigmentation Height Single Nucleotide Polymorphism Population Text-Mining MySQL Forensic SNP Phenotype Database Web user interface.
370	Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance Kojo, Kwofie Samuel January 2011 (has links) <p>To ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via&nbsp / http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based&nbsp / approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the&nbsp / Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific&nbsp / exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to&nbsp / computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can&nbsp / retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously&nbsp / reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles.&nbsp / Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via &ldquo / string searches&rdquo / . The utility of HCVpro&nbsp / has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin.&nbsp / Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their&nbsp / diagnostic potential. The data archived in HCVpro can be&nbsp / utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.&nbsp / </p>

Search results