Global ETD Search

331	Application of Data mining in Medical Applications Eapen, Arun George January 2004 (has links) Abstract Data mining is a relatively new field of research whose major objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make timely and accurate decisions. Two medical databases are considered, one for describing the various tools and the other as the case study. The first database is related to breast cancer and the second is related to the minimum data set for mental health (MDS-MH). The breast cancer database consists of 10 attributes and the MDS-MH dataset consists of 455 attributes. As there are a number of data mining algorithms and tools available we consider only a few tools to evaluate on these applications and develop classification rules that can be used in prediction. Our results indicate that for the major case study, namely the mental health problem, over 70 to 80% accurate results are possible. A further extension of this work is to make available classification rules in mobile devices such as PDAs. Patient information is directly inputted onto the PDA and the classification of these inputted values takes place based on the rules stored on the PDA to provide real time assistance to practitioners. Systems Design Data mining Health Informatics
332	Microarray analysis using pattern discovery Bainbridge, Matthew Neil 10 December 2004 (has links) Analysis of gene expression microarray data has traditionally been conducted using hierarchical clustering. However, such analysis has many known disadvantages and pattern discovery (PD) has been proposed as an alternative technique. In this work, three similar but different PD algorithms Teiresias, Splash and Genes@Work were benchmarked for time and memory efficiency on a small yeast cell-cycle data set. Teiresias was found to be the fastest, and best over-all program. However, Splash was more memory efficient. This work also investigated the performance of four methods of discretizing microarray data: sign-of-the-derivative, K-means, pre-set value, and Genes@Work stratification. The first three methods were evaluated on their predisposition to group together biologically related genes. On a yeast cell-cycle data set, sign-of-the-derivative method yielded the most biologically significant patterns, followed by the pre-set value and K-means methods. K-means, preset-value, and Genes@Work were also compared on their ability to classify tissue samples from diffuse large b-cell lymphoma (DLBCL) into two subtypes determined by standard techniques. The Genes@Work stratification method produced the best patterns for discriminating between the two subtypes of lymphoma. However, the results from the second-best method, K-means, call into question the accuracy of the classification by the standard technique. Finally, a number of recommendations for improvement of pattern discovery algorithms and discretization techniques are made. data mining patterns pattern discovery microarray bioinformatics
333	Incident Data Analysis Using Data Mining Techniques Veltman, Lisa M. 16 January 2010 (has links) There are several databases collecting information on various types of incidents, and most analyses performed on these databases usually do not expand past basic trend analysis or counting occurrences. This research uses the more robust methods of data mining and text mining to analyze the Hazardous Substances Emergency Events Surveillance (HSEES) system data by identifying relationships among variables, predicting the occurrence of injuries, and assessing the value added by the text data. The benefits of performing a thorough analysis of past incidents include better understanding of safety performance, better understanding of how to focus efforts to reduce incidents, and a better understanding of how people are affected by these incidents. The results of this research showed that visually exploring the data via bar graphs did not yield any noticeable patterns. Clustering the data identified groupings of categories across the variable inputs such as manufacturing events resulting from intentional acts like system startup and shutdown, performing maintenance, and improper dumping. Text mining the data allowed for clustering the events and further description of the data, however, these events were not noticeably distinct and drawing conclusions based on these clusters was limited. Inclusion of the text comments to the overall analysis of HSEES data greatly improved the predictive power of the models. Interpretation of the textual data?s contribution was limited, however, the qualitative conclusions drawn were similar to the model without textual data input. Although HSEES data is collected to describe the effects hazardous substance releases/threatened releases have on people, a fairly good predictive model was still obtained from the few variables identified as cause related. data mining text mining incident data HSEES
334	A Keyword-Based Association Rule Mining Method for Personal Document Query Tseng, Chien-Ming 29 August 2003 (has links) Because of the flourishing growth of Internet and IT there are too much information surround us today. We have limited attention but unlimited information. So almost all people today face a novel problem¡X Information Overload. It means our precious resource¡X attention, which is not enough to be used to digest any information that we touch. This problem also exists in Literature Digital Libraries. In today, any Literature Digital Library may collect over one million literatures and documents. Hence a well searching or recommendation mechanism is needful for users. But the traditional ones are not good enough for users. Their searching results may need users to spend more effort to select for users¡¦ true requirement. So this study tries to propose a new personal document recommendation mechanism to solve this problem. This mechanism use keyword-based association rule mining method to find association rules between documents. Then according to these rules and user¡¦s history preference, the mechanism recommend documents for user that they really want. After some evaluations, we prove this study¡¦s mechanism actually solve partial information overload problem. And it has good performance on both ¡§Precision¡¨ and ¡§Recall¡¨ indices. Data Mining Query Expansion Digital Library
335	Detecting Backdoor Kao, Cheng-yuan 12 August 2004 (has links) Cyber space is like a society. Attacking events happen all the time. No matter what is in the cyber space. We need to do many things to defend our computers and network devices form attackers, for example: update patches, install anti-virus software, firewalls and intrusion detection system. In all kinds of network attacks, it is hard to detect that an attacker install a backdoor after he crack the system. He can do many things by the backdoor, like steal sensitive or secret information. Otherwise, intrusion detection systems are responsible for early warnings, but they usually need to capture all the network packets include the headers and contents to analyze. It costs many overheads for the system. The goal of our research is to detect backdoors correctly, and we only use the network packet headers to analyze. Network Security Backdoor Data Mining Intrusion Detection
336	Backdoor Detection based on SVM Tzeng, Zhong-Chiang 29 July 2005 (has links) With the improvement of computer technologies and the wide use of the Internet, network security becomes more and more significant. According to the relevant statistics, malicious codes such as virus, worms, backdoors, and Trojans launch a lot of attacks. Backdoors are especially critical. Not only can it cross firewalls and antivirus software but also will steal confidential information and misuse network resources and launch attacks such as DDoS¡]Distributed Denial of Service¡^. In this research, we analyze the properties and categories of backdoors and the application of data mining and support vector machines in intrusion detection. This research will focus on detecting the behavior of backdoor connection, and we propose a detecting architecture. The architecture is based on SVM, which is a machine learning method based on statistic theory and proposed by Vapnik to solve the problems in Neural Network techniques. In system modules, this research chooses IPAudit as our network monitor and libsvm as a SVM classifier. The packets captured by IPAudit will be classified into interactive or non-interactive flow by libsvm, and the result will be compared with legal service lists to determine whether a connection is a backdoor connection. We compare the accuracy of SVM, C4.5, and Na Intrusion Detection Backdoor Data Mining and Classification SVM
337	Using Bayesian Networks for Discovering Temporal-State Transitions in Hemodialysis Chiu, Chih-Hung 02 August 2000 (has links) In this thesis, we discover knowledge from workflow logs with temporal-state transitions in the form of Bayesian networks. Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest, and easily incorporates with new instances to maintain rules up to date. The Bayesian networks can predict, communicate, train, and offer more alternatives to make better decisions. We demonstrate the proposed method in representing the causal relationships between medical treatments and transitions of patient¡¦s physiological states in the Hemodialysis process. The discovery of clinical pathway patterns of Hemodialysis can be used for predicting possible paths for an admitted patient, and facilitating medical professionals to control the Hemodialysis machines during the Hemodialysis process. The reciprocal knowledge management can be extended from the results in future research. Hemodialysis knowledge management Bayesian network data mining
338	Facilitating On-line Automated Bargaining Using Data Mining Technology -- A Solution from Time Series Analysis Kuang-Yi, Chang 02 August 2000 (has links) Bargaining is a frequent activity in the shopping process, and it becomes a trend in electronic trading. In order to facilitate the on-line automatic bargaining activity, we develop three algorithms on the multi-agent system in this thesis. The first algorithm is the pattern generalization algorithm used for generalizing common patterns from transaction records. The second one is the pattern matching algorithm used on-line for identifying possible bargaining patterns from the pattern bases. To deal with the situation that there is no matched pattern, we design the dynamic price issuing algorithm using the utility theory to determine the seller¡¦s price and the timing a deal should be closed. We conducted a series of field experiments to evaluate the proposed algorithms on different seller¡¦s risk perspectives and compared the performance with conventional bargaining methods. The results show that the proposed methods obtain encouraging performance. The major contribution of this research is the initiation efforts on developing data mining algorithms for facilitating the price bargaining process for e-commerce. time series agent bargaining data mining e-commerce
339	Article Recommendation in Literature Digital Libraries Hsiung, Wen-Chiang 02 September 2002 (has links) Literature digital libraries is perhaps one of the most important resources to research as the preserved literature data is vital to any researchers and practitioners who need to now what people have done previously in a particular area. The emergence of World Wide Web (www) further boosts the circulation power of literature digital libraries, and people who are interested in a particular topic may easily find related articles by searching a literature digital library that provides a www interface. However, it is quite often that a given search condition will yield a large number of articles, among which only a small subset will indeed interest the user. To provide more effective and efficient information search, many literature digital libraries are equipped with a recommendation subsystem that recommend articles to a user based on his past or current interest. In this thesis, we adapt the existing approaches for web page recommendation to the recommendation of literature digital libraries. We have investigated issues for article recommendation of a literature digital library. We have developed a recommendation framework in this context that makes use of web log of a literature digital library. This framework consists of three sequential steps: data preparation of the web log, association discovery, and article recommendations. We proposed three alternatives in identifying transactions from a web log, adapted the MSApriori algorithm for discovery large itemsets, and discussed two approaches, namely hypergraph and association based recommendations, for making recommendation. These alternatives and approaches were evaluated using the web log of an operational electronic thesis system at NSYSU. It has been found that query-chosen and session-chosen are better methods for transaction identification, and hypergraph based approach yields better quality of article recommendation and has stable running time. literature digital library data mining recommendation
340	Constructing Decision Tree Using Learners¡¦ Portfolio for Supporting e-Learning Liao, Shen-Jai 01 July 2003 (has links) In recent years, with the development of electronic media, e-learning has begun to replace traditional teaching and learning with Internet service. With the availability of newly developed technology, opportunities have risen for the teacher of e-learning to using students¡¦ learning logs that recorded via Web site to understanding the learning state of students. This research will address an analytical mechanism that integrated multidimensional logs to let teachers observe students all learning behaviors and learning status immediately, and used decision tree analysis to detect when and what students may have a learning bottleneck. Finally, teachers can use those results to give the right student with the right remedial instruction at the right time. Summary, we have four conclusions: (1) the decision rules are different from course to course, for example instruction method and assessment method, assignment is a basis to assess student¡¦s learning effectiveness, as well those attributes cooperate with learning effectiveness are related to student¡¦s learning behaviors. (2) To accumulate those learning behavior attributes with the time point actually detect learners probably learning effectiveness early. The variation of effectiveness with different time interval is not clearly, but all time intervals can detect learning effectiveness early. (3) To detect students¡¦ learning effectiveness with different grade level classifications, every grade level classifications can describe decision rules very well, but not to detect all students¡¦ learning effectiveness. (4) Although to detect high-grade students¡¦ learning effectiveness are very difficult, but we can detect lower-grade students¡¦ learning effectiveness. Finally, this research can really observe student¡¦s leaning states immediately, and early detect students¡¦ learning effectiveness. Therefore, teachers can make decisions to manage learning activities to promote learning effect. Portfolio Decision-Tree Analysis e-Learning Data Mining

Search results