Global ETD Search

141	Statistical Analysis, Modeling, and Algorithms for Pharmaceutical and Cancer Systems Choi, Bong-Jin 27 May 2014 (has links) The aim of the present study is to develop a statistical algorithm and model associ- ated with breast and lung cancer patients. In this study, we developed several statistical softwares, R packages, and models using our new statistical approach. In the present study, we used the five parameters logistic model for determining the optimal doses of a pharmaceutical drugs, including dynamic initial points, an automatic process for outlier detection and an algorithm that develops a graphic user interface(GUI) program. The developed statistical procedure assists medical scientists by reducing their time in determining the optimal dose of new drugs, and can also easily identify which drugs need more experimentation. Secondly, in the present study, we developed a new classification method that is very useful in the health sciences. We used a new decision tree algorithm and a random forest method to rank our variables and to build a final decision tree model. The decision tree can identify and communicate complex data systems to scientists with minimal knowledge in statistics. Thirdly, we developed statistical packages using the Johnson SB probability distribu- tion which is important in parametrically studying a variety of health, environmental, and engineering problems. Scientists are experiencing difficulties in obtaining estimates for the four parameters of the subject probability distribution. The developed algorithm com- bines several statistical procedures, such as, the Newtwon Raphson, the Bisection, the Least Square Estimation, and the regression method to develop our R package. This R package has functions that generate random numbers, calculate probabilities, inverse probabilities, and estimate the four parameters of the SB Johnson probability distribution. Researchers can use the developed R package to build their own statistical models or perform desirable statistical simulations. The final aspect of the study involves building a statistical model for lung cancer sur- vival time. In developing the subject statistical model, we have taken into consideration the number of cigarettes the patient smoked per day, duration of smoking, and the age at diagnosis of lung cancer. The response variables the survival time. The significant factors include interaction. the probability density function of the survival times has been obtained and the survival function is determined. The analysis is have on your groups the involve gender and with factors. A companies with the ordinary survival function is given. Decision Tree Drug Efficiency Random Forest Survival Analysis Variable Rank Medicinal Chemistry and Pharmaceutics Medicine and Health Sciences Statistics and Probability
142	Model error space and data assimilation in the Mediterranean Sea and nested grids / Espace d'erreur et assimilation de données dans un modèle de la Mer Mediterranée et des grilles gigognes. Vandenbulcke, Luc 11 June 2007 (has links) In this work, we implemented the GHER hydrodynamic model in the Gulf of Lions (resolution 1/100°). This model is nested interactively in another model covering the North-Western basin of the Mediterranean Sea (resolution 1/20°), itself nested in a model covering the whole basin (1/4°). A data assimilation filter, called the SEEK filter, is used to test in which of those grids observations taken in the Gulf of Lions are best assimilated. Therefore, twin experiments are used: a reference run is considered as the truth, and another run, starting from different initial conditions, assimilates pseudo-observations coming from the reference run. It appeared that, in order to best constrain the coastal model, available data should be assimilated in that model. The most efficient setup, however, is to group all the state vectors from the 3 grids into a single vector, and hence coherently modify the 3 domains at once during assimilation cycles. Operational forecasting with nested models often only uses so-called passive nesting: no data feedback happens from the regional models to the global model. We propose a new idea: to use data assimilation as a substitute for the feedback. Using again twin experiments, we show that when assimilating outputs from the regional model in the global model, this has benecial impacts for the subsequent forecasts in the regional model. The data assimilation method used in those experiments corrects errors in the models using only some privileged directions in the state space. Furthermore, these directions are selected from a previous model run. This is a weakness of the method when real observations are available. We tried to build new directions of the state space using an ensemble run, this time covering only the Mediterranean basin (without grid nesting). This led to a quantitative characterization of the forecast errors we might expect when various parameters and external forcings are affected by uncertainties. Finally, using these new directions, we tried to build a statistical model supposed to simulate the hydrodynamical model using only a fraction of the computer resources needed by the latter. To achieve this goal, we tried out artifficial neural networks, nearest-neighbor and regression trees. This study constitutes only the first step toward an innovative statistical model, as in its present form, only a few degrees of freedom are considered and the primitive equation model is still required to build the AL method. We tried forecasting at 2 different time horizons: one day and one week. coupled primitive equation model ocean model data assimilation statistical prediction model error neural network decision tree Gulf of Lions Mediterranean Sea
143	Predicting The Disease Of Alzheimer (ad) With Snp Biomarkers And Clinical Data Based Decision Support System Using Data Mining Classification Approaches Erdogan, Onur 01 September 2012 (has links) (PDF) Single Nucleotide Polymorphisms (SNPs) are the most common DNA sequence variations where only a single nucleotide (A, T, C, G) in the human genome differs between individuals. Besides being the main genetic reason behind individual phenotypic differences, SNP variations have the potential to exploit the molecular basis of many complex diseases. Association of SNPs subset with diseases and analysis of the genotyping data with clinical findings will provide practical and affordable methodologies for the prediction of diseases in clinical settings. So, there is a need to determine the SNP subsets and patients&rsquo / clinical data which is informative for the prediction or the diagnosis of the particular diseases. So far, there is no established approach for selecting the representative SNP subset and patients&rsquo / clinical data, and data mining methodology that is based on finding hidden and key patterns over huge databases. This approach have the highest potential for extracting the knowledge from genomic datasets and to select the number of SNPs and most effective clinical features for diseases that are informative and relevant for clinical diagnosis. In this study we have applied one of the widely used data mining classification methodology: &ldquo / decision tree&rdquo / for associating the SNP Biomarkers and clinical data with the Alzheimer&rsquo / s disease (AD), which is the most common form of &ldquo / dementia&rdquo / . Different tree construction parameters have been compared for the optimization, and the most efficient and accurate tree for predicting the AD is presented. QA Analysis 299.6-433 s Disease
144	Integrating Information Theory Measures and a Novel Rule-Set-Reduction Tech-nique to Improve Fuzzy Decision Tree Induction Algorithms Abu-halaweh, Nael Mohammed 02 December 2009 (has links) Machine learning approaches have been successfully applied to many classification and prediction problems. One of the most popular machine learning approaches is decision trees. A main advantage of decision trees is the clarity of the decision model they produce. The ID3 algorithm proposed by Quinlan forms the basis for many of the decision trees’ application. Trees produced by ID3 are sensitive to small perturbations in training data. To overcome this problem and to handle data uncertainties and spurious precision in data, fuzzy ID3 integrated fuzzy set theory and ideas from fuzzy logic with ID3. Several fuzzy decision trees algorithms and tools exist. However, existing tools are slow, produce a large number of rules and/or lack the support for automatic fuzzification of input data. These limitations make those tools unsuitable for a variety of applications including those with many features and real time ones such as intrusion detection. In addition, the large number of rules produced by these tools renders the generated decision model un-interpretable. In this research work, we proposed an improved version of the fuzzy ID3 algorithm. We also introduced a new method for reducing the number of fuzzy rules generated by Fuzzy ID3. In addition we applied fuzzy decision trees to the classification of real and pseudo microRNA precursors. Our experimental results showed that our improved fuzzy ID3 can achieve better classification accuracy and is more efficient than the original fuzzy ID3 algorithm, and that fuzzy decision trees can outperform several existing machine learning algorithms on a wide variety of datasets. In addition our experiments showed that our developed fuzzy rule reduction method resulted in a significant reduction in the number of produced rules, consequently, improving the produced decision model comprehensibility and reducing the fuzzy decision tree execution time. This reduction in the number of rules was accompanied with a slight improvement in the classification accuracy of the resulting fuzzy decision tree. In addition, when applied to the microRNA prediction problem, fuzzy decision tree achieved better results than other machine learning approaches applied to the same problem including Random Forest, C4.5, SVM and Knn. Decision tree ID3 Fuzzy ID3 FID3 Classification Fuzzy MicroRNA Prediction Rule-set reduction Machine learning Pre-microRNA Computer Sciences
145	Protein Tertiary Model Assessment Using Granular Machine Learning Techniques Chida, Anjum A 21 March 2012 (has links) The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same. Protein 3D Structures Protein model assessment Feature selection Support vector machines Decision tree Fuzzy ID3 and Machine learning.
146	Detecting Swiching Points and Mode of Transport from GPS Tracks Araya, Yeheyies January 2012 (has links) In recent years, various researches are under progress to enhance the quality of the travel survey. These researches were mainly performed with the aid of GPS technology. Initially the researches were mainly focused on the vehicle travel mode due to the availability of GPS technology in vehicle. But, nowadays due to the accessible of GPS devices for personal uses, researchers have diverted their focus on personal mobility in all travel modes. This master’s thesis aimed at developing a mechanism to extract one type of travel survey information particularly travel mode from collected GPS dataset. The available GPS dataset is collected for travel modes of walk, bike, car, and public transport travel modes such as bus, train and subway. The developed procedure consists of two stages where the first is the dividing the track trips into trips and further the trips into segments by means of a segmentation process. The segmentation process is based on an assumption that a traveler switches from one transportation mode to the other. Thus, the trips are divided into walking and non walking segments. The second phase comprises a procedure to develop a classification model to infer the separated segments with travel modes of walk, bike, bus, car, train and subway. In order to develop the classification model, a supervised classification method has been used where decision tree algorithm is adopted. The highest obtained prediction accuracy of the classification system is walk travel mode with 75.86%. In addition, the travel modes of bike and bus have shown the lowest prediction accuracy. Moreover, the developed system has showed remarkable results that could be used as baseline for further similar researches. Travel demand model Supervised classification model Decision tree Data mining Artificial intelligence Inferring travel modes GPS in travel survey
147	Automatic Construction Algorithms for Supervised Neural Networks and Applications Tsai, Hsien-Leing 28 July 2004 (has links) The reseach on neural networks has been done for six decades. In this period, many neural models and learning rules have been proposed. Futhermore, they were popularly and successfully applied to many applications. They successfully solved many problems that traditional algorithms could not solve efficiently . However, applying multilayer neural networks to applications, users are confronted with the problem of determining the number of hidden layers and the number of hidden neurons in each hidden layer. It is too difficult for users to determine proper neural network architectures. However, it is very significant, because neural network architectures always influence critically their performance. We may solve problems efficiently, only when we has proper neural network architectures. To overcome this difficulty, several approaches have been proposed to generate the architecture of neural networks recently. However, they still have some drawbacks. The goal of our research is to discover better approachs to automatically determine proper neural network architectures. We propose a series of approaches in this thesis. First, we propose an approach based on decision trees. It successfully determines neural network architectures and greatly decreases learning time. However, it can deal only with two-class problems and it generates bigger neural network architectures. Next, we propose an information entropy based approach to overcome the above drawbacks. It can generate easily multi-class neural networks for standard domain problems. Finally, we expand the above method for sequential domain and structured domain problems. Therefore, our approaches can be applied to many applications. Currently, we are trying to work on quantum neural networks. We are also interested in ART neural networks. They are also incremental neural models. We apply them to digital signal processing. We propose a character recognition application, a spoken word recognition application, and an image compression application. All of them have good performances. information entropy image compression neural networks dynamic time warping learning rules decision tree simulated annealing method fuzzy theories
148	An Improved C-Fuzzy Decision Tree and its Application to Vector Quantization Chiu, Hsin-Wei 27 July 2006 (has links) In the last one hundred years, the mankind has invented a lot of convenient tools for pursuing beautiful and comfortable living environment. Computer is one of the most important inventions, and its operation ability is incomparable with the mankind. Because computer can deal with a large amount of data fast and accurately, people use this advantage to imitate human thinking. Artificial intelligence is developed extensively. Methods, such as all kinds of neural networks, data mining, fuzzy logic, etc., apply to each side fields (ex: fingerprint distinguishing, image compressing, antennal designing, etc.). We will probe into to prediction technology according to the decision tree and fuzzy clustering. The fuzzy decision tree proposed the classification method by using fuzzy clustering method, and then construct out the decision tree to predict for data. However, in the distance function, the impact of the target space was proportional inversely. This situation could make problems in some dataset. Besides, the output model of each leaf node represented by a constant restricts the representation capability about the data distribution in the node. We propose a more reasonable definition of the distance function by considering both input and target differences with weighting factor. We also extend the output model of each leaf node to a local linear model and estimate the model parameters with a recursive SVD-based least squares estimator. Experimental results have shown that our improved version produces higher recognition rates and smaller mean square errors for classification and regression problems, respectively. vector quantization method fuzzy decision tree Decision trees fuzzy clustering image compression
149	Enhancing Accuracy Of Hybrid Recommender Systems Through Adapting The Domain Trends Aksel, Fatih 01 September 2010 (has links) (PDF) Traditional hybrid recommender systems typically follow a manually created fixed prediction strategy in their decision making process. Experts usually design these static strategies as fixed combinations of different techniques. However, people&#039 / s tastes and desires are temporary and they gradually evolve. Moreover, each domain has unique characteristics, trends and unique user interests. Recent research has mostly focused on static hybridization schemes which do not change at runtime. In this thesis work, we describe an adaptive hybrid recommender system, called AdaRec that modifies its attached prediction strategy at runtime according to the performance of prediction techniques (user feedbacks). Our approach to this problem is to use adaptive prediction strategies. Experiment results with datasets show that our system outperforms naive hybrid recommender. QA Computer Software 76.75-76.765
150	Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages Jarman, Jay 01 January 2011 (has links) This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual methods. The methods are evaluated on an extensive set of real clinical notes annotated by experts to provide benchmark results. There are two main research question for this dissertation. First, can information (specialized language) be extracted from clinical progress notes that will represent the notes without loss of predictive information? Secondly, can classifiers be built for clinical progress notes that are represented by specialized language? Three experiments were conducted to answer these questions by investigating some specific challenges with regard to extracting information from the unstructured clinical notes and classifying documents that are so important in the medical domain. The first experiment addresses the first research question by focusing on whether relevant patterns within clinical notes reside more in the highly technical medically-relevant terminology or in the passages expressed by common language. The results from this experiment informed the subsequent experiments. It also shows that predictive patterns are preserved by preprocessing text documents with a grammatical NLP system that separates specialized language from common language and it is an acceptable method of data reduction for the purpose of STM. Experiments two and three address the second research question. Experiment two focuses on applying rule-mining techniques to the output of the information extraction effort from experiment one, with the ultimate goal of creating rule-based classifiers. There are several contributions of this experiment. First, it uses a novel approach to create classification rules from specialized language and to build a classifier. The data is split by classification and then rules are generated. Secondly, several toolkits were assembled to create the automated process by which the rules were created. Third, this automated process created interpretable rules and finally, the resulting model provided good accuracy. The resulting performance was slightly lower than from the classifier from experiment one but had the benefit of having interpretable rules. Experiment three focuses on using decision tree induction (DTI) for a rule discovery approach to classification, which also addresses research question three. DTI is another rule centric method for creating a classifier. The contributions of this experiment are that DTI can be used to create an accurate and interpretable classifier using specialized language. Additionally, the resulting rule sets are simple and easily interpretable, as well as created using a highly automated process. computational linguistics data mining decision tree machine learning rule mining American Studies Arts and Humanities Databases and Information Systems

Search results