Global ETD Search

131	Protein Tertiary Model Assessment Using Granular Machine Learning Techniques Chida, Anjum A 21 March 2012 (has links) The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same. Protein 3D Structures Protein model assessment Feature selection Support vector machines Decision tree Fuzzy ID3 and Machine learning.
132	Detecting Swiching Points and Mode of Transport from GPS Tracks Araya, Yeheyies January 2012 (has links) In recent years, various researches are under progress to enhance the quality of the travel survey. These researches were mainly performed with the aid of GPS technology. Initially the researches were mainly focused on the vehicle travel mode due to the availability of GPS technology in vehicle. But, nowadays due to the accessible of GPS devices for personal uses, researchers have diverted their focus on personal mobility in all travel modes. This master’s thesis aimed at developing a mechanism to extract one type of travel survey information particularly travel mode from collected GPS dataset. The available GPS dataset is collected for travel modes of walk, bike, car, and public transport travel modes such as bus, train and subway. The developed procedure consists of two stages where the first is the dividing the track trips into trips and further the trips into segments by means of a segmentation process. The segmentation process is based on an assumption that a traveler switches from one transportation mode to the other. Thus, the trips are divided into walking and non walking segments. The second phase comprises a procedure to develop a classification model to infer the separated segments with travel modes of walk, bike, bus, car, train and subway. In order to develop the classification model, a supervised classification method has been used where decision tree algorithm is adopted. The highest obtained prediction accuracy of the classification system is walk travel mode with 75.86%. In addition, the travel modes of bike and bus have shown the lowest prediction accuracy. Moreover, the developed system has showed remarkable results that could be used as baseline for further similar researches. Travel demand model Supervised classification model Decision tree Data mining Artificial intelligence Inferring travel modes GPS in travel survey
133	Automatic Construction Algorithms for Supervised Neural Networks and Applications Tsai, Hsien-Leing 28 July 2004 (has links) The reseach on neural networks has been done for six decades. In this period, many neural models and learning rules have been proposed. Futhermore, they were popularly and successfully applied to many applications. They successfully solved many problems that traditional algorithms could not solve efficiently . However, applying multilayer neural networks to applications, users are confronted with the problem of determining the number of hidden layers and the number of hidden neurons in each hidden layer. It is too difficult for users to determine proper neural network architectures. However, it is very significant, because neural network architectures always influence critically their performance. We may solve problems efficiently, only when we has proper neural network architectures. To overcome this difficulty, several approaches have been proposed to generate the architecture of neural networks recently. However, they still have some drawbacks. The goal of our research is to discover better approachs to automatically determine proper neural network architectures. We propose a series of approaches in this thesis. First, we propose an approach based on decision trees. It successfully determines neural network architectures and greatly decreases learning time. However, it can deal only with two-class problems and it generates bigger neural network architectures. Next, we propose an information entropy based approach to overcome the above drawbacks. It can generate easily multi-class neural networks for standard domain problems. Finally, we expand the above method for sequential domain and structured domain problems. Therefore, our approaches can be applied to many applications. Currently, we are trying to work on quantum neural networks. We are also interested in ART neural networks. They are also incremental neural models. We apply them to digital signal processing. We propose a character recognition application, a spoken word recognition application, and an image compression application. All of them have good performances. information entropy image compression neural networks dynamic time warping learning rules decision tree simulated annealing method fuzzy theories
134	An Improved C-Fuzzy Decision Tree and its Application to Vector Quantization Chiu, Hsin-Wei 27 July 2006 (has links) In the last one hundred years, the mankind has invented a lot of convenient tools for pursuing beautiful and comfortable living environment. Computer is one of the most important inventions, and its operation ability is incomparable with the mankind. Because computer can deal with a large amount of data fast and accurately, people use this advantage to imitate human thinking. Artificial intelligence is developed extensively. Methods, such as all kinds of neural networks, data mining, fuzzy logic, etc., apply to each side fields (ex: fingerprint distinguishing, image compressing, antennal designing, etc.). We will probe into to prediction technology according to the decision tree and fuzzy clustering. The fuzzy decision tree proposed the classification method by using fuzzy clustering method, and then construct out the decision tree to predict for data. However, in the distance function, the impact of the target space was proportional inversely. This situation could make problems in some dataset. Besides, the output model of each leaf node represented by a constant restricts the representation capability about the data distribution in the node. We propose a more reasonable definition of the distance function by considering both input and target differences with weighting factor. We also extend the output model of each leaf node to a local linear model and estimate the model parameters with a recursive SVD-based least squares estimator. Experimental results have shown that our improved version produces higher recognition rates and smaller mean square errors for classification and regression problems, respectively. vector quantization method fuzzy decision tree Decision trees fuzzy clustering image compression
135	Enhancing Accuracy Of Hybrid Recommender Systems Through Adapting The Domain Trends Aksel, Fatih 01 September 2010 (has links) (PDF) Traditional hybrid recommender systems typically follow a manually created fixed prediction strategy in their decision making process. Experts usually design these static strategies as fixed combinations of different techniques. However, people&#039 / s tastes and desires are temporary and they gradually evolve. Moreover, each domain has unique characteristics, trends and unique user interests. Recent research has mostly focused on static hybridization schemes which do not change at runtime. In this thesis work, we describe an adaptive hybrid recommender system, called AdaRec that modifies its attached prediction strategy at runtime according to the performance of prediction techniques (user feedbacks). Our approach to this problem is to use adaptive prediction strategies. Experiment results with datasets show that our system outperforms naive hybrid recommender. QA Computer Software 76.75-76.765
136	Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages Jarman, Jay 01 January 2011 (has links) This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual methods. The methods are evaluated on an extensive set of real clinical notes annotated by experts to provide benchmark results. There are two main research question for this dissertation. First, can information (specialized language) be extracted from clinical progress notes that will represent the notes without loss of predictive information? Secondly, can classifiers be built for clinical progress notes that are represented by specialized language? Three experiments were conducted to answer these questions by investigating some specific challenges with regard to extracting information from the unstructured clinical notes and classifying documents that are so important in the medical domain. The first experiment addresses the first research question by focusing on whether relevant patterns within clinical notes reside more in the highly technical medically-relevant terminology or in the passages expressed by common language. The results from this experiment informed the subsequent experiments. It also shows that predictive patterns are preserved by preprocessing text documents with a grammatical NLP system that separates specialized language from common language and it is an acceptable method of data reduction for the purpose of STM. Experiments two and three address the second research question. Experiment two focuses on applying rule-mining techniques to the output of the information extraction effort from experiment one, with the ultimate goal of creating rule-based classifiers. There are several contributions of this experiment. First, it uses a novel approach to create classification rules from specialized language and to build a classifier. The data is split by classification and then rules are generated. Secondly, several toolkits were assembled to create the automated process by which the rules were created. Third, this automated process created interpretable rules and finally, the resulting model provided good accuracy. The resulting performance was slightly lower than from the classifier from experiment one but had the benefit of having interpretable rules. Experiment three focuses on using decision tree induction (DTI) for a rule discovery approach to classification, which also addresses research question three. DTI is another rule centric method for creating a classifier. The contributions of this experiment are that DTI can be used to create an accurate and interpretable classifier using specialized language. Additionally, the resulting rule sets are simple and easily interpretable, as well as created using a highly automated process. computational linguistics data mining decision tree machine learning rule mining American Studies Arts and Humanities Databases and Information Systems
137	Multi-Temporal Crop Classification Using a Decision Tree in a Southern Ontario Agricultural Region Melnychuk, Amie 03 October 2012 (has links) Identifying landuse management practices is important for detecting landuse change and impacts on the surrounding landscape. The Ontario Ministry of Agriculture and Rural A airs has established a database product called the Agricultural Resource Inventory (AgRI), which is used for the storage and analysis of agricultural land management practices. This thesis explores the opportunity to populate the AgRI. A comparison of two supervised classi fications using optical satellite imagery with multiple single-date classifi cations and a subsequent multi-date, multi-sensor classi fication were used to gauge the best image timing for crop classi fication. In this study optical satellite images (Landsat-5 and SPOT-4/5) were inputted into a decision tree classifi er and Maximum Likelihood Classifi er (MLC) where the decision tree performed better than the MLC in overall and class accuracies. Classifi cation experienced complications from visual diff erences in vegetation. The multi-date classifi cation performed had an accuracy of 66.52%. The lack of imagery available at crop ripening stages reduced the accuracies greatly. multi-sensor crop classification decision tree maximum likelihood classifier multi-temporal Landsat-5 SPOT-5 SPOT-4
138	Predictive Health Monitoring for Aircraft Systems using Decision Trees Gerdes, Mike January 2014 (has links) Unscheduled aircraft maintenance causes a lot problems and costs for aircraft operators. This is due to the fact that aircraft cause significant costs if flights have to be delayed or canceled and because spares are not always available at any place and sometimes have to be shipped across the world. Reducing the number of unscheduled maintenance is thus a great costs factor for aircraft operators. This thesis describes three methods for aircraft health monitoring and prediction; one method for system monitoring, one method for forecasting of time series and one method that combines the two other methods for one complete monitoring and prediction process. Together the three methods allow the forecasting of possible failures. The two base methods use decision trees for decision making in the processes and genetic optimization to improve the performance of the decision trees and to reduce the need for human interaction. Decision trees have the advantage that the generated code can be fast and easily processed, they can be altered by human experts without much work and they are readable by humans. The human readability and modification of the results is especially important to include special knowledge and to remove errors, which the automated code generation produced. Condition Monitoring Condition Prediction Failure Prediction Decision Trees Genetic Algorithm Fuzzy Decision Tree Evaluation System Monitoring Aircraft Health Monitoring
139	應用資料採礦技術於電影市場研究 / Application of Data Mining Techniques to Film Market Research 蔡依庭, Tsai, Yi-Ting Unknown Date (has links) 就當前電影市場的現況來看，電影發行成本的節節升高，顧客需求的複雜多變，再加上電影消費集中化趨勢越趨明顯的事實，不論是從電影發行公司或是電影映演事業的角度來看，如何透過對於市場顧客需求、行為的解讀，清楚分隔市場，並為不同市場區隔設計不同的產品及行銷組合已經成了電影工業刻不容緩的課題。有鑑於此，本研究透過應用資料採礦之技術，選用四個決策樹（C&RT、QUEST、CHAID、C5.0）、邏輯斯迴歸以及類神經網路等方式進行模型建置，由於決策樹CHAID對於「是否去電影院看外片或國片」及「是否去電影院看電影」兩種不同的目標變數，其不論是在整體預測正確率、準確度、反查率，皆是高於其他模型，故最後兩個目標變數皆選擇CHAID此一模型，而目標變數為「是否去電影院看電影」之CHAID模型表現也較好，故主要以其結果為主。透過目標變數為「是否去電影院看電影」之CHAID模型，共獲得十三項影響「是否去電影院看電影」之相關變數，並根據分析結果，將電影市場顧客區分為最高貢獻顧客、一般貢獻顧客及低度貢獻顧客三類，將其歸納出並找出三種不同貢獻程度的顧客族群特性，而三種不同貢獻族群在「年齡」、「教育程度」、「娛樂文化支出」、「居住地區」、「是否上網瀏覽資訊網頁」、「是否上網蒐集資訊」、「是否會收看電視外片」、「是否看電視歐美影集」、「是否會說英文」、「是否上網線上觀賞影片」、「經濟富裕」、「即時行樂」均呈現顯著的差異，故本研究以不同貢獻程度族群特性為主，以看外片或國片之族群特性為輔，作為行銷策略建議之依據。 / Considering the current film market, the publication cost of a film is steadily increased. Meanwhile, customers have complicated requirements, and the trend of concentrated film consumption is gradually clear. For the perspective of both film companies and film broadcasting business, clear market segmentation after understanding customers’ needs and interpretation of customer behaviors to design different products and marketing combination for different markets are of great urgency for the general film industry. In view of this, the study aims to using four Decision Trees(C&RT, QUEST, CHAID, C5.0), Logistic Regression, and Artificial Neural Network to construct the model by applying Data Mining technology. Since Decision Tree-CHAID is excellent in the forecast accuracy, precision, and recall rate as compared to other models for response variables of going to the movies and going to foreign movies or Taiwan movies, the CHAID is adopted in this research for both response variables. The CHAID is more excellent for the response variable of going to the movies than the other, so use it as the main result. Through using Decision Tree-CHAID, this study identified thirteen factors that have greater impact on going to the movies. Based on the analysis results, this study induced the characteristics of three customer groups-the highest contribution customers, regular contribution customers and low contribution customers. Three different contribution groups shows significant differences at age, education, entertainment expenditure, living area, internet surfing, collecting information from internet, watch foreign movies, watch foreign drama, speak English, watch on-lines movies, affluent, and seize the day. This study mainly based on the characteristics of the three different groups, and group characteristic of going to foreign movies or Taiwan movies as auxiliary, to provide the marketing portfolio strategy recommendations. 電影資料採礦決策樹市場區隔 Film Data Mining Decision Tree Market Segmentation
140	Mapeamento digital de solos: Metodologias para atender a demanda por informação espacial em solos / Digital soil mapping: Methods to meet the demand for soil spatial information Caten, Alexandre Ten 07 November 2011 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Soil has increasingly being recognized as having an important role in ecosystems as well as for food production and global climate regulation. For this reason, the demand for relevant and updated information on soil is increasing. Digital Soil Mapping (DSM) provides this information at different spatial resolution with associated quality indicators. The aim of this study was to analyze the main methodological approaches used for DSM of soil classes through a literature review of national researches and to propose procedures for data analysis in DSM projects of soil classes. The use of DSM techniques for mapping soil classes in Brazil is recent, the first publication on this subject occurred only in 2006. Among the predictive functions, logistic regressions is the predominantly used technique. Quality evaluation of the predictive models employed error matrix and kappa index in most cases. The use of wavelet transform proved to be a methodology of great potential for analyzing the spatial resolution of terrain attributes maximum variability. The proposed methodology of data exclusion for environmental covariates located too near at the border of soil classes polygons has enabled the generation of less complex and more accurate Decision Tree (DT) models. It was also shown that the amount of data required for DT model training is between five and 15% of the total data set. Collected field observations indicated a predicted accuracy close to 70% for DT models produced by those sampling densities. / O solo é cada vez mais reconhecido como tendo um importante papel nos ecossistemas, assim como para a produção de alimentos e regulação do clima global. Por esse motivo, a demanda por informações relevantes e atualizadas em solos está em uma crescente. O Mapeamento Digital de Solos (MDS) possibilita gerar essas informações demandadas em diferentes resoluções espaciais e com indicadores de qualidade associados. O objetivo deste estudo foi analisar as principais abordagens metodológicas utilizadas nos mapeamentos digitais de classes de solos através de uma revisão de literatura dos trabalhos nacionais, assim como propor procedimentos para a análise dos dados a serem utilizados em projetos de mapeamento digital de classes de solos. O emprego de técnicas de MDS para o mapeamento de classes de solos é recente no país, a primeira publicação nesse sentido ocorreu apenas em 2006. Entre as funções preditivas utilizadas predomina o emprego da técnica de regressões logísticas. Quanto à avaliação da qualidade dos modelos preditivos o emprego da matriz de erros e do índice kappa têm sido os procedimentos mais usuais. O emprego da transformada wavelet mostrou-se como uma metodologia de grande potencial para a análise da resolução espacial de máxima variabilidade de atributos de terreno a serem usados em projetos de MDS. A metodologia proposta de exclusão dos dados oriundos de covariáveis ambientais localizadas na bordas dos polígonos de solos possibilitou a geração de modelos por Árvore de Decisão (AD) menos complexos e mais precisos. Assim como o volume de dados necessários para o treinamento de modelos preditivos por AD está entre cinco e 15% do conjunto total de dados como mostrou este estudo. Observações coletadas a campo indicaram uma acurácia dos mapas preditos próxima a 70% para os modelos oriundos dessas densidades de amostragem. Ondaleta Árvore de decisão Pedometria Levantamento de solos Wavelet Decision tree Pedometric Soil survey

Search results