Spelling suggestions: "subject:"[een] DECISION TREE"" "subject:"[enn] DECISION TREE""
131 |
Protein Tertiary Model Assessment Using Granular Machine Learning TechniquesChida, Anjum A 21 March 2012 (has links)
The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM.
Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment.
Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same.
|
132 |
Detecting Swiching Points and Mode of Transport from GPS TracksAraya, Yeheyies January 2012 (has links)
In recent years, various researches are under progress to enhance the quality of the travel survey. These researches were mainly performed with the aid of GPS technology. Initially the researches were mainly focused on the vehicle travel mode due to the availability of GPS technology in vehicle. But, nowadays due to the accessible of GPS devices for personal uses, researchers have diverted their focus on personal mobility in all travel modes. This master’s thesis aimed at developing a mechanism to extract one type of travel survey information particularly travel mode from collected GPS dataset. The available GPS dataset is collected for travel modes of walk, bike, car, and public transport travel modes such as bus, train and subway. The developed procedure consists of two stages where the first is the dividing the track trips into trips and further the trips into segments by means of a segmentation process. The segmentation process is based on an assumption that a traveler switches from one transportation mode to the other. Thus, the trips are divided into walking and non walking segments. The second phase comprises a procedure to develop a classification model to infer the separated segments with travel modes of walk, bike, bus, car, train and subway. In order to develop the classification model, a supervised classification method has been used where decision tree algorithm is adopted. The highest obtained prediction accuracy of the classification system is walk travel mode with 75.86%. In addition, the travel modes of bike and bus have shown the lowest prediction accuracy. Moreover, the developed system has showed remarkable results that could be used as baseline for further similar researches.
|
133 |
Automatic Construction Algorithms for Supervised Neural Networks and ApplicationsTsai, Hsien-Leing 28 July 2004 (has links)
The reseach on neural networks has been done for six decades. In this period, many neural models and learning rules have been proposed. Futhermore, they were popularly and successfully applied to many applications. They successfully solved many problems that traditional algorithms could not solve efficiently .
However, applying multilayer neural networks to applications, users are confronted with the problem of determining the number of hidden layers and the number of hidden neurons in each hidden layer. It is too difficult for users to determine proper neural network architectures. However, it is very significant, because neural network architectures always influence critically their performance. We may solve problems efficiently, only when we has proper neural network architectures.
To overcome this difficulty, several approaches have been proposed to generate the architecture of neural networks recently. However, they still have some drawbacks. The goal of our research is to discover better approachs to automatically determine proper neural network architectures. We propose a series of approaches in this thesis. First, we propose an approach based on decision trees. It successfully determines neural network architectures and greatly decreases learning time. However, it can deal only with two-class problems and it generates bigger neural network architectures. Next, we propose an information entropy based approach to overcome the above drawbacks. It can generate easily multi-class neural networks for standard domain problems. Finally, we expand the above method for sequential domain and structured domain problems. Therefore, our approaches can be applied to many applications. Currently, we are trying to work on quantum neural networks.
We are also interested in ART neural networks. They are also incremental neural models. We apply them to digital signal processing. We propose a character recognition application, a spoken word recognition application, and an image compression application. All of them have good performances.
|
134 |
An Improved C-Fuzzy Decision Tree and its Application to Vector QuantizationChiu, Hsin-Wei 27 July 2006 (has links)
In the last one hundred years, the mankind has invented a lot of convenient tools for pursuing beautiful and comfortable living environment. Computer is one of the most important inventions, and its operation ability is incomparable with the mankind. Because computer can deal with a large amount of data fast and accurately, people use this advantage to imitate human thinking. Artificial intelligence is developed extensively. Methods, such as all kinds of neural networks, data mining, fuzzy logic, etc., apply to each side fields (ex: fingerprint distinguishing, image compressing, antennal designing, etc.). We will probe into to prediction technology according to the decision tree and fuzzy clustering. The fuzzy decision tree proposed the classification method by using fuzzy clustering method, and then construct out the decision tree to predict for data. However, in the distance function, the impact of the target space was proportional inversely. This situation could make problems in some dataset. Besides, the output model of each leaf node represented by a constant restricts the representation capability about the data distribution in the node. We propose a more reasonable definition of the distance function by considering both input and target differences with weighting factor. We also extend the output model of each leaf node to a local linear model and estimate the model parameters with a recursive SVD-based least squares estimator. Experimental results have shown that our improved version produces higher recognition rates and smaller mean square errors for classification and regression problems, respectively.
|
135 |
Enhancing Accuracy Of Hybrid Recommender Systems Through Adapting The Domain TrendsAksel, Fatih 01 September 2010 (has links) (PDF)
Traditional hybrid recommender systems typically follow a manually created fixed prediction strategy in their decision making process. Experts usually design these static strategies as fixed combinations of different techniques. However, people' / s tastes and desires are temporary and they gradually evolve. Moreover, each domain has unique characteristics, trends and unique user interests. Recent research has mostly focused on static hybridization schemes which do not change at runtime. In this thesis work, we describe an adaptive hybrid recommender system, called AdaRec that modifies its attached prediction strategy at runtime according to the performance of prediction techniques (user feedbacks). Our approach to this problem is to use adaptive prediction strategies. Experiment results with datasets show that our system outperforms naive hybrid recommender.
|
136 |
Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common LanguagesJarman, Jay 01 January 2011 (has links)
This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual methods. The methods are evaluated on an extensive set of real clinical notes annotated by experts to provide benchmark results.
There are two main research question for this dissertation. First, can information (specialized language) be extracted from clinical progress notes that will represent the notes without loss of predictive information? Secondly, can classifiers be built for clinical progress notes that are represented by specialized language? Three experiments were conducted to answer these questions by investigating some specific challenges with regard to extracting information from the unstructured clinical notes and classifying documents that are so important in the medical domain.
The first experiment addresses the first research question by focusing on whether relevant patterns within clinical notes reside more in the highly technical medically-relevant terminology or in the passages expressed by common language. The results from this experiment informed the subsequent experiments. It also shows that predictive patterns are preserved by preprocessing text documents with a grammatical NLP system that separates specialized language from common language and it is an acceptable method of data reduction for the purpose of STM.
Experiments two and three address the second research question. Experiment two focuses on applying rule-mining techniques to the output of the information extraction effort from experiment one, with the ultimate goal of creating rule-based classifiers. There are several contributions of this experiment. First, it uses a novel approach to create classification rules from specialized language and to build a classifier. The data is split by classification and then rules are generated. Secondly, several toolkits were assembled to create the automated process by which the rules were created. Third, this automated process created interpretable rules and finally, the resulting model provided good accuracy. The resulting performance was slightly lower than from the classifier from experiment one but had the benefit of having interpretable rules.
Experiment three focuses on using decision tree induction (DTI) for a rule discovery approach to classification, which also addresses research question three. DTI is another rule centric method for creating a classifier. The contributions of this experiment are that DTI can be used to create an accurate and interpretable classifier using specialized language. Additionally, the resulting rule sets are simple and easily interpretable, as well as created using a highly automated process.
|
137 |
Multi-Temporal Crop Classification Using a Decision Tree in a Southern Ontario Agricultural RegionMelnychuk, Amie 03 October 2012 (has links)
Identifying landuse management practices is important for detecting landuse change and impacts on the surrounding landscape. The Ontario Ministry of Agriculture and Rural A airs has established a database product called the Agricultural Resource Inventory (AgRI), which is used for the storage and analysis of agricultural land management practices. This thesis explores the opportunity to populate the AgRI. A comparison of two supervised classi fications using optical satellite imagery with multiple single-date classifi cations and a subsequent multi-date, multi-sensor classi fication were used to gauge the best image timing for crop classi fication. In this study optical satellite images (Landsat-5 and SPOT-4/5) were inputted into a decision tree classifi er and Maximum Likelihood Classifi er (MLC) where the decision tree performed better than the MLC in overall and class accuracies. Classifi cation experienced complications from visual diff erences in vegetation. The multi-date classifi cation performed had an accuracy of 66.52%. The lack of imagery available at crop ripening stages reduced the accuracies greatly.
|
138 |
Predictive Health Monitoring for Aircraft Systems using Decision TreesGerdes, Mike January 2014 (has links)
Unscheduled aircraft maintenance causes a lot problems and costs for aircraft operators. This is due to the fact that aircraft cause significant costs if flights have to be delayed or canceled and because spares are not always available at any place and sometimes have to be shipped across the world. Reducing the number of unscheduled maintenance is thus a great costs factor for aircraft operators. This thesis describes three methods for aircraft health monitoring and prediction; one method for system monitoring, one method for forecasting of time series and one method that combines the two other methods for one complete monitoring and prediction process. Together the three methods allow the forecasting of possible failures. The two base methods use decision trees for decision making in the processes and genetic optimization to improve the performance of the decision trees and to reduce the need for human interaction. Decision trees have the advantage that the generated code can be fast and easily processed, they can be altered by human experts without much work and they are readable by humans. The human readability and modification of the results is especially important to include special knowledge and to remove errors, which the automated code generation produced.
|
139 |
應用資料採礦技術於電影市場研究 / Application of Data Mining Techniques to Film Market Research蔡依庭, Tsai, Yi-Ting Unknown Date (has links)
就當前電影市場的現況來看,電影發行成本的節節升高,顧客需求的複雜多變,再加上電影消費集中化趨勢越趨明顯的事實,不論是從電影發行公司或是電影映演事業的角度來看,如何透過對於市場顧客需求、行為的解讀,清楚分隔市場,並為不同市場區隔設計不同的產品及行銷組合已經成了電影工業刻不容緩的課題。
有鑑於此,本研究透過應用資料採礦之技術,選用四個決策樹(C&RT、QUEST、CHAID、C5.0)、邏輯斯迴歸以及類神經網路等方式進行模型建置,由於決策樹CHAID對於「是否去電影院看外片或國片」及「是否去電影院看電影」兩種不同的目標變數,其不論是在整體預測正確率、準確度、反查率,皆是高於其他模型,故最後兩個目標變數皆選擇CHAID此一模型,而目標變數為「是否去電影院看電影」之CHAID模型表現也較好,故主要以其結果為主。
透過目標變數為「是否去電影院看電影」之CHAID模型,共獲得十三項影響「是否去電影院看電影」之相關變數,並根據分析結果,將電影市場顧客區分為最高貢獻顧客、一般貢獻顧客及低度貢獻顧客三類,將其歸納出並找出三種不同貢獻程度的顧客族群特性,而三種不同貢獻族群在「年齡」、「教育程度」、「娛樂文化支出」、「居住地區」、「是否上網瀏覽資訊網頁」、「是否上網蒐集資訊」、「是否會收看電視外片」、「是否看電視歐美影集」、「是否會說英文」、「是否上網線上觀賞影片」、「經濟富裕」、「即時行樂」均呈現顯著的差異,故本研究以不同貢獻程度族群特性為主,以看外片或國片之族群特性為輔,作為行銷策略建議之依據。 / Considering the current film market, the publication cost of a film is steadily increased. Meanwhile, customers have complicated requirements, and the trend of concentrated film consumption is gradually clear. For the perspective of both film companies and film broadcasting business, clear market segmentation after understanding customers’ needs and interpretation of customer behaviors to design different products and marketing combination for different markets are of great urgency for the general film industry.
In view of this, the study aims to using four Decision Trees(C&RT, QUEST, CHAID, C5.0), Logistic Regression, and Artificial Neural Network to construct the model by applying Data Mining technology. Since Decision Tree-CHAID is excellent in the forecast accuracy, precision, and recall rate as compared to other models for response variables of going to the movies and going to foreign movies or Taiwan movies, the CHAID is adopted in this research for both response variables. The CHAID is more excellent for the response variable of going to the movies than the other, so use it as the main result.
Through using Decision Tree-CHAID, this study identified thirteen factors that have greater impact on going to the movies. Based on the analysis results, this study induced the characteristics of three customer groups-the highest contribution customers, regular contribution customers and low contribution customers. Three different contribution groups shows significant differences at age, education, entertainment expenditure, living area, internet surfing, collecting information from internet, watch foreign movies, watch foreign drama, speak English, watch on-lines movies, affluent, and seize the day. This study mainly based on the characteristics of the three different groups, and group characteristic of going to foreign movies or Taiwan movies as auxiliary, to provide the marketing portfolio strategy recommendations.
|
140 |
Mapeamento digital de solos: Metodologias para atender a demanda por informação espacial em solos / Digital soil mapping: Methods to meet the demand for soil spatial informationCaten, Alexandre Ten 07 November 2011 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Soil has increasingly being recognized as having an important role in ecosystems as
well as for food production and global climate regulation. For this reason, the demand for
relevant and updated information on soil is increasing. Digital Soil Mapping (DSM) provides
this information at different spatial resolution with associated quality indicators. The aim of
this study was to analyze the main methodological approaches used for DSM of soil classes
through a literature review of national researches and to propose procedures for data analysis
in DSM projects of soil classes. The use of DSM techniques for mapping soil classes in Brazil
is recent, the first publication on this subject occurred only in 2006. Among the predictive
functions, logistic regressions is the predominantly used technique. Quality evaluation of the
predictive models employed error matrix and kappa index in most cases. The use of wavelet
transform proved to be a methodology of great potential for analyzing the spatial resolution of
terrain attributes maximum variability. The proposed methodology of data exclusion for
environmental covariates located too near at the border of soil classes polygons has enabled
the generation of less complex and more accurate Decision Tree (DT) models. It was also
shown that the amount of data required for DT model training is between five and 15% of the
total data set. Collected field observations indicated a predicted accuracy close to 70% for DT
models produced by those sampling densities. / O solo é cada vez mais reconhecido como tendo um importante papel nos
ecossistemas, assim como para a produção de alimentos e regulação do clima global. Por esse
motivo, a demanda por informações relevantes e atualizadas em solos está em uma crescente.
O Mapeamento Digital de Solos (MDS) possibilita gerar essas informações demandadas em
diferentes resoluções espaciais e com indicadores de qualidade associados. O objetivo deste
estudo foi analisar as principais abordagens metodológicas utilizadas nos mapeamentos
digitais de classes de solos através de uma revisão de literatura dos trabalhos nacionais, assim
como propor procedimentos para a análise dos dados a serem utilizados em projetos de
mapeamento digital de classes de solos. O emprego de técnicas de MDS para o mapeamento
de classes de solos é recente no país, a primeira publicação nesse sentido ocorreu apenas em
2006. Entre as funções preditivas utilizadas predomina o emprego da técnica de regressões
logísticas. Quanto à avaliação da qualidade dos modelos preditivos o emprego da matriz de
erros e do índice kappa têm sido os procedimentos mais usuais. O emprego da transformada
wavelet mostrou-se como uma metodologia de grande potencial para a análise da resolução
espacial de máxima variabilidade de atributos de terreno a serem usados em projetos de MDS.
A metodologia proposta de exclusão dos dados oriundos de covariáveis ambientais
localizadas na bordas dos polígonos de solos possibilitou a geração de modelos por Árvore de
Decisão (AD) menos complexos e mais precisos. Assim como o volume de dados necessários
para o treinamento de modelos preditivos por AD está entre cinco e 15% do conjunto total de
dados como mostrou este estudo. Observações coletadas a campo indicaram uma acurácia dos
mapas preditos próxima a 70% para os modelos oriundos dessas densidades de amostragem.
|
Page generated in 0.058 seconds