Global ETD Search

61	Comparing Compound and Ordinary Diversity measures Using Decision Trees. Gangadhara, Kanthi, Reddy Dubbaka, Sai Anusha January 2011 (has links) An ensemble of classifiers succeeds in improving the accuracy of the whole when thecomponent classifiers are both diverse and accurate. Diversity is required to ensure that theclassifiers make uncorrelated errors. Theoretical and experimental approaches from previousresearch show very low correlation between ensemble accuracy and diversity measure.Introducing Proposed Compound diversity functions by Albert Hung-Ren KO and RobertSabourin, (2009), by combining diversities and performances of individual classifiers exhibitstrong correlations between the diversities and accuracy. To be consistent with existingarguments compound diversity of measures are evaluated and compared with traditionaldiversity measures on different problems. Evaluating diversity of errors and comparison withmeasures are significant in this study. The results show that compound diversity measuresare better than ordinary diversity measures. However, the results further explain evaluation ofdiversity of errors on available data. / Program: Magisterutbildning i informatik machine learning ensemble diversity decision trees compound diversity classifiers data mining Engineering and Technology Teknik och teknologier
62	Performance financeira da carteira na avaliação de modelos de análise e concessão de crédito: uma abordagem baseada em aprendizagem estatística / Financial performance portfolio to evaluate and select analyses and credit models: An approach based on Statistical Learning Silva, Rodrigo Alves 05 September 2014 (has links) Os modelos de análise e decisão de concessão de crédito buscam associar o perfil do tomador de crédito à probabilidade do não pagamento de obrigações contraídas, identificando assim o risco associado ao tomador e auxiliando a firma a decidir pela aprovação ou negação da solicitação de crédito. Atualmente este campo de pesquisa tem ganhado importância no cenário nacional - pela intensificação da atividade de crédito no país com grande participação dos bancos públicos neste processo - e internacional - pelo aumento das preocupações com potenciais danos à economia derivados de eventos de default. Tal quadro fez com que fossem construídos e adaptados diversos modelos e métodos à análise de risco de crédito tanto para consumidores como para empresas. Estes modelos são testados e comparados com base na acurácia de previsão ou de métricas de otimização estatística. Este é um procedimento que pode não se mostrar eficiente do ponto de vista financeiro, ao mesmo tempo em que dificulta a interpretação e tomada de decisão por parte da firma quanto a qual o melhor modelo, gerando uma lacuna pelo desprendimento observado entre a decisão de qual o modelo a ser adotado e o objetivo financeiro da empresa. Tendo em vista que o desempenho financeiro é um dos principais indicadores de qualquer procedimento gerencial, o presente estudo objetivou preencher a esta lacuna analisando o desempenho financeiro de carteiras de crédito formadas por técnicas de aprendizagem estatística utilizadas atualmente na classificação e análise de risco de crédito em pesquisas nacionais e internacionais. A pesquisa selecionou as técnicas: análise discriminante, regressão logística, redes bayesianas Naïve Bayes, kdB-1, kdB-2, SVC e SVM e aplicou tais técnicas junto à base de dados German Credit Data Set. Os resultados foram analisados e comparados inicialmente em termos de acurácia e custos por erro de classificação. Adicionalmente a pesquisa propôs o emprego de quatro métricas financeiras (RFC, PLR, RAROC e IS), encontrando variações quanto aos resultados produzidos por cada técnica. Estes resultados sugerem variações quanto a sequência de eficiência e consequentemente de emprego das técnicas, demonstrando a importância da consideração destas métricas para a análise e decisão de seleção de modelos de classificação ótimos. / Analysis and decision credit concession models search for relating the borrower\'s credit profile to the nonpayment probability of their obligations, identifying risks related to borrower and helping decision firm to approve or deny the credit request. Currently this search field has increased in Brazilian scenario - by credit activity intensification into the country with a large public banks sharing - and in the international scenario - by growing concerns about economy potential damages resulting from default events. This position leads the construction and adaptation of several models and methods by credit risk analysis from both consumers and companies. These models have been tested and compared based on prediction of accuracy or other statistical optimization metrics. This proceed is eventually not effective when analyzed by a financial standpoint, in the same time that affects the understanding and decision of the enterprise about the best model, creating a gap in the decision model choice and the firm financial goals. Given that the financial performance is a foremost indicator of any management procedure, this study aimed to address this gap by the financial performance analysis of loan portfolios formed by statistical learning techniques currently used in the classification and credit risk analysis in national and international researches. The selected techniques (discriminant analysis, logistic regression, Bayesian networks Naïve Bayes , 1 - KDB , KDB - 2 , SVC and SVM) were applied to the German Credit Data Set and their results were initially analyzed and compared in terms of accuracy and misclassification costs. Regardless of these metrics the research has proposed to use four financial metrics (RFC, PLR, RAROC and IS), finding variations in the results of each statistical learning techniques. These results suggest variations in the sequence of efficiency and, ultimately, techniques choice, demonstrating the importance of considering these metrics for analysis and selection of decision models of optimal classification. Aprendizagem Estatística Classificadores Classifiers Credit risk Desempenho Financeiro Financial Performance Risco de crédito Statistical Learning
63	Automated retrieval and extraction of training course information from unstructured web pages Xhemali, Daniela January 2010 (has links) Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance.
64	Instrument classifier predicates in Tianjin sign language. January 2011 (has links) He, Jia. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (p. [150-154] ). / Abstracts in English and Chinese. / Acknowledgements --- p.i / Abstract --- p.iii / 摘要 --- p.iv / Chapter CHAPTER ONE --- INTRODUCTION --- p.5 / Chapter 1.1 --- Introduction --- p.5 / Chapter 1.2 --- Classifiers in natural languages --- p.7 / Chapter 1.2.1 --- Classifier systems in spoken languages --- p.7 / Chapter 1.2.2 --- Classifiers in signed languages --- p.10 / Chapter 1.3 --- Instruments in spoken languages --- p.13 / Chapter 1.4 --- Objectives of the study --- p.18 / Chapter 1.5 --- Research questions --- p.19 / Chapter 1.6 --- Organization of the thesis --- p.20 / Chapter CHAPTER TWO --- CLASSIFIERS IN SIGNED LANGUAGES --- p.22 / Chapter 2.1 --- Introduction --- p.22 / Chapter 2.2 --- Classifier predicates in signed languages --- p.22 / Chapter 2.2.1 --- Classification of classifier handshape unit in classifier predicates --- p.22 / Chapter 2.2.2 --- Classification of movement unit in classifier predicates --- p.25 / Chapter 2.3 --- Previous formal analyses on classifier predicates in signed languages --- p.28 / Chapter 2.3.1 --- Meir's (2001) noun incorporation analysis in Israel Sign Language --- p.28 / Chapter 2.3.2 --- Zwitserlood's (2003) analysis of verbs of motion and location in NGT --- p.29 / Chapter 2.3.3 --- Benedicto and Brentari's (2004) syntactic analysis of classifier predicates in ASL --- p.31 / Chapter 2.3.4 --- Some previous attempts to analyze classifier predicates in HKSL --- p.33 / Chapter 2.4 --- Interim discussion and conclusion --- p.36 / Chapter CHAPTER THREE --- RESEARCH METHODOLOGY --- p.37 / Chapter 3.1 --- Introduction --- p.37 / Chapter 3.2 --- Background of Tianjin Sign Language --- p.37 / Chapter 3.3 --- Data collection --- p.38 / Chapter 3.3.1 --- Consultants --- p.38 / Chapter 3.3.2 --- Elicitation materials --- p.39 / Chapter 3.3.2.1 --- "Movies: ""Tweety and Sylvester""" --- p.39 / Chapter 3.3.2.2 --- Picture stories --- p.40 / Chapter 3.3.2.3 --- Simple picture descriptions --- p.41 / Chapter 3.3.3 --- Elicitation tasks and procedures --- p.46 / Chapter 3.3.4 --- Transcription method --- p.47 / Chapter 3.4 --- Interim discussion and conclusion --- p.47 / Chapter CHAPTER FOUR --- RESULTS AND DATA DESCRIPTION --- p.48 / Chapter 4.1 --- Introduction --- p.48 / Chapter 4.2 --- Inventory of handshapes for instrument classifier predicates --- p.48 / Chapter 4.3 --- Classifier handshape and predicate types --- p.66 / Chapter 4.4 --- Interim discussion and conclusion --- p.68 / Chapter CHAPTER FIVE --- THEORETICAL BACKGROUNDS --- p.69 / Chapter 5.1 --- Distributed Morphology --- p.69 / Chapter 5.1.1 --- An overview of Distributed Morphology --- p.69 / Chapter 5.1.2 --- The concept of morpheme in DM --- p.73 / Chapter 5.1.3 --- Cyclic domain in DM --- p.73 / Chapter 5.1.4 --- Why reject Lexicalism? --- p.74 / Chapter 5.1.5 --- Interim discussion and conclusion --- p.77 / Chapter 5.2 --- Capturing 3-place predicates in syntax --- p.79 / Chapter 5.2.1 --- Larson's VP-shell analysis (1988) --- p.79 / Chapter 5.2.2 --- "Pylkannen's analysis (2002, 2008)" --- p.83 / Chapter 5.2.2.1 --- Introduction of non-core arguments --- p.83 / Chapter 5.2.2.2 --- Applicatives in natural languages --- p.84 / Chapter 5.2.3 --- Interim Discussion and conclusion --- p.87 / Chapter CHAPTER SIX --- FORMAL ANALYSIS OF INSTRUMENT CLASSIFIER PREDICATES IN TJSL --- p.89 / Chapter 6.1 --- Introduction --- p.89 / Chapter 6.2 --- Morphosyntactic properties of instrument classifier predicates in TJSL --- p.89 / Chapter 6.2.1 --- Two types of three-place classifier predicates in TJSL --- p.89 / Chapter 6.2.2 --- Handling classifier handshape and agentivity --- p.93 / Chapter 6.2.3 --- Signer's body encodes agentivity --- p.96 / Chapter 6.2.3.1 --- Grammatical function of the signer's body --- p.97 / Chapter 6.2.3.2 --- Test for argument status of signer's body --- p.100 / Chapter 6.2.4 --- Classifier handshape and noun class system --- p.107 / Chapter 6.2.4.1 --- Classifier handshape and gender system --- p.109 / Chapter 6.2.4.2 --- Instrument classifier handshapes: unifying gender system and noun classes --- p.110 / Chapter 6.2.4.2.1 --- Variation in the choice of classifier handshape in instrument classifier predicates in TJSL --- p.110 / Chapter 6.2.4.2.2 --- Classifier handshape and ^-feature specification --- p.113 / Chapter 6.2.4.2.3 --- Locationalization of classifier handshapes in space --- p.118 / Chapter 6.3 --- Structural representation of instrument classifier predicates --- p.120 / Chapter 6.3.1 --- Voice0 and volitional external argument in instrument classifier predicates --- p.120 / Chapter 6.3.2 --- Instrument as high applicative --- p.121 / Chapter 6.3.3 --- How instrument classifier predicates are derived in TJSL? --- p.125 / Chapter 6.4 --- Interim discussion and conclusion --- p.138 / Chapter CHAPTER SEVEN --- CONCLUSIONS --- p.139 / Chapter 7.1 --- Summery --- p.139 / Chapter 7.2 --- Theoretical implications --- p.140 / List of tables / Appendix I / Appendix II / References Chinese sign language Chinese sign language--China--Tianjin Sign language--Classifiers Sign language--Syntax
65	The role of confidence and diversity in dynamic ensemble class prediction systems Sağlam, Şenay Yaşar 01 July 2015 (has links) Classification is a data mining problem that arises in many real-world applications. A popular approach to tackle these classification problems is using an ensemble of classifiers that combines the collective knowledge of several classifiers. Most popular methods create a static ensemble, in which a single ensemble is constructed or chosen from a pool of classifiers and used for all new data instances. Two factors that have been frequently used to construct a static ensemble are the accuracy of and diversity among the individual classifiers. There have been many studies investigating how these factors should be combined and how much diversity is required to increase the ensemble's performance. These results have concluded that it is not trivial to build a static ensemble that generalizes well. Recently, a different approach has been undertaken: dynamic ensemble construction. Using a different set of classifiers for each new data instance rather than a single static ensemble of classifiers may increase performance since the dynamic ensemble is not required to generalize across the feature space. Most studies on dynamic ensembles focus on classifiers' competency in the local region in which a new data instance resides or agreement among the classifiers. In this thesis, we propose several other approaches for dynamic class prediction. Existing methods focus on assigned labels or their correctness. We hypothesize that using the class probability estimates returned by the classifiers can enhance our estimate of the competency of classifiers on the prediction. We focus on how to use class prediction probabilities (confidence) along with accuracy and diversity to create dynamic ensembles and analyze the contribution of confidence to the system. Our results show that confidence is a significant factor in the dynamic setting. However, it is still unclear how accurate, diverse, and confident ensemble can best be formed to increase the prediction capability of the system. Second, we propose a system for dynamic ensemble classification based on a new distance measure to evaluate the distance between data instances. We first map data instances into a space defined by the class probability estimates from a pool of two-class classifiers. We dynamically select classifiers (features) and the k-nearest neighbors of a new instance by minimizing the distance between the neighbors and the new instance in a two-step framework. Results of our experiments show that our measure is effective for finding similar instances and our framework helps making more accurate predictions. Classifiers' agreement in the region where a new data instance resides has been considered a major factor in dynamic ensembles. We postulate that the classifiers chosen for a dynamic ensemble should behave similarly in the region in which the new instance resides, but differently outside of this area. In other words, we hypothesize that high local accuracy, combined with high diversity in other regions, is desirable. To verify the validity of this hypothesis we propose two approaches. The first approach focuses on finding the k-nearest data instances to the new instance, which then defines a neighborhood, and maximizes simultaneously local accuracy and distant diversity, based on data instances outside of the neighborhood. The second method considers all data instances to be in the neighborhood, and assigns them weights depending on the distance to the new instance. We demonstrate through several experiments that weighted distant diversity and weighted local accuracy outperform all benchmark methods. publicabstract Classifiers Confidence Data Mining Diversity Dynamic Class Prediction Dynamic Ensembles
66	Use of Random Subspace Ensembles on Gene Expression Profiles in Survival Prediction for Colon Cancer Patients Kamath, Vidya 04 November 2005 (has links) Cancer is a disease process that emerges out of a series of genetic mutations that cause seemingly uncontrolled multiplication of cells. The molecular genetics of cells indicates that different combinations of genetic events or alternative pathways in cells may lead to cancer. A study of the gene expressions of cancer cells, in combination with the external influential factors, can greatly aid in cancer management such as understanding the initiation and etiology of cancer, as well as detection, assessment and prediction of the progression of cancer. Gene expression analysis of cells yields a very large number of features that can be used to describe the condition of the cell. Feature selection methods are explored to choose the best of these features that are most relevant to the problem at hand. Random subspace ensembles created using these selected features perform poorly in predicting the 36-month survival for colon cancer patients. A modification to the random subspace scheme is proposed to enhance the accuracy of prediction. The method first applies random subspace ensembles with decision trees to select predictive features. Then, support vector machines are used to analyze the selected gene expression profiles in cancer tissue to predict the survival outcome for a patient. The proposed method is shown to achieve a weighted accuracy of 58.96%, with 40.54% sensitivity and 77.38% specificity in predicting 36-month survival for new and unknown colon cancer patients. The prediction accuracy of the method is comparable to the baseline classifiers and significantly better than random subspace ensembles on gene expression profiles of colon cancer. Microarray Bioinformatics Data mining Feature selection Classifiers American Studies Arts and Humanities
67	Análise e propostas de procedimentos técnicos para a elaboração de mapas de paisagem aplicados no planejamento ambiental da RMSP / Analyzes and proposals of techniques procedures for development of landscape maps applied in the environmental planning of the RMSP Souza, Waldir Wagner Campos de 07 February 2019 (has links) Na Geografia ocorreu uma convergência teórica e metodológica entre a paisagem e o Sensoriamento Remoto. A paisagem é um conceito que orienta análises sistêmicas, com resultados reais e visíveis, definidos por diferentes escalas de interações dos componentes. Tais dimensões explícitas são categorias também presentes no Sensoriamento Remoto, que utiliza técnicas e equipamentos para detectar objetos e interpretar componentes naturais e antrópicos em modelos e mapas temáticos. A Ecologia da Paisagem analisa mapas e dados produzidos pela associação entre classes de cobertura da terra ou de métricas de estrutura da paisagem que indicam as condições atuais, a fragilidade e cenários futuros, destacando a aptidão e conflitos com o uso antrópico. Fornecendo uma abordagem que integra padrões espaciais e processos ecológicos aplicada em estratégias de conservação, no planejamento ambiental e ordenamento territorial. Os mapas são a base para obtenção dos resultados, sendo indispensável garantir uma acurácia que evite erros e proposições incorretas. Os avanços técnicos, os esforços para identificar manualmente classes de cobertura da terra e a menor necessidade de interferência dos analistas tornaram comum a utilização de classificadores automáticos de imagens de satélite. A ausência de protocolos e testes que oriente ou direcione a escolha dos classificadores, considerando os atributos da paisagem, conduzem a análises incorretas ou frágeis. O nosso objetivo foi analisar a adequação de procedimentos metodológicos do Sensoriamento Remoto e da Ecologia da Paisagem em imagens de alta resolução, do satélite RapidEye. Selecionamos uma amostra da paisagem da Região Metropolitana de São Paulo (APRMSP), fizemos uma classificação visual e aplicamos cinco classificadores baseados em pixel, supervisionados e não-supervisionados: Mahalanobis Distance, Maximum Likelihood, Spectral Angle Mapper, Support Vector Machines e o Iterative Self-Organizing Data Analysis (ISODATA). Calculamos a acurácia geral das classificações, utilizando a matriz de confusão e a estatística Kappa, e a performance dos classificadores com a técnica de votação. Os mapas da Classificação Visual e da Classificação ISODATA, que obteve a maior acurácia geral (78,1%), foram simplificados floresta e não-floresta para o cálculo das métricas nas categorias de mancha, classe, forma e teoria dos grafos. Na classificação visual as unidades mais difíceis de separar foram os pares Solo Exposto e Área Urbanizada, Floresta e Reflorestamento. Nas amostras de referência dos classificadores o menor índice de separação foi entre Reflorestamento e Floresta, Solo Exposto e Área Urbanizada, Área Úmida e Campo e os pares com índices mais elevados foram: Área Úmida e Água, Água e Campo, Água e Floresta. Verificamos que a técnica de votação pode ser utilizada para selecionar um classificador por unidade focal, tendo o ISODATA alcançado o índice mais elevado (83,9%) para Floresta. Os resultados do Mapa da Classificação ISODATA da APRMSP prejudicaram a análise das métricas de Área da Mancha, Índice de Forma, Número de Manchas e Índice Integral de Conectividade. A Porcentagem de Habitat e as tendências gerais indicadas pelo agrupamento das classes dos mapas não foram prejudicadas. A utilização desses classificadores dependerá da melhora da acurácia geral ou do índice de acerto, por meio de classificações híbridas ou da combinação de classificadores. / In Geography there was a theoretical and methodological convergence between landscape and the Remote Sensing. Landscape is a concept that guides systemic analyzes, with real and visible results, defined by different scales of component interactions. These explicit dimensions are also present in Remote Sensing which uses techniques and equipments to detect objects and interpret natural and anthropic components in models and thematic maps. Landscape Ecology analyzes maps and data produced by the association between land cover classes or landscape structure metrics that indicate current conditions, fragility and future scenarios, highlighting the aptitude and conflicts with anthropic use. Providing an approach that integrates spatial patterns and ecological processes applied in conservation strategies, environmental planning and spatial planning. The maps are the basis for obtaining results, being indispensable make sure an accuracy that avoids errors and incorrect propositions. Technical advances, efforts to manually identify land cover classes and the less need analysts decision become common to use automatic classifiers of satellite imagery. The lack of protocols and tests to guide or direct the choice of classifiers, considering landscape attributes, lead to incorrect or weak analyzes. Our goal was to analyze the adequacy of Remote Sensing and Landscape Ecology methodological procedures in high resolution RapidEye satellite images. We selected a landscape sample of the Metropolitan Region of São Paulo (APRMSP), and we did a visual classification and applied supervised and unsupervised five pixel-based classifiers: Mahalanobis Distance, Maximum Likelihood, Spectral Angle Mapper, Support Vector Machines and Iterative Self-Organizing Data Analysis (ISODATA). We calculated the overall accuracy of classifications using confusion matrix and Kappa statistics, and the classifiers performance by voting technique. The maps of Visual Classification and ISODATA Classification, which had the highest accuracy (78,1%), were simplified in forest and non-forest for the landscape structure metrics calculation in the categories of patch, class, shape and graph theory. In the visual classification the most difficult units to separate were the pairs Exposed Soil and Urbanized Area, Forest and Reforestation. In the reference samples of classifiers, the lowest separate index was between Reforestation and Forest, Exposed Soil and Urbanized Area, Wetland and Field and highest separate index pairs were: Wetlands and Water, Water and Field, Water and Forest. We verified that voting technique can be used to select a classifier considering a focal unit, such as ISODATA that obtained the highest index (83.9 %) for the Forest unit. The results of the ISODATA Classification Map of APRMSP impaired the analysis of Patch Size, Shape Index, Number of Patches and Integral Connectivity Index metrics. The Habitat Proportion and the general trends indicated by grouping of map classes were not impaired. The use of classifiers will depend on the overall accuracy or the rightness index improvement, through hybrid classifications or the classifiers combinations. Classificadores de imagem Ecologia da paisagem Images classifiers Landscape Landscape ecology Paisagem Planejamento Planning Remote sensing Sensoriamento remoto
68	Machine learning for automatic classification of remotely sensed data Milne, Linda, Computer Science & Engineering, Faculty of Engineering, UNSW January 2008 (has links) As more and more remotely sensed data becomes available it is becoming increasingly harder to analyse it with the more traditional labour intensive, manual methods. The commonly used techniques, that involve expert evaluation, are widely acknowledged as providing inconsistent results, at best. We need more general techniques that can adapt to a given situation and that incorporate the strengths of the traditional methods, human operators and new technologies. The difficulty in interpreting remotely sensed data is that often only a small amount of data is available for classification. It can be noisy, incomplete or contain irrelevant information. Given that the training data may be limited we demonstrate a variety of techniques for highlighting information in the available data and how to select the most relevant information for a given classification task. We show that more consistent results between the training data and an entire image can be obtained, and how misclassification errors can be reduced. Specifically, a new technique for attribute selection in neural networks is demonstrated. Machine learning techniques, in particular, provide us with a means of automating classification using training data from a variety of data sources, including remotely sensed data and expert knowledge. A classification framework is presented in this thesis that can be used with any classifier and any available data. While this was developed in the context of vegetation mapping from remotely sensed data using machine learning classifiers, it is a general technique that can be applied to any domain. The emphasis of the applicability for this framework being domains that have inadequate training data available. contribution analysis ensemble classifiers multi-strategy classification attribute selection feature selection
69	Using Mining Techniques to Identify External Web Environment of Companies Chen, Hsaio 17 January 2006 (has links) As the rapid growth of World Wide Web nowadays, many companies tend to disseminate relevant information such as the introduction of product and service through their commercial Web sites. A company¡¦s Web site is deemed as a certain kind of its business assets. Customers, suppliers, partners, associations and other outsiders who desire to get access to the assets from the Web construct a company¡¦s external Web environment. From a strategic planning point of view, identifying a company¡¦s external environment helps to create its business values. Therefore, this research focuses on the issue of assisting a company to identify its external Web environment using mining techniques. Several research works pointed out that the hyperlink structure among Web pages could contribute to classifying the relationships within a company¡¦s external environment. We then propose a classifier that combines Web content mining and hyperlink structure, CNB-HI, for such a purpose. We apply our proposed approach to a real case to help identify the roles of customers, partners, media, and associations. Two experiments are conducted to examine the performance. In the first experiment, we compare CNB with other forms of Naïve Bayesian classifiers, and conclude that CNB achieves a better performance. However, even the performance by CNB is not satisfactory based exclusively on content classification. The second experiment is conducted to examine the benefits with hyperlink information incorporated (CNB-HI). The result shows that the performance of CNB-HI improves markedly. It thus justifies the feasibility of the proposed approach to real applications. Naïve Bayesian Classifiers Web Content Classification Hyperlink Analysis External Web Environment
70	Sen Koktas, Nigar 01 January 2008 (has links) (PDF) Gait analysis is the process of collecting and analyzing quantitative information about walking patterns of the people. Gait analysis enables the clinicians to differentiate gait deviations objectively. Diagnostic decision making from gait data only requires high level of medical expertise of neuromusculoskeletal system trained for the purpose. An automated system is expected to decrease this requirement by a &lsquo / transformed knowledge&rsquo / of these experts. This study presents a clinical decision support system for the detecting and scoring of a knee disorder, namely, Osteoarthritis (OA). Data used for training and recognition is mainly obtained through Computerized Gait Analysis software. Sociodemographic and disease characteristics such as age, body mass index and pain level are also included in decision making. Subjects are allocated into four OA-severity categories, formed in accordance with the Kellgren-Lawrence scale: &ldquo / Normal&rdquo / , &ldquo / Mild&rdquo / , &ldquo / Moderate&rdquo / , and &ldquo / Severe&rdquo / . Different types of classifiers are combined to incorporate the different types of data and to make the best advantages of different classifiers for better accuracy. A decision tree is developed with Multilayer Perceptrons (MLP) at the leaves. This gives an opportunity to use neural networks to extract hidden (i.e., implicit) knowledge in gait measurements and use it back into the explicit form of the decision trees for reasoning. Individual feature selection is applied using the Mahalanobis Distance measure and most discriminatory features are used for each expert MLP. Significant knowledge about clinical recognition of the OA is derived by feature selection process. The final system is tested with test set and a success rate of about 80% is achieved on the average. QA Computer Software 76.75-76.765

Search results