• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 60
  • 27
  • 14
  • 12
  • 11
  • 9
  • 8
  • 6
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 337
  • 337
  • 107
  • 92
  • 88
  • 67
  • 58
  • 51
  • 47
  • 45
  • 41
  • 41
  • 39
  • 37
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized Versus Common Languages

Jarman, Jay 01 January 2011 (has links)
This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual methods. The methods are evaluated on an extensive set of real clinical notes annotated by experts to provide benchmark results. There are two main research question for this dissertation. First, can information (specialized language) be extracted from clinical progress notes that will represent the notes without loss of predictive information? Secondly, can classifiers be built for clinical progress notes that are represented by specialized language? Three experiments were conducted to answer these questions by investigating some specific challenges with regard to extracting information from the unstructured clinical notes and classifying documents that are so important in the medical domain. The first experiment addresses the first research question by focusing on whether relevant patterns within clinical notes reside more in the highly technical medically-relevant terminology or in the passages expressed by common language. The results from this experiment informed the subsequent experiments. It also shows that predictive patterns are preserved by preprocessing text documents with a grammatical NLP system that separates specialized language from common language and it is an acceptable method of data reduction for the purpose of STM. Experiments two and three address the second research question. Experiment two focuses on applying rule-mining techniques to the output of the information extraction effort from experiment one, with the ultimate goal of creating rule-based classifiers. There are several contributions of this experiment. First, it uses a novel approach to create classification rules from specialized language and to build a classifier. The data is split by classification and then rules are generated. Secondly, several toolkits were assembled to create the automated process by which the rules were created. Third, this automated process created interpretable rules and finally, the resulting model provided good accuracy. The resulting performance was slightly lower than from the classifier from experiment one but had the benefit of having interpretable rules. Experiment three focuses on using decision tree induction (DTI) for a rule discovery approach to classification, which also addresses research question three. DTI is another rule centric method for creating a classifier. The contributions of this experiment are that DTI can be used to create an accurate and interpretable classifier using specialized language. Additionally, the resulting rule sets are simple and easily interpretable, as well as created using a highly automated process.
152

Statistical Analysis and Modeling of Breast Cancer and Lung Cancer

Cong, Chunling 05 November 2010 (has links)
The objective of the present study is to investigate various problems associate with breast cancer and lung cancer patients. In this study, we compare the effectiveness of breast cancer treatments using decision tree analysis and come to the conclusion that although certain treatment shows overall effectiveness over the others, physicians or doctors should discretionally give different treatment to breast cancer patients based on their characteristics. Reoccurrence time of breast caner patients who receive different treatments are compared in an overall sense, histology type is also taken into consideration. To further understand the relation between relapse time and other variables, statistical models are applied to identify the attribute variables and predict the relapse time. Of equal importance, the transition between different breast cancer stages are analyzed through Markov Chain which not only gives the transition probability between stages for specific treatment but also provide guidance on breast cancer treatment based on stating information. Sensitivity analysis is conducted on breast cancer doubling time which involves two commonly used assumptions: spherical tumor and exponential growth of tumor and the analysis reveals that variation from those assumptions could cause very different statistical behavior of breast cancer doubling time. In lung cancer study, we investigate the mortality time of lung cancer patients from several different perspectives: gender, cigarettes per day and duration of smoking. Statistical model is also used to predict the mortality time of lung cancer patients.
153

Multi-Temporal Crop Classification Using a Decision Tree in a Southern Ontario Agricultural Region

Melnychuk, Amie 03 October 2012 (has links)
Identifying landuse management practices is important for detecting landuse change and impacts on the surrounding landscape. The Ontario Ministry of Agriculture and Rural A airs has established a database product called the Agricultural Resource Inventory (AgRI), which is used for the storage and analysis of agricultural land management practices. This thesis explores the opportunity to populate the AgRI. A comparison of two supervised classi fications using optical satellite imagery with multiple single-date classifi cations and a subsequent multi-date, multi-sensor classi fication were used to gauge the best image timing for crop classi fication. In this study optical satellite images (Landsat-5 and SPOT-4/5) were inputted into a decision tree classifi er and Maximum Likelihood Classifi er (MLC) where the decision tree performed better than the MLC in overall and class accuracies. Classifi cation experienced complications from visual diff erences in vegetation. The multi-date classifi cation performed had an accuracy of 66.52%. The lack of imagery available at crop ripening stages reduced the accuracies greatly.
154

Predictive Health Monitoring for Aircraft Systems using Decision Trees

Gerdes, Mike January 2014 (has links)
Unscheduled aircraft maintenance causes a lot problems and costs for aircraft operators. This is due to the fact that aircraft cause significant costs if flights have to be delayed or canceled and because spares are not always available at any place and sometimes have to be shipped across the world. Reducing the number of unscheduled maintenance is thus a great costs factor for aircraft operators. This thesis describes three methods for aircraft health monitoring and prediction; one method for system monitoring, one method for forecasting of time series and one method that combines the two other methods for one complete monitoring and prediction process. Together the three methods allow the forecasting of possible failures. The two base methods use decision trees for decision making in the processes and genetic optimization to improve the performance of the decision trees and to reduce the need for human interaction. Decision trees have the advantage that the generated code can be fast and easily processed, they can be altered by human experts without much work and they are readable by humans. The human readability and modification of the results is especially important to include special knowledge and to remove errors, which the automated code generation produced.
155

應用資料採礦技術於電影市場研究 / Application of Data Mining Techniques to Film Market Research

蔡依庭, Tsai, Yi-Ting Unknown Date (has links)
就當前電影市場的現況來看,電影發行成本的節節升高,顧客需求的複雜多變,再加上電影消費集中化趨勢越趨明顯的事實,不論是從電影發行公司或是電影映演事業的角度來看,如何透過對於市場顧客需求、行為的解讀,清楚分隔市場,並為不同市場區隔設計不同的產品及行銷組合已經成了電影工業刻不容緩的課題。 有鑑於此,本研究透過應用資料採礦之技術,選用四個決策樹(C&RT、QUEST、CHAID、C5.0)、邏輯斯迴歸以及類神經網路等方式進行模型建置,由於決策樹CHAID對於「是否去電影院看外片或國片」及「是否去電影院看電影」兩種不同的目標變數,其不論是在整體預測正確率、準確度、反查率,皆是高於其他模型,故最後兩個目標變數皆選擇CHAID此一模型,而目標變數為「是否去電影院看電影」之CHAID模型表現也較好,故主要以其結果為主。 透過目標變數為「是否去電影院看電影」之CHAID模型,共獲得十三項影響「是否去電影院看電影」之相關變數,並根據分析結果,將電影市場顧客區分為最高貢獻顧客、一般貢獻顧客及低度貢獻顧客三類,將其歸納出並找出三種不同貢獻程度的顧客族群特性,而三種不同貢獻族群在「年齡」、「教育程度」、「娛樂文化支出」、「居住地區」、「是否上網瀏覽資訊網頁」、「是否上網蒐集資訊」、「是否會收看電視外片」、「是否看電視歐美影集」、「是否會說英文」、「是否上網線上觀賞影片」、「經濟富裕」、「即時行樂」均呈現顯著的差異,故本研究以不同貢獻程度族群特性為主,以看外片或國片之族群特性為輔,作為行銷策略建議之依據。 / Considering the current film market, the publication cost of a film is steadily increased. Meanwhile, customers have complicated requirements, and the trend of concentrated film consumption is gradually clear. For the perspective of both film companies and film broadcasting business, clear market segmentation after understanding customers’ needs and interpretation of customer behaviors to design different products and marketing combination for different markets are of great urgency for the general film industry. In view of this, the study aims to using four Decision Trees(C&RT, QUEST, CHAID, C5.0), Logistic Regression, and Artificial Neural Network to construct the model by applying Data Mining technology. Since Decision Tree-CHAID is excellent in the forecast accuracy, precision, and recall rate as compared to other models for response variables of going to the movies and going to foreign movies or Taiwan movies, the CHAID is adopted in this research for both response variables. The CHAID is more excellent for the response variable of going to the movies than the other, so use it as the main result. Through using Decision Tree-CHAID, this study identified thirteen factors that have greater impact on going to the movies. Based on the analysis results, this study induced the characteristics of three customer groups-the highest contribution customers, regular contribution customers and low contribution customers. Three different contribution groups shows significant differences at age, education, entertainment expenditure, living area, internet surfing, collecting information from internet, watch foreign movies, watch foreign drama, speak English, watch on-lines movies, affluent, and seize the day. This study mainly based on the characteristics of the three different groups, and group characteristic of going to foreign movies or Taiwan movies as auxiliary, to provide the marketing portfolio strategy recommendations.
156

Mapeamento digital de solos: Metodologias para atender a demanda por informação espacial em solos / Digital soil mapping: Methods to meet the demand for soil spatial information

Caten, Alexandre Ten 07 November 2011 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Soil has increasingly being recognized as having an important role in ecosystems as well as for food production and global climate regulation. For this reason, the demand for relevant and updated information on soil is increasing. Digital Soil Mapping (DSM) provides this information at different spatial resolution with associated quality indicators. The aim of this study was to analyze the main methodological approaches used for DSM of soil classes through a literature review of national researches and to propose procedures for data analysis in DSM projects of soil classes. The use of DSM techniques for mapping soil classes in Brazil is recent, the first publication on this subject occurred only in 2006. Among the predictive functions, logistic regressions is the predominantly used technique. Quality evaluation of the predictive models employed error matrix and kappa index in most cases. The use of wavelet transform proved to be a methodology of great potential for analyzing the spatial resolution of terrain attributes maximum variability. The proposed methodology of data exclusion for environmental covariates located too near at the border of soil classes polygons has enabled the generation of less complex and more accurate Decision Tree (DT) models. It was also shown that the amount of data required for DT model training is between five and 15% of the total data set. Collected field observations indicated a predicted accuracy close to 70% for DT models produced by those sampling densities. / O solo é cada vez mais reconhecido como tendo um importante papel nos ecossistemas, assim como para a produção de alimentos e regulação do clima global. Por esse motivo, a demanda por informações relevantes e atualizadas em solos está em uma crescente. O Mapeamento Digital de Solos (MDS) possibilita gerar essas informações demandadas em diferentes resoluções espaciais e com indicadores de qualidade associados. O objetivo deste estudo foi analisar as principais abordagens metodológicas utilizadas nos mapeamentos digitais de classes de solos através de uma revisão de literatura dos trabalhos nacionais, assim como propor procedimentos para a análise dos dados a serem utilizados em projetos de mapeamento digital de classes de solos. O emprego de técnicas de MDS para o mapeamento de classes de solos é recente no país, a primeira publicação nesse sentido ocorreu apenas em 2006. Entre as funções preditivas utilizadas predomina o emprego da técnica de regressões logísticas. Quanto à avaliação da qualidade dos modelos preditivos o emprego da matriz de erros e do índice kappa têm sido os procedimentos mais usuais. O emprego da transformada wavelet mostrou-se como uma metodologia de grande potencial para a análise da resolução espacial de máxima variabilidade de atributos de terreno a serem usados em projetos de MDS. A metodologia proposta de exclusão dos dados oriundos de covariáveis ambientais localizadas na bordas dos polígonos de solos possibilitou a geração de modelos por Árvore de Decisão (AD) menos complexos e mais precisos. Assim como o volume de dados necessários para o treinamento de modelos preditivos por AD está entre cinco e 15% do conjunto total de dados como mostrou este estudo. Observações coletadas a campo indicaram uma acurácia dos mapas preditos próxima a 70% para os modelos oriundos dessas densidades de amostragem.
157

[en] A FRAMEWORK FOR GENERATING BINARY SPLITS IN DECISION TREES / [pt] UM FRAMEWORK PARA GERAÇÃO DE SPLITS BINÁRIOS EM ÁRVORES DE DECISÃO

FELIPE DE ALBUQUERQUE MELLO PEREIRA 05 December 2018 (has links)
[pt] Nesta dissertação é apresentado um framework para desenvolver critérios de split para lidar com atributos nominais multi-valorados em árvores de decisão. Critérios gerados por este framework podem ser implementados para rodar em tempo polinomial no número de classes e valores, com garantia teórica de produzir um split próximo do ótimo. Apresenta-se também um estudo experimental, utilizando datasets reais, onde o tempo de execução e acurácia de métodos oriundos do framework são avaliados. / [en] In this dissertation we propose a framework for designing splitting criteria for handling multi-valued nominal attributes for decision trees. Criteria derived from our framework can be implemented to run in polynomial time in the number of classes and values, with theoretical guarantee of producing a split that is close to the optimal one. We also present an experimental study, using real datasets, where the running time and accuracy of the methods obtained from the framework are evaluated.
158

Riziko v investičním rozhodování / RISK IN INVESTMENT DECISIONS

GARDOŠ, Radek January 2008 (has links)
The topic of this thesis is the evaluation of risk in enterprise. First section summarizes common knowledge related to investment process and states methods used for analysis of risk and investments efficiency. Second part evaluates economic efficiency and risk of a future investments in the particular enterprise. Projects are critical to the realization of performing organization's strategies. Each project contains some degree of risk and it is required to be aware of these risks and to develop the necessary responses to get the desired level of project success. Because projects' risks are multidimensional, they must be evaluated by using risk evaluation methods. The aim of this part is to provide an analytic tool to evaluate the project risks. At first the thesis analysis net present value and other investment criteria of the construction project without risk factors. Subsequently the projects' risks are are evaluated by using risk premium. To study of how projected performance varies along with changes in the key assumptions on which the projections are based is used the sensitivity analysis. The main sources for data was the enterprise environment.
159

Avaliação Econômica de Causas Judiciais Sob a Ótica de um Investimento

SILVA, Jaqueline Matias da 21 August 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-04-01T14:52:00Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação - Jaqueline.pdf: 1157808 bytes, checksum: c1a431269eb9aaf8e39b976349f5746a (MD5) / Made available in DSpace on 2016-04-01T14:52:00Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação - Jaqueline.pdf: 1157808 bytes, checksum: c1a431269eb9aaf8e39b976349f5746a (MD5) Previous issue date: 2015-08-21 / CNPq / Em geral, as decisões sobre investimentos em negócios caracterizam-se pelo tratamento de problemas não estruturados, requisitando um alto nível de conhecimento conceitual por parte do decisor. Assim como todos os gestores, os profissionais da área jurídica têm de lidar com a natureza crítica dos riscos e das incertezas no processo de tomada de decisão. A atividade advocatícia caracteriza-se, em termos econômico-financeiros, por receitas imprevisíveis, de montante e de tempo, que devem cobrar despesas e custos fixos inadiáveis. Desta forma, a tomada de decisão está condicionada à otimização do uso dos recursos, o que exige a consideração dos investimentos e benefícios envolvidos. Diante do exposto, a proposta deste trabalho é sugerir um modelo que visa à estruturação do processo de análise de investimentos e de tomada de decisão com relação ao financiamento de causas judiciais. Os métodos utilizados para estruturação do modelo foram a Árvore de Decisão e a Simulação Monte Carlo, estabelecendo um processo de análise da viabilidade econômica, que permita aos escritórios de advocacia ou prestadores de serviços judiciários a análise da variabilidade do fluxo de caixa ao longo de um processo judicial, analisar o resultado econômico do investimento através de uma distribuição de probabilidade, bem como obter uma medida de risco que auxilie o decisor na tomada de decisão. A partir do desenvolvimento do modelo e de sua aplicação, foi possível perceber que o método é capaz de responder sobre a viabilidade econômica de causas judiciais, bem como de fornecer informações acerca dos benefícios e dos riscos de se tomar determinada decisão, tendo em vista o retorno de uma causa judicial como tendo sido subsidiado pelo prestador de serviços advocatícios. / In general, decisions on business investment characterized by the treatment of unstructured problems, require a high level of conceptual knowledge by the decision maker. Like all managers, legal professionals have to deal with the critical nature of the risks and uncertainties in the decision-making process. The attorney-client activity is characterized, in economic and financial terms, unpredictable income, amount and time, they should charge expenses and unavoidable fixed costs. Thus, decision making is subject to optimal use of resources, which requires consideration of investments and benefits involved. Given the above, the purpose of this paper is to suggest a model that aims to structure the analysis process of investment and decision-making with regard to funding for legal cases. The methods used to structure the model were Decision Trees and the Monte Carlo simulation, establishing a process of analysis of the economic viability, allowing law firms or judicial service providers to analyze the variability of cash flows over a court case, analyze the economic result of the investment by a probability distribution, as well as get a risk measure that helps the decision maker in decision making. From the development of the model and its application, it was revealed that the method is able to answer the economic viability of legal proceedings and to provide information about the benefits and risks of taking certain decision, having seen the return of a court case to have been subsidized legal services provider.
160

Processo de descoberta de conhecimento em bases de dados para a analise e o alerta de doenças de culturas agricolas e sua aplicação na ferrugem do cafeeiro / Process of knowledge discovery in databases for analysis and warning of crop diseases and its application on coffee rust

Meira, Carlos Alberto Alves 13 June 2008 (has links)
Orientador: Luiz Henrique Antunes Rodrigues / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Agricola / Made available in DSpace on 2018-08-11T10:02:19Z (GMT). No. of bitstreams: 1 Meira_CarlosAlbertoAlves_D.pdf: 2588338 bytes, checksum: 869cc28d2c71dbc901870285cc32d8f9 (MD5) Previous issue date: 2008 / Resumo: Sistemas de alerta de doenças de plantas permitem racionalizar o uso de agrotóxicos, mas são pouco utilizados na prática. Complexidade dos modelos, dificuldade de obtenção dos dados necessários e custos para o agricultor estão entre as razões que inibem o seu uso. Entretanto, o desenvolvimento tecnológico recente - estações meteoro lógicas automáticas, bancos de dados, monitoramento agrometeorológico na Web e técnicas avançadas de análise de dados - permite se pensar em um sistema de acesso simples e gratuito. Uma instância do processo de descoberta de conhecimento em bases de dados foi realizada com o objetivo de avaliar o uso de classificação e de indução de árvores de decisão na análise e no alerta da ferrugem do cafeeiro causada por Hemileia vastatrix. Taxas de infecção calculadas a partir de avaliações mensais de incidência da ferrugem foram agrupadas em três classes: TXl - redução ou estagnação; TX2 - crescimento moderado (até 5 p.p.); e TX3 - crescimento acelerado (acima de 5 p.p.). Dados meteorológicos, carga pendente de frutos do cafeeiro (Coffea arabica) e espaçamento entre plantas foram as variáveis independentes. O conjunto de treinamento totalizou 364 exemplos, preparados a partir de dados coletados em lavouras de café em produção, de outubro de 1998 a outubro de 2006. Uma árvore de decisão foi desenvolvida para analisar a epidemia da ferrugem do cafeeiro. Ela demonstrou seu potencial como modelo simbólico e interpretável, permitindo a identificação das fronteiras de decisão e da lógica contidas nos dados, allf'iliando na compreensão de quais variáveis e como as interações dessas variáveis condicionaram o progresso da doença no campo. As variáveis explicativas mais importantes foram a temperatura média nos períodos de molhamento foliar, a carga pendente de frutos, a média das temperaturas máximas diárias no período de inG:!Jbação e a umidade relativa do ar. Os modelos de alerta foram deserivolvtdos considerando taxas de infecção binárias, segundo os limites de 5 p.p e 10 p.p. (classe- '1' para taxas maiores ou iguais ao limite; classe 'O', caso contrário). Os modelos são específicos para lavouras com alta carga pendente ou para lavouras com baixa carga. Os primeiros tiveram melhor desempenho na avaliação. A estimativa de acurácia, por validação cruzada, foi de até 83%, considerando o alerta a partir de 5 p.p. Houve ainda equilíbrio entre a acurácia e medidas importantes como sensitividade, especificidade e confiabilidade positiva ou negativa. Considerando o alerta a partir de 10 p.p., a acurácia foi de 79%. Para lavouras com baixa carga pendente, os modelos considerando o alerta a partir de 5 p.p. tiveram acurácia de até 72%. Os modelos para a taxa de infecção mais elevada (a partir de 10 p.p.) tiveram desempenho fraco. Os modelos mais bem avaliados mostraram ter potencial para servir como apoio na tomada de decisão referente à adoção de medidas de controle da ferrugem do cafeeiro. O processo de descoberta de conhecimento em bases de dados foi caracterizado, com a intenção de que possa vir a ser útil em aplicações semelhantes para outras culturas agrícolas ou para a própria cultura do café, no caso de outras doenças ou pragas / Abstract: Plant disease warning systems can contribute for diminishing the use of chemicals in agriculture, but they have received limited acceptance in practice. Complexity of models, difficulties in obtaining the required data and costs for the growers are among the reasons that inhibit their use. However, recent technological advance - automatic weather stations, databases, Web based agrometeorological monitoring and advanced techniques of data analysis - allows the development of a system with simple and free access. A process .instance of knowledge discovery in databases has been realized to evaluate the use of classification and decision tree induction in the analysis and warning of coffee rust caused by Hemileia vastatrix. Infection rates calculated from monthly assessments of rust incidence were grouped into three classes: TXl - reduction or stagnation; TX2 - moderate growth (up to 5 pp); and TX3 - accelerated growth (above 5 pp). Meteorological data, expected yield and space between plants were used as independent variables. The training data set contained 364 examples prepared from data collected in coffee-growing areas between October 1998 and October 2006. A decision tree has been developed to analyse the coffee rust epidemics. The decision tree demonstrated its potential as a symbolic and interpretable model. Its mo deI representation identified the existing decision boundaries in the data and the logic underlying them, helping to understand which variables, and interactions between these variables, led to, coffee rust epidemics in the field. The most important explanatory variables were mean temperature during leaf wetness periods, expected yield, mean of maximum temperatures during the incubation period and relative air humidity. The warning models have been developed considering binary infection rates, according to the 5 pp and 10 pp thresholds, (class '1' for rates greater than or equal the threshold; class 'O;, otherwise). These models are specific for growing are as with high expected yield or areas with low expected yield. The former had best performance in the evaluation. The estimated accuracy by cross-validation was up to 83%, considering the waming for 5 pp and higher. There was yet equivalence between accuracy and such important measures like sensitivity, specificity a~d positive or negative reliability. Considering the waming for 10 pp and higher, the accuracy was 79%. For growing areas with low expected yield, the accuracy of the models considering the waming for 5 pp and higher was up to 72%. The models for the higher infection rate (10 pp and higher) had low performance. The best evaluated models showed potential to be used in decision making about coffee rust disease control. The process of knowledge discovery in databases was characterized in such a way it can be employed in similar problems of the application domain with other crops or other coffee diseases or pests / Doutorado / Planejamento e Desenvolvimento Rural Sustentável / Doutor em Engenharia Agrícola

Page generated in 0.0705 seconds