Global ETD Search

201	Uma comparação da aplicação de métodos computacionais de classificação de dados aplicados ao consumo de cinema no Brasil / A comparison of the application of data classification computational methods to the consumption of film at theaters in Brazil Nathalia Nieuwenhoff 13 April 2017 (has links) As técnicas computacionais de aprendizagem de máquina para classificação ou categorização de dados estão sendo cada vez mais utilizadas no contexto de extração de informações ou padrões em bases de dados volumosas em variadas áreas de aplicação. Em paralelo, a aplicação destes métodos computacionais para identificação de padrões, bem como a classificação de dados relacionados ao consumo dos bens de informação é considerada uma tarefa complexa, visto que tais padrões de decisão do consumo estão relacionados com as preferências dos indivíduos e dependem de uma composição de características individuais, variáveis culturais, econômicas e sociais segregadas e agrupadas, além de ser um tópico pouco explorado no mercado brasileiro. Neste contexto, este trabalho realizou o estudo experimental a partir da aplicação do processo de Descoberta do conhecimento (KDD), o que inclui as etapas de seleção e Mineração de Dados, para um problema de classificação binária, indivíduos brasileiros que consomem e não consomem um bem de informação, filmes em salas de cinema, a partir dos dados obtidos na Pesquisa de Orçamento Familiar (POF) 2008-2009, pelo Instituto Brasileiro de Geografia e Estatística (IBGE). O estudo experimental resultou em uma análise comparativa da aplicação de duas técnicas de aprendizagem de máquina para classificação de dados, baseadas em aprendizado supervisionado, sendo estas Naïve Bayes (NB) e Support Vector Machine (SVM). Inicialmente, a revisão sistemática realizada com o objetivo de identificar estudos relacionados a aplicação de técnicas computacionais de aprendizado de máquina para classificação e identificação de padrões de consumo indica que a utilização destas técnicas neste contexto não é um tópico de pesquisa maduro e desenvolvido, visto que não foi abordado em nenhum dos trabalhos estudados. Os resultados obtidos a partir da análise comparativa realizada entre os algoritmos sugerem que a escolha dos algoritmos de aprendizagem de máquina para Classificação de Dados está diretamente relacionada a fatores como: (i) importância das classes para o problema a ser estudado; (ii) balanceamento entre as classes; (iii) universo de atributos a serem considerados em relação a quantidade e grau de importância destes para o classificador. Adicionalmente, os atributos selecionados pelo algoritmo de seleção de variáveis Information Gain sugerem que a decisão de consumo de cultura, mais especificamente do bem de informação, filmes em cinema, está fortemente relacionada a aspectos dos indivíduos relacionados a renda, nível de educação, bem como suas preferências por bens culturais / Machine learning techniques for data classification or categorization are increasingly being used for extracting information or patterns from volumous databases in various application areas. Simultaneously, the application of these computational methods to identify patterns, as well as data classification related to the consumption of information goods is considered a complex task, since such decision consumption paterns are related to the preferences of individuals and depend on a composition of individual characteristics, cultural, economic and social variables segregated and grouped, as well as being not a topic explored in the Brazilian market. In this context, this study performed an experimental study of application of the Knowledge Discovery (KDD) process, which includes data selection and data mining steps, for a binary classification problem, Brazilian individuals who consume and do not consume a information good, film at theaters in Brazil, from the microdata obtained from the Brazilian Household Budget Survey (POF), 2008-2009, performed by the Brazilian Institute of Geography and Statistics (IBGE). The experimental study resulted in a comparative analysis of the application of two machine-learning techniques for data classification, based on supervised learning, such as Naïve Bayes (NB) and Support Vector Machine (SVM). Initially, a systematic review with the objective of identifying studies related to the application of computational techniques of machine learning to classification and identification of consumption patterns indicates that the use of these techniques in this context is not a mature and developed research topic, since was not studied in any of the papers analyzed. The results obtained from the comparative analysis performed between the algorithms suggest that the choice of the machine learning algorithms for data classification is directly related to factors such as: (i) importance of the classes for the problem to be studied; (ii) balancing between classes; (iii) universe of attributes to be considered in relation to the quantity and degree of importance of these to the classifiers. In addition, the attributes selected by the Information Gain variable selection algorithm suggest that the decision to consume culture, more specifically information good, film at theaters, is directly related to aspects of individuals regarding income, educational level, as well as preferences for cultural goods Algoritmos de classificação Bens de informação Consumo Naïve Bayes Reconhecimento de padrões Support Vector Machine SVM Classification algorithm Consumption Information goods Naïve Bayes Pattern recognition Support Vector Machine SVM
202	Seleção entre estratégias de geração automática de dados de teste por meio de métricas estáticas de softwares orientados a objetos / Selection between whole test generation strategies by analysing object oriented software static metrics Gustavo da Mota Ramos 09 October 2018 (has links) Produtos de software com diferentes complexidades são criados diariamente através da elicitação de demandas complexas e variadas juntamente a prazos restritos. Enquanto estes surgem, altos níveis de qualidade são esperados para tais, ou seja, enquanto os produtos tornam-se mais complexos, o nível de qualidade pode não ser aceitável enquanto o tempo hábil para testes não acompanha a complexidade. Desta maneira, o teste de software e a geração automática de dados de testes surgem com o intuito de entregar produtos contendo altos níveis de qualidade mediante baixos custos e rápidas atividades de teste. Porém, neste contexto, os profissionais de desenvolvimento dependem das estratégias de geração automáticas de testes e principalmente da seleção da técnica mais adequada para conseguir maior cobertura de código possível, este é um fator importante dados que cada técnica de geração de dados de teste possui particularidades e problemas que fazem seu uso melhor em determinados tipos de software. A partir desde cenário, o presente trabalho propõe a seleção da técnica adequada para cada classe de um software com base em suas características, expressas por meio de métricas de softwares orientados a objetos a partir do algoritmo de classificação Naive Bayes. Foi realizada uma revisão bibliográfica de dois algoritmos de geração, algoritmo de busca aleatório e algoritmo de busca genético, compreendendo assim suas vantagens e desvantagens tanto de implementação como de execução. As métricas CK também foram estudadas com o intuito de compreender como estas podem descrever melhor as características de uma classe. O conhecimento adquirido possibilitou coletar os dados de geração de testes de cada classe como cobertura de código e tempo de geração a partir de cada técnica e também as métricas CK, permitindo assim a análise destes dados em conjunto e por fim execução do algoritmo de classificação. Os resultados desta análise demonstraram que um conjunto reduzido e selecionado das métricas CK é mais eficiente e descreve melhor as características de uma classe se comparado ao uso do conjunto por completo. Os resultados apontam também que as métricas CK não influenciam o tempo de geração dos dados de teste, entretanto, as métricas CK demonstraram correlação moderada e influência na seleção do algoritmo genético, participando assim na sua seleção pelo algoritmo Naive Bayes / Software products with different complexity are created daily through analysis of complex and varied demands together with tight deadlines. While these arise, high levels of quality are expected for such, as products become more complex, the quality level may not be acceptable while the timing for testing does not keep up with complexity. In this way, software testing and automatic generation of test data arise in order to deliver products containing high levels of quality through low cost and rapid test activities. However, in this context, software developers depend on the strategies of automatic generation of tests and especially on the selection of the most adequate technique to obtain greater code coverage possible, this is an important factor given that each technique of data generation of test have peculiarities and problems that make its use better in certain types of software. From this scenario, the present work proposes the selection of the appropriate technique for each class of software based on its characteristics, expressed through object oriented software metrics from the naive bayes classification algorithm. Initially, a literature review of the two generation algorithms was carried out, random search algorithm and genetic search algorithm, thus understanding its advantages and disadvantages in both implementation and execution. The CK metrics have also been studied in order to understand how they can better describe the characteristics of a class. The acquired knowledge allowed to collect the generation data of tests of each class as code coverage and generation time from each technique and also the CK metrics, thus allowing the analysis of these data together and finally execution of the classification algorithm. The results of this analysis demonstrated that a reduced and selected set of metrics is more efficient and better describes the characteristics of a class besides demonstrating that the CK metrics have little or no influence on the generation time of the test data and on the random search algorithm . However, the CK metrics showed a medium correlation and influence in the selection of the genetic algorithm, thus participating in its selection by the algorithm naive bayes Algoritmo genético Cobertura de testes Geração de testes Métricas CK Naive bayes Teste de software Code coverages Genetic algorithm Naive bayes Software testing Test data generation
203	Immunhistochemisch gestützte Tumordiagnostik unter besonderer Berücksichtigung von Metastasen bei unbekanntem Primärtumor Kaufmann, Olaf 06 November 2001 (has links) Immunhistochemische Zusatzuntersuchungen an Karzinommetastasen mit unbekanntem Primärtumor sind kostengünstig und erlauben insbesondere bei Adenokarzinomen oft eine spezifische Identifizierung des primären Tumorsitzes. Die Auswahl an kommerziell verfügbaren Antikörpern gegen Markerproteine mit gut dokumentierter hoher bis sehr hoher Spezifität für bestimmte Primärtumoren ist jedoch begrenzt. Dazu gehören der Thyreoidale Transkriptionsfaktor-1, Uroplakin III, GCDFP-15, Östrogen- und Progesteronrezeptoren, (-Fetoprotein, der A103-Antikörper gegen MART-1, die Cytokeratine 7 und 20, Basalzell-Cytokeratine, das carcinoembryonale Antigen, CA-125, EMA, Vimentin, HepPar-1, PSA, Thyreoglobulin und das S100-Protein. Die meisten dieser Marker sind jedoch nicht absolut spezifisch, die mit ihnen erzielten Färbeergebnisse müssen daher im Kontext des klinischen und konventionell-histomorphologischen Gesamtbefundes bewertet werden. Je genauer im Rahmen dieses Gesamtbefundes das Spektrum der infrage kommenden Karzinome und ihre relativen a priori Wahrscheinlichkeiten abgeschätzt werden, um so genauer lassen sich auch auf der Grundlage des Bayes-Theorems aus den Färbeergebnisse der Marker diagnostisch relevante Aussagen (prädiktive Werte) gewinnen. / Immunohistochemical studies on metastatic carcinomas of unknown primary site are cost-effective and often allow a specific identification of the tumor origin, especially if the metastases are adenocarcinomas by light microscopy. Commercially available site-specific markers include prostate-specific antigen, thyreoglobulin, thyreoid transcription factor-1, uroplakin III, GCDFP-15, estrogen- and progesterone rezeptors, (-Fetoprotein, the A103 monoclonal antibody against MART-1, cytokeratins 7 and 20, cytokeratins of basal cell type, p63, carcinoembryonic antigen, CA-125, EMA, vimentin, HepPar-1, and S100 protein. However, immunostainings with most of these markers do not show an absolute specificity for a certain primary site. For this reason, histopathologists interpretating staining results with these markers should take into consideration the available clinical data and the histological features of the metastatic carcinoma. These data are necessary to estimate the relative a priori probabilities of possible carcinomas. Based on Bayes` theorem, the a priori probabilities can then be used to calculate the diagnostically relevant predictive values for immunostaining results with the chosen markers. Metastasen mit unbekanntem Primärtumor Immunhistochemie Epitopdemaskierung Bayes-Theorem Carcinoma of unknown primary site immunohistochemistry antigen retrieval Bayes' theorem 610 Medizin 33 Medizin XH 2600 ddc:610
204	Empirical Bayes Methods for DNA Microarray Data Lönnstedt, Ingrid January 2005 (has links) <p>cDNA microarrays is one of the first high-throughput gene expression technologies that has emerged within molecular biology for the purpose of functional genomics. cDNA microarrays compare the gene expression levels between cell samples, for thousands of genes simultaneously. </p><p>The microarray technology offers new challenges when it comes to data analysis, since the thousands of genes are examined in parallel, but with very few replicates, yielding noisy estimation of gene effects and variances. Although careful image analyses and normalisation of the data is applied, traditional methods for inference like the Student <i>t</i> or Fisher’s <i>F</i>-statistic fail to work.</p><p>In this thesis, four papers on the topics of empirical Bayes and full Bayesian methods for two-channel microarray data (as e.g. cDNA) are presented. These contribute to proving that empirical Bayes methods are useful to overcome the specific data problems. The sample distributions of all the genes involved in a microarray experiment are summarized into prior distributions and improves the inference of each single gene.</p><p>The first part of the thesis includes biological and statistical background of cDNA microarrays, with an overview of the different steps of two-channel microarray analysis, including experimental design, image analysis, normalisation, cluster analysis, discrimination and hypothesis testing. The second part of the thesis consists of the four papers. Paper I presents the empirical Bayes statistic <i>B</i>, which corresponds to a <i>t</i>-statistic. Paper II is based on a version of <i>B</i> that is extended for linear model effects. Paper III assesses the performance of empirical Bayes models by comparisons with full Bayes methods. Paper IV provides extensions of <i>B</i> to what corresponds to <i>F</i>-statistics.</p> Mathematical statistics two-channel microarrays differential expression replication empirical Bayes factorial design interaction time trends hierarchical Bayes MCMC simulations ANOVA F-statistics Matematisk statistik Mathematical statistics Matematisk statistik
205	Empirical Bayes Methods for DNA Microarray Data Lönnstedt, Ingrid January 2005 (has links) cDNA microarrays is one of the first high-throughput gene expression technologies that has emerged within molecular biology for the purpose of functional genomics. cDNA microarrays compare the gene expression levels between cell samples, for thousands of genes simultaneously. The microarray technology offers new challenges when it comes to data analysis, since the thousands of genes are examined in parallel, but with very few replicates, yielding noisy estimation of gene effects and variances. Although careful image analyses and normalisation of the data is applied, traditional methods for inference like the Student t or Fisher’s F-statistic fail to work. In this thesis, four papers on the topics of empirical Bayes and full Bayesian methods for two-channel microarray data (as e.g. cDNA) are presented. These contribute to proving that empirical Bayes methods are useful to overcome the specific data problems. The sample distributions of all the genes involved in a microarray experiment are summarized into prior distributions and improves the inference of each single gene. The first part of the thesis includes biological and statistical background of cDNA microarrays, with an overview of the different steps of two-channel microarray analysis, including experimental design, image analysis, normalisation, cluster analysis, discrimination and hypothesis testing. The second part of the thesis consists of the four papers. Paper I presents the empirical Bayes statistic B, which corresponds to a t-statistic. Paper II is based on a version of B that is extended for linear model effects. Paper III assesses the performance of empirical Bayes models by comparisons with full Bayes methods. Paper IV provides extensions of B to what corresponds to F-statistics. Mathematical statistics two-channel microarrays differential expression replication empirical Bayes factorial design interaction time trends hierarchical Bayes MCMC simulations ANOVA F-statistics Matematisk statistik Mathematical statistics Matematisk statistik
206	Bayesian Model Selection for High-dimensional High-throughput Data Joshi, Adarsh 2010 May 1900 (has links) Bayesian methods are often criticized on the grounds of subjectivity. Furthermore, misspecified priors can have a deleterious effect on Bayesian inference. Noting that model selection is effectively a test of many hypotheses, Dr. Valen E. Johnson sought to eliminate the need of prior specification by computing Bayes' factors from frequentist test statistics. In his pioneering work that was published in the year 2005, Dr. Johnson proposed using so-called local priors for computing Bayes? factors from test statistics. Dr. Johnson and Dr. Jianhua Hu used Bayes' factors for model selection in a linear model setting. In an independent work, Dr. Johnson and another colleage, David Rossell, investigated two families of non-local priors for testing the regression parameter in a linear model setting. These non-local priors enable greater separation between the theories of null and alternative hypotheses. In this dissertation, I extend model selection based on Bayes' factors and use nonlocal priors to define Bayes' factors based on test statistics. With these priors, I have been able to reduce the problem of prior specification to setting to just one scaling parameter. That scaling parameter can be easily set, for example, on the basis of frequentist operating characteristics of the corresponding Bayes' factors. Furthermore, the loss of information by basing a Bayes' factors on a test statistic is minimal. Along with Dr. Johnson and Dr. Hu, I used the Bayes' factors based on the likelihood ratio statistic to develop a method for clustering gene expression data. This method has performed well in both simulated examples and real datasets. An outline of that work is also included in this dissertation. Further, I extend the clustering model to a subclass of the decomposable graphical model class, which is more appropriate for genotype data sets, such as single-nucleotide polymorphism (SNP) data. Efficient FORTRAN programming has enabled me to apply the methodology to hundreds of nodes. For problems that produce computationally harder probability landscapes, I propose a modification of the Markov chain Monte Carlo algorithm to extract information regarding the important network structures in the data. This modified algorithm performs well in inferring complex network structures. I use this method to develop a prediction model for disease based on SNP data. My method performs well in cross-validation studies. Bayes factors Bayes factors based on test statistics Bayesian Graphs MCMC Objective Bayesian Analysis Bayesian Model Selection Microarray data
207	分析失去部分訊息的貝氏更新計算方法 / Bayesian updating methods for the analysis of censored data. 范靜宜, Fan, Gin-Yi Unknown Date (has links) 對於使用貝氏法來處理部份區分(partially-classified)或是失去部分訊息資料的類別抽樣(categorical sampling with censored data)，大多建立在「誠實回答」(truthful reporting)以及「無價值性失去部分訊息」(non-informative censoring)的前提下。Jiang(1995)及Jiang and Dickey(2006)取消以上兩個限制，提出貝氏解並利用準貝氏法(quasi-Bayes)來求近似解，而Jiang and Ko(2004)也利用吉氏取樣器(Gibbs sampler)來近似這類問題的貝氏解。本文首先嘗試利用Kuroda, Geng and Niki(2001)所提的“平均變異數和(average variance sum)”估計法來應用到我們問題的貝氏解。在小樣本時，數值上我們可求得貝氏解，因此本文另一個重點為在小樣本時比較以上三種方法估計值的準確性，並考慮先驗參數(prior)的選取對估計的影響。本文更進一步證明若選取到某種特殊的先驗參數時，利用“平均變異數和”的方法所計算出來的結果會和準貝氏法的估計結果相同，而且皆等於用貝氏法計算出的結果。貝氏準貝氏法平均變異數和吉氏取樣器 Bayes quasi-Bayes Average variance sum Gibbs sampler
208	Kausales Denken, Bayes-Netze und die Markov-Bedingung / Causal reasoning, Bayes nets, and the Markov condition Mayrhofer, Ralf 11 February 2009 (has links) No description available. 150 Psychologie Mathematics and Computer Science Kausales Denken Bayes-Netze Markov-Bedingung Causal reasoning Bayes nets Markov condition 77.31 FAB 000: Kognitionspsychologie
209	Essays on the robustness of growth and poverty determinants / Tsangarides, Charalambos G. January 2003 (has links) (PDF) DC, George Washington Univ., Columbian College of Arts and Sciences, Diss.--Washington, 2003. / Kopie, ersch. im Verl. UMI, Ann Arbor, Mich. - Enth. 3 Beitr.
210	Structural modelling of operational risk in financial institutions : application of Bayesian networks and balanced scorecards to IT infrastructure risk modelling / Starobinskaya, Irina. January 2008 (has links) Zugl.: München, University, Diss., 2008.

Search results