Spelling suggestions: "subject:" discovery"" "subject:" viscovery""
401 |
Geração automática de metadados: uma contribuição para a Web semântica. / Automatic metadata generation: a contribution to the semantic Web.Eveline Cruz Hora Gomes Ferreira 05 April 2006 (has links)
Esta Tese oferece uma contribuição na área de Web Semântica, no âmbito da representação e indexação de documentos, definindo um Modelo de geração automática de metadados baseado em contexto, a partir de documentos textuais na língua portuguesa, em formato não estruturado (txt). Um conjunto teórico amplo de assuntos ligados à criação de ambientes digitais semântico também é apresentado. Conforme recomendado em SemanticWeb.org, os documentos textuais aqui estudados foram automaticamente convertidos em páginas Web anotadas semanticamente, utilizando o Dublin Core como padrão para definição dos elementos de metadados, e o padrão RDF/XML para representação dos documentos e descrição dos elementos de metadados. Dentre os quinze elementos de metadados Dublin Core, nove foram gerados automaticamente pelo Modelo, e seis foram gerados de forma semi-automática. Os metadados Description e Subject foram os que necessitaram de algoritmos mais complexos, sendo obtidos através de técnicas estatísticas, de mineração de textos e de processamento de linguagem natural. A finalidade principal da avaliação do Modelo foi verificar o comportamento dos documentos convertidos para o formato RDF/XML, quando estes foram submetidos a um processo de recuperação de informação. Os elementos de metadados Description e Subject foram exaustivamente avaliados, uma vez que estes são os principais responsáveis por apreender a semântica de documentos textuais. A diversidade de contextos, a complexidade dos problemas relativos à língua portuguesa, e os novos conceitos introduzidos pelos padrões e tecnologias da Web Semântica, foram alguns dos fortes desafios enfrentados na construção do Modelo aqui proposto. Apesar de se ter utilizado técnicas não muito novas para a exploração dos conteúdos dos documentos, não se pode ignorar que os elementos inovadores introduzidos pela Web Semântica ofereceram avanços que possibilitaram a obtenção de resultados importantes nesta Tese. Como demonstrado aqui, a junção dessas técnicas com os padrões e tecnologias recomendados pela Web Semântica pode minimizar um dos maiores problemas da Web atual, e uma das fortes razões para a implementação da Web Semântica: a tendência dos mecanismos de busca de inundarem os usuários com resultados irrelevantes, por não levarem em consideração o contexto específico desejado pelo usuário. Dessa forma, é importante que se dê continuidade aos estudos e pesquisas em todas as áreas relacionadas à implementação da Web Semântica, dando abertura para que sistemas de informação mais funcionais sejam projetados / This Thesis offers a contribution to the Semantic Web area, in the scope of the representation and indexing of documents, defining an Automatic metadata generation model based on context, starting from textual documents not structured in the Portuguese language. A wide theoretical set of subjects related to the creation of semantic digital environments is also presented. As recommended in SemanticWeb.org, the textual documents studied here were automatically converted to Web pages written in semantic format, using Dublin Core as standard for definition of metadata elements, and the standard RDF/XML for representation of documents and description of the metadata elements. Among the fifteen Dublin Core metadata elements, nine were automatically generated by the Model, and six were generated in a semiautomatic manner. The metadata Description and Subject were the ones that required more complex algorithms, being obtained through statistical techniques, text mining techniques and natural language processing. The main purpose of the evaluation of the Model was to verify the behavior of the documents converted to the format RDF/XML, when these were submitted to an information retrieval process. The metadata elements Description and Subject were exhaustively evaluated, since these are the main ones responsible for learning the semantics of textual documents. The diversity of contexts, the complexity of the problems related to the Portuguese language, and the new concepts introduced by the standards and technologies of the Semantic Web, were some of the great challenges faced in the construction of the Model here proposed. In spite of having used techniques which are not very new for the exploration and exploitation of the contents of the documents, we cannot ignore that the innovative elements introduced by the Web Semantic have offered improvements that made possible the obtention of important results in this Thesis. As demonstrated here, the joining of those techniques with the standards and technologies recommended by the Semantic Web can minimize one of the largest problems of the current Web, and one of the strong reasons for the implementation of the Semantic Web: the tendency of the search mechanisms to flood the users with irrelevant results, because they do not take into account the specific context desired by the user. Therefore, it is important that the studies and research be continued in all of the areas related to the Semantic Web?s implementation, opening the door for more functional systems of information to be designed.
|
402 |
"O framework de integração do sistema DISCOVER" / The Discover integration frameworkPrati, Ronaldo Cristiano 04 April 2003 (has links)
Talvez uma das maiores capacidades do ser humano seja a sua habilidade de aprender a partir de observações e transmitir o que aprendeu para outros humanos. Durante séculos, a humanidade vem tentado compreender o mundo em que vive e, a partir desse novo conhecimento adquirido, melhorar o mundo em que vive. O desenvolvimento da tecnologia colocou a descoberta de conhecimento em um momento ímpar na história da humanidade. Com os progressos da Ciência da Computação, e, em particular, da Inteligência Artificial - IA - e Aprendizado de Máquina -AM, hoje em dia é possível, a partir de métodos de inferência indutiva e utilizando um conjunto de exemplos, descobrir algum tipo de conhecimento implícito nesses exemplos. Entretanto, por ser uma área de pesquisa relativamente nova, e por envolver um processo tanto iterativo quanto interativo, atualmente existem poucas ferramentas que suportam eficientemente a descoberta de conhecimento a partir dos dados. Essa falta de ferramentas se agrava ainda mais no que se refere ao seu uso por pesquisadores em Aprendizado de Máquina e Aquisição de Conhecimento. Esses fatores, além do fato que algumas pesquisas em nosso Laboratório de Inteligência Computacional - LABIC - têm alguns componentes em comum, motivaram a elaboração do projeto Discover, que consiste em uma estratégia de trabalho em conjunto, envolvendo um conjunto de ferramentas que se integram e interajam, e que supram as necessidades de pesquisa dos integrantes do nosso laboratório. O Discover também pode ser utilizado como um campo de prova para desenvolver novas ferramentas e testar novas idéias. Como o Discover tem como principal finalidade o seu uso e extensão por pesquisadores, uma questão principal é que a arquitetura do projeto seja flexível o suficiente para permitir que novas pesquisas sejam englobadas e, simultaneamente, deve impor determinados padrões que permitam a integração eficiente de seus componentes. Neste trabalho, é proposto um framework de integração de componentes que tem como principal objetivo possibilitar a criação de um sistema computacional a partir das ferramentas desenvolvidas para serem utilizadas no projeto Discover. Esse framework compreende um mecanismo de adaptação de interface que cria uma camada (interface horizontal) sobre essas ferramentas, um poderoso mecanismo de metadados, que é utilizado para descrever tanto os componentes que implementam as funcionalidades do sistema quanto as configurações de experimentos criadas pelos usuário, que serão executadas pelo framework, e um ambiente de execução para essas configurações de experimentos. / One of human greatest capability is the ability to learn from observed instances of the world and to transmit what have been learnt to others. For thousands of years, we have tried to understand the world, and used the acquired knowledge to improve it. Nowadays, due to the progress in digital data acquisition and storage technology as well as significant progress in the field of Artificial Intelligence - AI, particularly Machine Learning - ML, it is possible to use inductive inference in huge databases in order to find, or discover, new knowledge from these data. The discipline concerned with this task has become known as Knowledge Discovery from Databases - KDD. However, this relatively new research area offers few tools that can efficiently be used to acquire knowledge from data. With these in mind, a group of researchers at the Computational Intelligence Laboratory - LABIC - is working on a system, called Discover, in order to help our research activities in KDD and ML. The aim of the system is to integrate ML algorithms mostly used by the community with the data and knowledge processing tools developed as the results of our work. The system can also be used as a workbench for new tools and ideas. As the main concern of the Discover is related to its use and extension by researches, an important question is related to the flexibility of its architecture. Furthermore, the Discover architecture should allow new tools be easily incorporated. Also, it should impose strong patterns to guarantee efficient component integration. In this work, we propose a component integration framework that aims the development of an integrated computational environment using the tools already implemented in the Discover project. The proposed component integration framework has been developed keeping in mind its future integration with new tools. This framework offers an interface adapter mechanism that creates a layer (horizontal interface) over these tools, a powerful metadata mechanism, which is used to describe both components implementing systems' functionalities and experiment configurations created by the user, and an environment that enables these experiment execution.
|
403 |
Uma arquitetura híbrida para descoberta de conhecimento em bases de dados: teoria dos rough sets e redes neurais artificiais mapas auto-organizáveis. / An hybrid architecture for the knowledge discovery in databases: rough sets theory and artificial neural nets self-organizing maps.Sassi, Renato José 28 November 2006 (has links)
As bases de dados do mundo real contêm grandes volumes de dados, e entre eles escondem-se diversas relações difíceis de descobrir através de métodos tradicionais como planilhas de cálculo e relatórios informativos operacionais. Desta forma, os sistemas de descoberta de conhecimento (Knowledge Discovery in Data Bases - KDD) surgem como uma possível solução para dessas relações extrair conhecimento que possa ser aplicado na tomada de decisão em organizações. Mesmo utilizando um KDD, tal atividade pode continuar sendo extremamente difícil devido à grande quantidade de dados que deve ser processada. Assim, nem todos os dados que compõem essas bases servem para um sistema descobrir conhecimento. Em geral, costuma-se pré-processar os dados antes de serem apresentados ao KDD, buscando reduzir a sua quantidade e também selecionar os dados mais relevantes que serão utilizados pelo sistema. Este trabalho propõe o desenvolvimento, aplicação e análise de uma Arquitetura Híbrida formada pela combinação da Teoria dos Rough Sets (Teoria dos Conjuntos Aproximados) com uma arquitetura de rede neural artificial denominada Mapas Auto-Organizáveis ou Self-Organizing Maps (SOM) para descoberta de conhecimento. O objetivo é verificar o desempenho da Arquitetura Híbrida proposta na geração de clusters (agrupamentos) em bases de dados. Em particular, alguns dos experimentos significativos foram feitos para apoiar a tomada de decisão em organizações. / Databases of the real world contain a huge amount of data within which several relations are hidden. These relations are difficult to discover by means of traditional methods such as worksheets and operational informative reports. Therefore, the knowledge discovery systems (KDD) appear as a possible solution to extract, from such relations, knowledge to be applied in decision taking. Even using a KDD system, such activity may still continue to be extremely difficult due to the huge amount of data to be processed. Thus, not all data which are part of this base will be useful for a system to discover knowledge. In general, data are usually previously processed before being presented to a knowledge discovery system in order to reduce their quantity and also to select the most relevant data to be used by the system. This research presents the development, application and analysis of an hybrid architecture formed by the combination of the Rough Sets Theory with an artificial neural net architecture named Self-Organizing Maps (SOM) to discover knowledge. The objective is to verify the performance of the hybrid architecture proposed in the generation of clusters in databases. In particular, some of the important performed experiments targeted the decision taking in organizations.
|
404 |
Integrando mineração de séries temporais e fractais para encontrar padrões e eventos extremos em bases de dados climáticas e de sensoriamento remoto / Integrating time series mining and fractals to discover patterns and extreme events in climate and remote sensing databasesRomani, Luciana Alvim Santos 13 December 2010 (has links)
Esta tese apresenta novos metodos baseados na teoria dos fractais e em tecnicas de mineração de dados para dar suporte ao monitoramento agrícola em escala regional, mais especicamente areas com plantações de cana-de-açucar que tem um papel importante na economia brasileira como uma alternativa viavel para a substituição de combustíveis fósseis. Uma vez que o clima tem um grande impacto na agricultura, os agrometeorologistas utilizam dados climáticos associados a índices agrometeorológicos e mais recentemente dados provenientes de satélites para apoiar a tomada de decisão. Neste sentido, foi proposto um método que utiliza a dimensão fractal para identicar mudanças de tendências nas séries climáticas juntamente com um módulo de análise estatística para definir quais atributos são responsáveis por essas alterações de comportamento. Além disso, foram propostos dois métodos de medidas de similaridade para auxiliar na comparação de diferentes regiões agrícolas representadas por múltiplas variáveis provenientes de dados meteorológicos e imagens de sensoriamento remoto. Diante da importância de se estudar os extremos climáticos que podem se intensicar dado os cenários que preveem mudanças globais no clima, foi proposto o algoritmo CLIPSMiner que identifica padrões relevantes e extremos em séries climáticas. CLIPSMiner também permite a identificação de correlação de múltiplas séries considerando defasagem de tempo e encontra padrões de acordo com parâmetros que podem ser calibrados pelos usuários. A busca por padrões de associação entre séries foi alcançada por meio de duas abordagens distintas. A primeira delas integrou o cálculo da correlação de dimensão fractal com uma técnica para tornar os valores contínuos das séries em intervalos discretos e um algoritmo de regras de associação gerando o método Apriori-FD. Embora tenha identificado padrões interessantes em relação a temperatura, este método não conseguiu lidar de forma apropriada com defasagem temporal. Foi proposto então o algoritmo CLEARMiner que de forma não-supervisionada minera padrões em uma série associando-os a padrões em outras séries considerando a possibilidade de defasagem temporal. Os métodos propostos foram comparados a técnicas similares e avaliados por um grupo composto por meteorologistas, agrometeorologistas e especialistas em sensoriamento remoto. Os experimentos realizados mostraram que a aplicação de técnicas de mineração de dados e fractais contribui para melhorar a análise dos dados agrometeorológicos e de satélite auxiliando no trabalho de pesquisadores, além de se configurar como uma ferramenta importante para apoiar a tomada de decisão no agronegócio / This thesis presents new methods based on fractal theory and data mining techniques to support agricultural monitoring in regional scale, specifically regions with sugar canefields. This commodity greatly contributes to the Brazilian economy since it is a viable alternative to replace fossil fuels. Since climate in uences the national agricultural production, researchers use climate data associated to agrometeorological indexes, and recently they also employed data from satellites to support decision making processes. In this context, we proposed a method that uses the fractal dimension to identify trend changes in climate series jointly with a statistical analysis module to define which attributes are responsible for the behavior alteration in the series. Moreover, we also proposed two methods of similarity measure to allow comparisons among different agricultural regions represented by multiples variables from meteorological data and remote sensing images. Given the importance of studying the extreme weather events, which could increase in intensity, duration and frequency according to different scenarios indicated by climate forecasting models, we proposed the CLIPSMiner algorithm to identify relevant patterns and extremes in climate series. CLIPSMiner also detects correlations among multiple time series considering time lag and finds patterns according to parameters, which can be calibrated by the users. We applied two distinct approaches in order to discover association patterns on time series. The first one is the Apriori-FD method that integrates an algorithm to perform attribute selection through applying the correlation fractal dimension, an algorithm of discretization to convert continuous values of series into discrete intervals, and a well-known association rules algorithm (Apriori). Although Apriori-FD has identified interesting patterns related to temperature, this method failed to appropriately deal with time lag. As a solution, we proposed CLEARMiner that is an unsupervised algorithm in order to mine the association patterns in one time series relating them to patterns in other series considering the possibility of time lag. The proposed methods were compared with similar techniques as well as assessed by a group of meteorologists, and specialists in agrometeorology and remote sensing. The experiments showed that applying data mining techniques and fractal theory can contribute to improve the analyses of agrometeorological and satellite data. These new techniques can aid researchers in their work on decision making and become important tools to support decision making in agribusiness
|
405 |
信息不对称下,“中药材全产业链服务商”模式对中药材价格的影响研究January 2019 (has links)
abstract: 中医药是中华文明的瑰宝,中药材是中医药文化和产业的核心。随着近年来国家相关政策出台,中药材产业的发展备受瞩目。由于中药材产业链条长,层级多,各层级间信息不对称,因而中药材市场普遍具有“假”、“乱”、“杂”的问题。
A公司的中药材全产业链服务商模式,通过对上游各主要专营商的整合,形成一定的平台综合集采能力,并开始得到下游医药厂家、药店认可,在市场逐步形成品牌号召力。本文实证研究A公司商业模式的转型对中药材市场价格的影响,进而分析中药材全产业链服务商模式在中药材行业健康发展中所发挥的积极作用。研究结果表明,上下游产销结合的中药材全产业链服务商模式,只有在形成一定收购规模,对市场价格产生一定影响的时候,才能充分释放药材质量的信号,润滑药材交易市场,提高收购价格,增加市场波动率,发挥价格发现作用。由于中药材市场的信息不对称程度较高,如果产销结合模式仍处于初级开创阶段,产销结合模式释放的药材质量信号则不足以全面改善信息不对称的状况。 / Dissertation/Thesis / Doctoral Dissertation Business Administration 2019
|
406 |
THE DEVELOPMENT OF NOVEL NON-PEPTIDE PROTEASOME INHIBITORS FOR THE TREATMENT OF SOLID TUMORSMiller, Zachary C. 01 January 2018 (has links)
The proteasome is a large protein complex which is responsible for the majority of protein degradation in eukaryotes. Following FDA approval of the first proteasome inhibitor bortezomib for the treatment of multiple myeloma (MM) in 2003, there has been an increasing awareness of the significant therapeutic potential of proteasome inhibitors in the treatment of cancer. As of 2017, three proteasome inhibitors are approved for the treatment of MM but in clinical trials with patients bearing solid tumors these existing proteasome inhibitors have demonstrated poor results. Notably, all three FDA-approved proteasome inhibitors rely on the combination a peptide backbone and reactive electrophilic warhead to target the proteasome, and all three primarily target the catalytic subunits conferring the proteasome’s chymotrypsin-like (CT-L) activity.
It is our hypothesis that compounds with non-peptidic structures, non-covalent and reversible modes of action, and unique selectivity profiles against the proteasome’s distinct catalytic subunits could have superior pharmacodynamic and pharmacokinetic properties and may bear improved activity against solid tumors relative to existing proteasome inhibitors. In an effort to discover such compounds we have employed an approach which combines computational drug screening methods with conventional screening and classic medicinal chemistry.
Our efforts began with a computational screen performed in the lab of Dr. Chang-Guo Zhan. This virtual screen narrowed a library of over 300,000 drug-like compounds down to under 300 virtual hits which were then screened for proteasome inhibitory activity in an in vitro assay. Despite screening a relatively small pool of compounds, we were able to identify 18 active compounds. The majority of these hits were non-peptide in structure and lacked any hallmarks of covalent inhibition. The further development of one compound, a tri-substituted pyrazole, provided us with a proteasome inhibitor which demonstrated cytotoxic activity in a variety of human solid cancer cell lines as well as significant anti-tumor activity in a prostate cancer mouse xenograft model. We have also evaluated the in vitro pharmacokinetic properties of our lead compound and investigated its ability to evade cross-resistance phenomena in proteasome inhibitor-resistant cell lines.
We believe that our lead compound as well as our drug discovery approach itself will be of interest and use to other researchers. We hope that this research effort may aid in the further development of reversible non-peptide proteasome inhibitors and may eventually deliver new therapeutic options for patients with difficult-to-treat solid tumors.
|
407 |
An INNOVATIVE USE of TECHNOLOGY and ASSOCIATIVE LEARNING to ASSESS PRONE MOTOR LEARNING and DESIGN INTERVENTIONS to ENHANCE MOTOR DEVELOPMENT in INFANTSTripathi, Tanya 01 January 2018 (has links)
Since the introduction of the American Academy of Pediatrics Back to Sleep Campaign infants have not met the recommendation to “incorporate supervised, awake “prone play” in their infant’s daily routine to support motor development and minimize the risk of plagiocephaly”. Interventions are needed to increase infants’ tolerance for prone position and prone playtime to reduce the risk of plagiocephaly and motor delays. Associative learning is the ability to understand causal relationship between events. Operant conditioning is a form of associative learning that occurs by associating a behavior with positive or negative consequences. Operant conditions has been utilized to encourage behaviors such as kicking, reaching and sucking in infants by associating these behaviors with positive reinforcement. This dissertation is a compilation of three papers that each represent a study used to investigate a potential play based interventions to encourage prone motor skills in infants. The first paper describes a series of experiment used to develop the Prone Play Activity Center (PPAC) and experimental protocols used in the other studies. The purpose of the second study was to determine the feasibility of a clinical trial comparing usual care (low tech) to a high-tech intervention based on the principles of operant conditioning to increase tolerance for prone and improve prone motor skills. Ten infants participated in the study where parents of infants in the high tech intervention group (n=5) used the PPAC for 3 weeks to practice prone play. Findings from this study suggested the proposed intervention is feasible with some modifications for a future large-scale clinical trial. The purpose of the third study evaluated the ability of 3-6 months old infants to demonstrate AL in prone and remember the association learned a day later. Findings from this study suggested that a majority of infants demonstrated AL in prone with poor retention of the association, 24 hours later. Taken together these 3 papers provide preliminary evidence that a clinical trial of an intervention is feasible and that associative learning could be used to reinforce specific prone motor behaviors in the majority of infants.
|
408 |
Comparative and integrative genomic approach toward disease gene identification: application to Bardet-Biedle SyndromeChiang, Annie Pei-Fen 01 January 2006 (has links)
The identification of disease genes (genes that when mutated cause human diseases) is an important and challenging problem. Proper diagnosis, prevention, as well as care for patients require an understanding of disease pathophysiology, which is best understood when the underlying causative gene(s) or genetic element(s) are identified. While the availability of the sequenced human genome helped to lead to the discovery of more than 1,900 disease genes, the rate of disease gene discovery is still occurring at a slow pace. The use of genetic linkage methods have successfully led to the identification of numerous disease genes. However, linkage studies are ultimately restricted by available meioses (clinical samples) which result in numerous candidate disease genes. This thesis addresses candidate gene prioritizations in disease gene discovery as applied toward a genetically heterogeneous disease known as Bardet-Biedl Syndrome (BBS). Specifically, the integration of various functional information and the development of a novel comparative genomic approach (Computational Orthologous Prioritization - COP) that led to the identification of BBS3 and BBS11. Functional data integration and application of the COP method may be helpful toward the identification of other disease genes.
|
409 |
Targeting dynamic enzymes for drug discovery effortsVance, Nicholas Robert 01 August 2018 (has links)
Proteins are dynamic molecules capable of performing complex biological functions necessary for life. The impact of protein dynamics in the development of medicines is often understated. Science is only now beginning to unravel the numerous consequences of protein flexibility on structure and function. This thesis will encompass two case studies in developing small molecule inhibitors targeting flexible enzymes, and provide a thorough evaluation of their inhibitory mechanisms of action.
The first case study focuses on caspases, a family of cysteine proteases responsible for executing the final steps of apoptosis. Consequently, they have been the subject of intense research due to the critical role they play in the pathogenesis of various cardiovascular and neurodegenerative diseases. A fragment-based screening campaign against human caspase-7 resulted in the identification of a novel series of allosteric inhibitors, which were characterized by numerous biophysical methods, including an X-ray co-crystal structure of an inhibitory fragment with caspase-7. The fragments described herein appear to have a significant impact on the substrate binding loop dynamics and the orientation of the catalytic Cys-His dyad, which appears to be the origin of their inhibition. This screening effort serves the dual purpose of laying the foundation for future medicinal chemistry efforts targeting caspase proteins, and for probing the allosteric regulation of this interesting class of hydrolases.
The second case study focuses on glutamate racemase, another dynamic enzyme responsible for the stereoinversion of glutamate, providing the essential function of D-glutamate production for the crosslinking of peptidoglycan in all bacteria. Herein, I present a series of covalent inhibitors of an antimicrobial drug target, glutamate racemase. The application of covalent inhibitors has experienced a renaissance within drug discovery programs in the last decade. To leverage the superior potency and drug target residence time of covalent inhibitors, there have been extensive efforts to develop highly specific covalent modifications to reduce off-target liabilities. A combination of enzyme kinetics, mass spectrometry, and surface-plasmon resonance experiments details a highly specific 1,4-conjugate addition of a small molecule inhibitor with the catalytic Cys74 of glutamate racemase. Molecular dynamics simulations and quantum mechanics-molecular mechanics geometry optimizations reveal, with unprecedented detail, the chemistry of the conjugate addition. Two compounds from this series of inhibitors display antimicrobial potency comparable to β-lactam antibiotics, with significant activity against methicillin-resistant S. aureus strains. This study elucidates a detailed chemical rationale for covalent inhibition and provides a platform for the development of antimicrobials with a novel mechanism of action.
|
410 |
Regularized methods for high-dimensional and bi-level variable selectionBreheny, Patrick John 01 July 2009 (has links)
Many traditional approaches cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP.
The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems.
In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. The Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals. I use these hypotheses to connect the previously disparate elds of multiple comparisons and penalized regression, develop estimators for the false discovery rates of methods such as the lasso and elastic net, and establish theoretical results.
Finally, the methods from all three sections are studied in a number of simulations and applied to real data from gene expression and genetic association studies.
|
Page generated in 0.0317 seconds