Global ETD Search

11	Finding, extracting and exploiting structure in text and hypertext / Att finna, extrahera och utnyttja strukturer i text och hypertext Ågren, Ola January 2009 (has links) Data mining is a fast-developing field of study, using computations to either predict or describe large amounts of data. The increase in data produced each year goes hand in hand with this, requiring algorithms that are more and more efficient in order to find interesting information within a given time. In this thesis, we study methods for extracting information from semi-structured data, for finding structure within large sets of discrete data, and to efficiently rank web pages in a topic-sensitive way. The information extraction research focuses on support for keeping both documentation and source code up to date at the same time. Our approach to this problem is to embed parts of the documentation within strategic comments of the source code and then extracting them by using a specific tool. The structures that our structure mining algorithms are able to find among crisp data (such as keywords) is in the form of subsumptions, i.e. one keyword is a more general form of the other. We can use these subsumptions to build larger structures in the form of hierarchies or lattices, since subsumptions are transitive. Our tool has been used mainly as input to data mining systems and for visualisation of data-sets. The main part of the research has been on ranking web pages in a such a way that both the link structure between pages and also the content of each page matters. We have created a number of algorithms and compared them to other algorithms in use today. Our focus in these comparisons have been on convergence rate, algorithm stability and how relevant the answer sets from the algorithms are according to real-world users. The research has focused on the development of efficient algorithms for gathering and handling large data-sets of discrete and textual data. A proposed system of tools is described, all operating on a common database containing "fingerprints" and meta-data about items. This data could be searched by various algorithms to increase its usefulness or to find the real data more efficiently. All of the methods described handle data in a crisp manner, i.e. a word or a hyper-link either is or is not a part of a record or web page. This means that we can model their existence in a very efficient way. The methods and algorithms that we describe all make use of this fact. / Informationsutvinning (som ofta kallas data mining även på svenska) är ett forskningsområde som hela tiden utvecklas. Det handlar om att använda datorer för att hitta mönster i stora mängder data, alternativt förutsäga framtida data utifrån redan tillgänglig data. Eftersom det samtidigt produceras mer och mer data varje år ställer detta högre och högre krav på effektiviteten hos de algoritmer som används för att hitta eller använda informationen inom rimlig tid. Denna avhandling handlar om att extrahera information från semi-strukturerad data, att hitta strukturer i stora diskreta datamängder och att på ett effektivt sätt rangordna webbsidor utifrån ett ämnesbaserat perspektiv. Den informationsextraktion som beskrivs handlar om stöd för att hålla både dokumentationen och källkoden uppdaterad samtidigt. Vår lösning på detta problem är att låta delar av dokumentationen (främst algoritmbeskrivningen) ligga som blockkommentarer i källkoden och extrahera dessa automatiskt med ett verktyg. De strukturer som hittas av våra algoritmer för strukturextraktion är i form av underordnanden, exempelvis att ett visst nyckelord är mer generellt än ett annat. Dessa samband kan utnyttjas för att skapa större strukturer i form av hierarkier eller riktade grafer, eftersom underordnandena är transitiva. Det verktyg som vi har tagit fram har främst använts för att skapa indata till ett informationsutvinningssystem samt för att kunna visualisera indatan. Huvuddelen av den forskning som beskrivs i denna avhandling har dock handlat om att kunna rangordna webbsidor utifrån både deras innehåll och länkarna som finns mellan dem. Vi har skapat ett antal algoritmer och visat hur de beter sig i jämförelse med andra algoritmer som används idag. Dessa jämförelser har huvudsakligen handlat om konvergenshastighet, algoritmernas stabilitet givet osäker data och slutligen hur relevant algoritmernas svarsmängder har ansetts vara utifrån användarnas perspektiv. Forskningen har varit inriktad på effektiva algoritmer för att hämta in och hantera stora datamängder med diskreta eller textbaserade data. I avhandlingen presenterar vi även ett förslag till ett system av verktyg som arbetar tillsammans på en databas bestående av “fingeravtryck” och annan meta-data om de saker som indexerats i databasen. Denna data kan sedan användas av diverse algoritmer för att utöka värdet hos det som finns i databasen eller för att effektivt kunna hitta rätt information. / AlgExt, CHiC, ProT Automatic propagation CHiC Data mining Discrete data Extraction Hierarchies ProT Rank distribution S²ProT Spatial linking Web mining Web searching Computer science Datalogi
12	Adaptation of dosing regimen of chemotherapies based on pharmacodynamic models Paule, Inès 29 September 2011 (has links) (PDF) There is high variability in response to cancer chemotherapies among patients. Its sources are diverse: genetic, physiologic, comorbidities, concomitant medications, environment, compliance, etc. As the therapeutic window of anticancer drugs is usually narrow, such variability may have serious consequences: severe (even life-threatening) toxicities or lack of therapeutic effect. Therefore, various approaches to individually tailor treatments and dosing regimens have been developed: a priori (based on genetic information, body size, drug elimination functions, etc.) and a posteriori (that is using information of measurements of drug exposure and/or effects). Mixed-effects modelling of pharmacokinetics and pharmacodynamics (PK-PD), combined with Bayesian maximum a posteriori probability estimation of individual effects, is the method of choice for a posteriori adjustments of dosing regimens. In this thesis, a novel approach to adjust the doses on the basis of predictions, given by a model for ordered categorical observations of toxicity, was developed and investigated by computer simulations. More technical aspects concerning the estimation of individual parameters were analysed to determine the factors of good performance of the method. These works were based on the example of capecitabine-induced hand-and-foot syndrome in the treatment of colorectal cancer. Moreover, a review of pharmacodynamic models for discrete data (categorical, count, time-to-event) was performed. Finally, PK-PD analyses of hydroxyurea in the treatment of sickle cell anemia were performed and used to compare different dosing regimens and determine the optimal measures for monitoring the treatment Dosing individualization Discrete data models Empirical Bayes estimates Capecitabine Hand-foot syndrome Hydroxyurea Sickle cell anemia
13	Some problems in the theory & application of graphical models Roddam, Andrew Wilfred January 1999 (has links) A graphical model is simply a representation of the results of an analysis of relationships between sets of variables. It can include the study of the dependence of one variable, or a set of variables on another variable or sets of variables, and can be extended to include variables which could be considered as intermediate to the others. This leads to the concept of representing these chains of relationships by means of a graph; where variables are represented by vertices, and relationships between the variables are represented by edges. These edges can be either directed or undirected, depending upon the type of relationship being represented. The thesis investigates a number of outstanding problems in the area of statistical modelling, with particular emphasis on representing the results in terms of a graph. The thesis will study models for multivariate discrete data and in the case of binary responses, some theoretical results are given on the relationship between two common models. In the more general setting of multivariate discrete responses, a general class of models is studied and an approximation to the maximum likelihood estimates in these models is proposed. This thesis also addresses the problem of measurement errors. An investigation into the effect that measurement error has on sample size calculations is given with respect to a general measurement error specification in both linear and binary regression models. Finally, the thesis presents, in terms of a graphical model, a re-analysis of a set of childhood growth data, collected in South Wales during the 1970s. Within this analysis, a new technique is proposed that allows the calculation of derived variables under the assumption that the joint relationships between the variables are constant at each of the time points. 519.5
14	Užití modelů diskrétních dat / Application of count data models Reichmanová, Barbora January 2018 (has links) Při analýze dat růstu rostlin v řádku dané délky bychom měli uvažovat jak pravděpodobnost, že semínko zdárně vyroste, tak i náhodný počet semínek, které byly zasety. Proto se v celé práci věnujeme analýze náhodných sum, kde počet nezávisle stejně rozdělených sčítanců je na nich nezávislé náhodné číslo. První část práce věnuje pozornost teoretickému základu, definuje pojem náhodná suma a uvádí vlastnosti, jako jsou číslené míry polohy nebo funkční charakteristiky popisující dané rozdělení. Následně je diskutována metoda odhadu parametrů pomocí maximální věrohodnosti a zobecněné lineární modely. Metoda kvazi-věrohodnosti je též krátce zmíněna. Tato část je ilustrována příklady souvisejícími s výchozím problémem. Poslední kapitola se věnuje aplikaci na reálných datech a následné analýze.
15	Comparações de populações discretas / Comparison of discrete populations Watanabe, Alexandre Hiroshi 19 April 2013 (has links) Um dos principais problemas em testes de hipóteses para a homogeneidade de curvas de sobrevivência ocorre quando as taxas de falha (ou funções de intensidade) não são proporcionais. Apesar do teste de Log-rank ser o teste não paramétrico mais utilizado para se comparar duas ou mais populações sujeitas a dados censurados, este teste apresentada duas restrições. Primeiro, toda a teoria assintótica envolvida com o teste de Log-rank, tem como hipótese o fato das populações envolvidas terem distribuições contínuas ou no máximo mistas. Segundo, o teste de Log-rank não apresenta bom comportamento quando as funções intensidade cruzam. O ponto inicial para análise consiste em assumir que os dados são contínuos e neste caso processos Gaussianos apropriados podem ser utilizados para testar a hipótese de homogeneidade. Aqui, citamos o teste de Renyi e Cramér-von Mises para dados contínuos (CCVM), ver Klein e Moeschberger (1997) [15]. Apesar destes testes não paramétricos apresentar bons resultados para dados contínuos, esses podem ter problemas para dados discretos ou arredondados. Neste trabalho, fazemos um estudo simulação da estatística de Cramér von-Mises (CVM) proposto por Leão e Ohashi [16], que nos permite detectar taxas de falha não proporcionais (cruzamento das taxas de falha) sujeitas a censuras arbitrárias para dados discretos ou arredondados. Propomos também, uma modificação no teste de Log-rank clássico para dados dispostos em uma tabela de contingência. Ao aplicarmos as estatísticas propostas neste trabalho para dados discretos ou arredondados, o teste desenvolvido apresenta uma função poder melhor do que os testes usuais / One of the main problems in hypothesis testing for homogeneity of survival curves occurs when the failure rate (or intensity functions) are nonproportional. Although the Log-rank test is a nonparametric test most commonly used to compare two or more populations subject to censored data, this test presented two constraints. First, all the asymptotic theory involved with the Log-rank test, is the hypothesis that individuals and populations involved have continuous distributions or at best mixed. Second, the log-rank test does not show well when the intensity functions intersect. The starting point for the analysis is to assume that the data is continuous and in this case suitable Gaussian processes may be used to test the assumption of homogeneity. Here, we cite the Renyi test and Cramér-von Mises for continuous data (CCVM), and Moeschberger see Klein (1997) [15]. Despite these non-parametric tests show good results for continuous data, these may have trouble discrete data or rounded. In this work, we perform a simulation study of statistic Cramér-von Mises (CVM) proposed by Leão and Ohashi [16], which allows us to detect failure rates are nonproportional (crossing of failure rates) subject to censure for arbitrary data discrete or rounded. We also propose a modification of the test log-rank classic data arranged in a contingency table. By applying the statistics proposed in this paper for discrete or rounded data, developed the test shows a power function better than the usual testing Censorship Censura Cramér-von Mises Cramér-von Mises Crossing hazard Cruzamento de taxas de falha Dados discretos Discrete data Long-rank modificado Long-rank ponderado Modified Long-rank Monte Carlos simulation Simulação Monte Carlo Weighted Long-rank
16	Pharmacometric Methods and Novel Models for Discrete Data Plan, Elodie L January 2011 (has links) Pharmacodynamic processes and disease progression are increasingly characterized with pharmacometric models. However, modelling options for discrete-type responses remain limited, although these response variables are commonly encountered clinical endpoints. Types of data defined as discrete data are generally ordinal, e.g. symptom severity, count, i.e. event frequency, and time-to-event, i.e. event occurrence. Underlying assumptions accompanying discrete data models need investigation and possibly adaptations in order to expand their use. Moreover, because these models are highly non-linear, estimation with linearization-based maximum likelihood methods may be biased. The aim of this thesis was to explore pharmacometric methods and novel models for discrete data through (i) the investigation of benefits of treating discrete data with different modelling approaches, (ii) evaluations of the performance of several estimation methods for discrete models, and (iii) the development of novel models for the handling of complex discrete data recorded during (pre-)clinical studies. A simulation study indicated that approaches such as a truncated Poisson model and a logit-transformed continuous model were adequate for treating ordinal data ranked on a 0-10 scale. Features that handled serial correlation and underdispersion were developed for the models to subsequently fit real pain scores. The performance of nine estimation methods was studied for dose-response continuous models. Other types of serially correlated count models were studied for the analysis of overdispersed data represented by the number of epilepsy seizures per day. For these types of models, the commonly used Laplace estimation method presented a bias, whereas the adaptive Gaussian quadrature method did not. Count models were also compared to repeated time-to-event models when the exact time of gastroesophageal symptom occurrence was known. Two new model structures handling repeated time-to-categorical events, i.e. events with an ordinal severity aspect, were introduced. Laplace and two expectation-maximisation estimation methods were found to be performing well for frequent repeated time-to-event models. In conclusion, this thesis presents approaches, estimation methods, and diagnostics adapted for treating discrete data. Novel models and diagnostics were developed when lacking and applied to biological observations. Pharmacometrics pharmacodynamics disease progression modelling discrete data count ordered categorical repeated time-to-event RTTCE RCEpT NONMEM FOCE LAPLACE SAEM AGQ pain scores epilepsy seizures gastroesophageal symptoms statistical power simulations diagnostics PHARMACY FARMACI
17	Comparações de populações discretas / Comparison of discrete populations Alexandre Hiroshi Watanabe 19 April 2013 (has links) Um dos principais problemas em testes de hipóteses para a homogeneidade de curvas de sobrevivência ocorre quando as taxas de falha (ou funções de intensidade) não são proporcionais. Apesar do teste de Log-rank ser o teste não paramétrico mais utilizado para se comparar duas ou mais populações sujeitas a dados censurados, este teste apresentada duas restrições. Primeiro, toda a teoria assintótica envolvida com o teste de Log-rank, tem como hipótese o fato das populações envolvidas terem distribuições contínuas ou no máximo mistas. Segundo, o teste de Log-rank não apresenta bom comportamento quando as funções intensidade cruzam. O ponto inicial para análise consiste em assumir que os dados são contínuos e neste caso processos Gaussianos apropriados podem ser utilizados para testar a hipótese de homogeneidade. Aqui, citamos o teste de Renyi e Cramér-von Mises para dados contínuos (CCVM), ver Klein e Moeschberger (1997) [15]. Apesar destes testes não paramétricos apresentar bons resultados para dados contínuos, esses podem ter problemas para dados discretos ou arredondados. Neste trabalho, fazemos um estudo simulação da estatística de Cramér von-Mises (CVM) proposto por Leão e Ohashi [16], que nos permite detectar taxas de falha não proporcionais (cruzamento das taxas de falha) sujeitas a censuras arbitrárias para dados discretos ou arredondados. Propomos também, uma modificação no teste de Log-rank clássico para dados dispostos em uma tabela de contingência. Ao aplicarmos as estatísticas propostas neste trabalho para dados discretos ou arredondados, o teste desenvolvido apresenta uma função poder melhor do que os testes usuais / One of the main problems in hypothesis testing for homogeneity of survival curves occurs when the failure rate (or intensity functions) are nonproportional. Although the Log-rank test is a nonparametric test most commonly used to compare two or more populations subject to censored data, this test presented two constraints. First, all the asymptotic theory involved with the Log-rank test, is the hypothesis that individuals and populations involved have continuous distributions or at best mixed. Second, the log-rank test does not show well when the intensity functions intersect. The starting point for the analysis is to assume that the data is continuous and in this case suitable Gaussian processes may be used to test the assumption of homogeneity. Here, we cite the Renyi test and Cramér-von Mises for continuous data (CCVM), and Moeschberger see Klein (1997) [15]. Despite these non-parametric tests show good results for continuous data, these may have trouble discrete data or rounded. In this work, we perform a simulation study of statistic Cramér-von Mises (CVM) proposed by Leão and Ohashi [16], which allows us to detect failure rates are nonproportional (crossing of failure rates) subject to censure for arbitrary data discrete or rounded. We also propose a modification of the test log-rank classic data arranged in a contingency table. By applying the statistics proposed in this paper for discrete or rounded data, developed the test shows a power function better than the usual testing Censura Cramér-von Mises Cruzamento de taxas de falha Dados discretos Long-rank modificado Long-rank ponderado Simulação Monte Carlo Censorship Cramér-von Mises Crossing hazard Discrete data Modified Long-rank Monte Carlos simulation Weighted Long-rank
18	Adaptation of dosing regimen of chemotherapies based on pharmacodynamic models / Adaptation de posologie de chimiothérapies basée sur des modèles pharmacodynamiques Paule, Inès 29 September 2011 (has links) Il existe une grande variabilité dans la réponse aux chimiothérapies anticancéreuses. Ses sources sont diverses: génétiques, physiologiques, comorbidités, médicaments associés, etc. La marge thérapeutique de ces médicaments étant généralement étroite, une telle variabilité peut avoir de graves conséquences: toxicités graves ou absence d'effet thérapeutique. Plusieurs approches pour adapter individuellement les posologies ont été proposées: a priori (basées sur l'information génétique, la taille corporelle, les fonctions d'élimination, etc.) et a posteriori (sur les informations de mesures d'exposition au médicament et/ou effets). La modélisation à effets-mixtes de la pharmacocinétique et de la pharmacodynamie (PK-PD), combinée avec une estimation bayésienne des effets individuels, est la meilleure méthode pour individualiser des schémas posologiques a posteriori. Dans cette thèse, une nouvelle approche pour ajuster les doses sur la base des prédictions données par un modèle pour les observations catégorielles de toxicité a été développée et explorée par simulation. Les aspects plus techniques concernant l'estimation des paramètres individuels ont été analysés pour déterminer les facteurs de bonne performance de la méthode. Ces travaux étaient basés sur l'exemple du syndrome mains-pieds induit par la capécitabine dans le traitement du cancer colorectal. Une revue des modèles pharmacodynamiques de données discrètes (catégorielles, de comptage, de survie) a été effectuée. Enfin, des analyses PK-PD de l'hydroxyurée dans le traitement de la drépanocytose ont été réalisées pour comparer des différentes posologies et déterminer les modalités optimales de suivi du traitement / There is high variability in response to cancer chemotherapies among patients. Its sources are diverse: genetic, physiologic, comorbidities, concomitant medications, environment, compliance, etc. As the therapeutic window of anticancer drugs is usually narrow, such variability may have serious consequences: severe (even life-threatening) toxicities or lack of therapeutic effect. Therefore, various approaches to individually tailor treatments and dosing regimens have been developed: a priori (based on genetic information, body size, drug elimination functions, etc.) and a posteriori (that is using information of measurements of drug exposure and/or effects). Mixed-effects modelling of pharmacokinetics and pharmacodynamics (PK-PD), combined with Bayesian maximum a posteriori probability estimation of individual effects, is the method of choice for a posteriori adjustments of dosing regimens. In this thesis, a novel approach to adjust the doses on the basis of predictions, given by a model for ordered categorical observations of toxicity, was developed and investigated by computer simulations. More technical aspects concerning the estimation of individual parameters were analysed to determine the factors of good performance of the method. These works were based on the example of capecitabine-induced hand-and-foot syndrome in the treatment of colorectal cancer. Moreover, a review of pharmacodynamic models for discrete data (categorical, count, time-to-event) was performed. Finally, PK-PD analyses of hydroxyurea in the treatment of sickle cell anemia were performed and used to compare different dosing regimens and determine the optimal measures for monitoring the treatment Individualisation de posologie Modèles pour données discrètes Estimation empirique bayésienne Capécitabine Syndrome mains-pieds Hydroxyurée Drépanocytose Dosing individualization Discrete data models Empirical Bayes estimates Capecitabine Hand-foot syndrome Hydroxyurea Sickle cell anemia 615.58

Search results