1 |
The graphical representation of structured multivariate dataCottee, Michaela J. January 1996 (has links)
During the past two decades or so, graphical representations have been used increasingly for the examination, summarisation and communication of statistical data. Many graphical techniques exist for exploratory data analysis (ie. for deciding which model it is appropriate to fit to the data) and a number of graphical diagnostic techniques exist for checking the appropriateness of a fitted model. However, very few techniques exist for the representation of the fitted model itself. This thesis is concerned with the development of some new and existing graphical representation techniques for the communication and interpretation of fitted statistical models. The first part of this thesis takes the form of a general overview of the use in statistics of graphical representations for exploratory data analysis and diagnostic model checking. In relation to the concern of this thesis, particular consideration is given to the few graphical techniques which already exist for the representation of fitted models. A number of novel two-dimensional approaches are then proposed which go partway towards providing a graphical representation of the main effects and interaction terms for fitted models. This leads on to a description of conditional independence graphs, and consideration of the suitability of conditional independence graphs as a technique for the representation of fitted models. Conditional independence graphs are then developed further in accordance with the research aims. Since it becomes apparent that it is not possible to use any of the approaches taken m order to develop a simple two-dimensional pen-and-paper technique for the unambiguous graphical representation of all fitted statistical models, an interactive computer package based on the conditional independence graph approach is developed for the construction, communication and interpretation of graphical representations for fitted statistical models. This package, called the "Conditional Independence Graph Enhancer" (CIGE), does provide unambiguous graphical representations for all fitted statistical models considered.
|
2 |
Statistical considerations in the design and analysis of cross-over trialsMorrey, Gilbert Heneage January 1991 (has links)
No description available.
|
3 |
Spatial information display subsystemWitiuk, Sidney Wayne January 1977 (has links)
As our society becomes more and more complex, planners, politicians and other decision-makers are increasing their demands for relevant, accurate and timely statistical information on a "tailor-made" spatial basis. Government agencies responsible for the provision of statistical data in map form have found that conventional approaches to thematic mapping are not able to cope with these demands and are therefore turning to automated and semi-automated production systems. This thesis deals with the design objectives for an interactive spatial information display subsystem for Census data and reports upon efforts to integrate existing packages into this framework. Consideration is also given to the factors influencing the choice between building a special-purpose subsystem or adopting existing packages with similar or complementary objectives. Finally, a few typical results are appended to demonstrate the current level of operational capacity. / Science, Faculty of / Computer Science, Department of / Graduate
|
4 |
Análise da fidedignidade dos dados estatísticos hospitalares disponíveis na Secretaria do Estado de São Paulo em 1974 / Analysis of the reliability of hospital statistical data available at the State Department of São Paulo in 1974Lebrao, Maria Lucia 14 February 1977 (has links)
O presente trabalho teve como objetivo, analisar a fidedignidade dos dados estatísticos hospitalares recebidos pela Secretaria de Estado da Saúde de são Paulo. Mediante uma amostra sistemática por conglomerado em etapa única, com base nas saídas de pacientes relacionadas nos Boletins 101 - \"Movimento de Pacientes Saídos\" - recebidos pela Coordenadoria de Assistência Hospitalar da referida Secretaria, foram visitados 31 hospitais de assistência hospitalar geral e examinados 7.021 prontuários de pacientes saídos durante um mês. Encontrou-se uma perda de prontuários bastante grande (19,64 por cento ), havendo hospitais onde essa perda chegou a mais de 40 por cento . Em relação aos dados de identificação (idade e procedência) as concordâncias foram altas (97,08 por cento e 96,93 por cento ), o mesmo acontecendo com os dados administrativos - data de admissão (98,47 por cento ) e tipo de saída (99,82 por cento ). Porém, com referência a data de saída, a discordância média foi maior (7,43 por cento ), com valores de até 65,64 por cento e 82,51 por cento de erro. Alguns fatores responsáveis por esses resultados são analisados no trabalho. Quanto aos diagnósticos, verificou-se que há um aumento da relação de diagnósticos por paciente na medida em que se examina a folha de alta e os prontuários, em relação ao Boletim 101 e que alguns diagnósticos tais como anemias, desnutrição e verminoses apareceram inúmeras vezes nos prontuários sem que tivessem sido anotados nas folhas de alta e, consequentemente, no Boletim 101. Em 1404 casos (17,51 por cento ) os diagnósticos registrados nos boletins não concordavam com os diagnósticos anotados nos prontuários, havendo 458 saídas (6,52 por cento ) onde o único diagnóstico registrado era um estado mórbido mal definido, tendo sido possível reduzir, mediante análise dos prontuários, esse número para 174 casos (2,48 por cento ). Em relação à transcrição correta da saída completa, apenas 69,21 por cento dos casos estava totalmente certa, isto é, tinha todos os itens transcritos corretamente. / The objective of this work was to analyse the accuracy of hospital statistical data collected by means of the \"Model 101 Report\" of the Health Department of São Paulo, Brazil. The \"Model 101 Report\" summarizes the data on discharged patient records and is sent monthly to the Health Department by each hospital in the state of São Paulo. A one stage systematic cluster sampling of these data was performed. Only general care hospitals entered the sample. Thirty one hospitals with 7,021 medical records of discharged patients in a given month were reviewed. The mean loss percentage of medical records at the sample hospitals was 19.64 per cent , reaching as high as 40 per cent or more at some of them. The identifying information and personal data relating to each patient were correct in 97.08 per cent of the cases (age data) and 96.93 per cent of the cases (area of residence). The administrative information was correct in 98.47 per cent of the cases (date of admission) and 99.82 per cent of the cases (discharge status). Only the items of death or routine live discharge were checked. The date of discharge was in error in 7.43 per cent of the cases, as a mean, reaching as high as 65.64 per cent and 82.51 per cent in two of the hospitals. Some factors potentially responsible for these results were analised. There was a increase in the actual number of diagnoses relative to each patient after the medical records were checked. Also diagnoses such as: anemia, malnutrition and parasitosis were present many times in the medical records, without being transcribed to the discharge summary and therefore to the \"Model 101 Report\". In 1.404 cases (17.51 per cent ) the diagnoses on the \"Model 101 Report\" were erroneous. There were 458 (6.52 per cent ) discharges with an ill-defined condition as the unique diagnoses. It was possible to reduce this figure to 174 cases (2.48 per cent ) by checking the medical records. Only 69.21 per cent of the forms were considered entirely correct, i.e., with all the items correctly and completely transcribed.
|
5 |
Análise da fidedignidade dos dados estatísticos hospitalares disponíveis na Secretaria do Estado de São Paulo em 1974 / Analysis of the reliability of hospital statistical data available at the State Department of São Paulo in 1974Maria Lucia Lebrao 14 February 1977 (has links)
O presente trabalho teve como objetivo, analisar a fidedignidade dos dados estatísticos hospitalares recebidos pela Secretaria de Estado da Saúde de são Paulo. Mediante uma amostra sistemática por conglomerado em etapa única, com base nas saídas de pacientes relacionadas nos Boletins 101 - \"Movimento de Pacientes Saídos\" - recebidos pela Coordenadoria de Assistência Hospitalar da referida Secretaria, foram visitados 31 hospitais de assistência hospitalar geral e examinados 7.021 prontuários de pacientes saídos durante um mês. Encontrou-se uma perda de prontuários bastante grande (19,64 por cento ), havendo hospitais onde essa perda chegou a mais de 40 por cento . Em relação aos dados de identificação (idade e procedência) as concordâncias foram altas (97,08 por cento e 96,93 por cento ), o mesmo acontecendo com os dados administrativos - data de admissão (98,47 por cento ) e tipo de saída (99,82 por cento ). Porém, com referência a data de saída, a discordância média foi maior (7,43 por cento ), com valores de até 65,64 por cento e 82,51 por cento de erro. Alguns fatores responsáveis por esses resultados são analisados no trabalho. Quanto aos diagnósticos, verificou-se que há um aumento da relação de diagnósticos por paciente na medida em que se examina a folha de alta e os prontuários, em relação ao Boletim 101 e que alguns diagnósticos tais como anemias, desnutrição e verminoses apareceram inúmeras vezes nos prontuários sem que tivessem sido anotados nas folhas de alta e, consequentemente, no Boletim 101. Em 1404 casos (17,51 por cento ) os diagnósticos registrados nos boletins não concordavam com os diagnósticos anotados nos prontuários, havendo 458 saídas (6,52 por cento ) onde o único diagnóstico registrado era um estado mórbido mal definido, tendo sido possível reduzir, mediante análise dos prontuários, esse número para 174 casos (2,48 por cento ). Em relação à transcrição correta da saída completa, apenas 69,21 por cento dos casos estava totalmente certa, isto é, tinha todos os itens transcritos corretamente. / The objective of this work was to analyse the accuracy of hospital statistical data collected by means of the \"Model 101 Report\" of the Health Department of São Paulo, Brazil. The \"Model 101 Report\" summarizes the data on discharged patient records and is sent monthly to the Health Department by each hospital in the state of São Paulo. A one stage systematic cluster sampling of these data was performed. Only general care hospitals entered the sample. Thirty one hospitals with 7,021 medical records of discharged patients in a given month were reviewed. The mean loss percentage of medical records at the sample hospitals was 19.64 per cent , reaching as high as 40 per cent or more at some of them. The identifying information and personal data relating to each patient were correct in 97.08 per cent of the cases (age data) and 96.93 per cent of the cases (area of residence). The administrative information was correct in 98.47 per cent of the cases (date of admission) and 99.82 per cent of the cases (discharge status). Only the items of death or routine live discharge were checked. The date of discharge was in error in 7.43 per cent of the cases, as a mean, reaching as high as 65.64 per cent and 82.51 per cent in two of the hospitals. Some factors potentially responsible for these results were analised. There was a increase in the actual number of diagnoses relative to each patient after the medical records were checked. Also diagnoses such as: anemia, malnutrition and parasitosis were present many times in the medical records, without being transcribed to the discharge summary and therefore to the \"Model 101 Report\". In 1.404 cases (17.51 per cent ) the diagnoses on the \"Model 101 Report\" were erroneous. There were 458 (6.52 per cent ) discharges with an ill-defined condition as the unique diagnoses. It was possible to reduce this figure to 174 cases (2.48 per cent ) by checking the medical records. Only 69.21 per cent of the forms were considered entirely correct, i.e., with all the items correctly and completely transcribed.
|
6 |
Lossless statistical data service over Asynchronous Transfer Mode.Van Luinen, Steven M. January 1999 (has links)
Asynchronous Transfer Mode (ATM) can provide deterministic channels as required for real time signals, as well as statistical multiplexing. For this reason, ATM has been chosen as the underlying technology for providing a Broadband Integrated Services Digital Network (B-ISDN). Two main classes of services are expected to be supported over a B-ISDN. These classes are real-time services and data services. Data services include computer communications (Local Area Network (LAN) interconnections) and general non-real time traffic, such as file transfer and small transactions.The provision of data services over ATM are better served with statistical multiplexing, provided that the service is loss-free. For multiplexing to be loss-free and still statistical, while the maximum service rate is fixed, the multiplexer tributaries must be controlled in flow, to assure no overflow of the multiplexing buffer. Provision of a service over ATM is accomplished by an ATM layer. Transfer Capability (ATC).This thesis investigates and reports on the operating characteristics of an ATM layer Transfer Capability proposed to the International Telecommunications Union (ITU), and called Controlled Cell Transfer (CCT). CCT uses credit window based flow control on links and a quota based control in switches, and will give loss free statistical multiplexing for data. Other ITU defined ATCs are examined in regard to data service provision and compared with CCT. It is found that only CCT can provide a fast and at the same time efficient data service.The thesis also examines the impact that support of the CCT capability would have on an ATM switch, through determination of required functionality, and mapping of the required functions into a switch design. Finally, an architecture and implementation of an ATM switch is described that would support the CCT as well as the Deterministic Bit Rate (DBR) ++ / transfer capability, and would provide efficient data and real-time services.
|
7 |
Robust estimation of inter-chip variability to improve microarray sample size calculationsKnowlton, Nicholas Scott. January 2005 (has links) (PDF)
Thesis--University of Oklahoma. / Bibliography: leaves 82-83.
|
8 |
Oblique decision trees in transformed spaces.Wickramarachchi, Darshana Chitraka January 2015 (has links)
Decision trees (DTs) play a vital role in statistical modelling. Simplicity and interpretability of the solution structure have made the method popular in a wide range of disciplines. In data classification problems, DTs recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class. Axis parallel splits, the simplest form of splits, partition the feature space parallel to feature axes. However, for some problem domains DTs with axis parallel splits can produce complicated boundary structures. As an alternative, oblique splits are used to partition the feature space potentially simplifying the boundary structure. Various approaches have been explored to find optimal oblique splits. One approach is based on optimisation techniques. This is considered the benchmark approach, however, its major limitation is that the tree induction algorithm is computationally expensive. On the other hand, split finding approaches based on heuristic arguments have gained popularity and have made improvements on benchmark methods. This thesis proposes a methodology to induce oblique decision trees in transformed spaces based on a heuristic argument.
As the first goal of the thesis, a new oblique decision tree algorithm, called HHCART (\underline{H}ouse\underline{H}older \underline{C}lassification and \underline{R}egression \underline{T}ree) is proposed. The proposed algorithm utilises a series of Householder matrices to reflect the training data at each non-terminal node during the tree construction. Householder matrices are constructed using the eigenvectors from each classes' covariance matrix. Axis parallel splits in the reflected (or transformed) spaces provide an efficient way of finding oblique splits in the original space. Experimental results show that the accuracy and size of the HHCART trees are comparable with some benchmark methods in the literature. The appealing features of HHCART is that it can handle both qualitative and quantitative features in the same oblique split, conceptually simple and computationally efficient.
Data mining applications often come with massive example sets and inducing oblique DTs for such example sets often consumes considerable time. HHCART is a serial computing memory resident algorithm which may be ineffective when handling massive example sets. As the second goal of the thesis parallel computing and disk resident versions of the HHCART algorithm are presented so that HHCART can be used irrespective of the size of the problem.
HHCART is a flexible algorithm and the eigenvectors defining Householder matrices can be replaced by other vectors deemed effective in oblique split finding. The third endeavour of this thesis explores this aspect of HHCART. HHCART can be used with other vectors in order to improve classification results. For example, a normal vector of the angular bisector, introduced in the Geometric Decision Tree (GDT) algorithm, is used to construct the Householder reflection matrix. The proposed method produces better results than GDT for some problem domains. In the second case, \textit{Class Representative Vectors} are introduced and used to construct Householder reflection matrices. The results of this experiment show that these oblique trees produce classification results competitive with those achieved with some benchmark decision trees.
DTs are constructed using two approaches, namely: top-down and bottom-up. HHCART is a top-down tree, which is the most common approach. As the fourth idea of the thesis, the concept of HHCART is used to induce a new DT, HHBUT, using the bottom-up approach. The bottom-up approach performs cluster analysis prior to the tree building to identify the terminal nodes. The use of the Bayesian Information Criterion (BIC) to determine the number of clusters leads to accurate and compact trees when compared with Cross Validation (CV) based bottom-up trees. We suggest that HHBUT is a good alternative to the existing bottom-up tree especially when the number of examples is much higher than the number of features.
|
9 |
A systems approach to computational protein identificationRamakrishnan, Smriti Rajan 21 October 2010 (has links)
Proteomics is the science of understanding the dynamic protein content of an organism's cells (its proteome), which is one of the largest current challenges in biology. Computational proteomics is an active research area that involves in-silico methods for the analysis of high-throughput protein identification data. Current methods are based on a technology called tandem mass spectrometry (MS/MS) and suffer from low coverage and accuracy, reliably identifying only 20-40% of the
proteome. This dissertation addresses recall, precision, speed and scalability of computational proteomics experiments.
This research goes beyond the traditional paradigm of analyzing MS/MS experiments in isolation, instead learning priors of protein presence from the joint analysis of various systems biology data sources. This integrative `systems' approach to protein identification is very effective, as demonstrated by two new methods. The first, MSNet, introduces a social model for protein identification and leverages functional dependencies from genome-scale, probabilistic, gene functional networks. The second, MSPresso, learns a gene expression prior from a joint analysis of mRNA and proteomics experiments on similar samples.
These two sources of prior information result in more accurate estimates of protein presence, and increase protein recall by as much as 30% in complex samples, while also increasing precision. A comprehensive suite of benchmarking datasets is
introduced for evaluation in yeast. Methods to assess statistical significance in the absence of ground truth are also introduced and employed whenever applicable.
This dissertation also describes a database indexing solution to improve speed and scalability of protein identification experiments. The method, MSFound, customizes a metric-space database index and its associated approximate k-nearest-neighbor search algorithm with a semi-metric distance designed to match noisy spectra. MSFound achieves an order of magnitude speedup over traditional spectra database searches while maintaining scalability. / text
|
10 |
Détection, localisation et étude des propriétés spectrales de sursauts gamma observés à haute énergie avec l'expérience Fermi. / Detection, localization and spectral analyses of gamma-ray bursts observed at high energies with the Fermi space telescope.Pelassa, Véronique 13 December 2010 (has links)
Les sursauts gamma sont des sources astrophysiques parmi les plus brillantes du ciel. Dans le modèle standard actuel, leur émission prompte (X et gamma) est due à des particules chargées accélérées au sein de jets relativistes émis à la formation de trous noirs de masses stellaire. L'émission rémanente observée de la radio aux X serait due à l'interaction de ces jets avec le milieu interstellaire. Le LAT, détecteur à création de paire du télescope spatial Fermi, permet depuis juin 2008 l'étude du ciel gamma de 20 MeV à plus de 300 GeV avec des performances inégalées. Le GBM, détecteur de sources transitoires de Fermi (8 keV à 40 MeV) a observé ~450 sursauts gamma, dont ~18 ont été observés jusqu'au domaine du GeV. Une localisation précise de ces sursauts et la synergie de Fermi avec les autres observatoires permettent l'étude des rémanences associées et une meilleure interprétation des observations. L'étude de sursauts gamma de 8 keV au domaine du GeV est présentée. Les localisations obtenues avec le LAT sont étudiées ainsi que leurs erreurs. Des analyses spectrales des émissions promptes combinant les données du GBM et du LAT sont exposées, ainsi que leur interprétation. Une analyse alternative basée sur une sélection relâchée des données LAT est présentée et caractérisée. L'utilisation des événements d'énergies inférieures à 100 MeV améliore l'analyse temporelle et spectrale des émissions promptes. La recherche d'émission gamma prolongée est présentée, ainsi que l'étude de l'émission rémanente de GRB 090510 observé des UV au GeV par Fermi et Swift. Enfin, un modèle d'émission prompte par les chocs internes, développé à l'IAP, est comparé aux observations de Fermi. / Gamma-Ray Bursts (GRB) are among the brightest gamma-ray sources in the sky. The current standard framework associates their prompt gamma-ray emission to charged particles accelerated in relativistic jets issued by newly-formed stellar-mass black holes. The radio to X-ray afterglow emission is due to the interaction between these jets and the interstellar medium.The LAT, pair-creation instrument onboard Fermi gamma-ray space telescope, performs unprecedented observation of the gamma-ray sky at energies of 20 MeV to over 300 GeV since its launch in june 2008. Fermi's transient sources detector, GBM, observed prompt emissions of ~450 GRB between 8 keV and 40 MeV. ~18 of these GRB were also studied up to GeV energies with the LAT. Accurate GRB localizations and Fermi's synergy with other observatories allows the study of GRB afterglows, and therefore a better interpretation of these observations.The analyses of GRB emissions between 8 keV to GeV energies is presented here. Localizations based on LAT data and their biases are studied. Spectral analyses of combined GBM and LAT data are shown, and their theoretical interpretations explained.An alternative analysis based on a relaxed selection of LAT data is presented and fully characterized. It allows to recover and use low-energy LAT statistics in temporal and spectral analyses of GRB prompt emission.Searches for long-lived high-energy emission from GRB are presented. The analysis of GRB 090510 afterglow emission from eV to GeV energies is described.Finally, Fermi bright GRB prompt emissions are compared to an internal shock model developed at IAP.
|
Page generated in 0.0881 seconds