Spelling suggestions: "subject:"cample size"" "subject:"5ample size""
121 |
The Analysis of Big Data on Cites and Regions - Some Computational and Statistical ChallengesSchintler, Laurie A., Fischer, Manfred M. 28 October 2018 (has links) (PDF)
Big Data on cities and regions bring new opportunities and challenges to data analysts and city planners. On the one side, they hold great promise to combine increasingly detailed data for each citizen with critical infrastructures to plan, govern and manage cities and regions, improve their sustainability, optimize processes and maximize the provision of public and private services. On the other side, the massive sample size and high-dimensionality of Big Data and their geo-temporal character introduce unique computational and statistical challenges. This chapter provides overviews on the salient characteristics of Big Data and how these features impact on paradigm change of data management and analysis, and also on the computing environment. / Series: Working Papers in Regional Science
|
122 |
Testes para análise de vigor em sementes de girassol / Test for analysis of vigor in sunflower seedsHaesbaert, Fernando Machado 28 February 2013 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Research with the sunflower crop currently emphasize the use for oil extraction for
biodiesel production. The sunflower, because it is a culture of broad adaptation and soil
nutrient cycling, has been deployed in systems of crop rotation. An important strategy for
the success of any crop seed is the use of good quality in order to obtain suitable plant
stand. But often lacking appropriate tests to determine the quality of seeds, making the
choice of the best lots. In this sense, the objective is to determine the appropriate
methodology for analysis of the effect of sunflower seeds, through electrical conductivity
mass and individual test and pH exudates, as well as determining the number of samples in
the number of seeds for evaluation of Individual electrical conductivity of sunflower seeds.
Experiments were conducted, which evaluated the mass electrical conductivity, electrical
conductivity and pH of the individual exudate and established the relationship of these
tests with the test field emergence. The electrical conductivity mass and individual are
promising in the separation of lots of sunflower seeds, and the best conditions for
performing the electrical conductivity test mass is 25 seeds, 25 ml of water and reading
done after an hour of soaking. For the electrical conductivity test individual periods 1 - 24
hours have high correlation between field emergence and electrical conductivity
individual. The sample size seed number, to evaluate the conductivity of sunflower seed, is
dependent on the time of seed imbibition. Soaking times an hour using the smallest
possible sample sizes. Considering the range of 15 μS cm-1 seed-1 is recommended sample
size of 100 seeds. / As pesquisas com a cultura do girassol, atualmente, enfatizam a utilização para fins de
extração de óleo para produção de biodiesel. O girassol, por ser uma cultura de ampla
adaptação e ciclagem de nutrientes do solo, vem sendo implantado nos sistemas de rotação
de culturas. Uma estratégia importante para o sucesso de qualquer cultivo é a utilização de
sementes de boa qualidade, de forma a obter adequado estande de plantas. Porém, muitas
vezes faltam testes apropriados para determinação da qualidade das sementes, dificultando
a escolha dos melhores lotes. Neste sentido, objetiva-se a determinação da metodologia
adequada para análise do vigor de sementes de girassol, através de testes de condutividade
elétrica massal e individual e teste do pH do exsudato, bem como, determinar o número de
amostra em número de sementes para avaliação da condutividade elétrica individual de
sementes de girassol. Foram realizados experimentos, em que se avaliou os testes da
condutividade elétrica massal, da condutividade elétrica individual e do pH do exsudato e
se estabeleceu a relação destes testes com o teste de emergência em campo. Os testes de
condutividade elétrica massal e individual são promissores na separação dos lotes de
sementes de girassol, sendo que, a condição mais adequada para realização do teste de
condutividade elétrica massal é de 25 sementes, 25 ml de água e leitura realizada após uma
hora de embebição. Para o teste de condutividade elétrica individual, períodos de 1 a 24
horas apresentam alta correlação entre emergência em campo e condutividade elétrica
individual. O tamanho de amostra em número de sementes, para avaliar a condutividade
elétrica das sementes de girassol, é dependente do tempo de embebição das sementes.
Tempos de embebição de uma hora possibilitam utilizar os menores tamanhos de amostras.
Considerando a amplitude de 15 μS cm-1 semente-1 recomenda-se tamanho de amostra de
100 sementes.
|
123 |
Modelo estocástico para estimação da produtividade de soja no Estado de São Paulo utilizando simulação normal bivariada / Sthocastic model to estimate the soybean productivity in the State of São Paulo through bivaried normal simulationThomas Newton Martin 08 February 2007 (has links)
A disponibilidade de recursos, tanto de ordem financeira quanto de mão-de-obra, é escassa. Sendo assim, deve-se incentivar o planejamento regional que minimize a utilização de recursos. A previsão de safra por intermédio de técnicas de modelagem deve ser realizada anteriormente com base nas características regionais, indicando assim as diretrizes básicas da pesquisa, bem como o planejamento regional. Dessa forma, os objetivos deste trabalho são: (i) caracterizar as variáveis do clima por intermédio de diferentes distribuições de probabilidade; (ii) verificar a homogeneidade espacial e temporal para as variáveis do clima; (iii) utilizar a distribuição normal bivariada para simular parâmetros utilizados na estimação de produtividade da cultura de soja; e (iv) propor um modelo para estimar a ordem de magnitude da produtividade potencial (dependente da interação genótipo, temperatura, radiação fotossinteticamente ativa e fotoperíodo) e da produtividade deplecionada (dependente da podutividade potencial, da chuva e do armazenamento de água no solo) de grãos de soja, baseados nos valores diários de temperatura, insolação e chuva, para o estado de São Paulo. As variáveis utilizadas neste estudo foram: temperatura média, insolação, radiação solar fotossinteticamente ativa e precipitação pluvial, em escala diária, obtidas em 27 estações localizadas no Estado de São Paulo e seis estações localizadas em Estados vizinhos. Primeiramente, verificou-se a aderência das variáveis a cinco distribuições de probabilidade (normal, log-normal, exponencial, gama e weibull), por intermédio do teste de Kolmogorov-Smirnov. Verificou-se a homogeneidade espacial e temporal dos dados por intermédio da análise de agrupamento pelo método de Ward e estimou-se o tamanho de amostra (número de anos) para as variáveis. A geração de números aleatórios foi realizada por intermédio do método Monte Carlo. A simulação dos dados de radiação fotossinteticamente ativa e temperatura foram realizadas por intermédio de três casos (i) distribuição triangular assimétrica (ii) distribuição normal truncada a 1,96 desvio padrão da média e (iii) distribuição normal bivariada. Os dados simulados foram avaliados por intermédio do teste de homogeneidade de variância de Bartlett e do teste F, teste t, índice de concordância de Willmott, coeficiente angular da reta, o índice de desempenho de Camargo (C) e aderência à distribuição normal (univariada). O modelo utilizado para calcular a produtividade potencial da cultura de soja foi desenvolvido com base no modelo de De Wit, incluindo contribuições de Van Heenst, Driessen, Konijn, de Vries, dentre outros. O cálculo da produtividade deplecionada foi dependente da evapotranspiração potencial, da cultura e real e coeficiente de sensibilidade a deficiência hídrica. Os dados de precipitação pluvial foram amostrados por intermédio da distribuição normal. Sendo assim, a produção diária de carboidrato foi deplecionada em função do estresse hídrico e número de horas diárias de insolação. A interpolação dos dados, de modo a englobar todo o Estado de São Paulo, foi realizada por intermédio do método da Krigagem. Foi verificado que a maior parte das variáveis segue a distribuição normal de probabilidade. Além disso, as variáveis apresentam variabilidade espacial e temporal e o número de anos necessários (tamanho de amostra) para cada uma delas é bastante variável. A simulação utilizando a distribuição normal bivariada é a mais apropriada por representar melhor as variáveis do clima. E o modelo de estimação das produtividades potencial e deplecionada para a cultura de soja produz resultados coerentes com outros resultados obtidos na literatura. / The availability of resources, as much of financial order and human labor, is scarse. Therefore, it must stimulates the regional planning that minimizes the use of resources. Then, the forecast of harvests through modelling techniques must previously on the basis of be carried through the regional characteristics, thus indicating the routes of the research, as well as the regional planning. Then, the aims of this work are: (i) to characterize the climatic variables through different probability distributions; (ii) to verify the spatial and temporal homogeneity of the climatic variables; (iii) to verify the bivaried normal distribution to simulate parameters used to estimate soybean crop productivity; (iv) to propose a model of estimating the magnitud order of soybean crop potential productivity (it depends on the genotype, air temperature, photosynthetic active radiation; and photoperiod) and the depleted soybean crop productivity (it pedends on the potential productivity, rainfall and soil watter availability) based on daily values of temperature, insolation and rain, for the State of São Paulo. The variable used in this study had been the minimum, maximum and average air temperature, insolation, solar radiation, fotosynthetic active radiation and pluvial precipitation, in daily scale, gotten in 27 stations located in the State of São Paulo and six stations located in neighboring States. First, it was verified tack of seven variables in five probability distributions (normal, log-normal, exponential, gamma and weibull), through of Kolmogorov-Smirnov. The spatial and temporal verified through the analysis of grouping by Ward method and estimating the sample size (number of years) for the variable. The generation of random numbers was carried through the Monte Carlo Method. The simulation of the data of photosyntetic active radiation and temperature had been carried through three cases: (i) nonsymetric triangular distribution (ii) normal distribution truncated at 1.96 shunting line standard of the average and (iii) bivaried normal distribution. The simulated data had been evaluated through the test of homogeneity of variance of Bartlett and the F test, t test, agreement index of Willmott, angular coefficient of the straight line, the index of performance index of Camargo (C) and tack the normal distribution (univarieted). The proposed model to simulate the potential productivity of soybean crop was based on the de Wit concepts, including Van Heenst, Driessen, Konijn, Vries, and others researchers. The computation of the depleted productivity was dependent of the potential, crop and real evapotranspirations and the sensitivity hydric deficiency coefficient. The insolation and pluvial precipitation data had been showed through the normal distribution. Being thus, the daily production of carbohydrate was depleted as function of hydric stress and insolation. The interpolation of the data, in order to consider the whole State of Sao Paulo, was carried through the Kriging method. The results were gotten that most of the variable can follow the normal distribution. Moreover, the variable presents spatial and temporal variability and the number of necessary years (sample size) for each one of them is sufficiently changeable. The simulation using the bivaried normal distribution is most appropriate for better representation of climate variable. The model of estimating potential and depleted soybean crop productivities produces coherent values with the literature results.
|
124 |
Aspectos estatísticos da amostragem de água de lastro / Statistical aspects of ballast water samplingEliardo Guimarães da Costa 01 March 2013 (has links)
A água de lastro de navios é um dos principais agentes dispersivos de organismos nocivos à saúde humana e ao meio ambiente e normas internacionais exigem que a concentração desses organismos no tanque seja menor que um valor previamente especificado. Por limitações de tempo e custo, esse controle requer o uso de amostragem. Sob a hipótese de que a concentração desses organismos no tanque é homogênea, vários autores têm utilizado a distribuição Poisson para a tomada de decisão com base num teste de hipóteses. Como essa proposta é pouco realista, estendemos os resultados para casos em que a concentração de organismos no tanque é heterogênea utilizando estratificação, processos de Poisson não-homogêneos ou assumindo que ela obedece a uma distribuição Gama, que induz uma distribuição Binomial Negativa para o número de organismos amostrados. Além disso, propomos uma nova abordagem para o problema por meio de técnicas de estimação baseadas na distribuição Binomial Negativa. Para fins de aplicação, implementamos rotinas computacionais no software R / Ballast water is a leading dispersing agent of harmful organisms to human health and to the environment and international standards require that the concentration of these organisms in the tank must be less than a prespecified value. Because of time and cost limitations, this inspection requires the use of sampling. Under the assumption of an homogeneous organism concentration in the tank, several authors have used the Poisson distribution for decision making based on hypothesis testing. Since this proposal is unrealistic, we extend the results for cases in which the organism concentration in the tank is heterogeneous, using stratification, nonhomogeneous Poisson processes or assuming that it follows a Gamma distribution, which induces a Negative Binomial distribution for the number of sampled organisms. Furthermore, we propose a novel approach to the problem through estimation techniques based on the Negative Binomial distribution. For practical applications, we implemented computational routines using the R software
|
125 |
Bildanalys inom Machine Vision : Nyquists samplingsteorem vid digital fotograferingLindström, Mattias January 2017 (has links)
Inom Machine Vision är det av stor vikt att kameran har möjlighet att detektera de detaljer som eftersöks. Aliasing är ett problem inom all digital fotografering och beror på att kamerans upplösning är för låg i förhållande till de detaljer den försöker fånga. Det här arbetet analyserar kamerans begränsningar och orsaken till dessa. En enkel kamerarigg som används till försök inom Machine Vision konstrueras om från grunden för bättre kontroll och upplösning och en ny styrning skapas till denna efter beställarens specifikationer. Ett testmönster för ISO 12233:2000 fotograferas därefter i denna rigg. Resultatet analyseras och jämförs mot Nyquists samplingsteorem med avseende på digital fotografering. Resultatet visar hur kamerans konstruktion och sätt att registrera färger genom ett filter framför bildsensorn och algoritmer för att beräkna färgen för varje enskild bildpunkt höjer sampelstorleken med en faktor 3 jämfört med det ursprungliga teoremet om dubbla samplingsfrekven-sen. / Within Machine Vision, it is very important that the camera can detect the details requested. Aliasing is a problem in all digital photography, and is because the camera's resolution is too low relative to the details it tries to capture. This work analyzes the camera's limitations and the cause of these. A simple camera rig used for Machine Vision tests is re-designed for better control and resolution, and a new control-system is created to this according to the client's specifications. A test pattern for ISO 12233: 2000 is then photographed in this rig. The result is analyzed and compared to Nyquist sampling theorem regarding digital photography. The result shows how the camera's design and way of registering colors through a filter in front of the image sensor and algorithms to calculate the color for each individual pixel increases the sample size by a factor of 3 compared with the original theorem with double sampling frequency.
|
126 |
Multiplicité des tests, et calculs de taille d'échantillon en recherche clinique / Multiplicity of tests, and sample size determination of clinical trialsRiou, Jérémie 11 December 2013 (has links)
Ce travail a eu pour objectif de répondre aux problématiques inhérentes aux tests multiples dans le contexte des essais cliniques. A l’heure actuelle un nombre croissant d’essais cliniques ont pour objectif d’observer l’effet multifactoriel d’un produit, et nécessite donc l’utilisation de co-critères de jugement principaux. La significativité de l’étude est alors conclue si et seulement si nous observons le rejet d’au moins r hypothèses nulles parmi les m hypothèses nulles testées. Dans ce contexte, les statisticiens doivent prendre en compte la multiplicité induite par cette pratique. Nous nous sommes consacrés dans un premier temps à la recherche d’une correction exacte pour l’analyse des données et le calcul de taille d’échantillon pour r = 1. Puis nous avons travaillé sur le calcul de taille d’´echantillon pour toutes valeurs de r, quand les procédures en une étape, ou les procédures séquentielles sont utilisées. Finalement nous nous sommes intéressés à la correction du degré de signification engendré par la recherche d’un codage optimal d’une variable explicative continue dans un modèle linéaire généralisé / This work aimed to meet multiple testing problems in clinical trials context. Nowadays, in clinical research it is increasingly common to define multiple co-primary endpoints in order to capture a multi-factorial effect of the product. The significance of the study is concluded if and only if at least r null hypotheses are rejected among the m null hypotheses. In this context, statisticians need to take into account multiplicity problems. We initially devoted our work on exact correction of the multiple testing for data analysis and sample size computation, when r = 1. Then we worked on sample size computation for any values of r, when stepwise and single step procedures are used. Finally we are interested in the correction of significance level generated by the search for an optimal coding of a continuous explanatory variable in generalized linear model.
|
127 |
Improved Criteria for Estimating Calibration Factors for Highway Safety Manual (HSM) ApplicationsSaha, Dibakar 14 November 2014 (has links)
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors.
In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually.
To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
|
128 |
The Meaningfulness of Effect Sizes in Psychological Research: Differences Between Sub-Disciplines and the Impact of Potential BiasesSchäfer, Thomas, Schwarz, Marcus A. 15 April 2019 (has links)
Effect sizes are the currency of psychological research. They quantify the results of a study to answer the research question and are used to calculate statistical power. The interpretation of effect sizes—when is an effect small, medium, or large?—has been guided by the recommendations Jacob Cohen gave in his pioneering writings starting in 1962: Either compare an effect with the effects found in past research or use certain conventional benchmarks. The present analysis shows that neither of these recommendations is currently applicable. From past publications without pre-registration, 900 effects were randomly drawn and compared with 93 effects from publications with pre-registration, revealing a large difference: Effects from the former (median r = 0.36) were much larger than effects from the latter (median r = 0.16). That is, certain biases, such as publication bias or questionable research practices, have caused a dramatic inflation in published effects, making it difficult to compare an actual effect with the real population effects (as these are unknown). In addition, there were very large differences in the mean effects between psychological sub-disciplines and between different study designs, making it impossible to apply any global benchmarks. Many more pre-registered studies are needed in the future to derive a reliable picture of real population effects.
|
129 |
How do Companies Reward their EmployeesCudjoe, Samuel January 2012 (has links)
This study is unique considering the location (Africa) and the industrial setting (Gold Mining) from which the research was studied as reward systems had mostly been studied in the North-American and European settings. Thus, the study considered rewards from the perspective of the African and its natural resource industries such as the gold mining industry. The methodology employed in the study was based on a case study approach at Golden Star (Bogoso/Prestea) Limited (GSB/PL) with a population size of 1029 employees combining both qualitative and quantitative data obtained through a questionnaire survey of a 278 sample size and structured interview with the Human Resources and Administration Manager. Thus, the method of data collection represents methodological triangulation and the data obtained from the study represents a primary source of data. The study revealed that all the three generational groups (Baby Boomers, GEN Xers and GEN Yers) places higher emphasis or priority on financial incentives (high salary and bonuses) over any other incentives when respondents were asked to indicate the reward they prefer most. But when rewards were considered as a total package profile, greater number of the baby boomers placed more emphasis or priority on packages with highly flexible pension benefits, long term job security and high internal promotions eventhough the salary and bonus components of the packages (profile) were not that attract. The GEN X and GEN Y groups still maintained their reward package profile preferences based on high financial incentives, training and learning opportunities, personal growth and career advancement. The study revealed that aside the high preferences for financial incentives such as high salary and bonuses by all the generational groups, few of the GEN X and GEN Y also exhibited other preferences such as high personal growth, flexible work schedule, attractive company policy and administration, career advancement, working environment, job security and praises and recognition of which the baby boomers did not indicate any preferences or interest. The study revealed that all the three generational groups (Baby Boomers, GEN X and GEN Y) consider high salary and bonuses as factor which causes employee dissatisfaction when not satisfied or available but when they are satisfied or available also do not motivate or cause satisfaction and thus confirming Herzberg Two-Factor theory that factors such as salary or remuneration, job security, working conditions and company policies only prevent employee dissatisfaction. The study revealed that all generational groups (baby boomers, GEN X and GEN Y) consider high salaries and bonuses as factor which could lead to lack of satisfaction and motivation of the employee in his current role or position when not available or satisfied and thus this finding confirm the traditional belief that pay is prime, or in some cases the only source of motivation but contradict Herzberg claim that pay (high salaries and bonuses) is only an extrinsic factor and that when is available or satisfied, pay does not bring satisfaction and motivation but rather prevents dissatisfaction. The study revealed that GSB/PL rewards systems basically comprises of extrinsic rewards such as high salary levels (pay increases), a bonus scheme, training and learning opportunities, job security, Stock options, Retirement/Pension benefits such as social security and provident fund, promotions, attractive company policies and administration, praises and recognition, good working environment, flexible work schedule, Long service awards and benefits such as housing, Health insurance, Vacation/Annual leave benefits, transportation/bussing service, messing (provision of meals to employees only when at work), and educational benefits (for employees dependants). The study also revealed that the design and implementation of GSB/PL reward systems involves four distinct phases: assessment, design, execution and evaluation phases. In the end, a suitable conclusion was drawn and a number of recommendations proposed to be implemented by the mining company in safeguarding the interest of both employees and the employer.
|
130 |
Confidence Intervals for a Ratio of Binomial Proportions Based on Paired DataBonett, Douglas, Price, Robert M. 15 September 2006 (has links)
Four interval estimation methods for the ratio of marginal binomial proportions are compared in terms of expected interval width and exact coverage probability. Two new methods are proposed that are based on combining two Wilson score intervals. The new methods are easy to compute and perform as well or better than the method recently proposed by Nam and Blackwelder. Two sample size formulas are proposed to approximate the sample size required to achieve an interval estimate with desired confidence level and width.
|
Page generated in 0.0365 seconds