Global ETD Search

111	Application of Spatiotemporal Data Mining to Air Quality Data Biancardi, Michael Anthony 05 1900 (has links) This thesis explores the use of spatiotemporal data mining in the air quality domain to understand causes of PM2.5 air pollution. PM2.5 refers to fine particulate matter less than 2.5 microns in diameter and is a major threat to human and environmental health. A review of air quality modeling methods is provided, emphasizing data-driven modeling techniques. While data mining methods have been applied to air quality data, including temporal sequence mining algorithms, spatiotemporal sequence mining methods have not been broadly applied to study air pollution. However, air pollution is highly spatial in nature, so such methods can offer new insights into air quality. This thesis applies one such method, the Spatiotemporal Sequence Miner (STS Miner) algorithm, to air quality data from a low-cost sensor network to explore causes and trends related to PM2.5. To facilitate the use of this method, an open-source library called OpenSTSMiner is developed to implement this algorithm. Various domain results are found; for instance, low temperature and low relative humidity are strongly associated with worsening levels of air quality. Lastly, to highlight the utility of the STS Miner algorithm, a comparison is presented between STS Miner and spatial Markov chains, another spatiotemporal modeling method used in the air quality domain. Spatiotemporal data mining air pollution air quality data mining spatial data mining spatial statistics Computer Science Environmental Sciences Geography
112	On the use of $\alpha$-stable random variables in Bayesian bridge regression, neural networks and kernel processes.pdf Jorge E Loria (18423207) 23 April 2024 (has links) <p dir="ltr">The first chapter considers the l_α regularized linear regression, also termed Bridge regression. For α ∈ (0, 1), Bridge regression enjoys several statistical properties of interest such</p><p dir="ltr">as sparsity and near-unbiasedness of the estimates (Fan & Li, 2001). However, the main difficulty lies in the non-convex nature of the penalty for these values of α, which makes an</p><p dir="ltr">optimization procedure challenging and usually it is only possible to find a local optimum. To address this issue, Polson et al. (2013) took a sampling based fully Bayesian approach to this problem, using the correspondence between the Bridge penalty and a power exponential prior on the regression coefficients. However, their sampling procedure relies on Markov chain Monte Carlo (MCMC) techniques, which are inherently sequential and not scalable to large problem dimensions. Cross validation approaches are similarly computation-intensive. To this end, our contribution is a novel non-iterative method to fit a Bridge regression model. The main contribution lies in an explicit formula for Stein’s unbiased risk estimate for the out of sample prediction risk of Bridge regression, which can then be optimized to select the desired tuning parameters, allowing us to completely bypass MCMC as well as computation-intensive cross validation approaches. Our procedure yields results in a fraction of computational times compared to iterative schemes, without any appreciable loss in statistical performance.</p><p><br></p><p dir="ltr">Next, we build upon the classical and influential works of Neal (1996), who proved that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance. Neal’s result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits. The tractable properties of Gaussian processes then allow straightforward posterior inference and uncertainty quantification, considerably simplifying the study of the limit process compared to a network of finite width. Neural network weights with unbounded variance, however, pose unique challenges. In this case, the classical central limit theorem breaks down and it is well known that the scaling limit is an α-stable process under suitable conditions. However, current literature is primarily limited to forward simulations under these processes and the problem of posterior inference under such a scaling limit remains largely unaddressed, unlike in the Gaussian process case. To this end, our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.</p><p><br></p><p dir="ltr">Finally, we extend on the previous chapter, by considering a natural extension to deep neural networks through kernel processes. Kernel processes (Aitchison et al., 2021) generalize to deeper networks the notion proved by Neal (1996) by describing the non-linear transformation in each layer as a covariance matrix (kernel) of a Gaussian process. In this way, each succesive layer transforms the covariance matrix in the previous layer by a covariance function. However, the covariance obtained by this process loses any possibility of representation learning since the covariance matrix is deterministic. To address this, Aitchison et al. (2021) proposed deep kernel processes using Wishart and inverse Wishart matrices for each layer in deep neural networks. Nevertheless, the approach they propose requires using a process that does not emerge from the limit of a classic neural network structure. We introduce α-stable kernel processes (α-KP) for learning posterior stochastic covariances in each layer. Our results show that our method is much better than the approach proposed by Aitchison et al. (2021) in both simulated data and the benchmark Boston dataset.</p> Computational statistics Spatial statistics Statistical theory gaussian processes Bridge regression kernel processes deep neural networks Probabilistic Machine Learning Models
113	Applications of modern regression techniques in empirical economics März, Alexander 14 July 2016 (has links) No description available. 630 Bayesian geoadditive quantile regression Bayesian distributional regression Conditional dependence Farmland rental rates Heteroscedastic regression Intergenerational social mobility Real options Spatial statistics Land- und Forstwirtschaft (PPN621302791)
114	台灣地震散群之研究吳東陽 Unknown Date (has links) 九二一地震是台灣數十年來傷亡最大的地震，根據中央氣象局的研究發現九二一地震之後半年至一年內發生的地震，大多數都是由其引發的餘震，然而一個地震屬於主震、或是某個地震的餘震又該如何判斷呢？本文是以統計資料分析之觀點來區分主震與餘震，而不是利用相關地震學理論來區分主震與餘震，本文主要研究的是比較四種區分主震與餘震的方法：整體距離(Global Distance)、負相關(Negative Correlation)、最近鄰區(Nearest Neighbors)、視窗(Window)。四種地震散群方法所需要給定的參數：時間與空間參數，要如何選取與決定，本文則是利用台灣自1991年1月 1日至2003年12月31日之地震規模大於5.0以上的資料，定義地震減少比例(decreasing earthquake percent)來選取參數，以求出最適當的模型參數。套用選取得到的模型參數，利用電腦模擬地震來驗證比較方法的優劣，依據誤判主震(False Positive)、誤判餘震(False Negative)、分錯比例(Overall Error Rate)等準則比較各種地震散群方法的優劣，研究發現四種方法各有其優劣之處。關鍵詞：主震、餘震、空間統計、最近鄰區、電腦模擬 / The Chi-Chi earthquake resulted in one of the greatest casualties for the past 100 years in Taiwan. According to the Central Weather Bureau in Taiwan, most of the earthquakes that occurred 6 months to 12 months after the Chi-Chi earthquake were the aftershocks. But in general, how do we classify if a certain earthquake is a main earthquake or aftershock? In this study, our interest is on the statistical methods for detecting whether an earthquake is a main earthquake. Four declustering methods are considered: Global Distance, Negative Correlation, Nearest Neighbors and Window. Taiwan earthquake data, with magnitude larger than 5 occurring between 1991 and 2003, were used to determine the parameters used in these four methods. Finally, a computer simulation is used to evaluate the performance of four methods, based on the results such as false positive and false negative, and overall Error Rate. Key Words: Decluster, Aftershock, Spatial Statistics, Nearest Neighbors, Simulation 主震餘震空間統計最近鄰區電腦模擬 Decluster Aftershock Spatial Statistics Nearest Neighbors Simulation
115	The effects of alcohol access on the spatial and temporal distribution of crime Fitterer, Jessica Laura 15 March 2017 (has links) Increases in alcohol availability have caused crime rates to escalate across multiple regions around the world. As individuals consume alcohol they experience impaired judgment and a dose-response escalation in aggression that, for some, leads to criminal behaviour. By limiting alcohol availability it is possible to reduce crime; however, the literature remains mixed on the best practices for alcohol access restrictions. Variances in data quality and statistical methods have created an inconsistency in the reported effects of price, hour of sales, and alcohol outlet restrictions on crime. Most notably, the research findings are influenced by the different effects of alcohol establishments on crime. The objective of this PhD research was to develop novel quantitative approaches to establish the extent alcohol access (outlets) influences the frequency of crime (liquor, disorder, violent) at a fine level of spatial detail (x,y locations and block groups). Analyses were focused on British Columbia’s largest cities where policies are changing to allow greater alcohol access, but little is known about the crime-alcohol access relationship. Two reviews were conducted to summarize and contrast the effects of alcohol access restrictions (price, hours of sales, alcohol outlet density) on crime, and evaluate the state-of-the-art in statistical methods used to associate crime with alcohol availability. Results highlight key methodological limitations and fragmentation in alcohol policy effects on crime across multiple disciplines. Using a spatial data science approach, recommendations were made to increase spatial detail in modelling to limit the scale effects on crime-alcohol association. Providing guidelines for alcohol-associated crime reduction, kernel density space-time change detection methods were also applied to provide the first evaluation of active policing on alcohol-associated crime in the Granville St. entertainment district of Vancouver, British Columbia. Foot patrols were able to reduce the spatial density of crime, but hot spots of liquor and violent assaults remained within 60m proximity to bars (nightclubs). To estimate the association between alcohol establishment size, and type on disorder and violent crime reports in block groups across Victoria, British Columbia a Poisson Generalized Linear Model with spatial lag effects was applied. Estimates provided the factor increase (1.0009) expected in crime for every additional patron seat added to an establishment capacity, and indicated that establishments should be spaced greater than 300m a part to significantly reduce alcohol-associated crime. These results offer the first evaluation of seating capacity and establishment spacing on alcohol-associated crime for alcohol license decision making, and are pertinent at a time when alcohol policy reform is being prioritized by the British Columbia government. In summary, this dissertation contributes 1) cross-disciplinary policy and methodological reviews, 2) expands the application of spatial statistics to alcohol-attributable crime research, 3) advances knowledge on local scale of effects of different alcohol establishment types on crime, 4) and develops transferable models to estimate the effects of alcohol establishment seating capacity and proximity between establishments on the frequency of crime. / Graduate / 2018-02-27 spatial statistics alcohol crime kernel density estimation spatial lag model alcohol outlets alcohol outlet density liquor assaults disorder time-series model bar pub nightclub alcohol policy health policy
116	Um modelo espaço-temporal contínuo para o preço de lançamentos imobiliários na cidade de São Paulo / A continuous space-time model for the price of real estate launches in the city of São Paulo Rocio, Vitor Dias 15 June 2018 (has links) Neste trabalho será feito um modelo espaço-temporal contínuo para preços de imóveis na cidade de São Paulo estimado através de métodos Bayesianos. Faremos uma decomposição da série em tendência e ciclo além de incorporar um conjunto de variáveis explicativas e efeitos aleatórios espaciais projetados no contínuo. Este modelo introduz um novo método para analisar a formação dos preços dos lançamentos imobiliários. Consideramos em nosso modelo hedônico, além das características intrínsecas, também as características da vizinhança e o ambiente econômico. Com este modelo, conseguimos observar os preços de equilíbrio para as respectivas localizações e uma interpretação mais clara da dinâmica de preços dos imóveis entre janeiro de 2000 e dezembro de 2013 para a cidade de São Paulo. / In this work will be made a continuous spatial-temporal model for real estate prices in the city of São Paulo estimated using Bayesian methods. We will decompose the series into a trend and cycle, and incorporate a set of explanatory variables and random spatial effects projected into the continuum. This model introduces a new method to analyze the price formation of real estate launches. We consider in our hedonic model, besides the intrinsic characteristics, also the characteristics of the neighborhood and the economic environment. With this model, we were able to observe the equilibrium prices for the respective locations and a clearer interpretation of the dynamics of real estate prices between January 2000 and December 2013 for the city of São Paulo. Bayesian methods Co-Integração espacial Econometria espacial Estatística espacial Métodos Bayesianos Spatial Co-Integration Spatial econometrics Spatial statistics
117	Explorando recursos de estatística espacial para análise da acessibilidade da cidade de Bauru / Exploring spatial statistics tools for an accessibility analysis in the city of Bauru Krempi, Ana Paula 04 June 2004 (has links) A acessibilidade está relacionada com a maneira como a disponibilidade de transportes e os usos do solo afetam os indivíduos na realização de viagens para o desenvolvimento de suas atividades habituais. Freqüentemente se assume que os moradores de baixa renda da periferia são os mais afetados pela falta de acesso aos meios de transporte. A questão subjacente a esta afirmação, no entanto, permanece sem uma resposta definitiva: o nível de renda, por si só, seria um indicativo do nível de acessibilidade? O objetivo deste estudo é explorar a união de ferramentas de estatística espacial e SIG (Sistema de Informações Geográficas) com um propósito específico, que é o de analisar as relações entre aspectos da distribuição espacial de características da população (como a renda, por exemplo) de uma cidade média brasileira e os diversos níveis de acessibilidade por diferentes modos de transporte nela observados, buscando possíveis respostas para esta pergunta. Quando se utiliza procedimentos de visualização e classificação de dados espaciais comuns em SIG, nem sempre as informações são diretamente perceptíveis. Logo, deve-se utilizar ferramentas que ampliem as possibilidades de compreensão e análise dos dados. Inicialmente, as ferramentas selecionadas para uso neste trabalho são apresentadas e discutidas quanto à sua aplicação e utilização na análise proposta. Para tal foram utilizados dados coletados em uma pesquisa origemdestino (O-D) realizada na cidade de Bauru - SP, agrupados por setores censitários e adicionados ao SIG, aplicando técnicas de estatística espacial utilizadas para entidades do tipo área. Os resultados obtidos são apresentados na forma de mapas e de índices que medem a associação espacial global e local entre estas zonas. Uma das conclusões interessantes da aplicação foi a identificação de regiões da cidade com dinâmica particular, que contrariam o padrão global observado nas demais partes da área urbana. Pôde-se constatar ainda particularidades a respeito do uso de cada modo de transportes. O modo automóvel como motorista, por exemplo, possui agrupamento espacial bem definido no nível de renda alta tanto nas regiões de periferia, como nas de transição e central. Já o modo ônibus é predominantemente utilizado nas zonas de renda baixa das regiões de periferia e transição, enquanto que os modos não motorizados possuem uma dinâmica bem diversificada em toda a área urbana. Estes e outros resultados do estudo de caso deixam claro que as análises de estatística espacial em ambiente SIG criam uma ferramenta para ampliar a análise convencional de acessibilidade em transportes / Transportation accessibility is directly related to the level of transportation supply and land uses and the way they affect individuals in their trip desires for accomplishing regular-basis activities. It is often assumed that low-income segments of the population living at the periphery of the cities are those affected the most by poor conditions of transportation accessibility. There is a subjacent question behind this statement, however, which is: can the income level or the location of an individual alone explain his/her accessibility level? In order to look for answers to this question, the aim of this study is to analyze, making use of spatial statistics tools in a GIS (Geographic Information System) environment, the relationships between accessibility and income and their geographical distributions in a medium-sized Brazilian city. The application of the most commonly used GIS resources, such as visualization and spatial data classification tools, not always assures a full comprehension of the phenomenon under analysis. As a consequence, many problems require tools that enhance the possibilities of observation and analysis. As tools with this characteristic have been used in this work, they were initially introduced. Thereafter, the possibilities of use of these tools in the problem analyzed were also discussed. Data of an origin-destination (O-D) survey carried out in the city of Bauru, located in the state of São Paulo, which brings information about four different transportation modes, were used in this study. Such data, grouped following the census tracts, were carefully examined in a Geographic Information System in order to look for spatial patterns of accessibility that are not visible in the traditional approaches. The results of the analysis are presented in maps and as indices that are able to capture glabal and local spatial association patterns in areas. One of the interesting outcomes of the application was the identification of regions with particular dynamics, which go against the pattern found in the overall urban area. Particularities regarding each particular transportation mode have also been noticed. The zones where the automobile is most used (by drivers, not by passengers) are spatially clustered, regardless if the zone is at the periphery, transition zone or central area of the city. The bus trips are predominantly carried out in low-income areas of the periphery and transition rings, while the non-motorized modes (walk and bicycle) have shown a very diversified dynamics in the entire urban area. This and other results of the case study clearly indicate that spatial statistics analyses in a GIS environment create a powerful tool to extend conventional transportation accessibility analysis accessibility acessibilidade análise espacial autocorrelação espacial estatística espacial GIS geographic information systems spatial analysis spatial autocorrelation spatial statistics
118	Localização industrial: uma aproximação usando processos pontuais espaciais / Firm location: an approach using spatial point process Morales, Adriano Barasal 08 June 2018 (has links) O objetivo desta pesquisa é mostrar como aproveitar novas bases de dados disponíveis e o avanço de métodos computacionais para extrair informações estatísticas sobre a localização espacial de firmas. Para isso, propomos uma aplicação de métodos de estatística espacial para modelar o padrão de localização de novas empresas de serviços no município de São Paulo. Neste trabalho, assumimos que a localização espacial dessas firmas foi gerada através de um processo pontual bidimensional e assim aplicamos dois modelos distintos: um baseado em intensidade não estocástica baseada no processo de Poisson, e um modelo de intensidade estocástica baseado processo de Cox log Gaussiano (Log Gaussian Cox Process - LGCP). A principal base de dados utilizada é base georeferenciada baseada no Cadastro Central de Empresas construída pelo Centro de Estudos da Metrópole (CEM), contendo observações de empresas na região metropolitana de São Paulo, para o ano base de 2000. Utilizamos como variáveis explicativas de localização informações advindas de sistemas de informações geográficas (SIG), o Censo demográfico e imagens de satélite do National Oceanic and Atmospheric Administration (NOAA). Os resultados encontrados mostram a importância dessa metodologia no processo de construção de modelos de localização espacial, combinando distintas fontes de dados e introduzindo novas perspectivas sobre o estudo empírico de economia urbana. / The objective of this research is to show how to take advantage of new available databases and computational methods to extract statistical information about the spatial location of firms. In this sense, we propose an application of spatial statistics methods to model the location patterns of new services firms in the city of São Paulo. In this paper, we assume that the spatial location of these firms was generated through a two-dimensional point process and thus we applied two distinct models: one based on non-stochastic intensity based on the Poisson process, and a stochastic intensity model based on the Log Gaussian Cox process (LGCP). The main input used is a georeferenced database based on the Central Business Register made by the Center for Metropolis Studies (CEM), containing data of firms in the metropolitan region of São Paulo, for the base year 2000. We use as explanatory variables information from geographic information systems (GIS), demographic census and satellite imagery from National Oceanic and Atmospheric Administration (NOAA). The results show the usefulness of these models the construction of spatial location models, combining different data sources and introducing new perspectives on the empirical study of urban economics.
119	Regiões urbanas homogêneas e oferta de transportes / Homogeneous urban regions and transportation supply Manzato, Gustavo Garcia 09 March 2007 (has links) O objetivo deste trabalho é identificar regiões urbanas homogêneas por meio da aplicação de duas vertentes da análise espacial: a estatística espacial e uma estratégia de modelagem espacial baseada na comparação de informações oriundas de diferentes entidades espaciais, em níveis diversos de informação. Um método baseado em fluxos de viagens seria a melhor alternativa para o problema em questão, mas não há dados disponíveis para sua aplicação no Brasil. Em virtude disso, o método aqui apresentado identifica regiões que podem ser consideradas como uniformes em relação a uma variável a partir de técnicas de análise exploratória de dados espaciais, como por exemplo, o gráfico e o mapa de Moran. Em um estudo de caso para o estado de São Paulo, analisando-se as distribuições espaciais dos valores da densidade populacional por meio de sua representação em mapas temáticos classificados segundo os quadrantes do gráfico de Moran (ou box map), esse indicador permite caracterizar razoavelmente bem as regiões urbanas homogêneas existentes (inclusive as oficiais). Entretanto, ao tentar representar o seu comportamento em uma análise temporal por meio de modelos, o indicador populacional não foi capaz de descrever esse comportamento e, conseqüentemente, não serviu para elaborar estratégias de previsão para o futuro. Por outro lado, ao combinar essas informações com um indicador que representa a oferta de transportes, os resultados obtidos permitiram observar o alto desempenho dos modelos, dada a forte influência recíproca entre uso e ocupação do solo e oferta de transportes. Ao permitir a identificação de padrões e a projeção de tendências, este tipo de análise pode ser útil para o planejamento urbano e regional, tanto no contexto estudado como em uma visão mais abrangente. / The objective of this work is to identify homogeneous urban regions through the application of two branches of spatial analysis: spatial statistics and a modeling strategy based on the comparison of information from different spatial entities and at distinct levels. A commuting-based approach would be the best alternative in that case, but there is no data available for its application in Brazil. Thus, the method presented here identifies uniform regions regarding a particular variable through exploratory spatial data analysis techniques, such as the Moran scatter plot and box maps. In a case study carried out in the state of São Paulo, in which the spatial distribution of the values of population density was analyzed through the representation in box maps, a reasonable identification of the existing homogeneous urban regions (including the official ones) was performed. However, the models based only on the population density distribution did not perform well for analyses through time and therefore they were not adequate for forecasting strategies. In contrast, when combining population density information with an indicator of transportation supply the performance of the models was noticeably improved, what was likely caused by the strong reciprocal influence between land use and transportation supply. As a conclusion, the method developed in this work shall be useful for urban and regional planning at different scales, given its potential for patterns recognition and trends forecasting. Autômatos celulares Cellular automata ESDA ESDA Estatística espacial Homogeneous urban regions Land use Modelagem espacial Regiões urbanas homogêneas Spatial modeling Spatial statistics Uso e ocupação do solo
120	Regiões urbanas homogêneas e oferta de transportes / Homogeneous urban regions and transportation supply Gustavo Garcia Manzato 09 March 2007 (has links) O objetivo deste trabalho é identificar regiões urbanas homogêneas por meio da aplicação de duas vertentes da análise espacial: a estatística espacial e uma estratégia de modelagem espacial baseada na comparação de informações oriundas de diferentes entidades espaciais, em níveis diversos de informação. Um método baseado em fluxos de viagens seria a melhor alternativa para o problema em questão, mas não há dados disponíveis para sua aplicação no Brasil. Em virtude disso, o método aqui apresentado identifica regiões que podem ser consideradas como uniformes em relação a uma variável a partir de técnicas de análise exploratória de dados espaciais, como por exemplo, o gráfico e o mapa de Moran. Em um estudo de caso para o estado de São Paulo, analisando-se as distribuições espaciais dos valores da densidade populacional por meio de sua representação em mapas temáticos classificados segundo os quadrantes do gráfico de Moran (ou box map), esse indicador permite caracterizar razoavelmente bem as regiões urbanas homogêneas existentes (inclusive as oficiais). Entretanto, ao tentar representar o seu comportamento em uma análise temporal por meio de modelos, o indicador populacional não foi capaz de descrever esse comportamento e, conseqüentemente, não serviu para elaborar estratégias de previsão para o futuro. Por outro lado, ao combinar essas informações com um indicador que representa a oferta de transportes, os resultados obtidos permitiram observar o alto desempenho dos modelos, dada a forte influência recíproca entre uso e ocupação do solo e oferta de transportes. Ao permitir a identificação de padrões e a projeção de tendências, este tipo de análise pode ser útil para o planejamento urbano e regional, tanto no contexto estudado como em uma visão mais abrangente. / The objective of this work is to identify homogeneous urban regions through the application of two branches of spatial analysis: spatial statistics and a modeling strategy based on the comparison of information from different spatial entities and at distinct levels. A commuting-based approach would be the best alternative in that case, but there is no data available for its application in Brazil. Thus, the method presented here identifies uniform regions regarding a particular variable through exploratory spatial data analysis techniques, such as the Moran scatter plot and box maps. In a case study carried out in the state of São Paulo, in which the spatial distribution of the values of population density was analyzed through the representation in box maps, a reasonable identification of the existing homogeneous urban regions (including the official ones) was performed. However, the models based only on the population density distribution did not perform well for analyses through time and therefore they were not adequate for forecasting strategies. In contrast, when combining population density information with an indicator of transportation supply the performance of the models was noticeably improved, what was likely caused by the strong reciprocal influence between land use and transportation supply. As a conclusion, the method developed in this work shall be useful for urban and regional planning at different scales, given its potential for patterns recognition and trends forecasting. Autômatos celulares ESDA Estatística espacial Modelagem espacial Regiões urbanas homogêneas Uso e ocupação do solo Cellular automata ESDA Homogeneous urban regions Land use Spatial modeling Spatial statistics

Search results