1 |
Expressive Forms of Topic Modeling to Support Digital HumanitiesGad, Samah Hossam Aldin 15 October 2014 (has links)
Unstructured textual data is rapidly growing and practitioners from diverse disciplines are expe- riencing a need to structure this massive amount of data. Topic modeling is one of the most used techniques for analyzing and understanding the latent structure of large text collections. Probabilistic graphical models are the main building block behind topic modeling and they are used to express assumptions about the latent structure of complex data. This dissertation address four problems related to drawing structure from high dimensional data and improving the text mining process.
Studying the ebb and flow of ideas during critical events, e.g. an epidemic, is very important to understanding the reporting or coverage around the event or the impact of the event on the society. This can be accomplished by capturing the dynamic evolution of topics underlying a text corpora. We propose an approach to this problem by identifying segment boundaries that detect significant shifts of topic coverage. In order to identify segment boundaries, we embed a temporal segmentation algorithm around a topic modeling algorithm to capture such significant shifts of coverage. A key advantage of our approach is that it integrates with existing topic modeling algorithms in a transparent manner; thus, more sophisticated algorithms can be readily plugged in as research in topic modeling evolves. We apply this algorithm to studying data from the iNeighbors system, and apply our algorithm to six neighborhoods (three economically advantaged and three economically disadvantaged) to evaluate differences in conversations for statistical significance. Our findings suggest that social technologies may afford opportunities for democratic engagement in contexts that are otherwise less likely to support opportunities for deliberation and participatory democracy. We also examine the progression in coverage of historical newspapers about the 1918 influenza epidemic by applying our algorithm on the Washington Times archives. The algorithm is successful in identifying important qualitative features of news coverage of the pandemic.
Visually convincing results of data mining algorithms and models is crucial to analyzing and driving conclusions from the algorithms. We develop ThemeDelta, a visual analytics system for extracting and visualizing temporal trends, clustering, and reorganization in time-indexed textual datasets. ThemeDelta is supported by a dynamic temporal segmentation algorithm that integrates with topic modeling algorithms to identify change points where significant shifts in topics occur. This algorithm detects not only the clustering and associations of keywords in a time period, but also their convergence into topics (groups of keywords) that may later diverge into new groups. The visual representation of ThemeDelta uses sinuous, variable-width lines to show this evolution on a timeline, utilizing color for categories, and line width for keyword strength. We demonstrate how interaction with ThemeDelta helps capture the rise and fall of topics by analyzing archives of historical newspapers, of U.S. presidential campaign speeches, and of social messages collected through iNeighbors. ThemeDelta is evaluated using a qualitative expert user study involving three researchers from rhetoric and history using the historical newspapers corpus.
Time and location are key parameters in any event; neglecting them while discovering topics from a collection of documents results in missing valuable information. We propose a dynamic spatial topic model (DSTM), a true spatio-temporal model that enables disaggregating a corpus's coverage into location-based reporting, and understanding how such coverage varies over time. DSTM naturally generalizes traditional spatial and temporal topic models so that many existing formalisms can be viewed as special cases of DSTM. We demonstrate a successful application of DSTM to multiple newspapers from the Chronicling America repository. We demonstrate how our approach helps uncover key differences in the coverage of the flu as it spread through the nation, and provide possible explanations for such differences.
Major events that can change the flow of people's lives are important to predict, especially when we have powerful models and sufficient data available at our fingertips. The problem of embedding the DSTM in a predictive setting is the last part of this dissertation. To predict events and their locations across time, we present a predictive dynamic spatial topic model that can predict future topics and their locations from unseen documents. We showed the applicability of our proposed approach by applying it on streaming tweets from Latin America. The prediction approach was successful in identify major events and their locations. / Ph. D.
|
2 |
Laser-based measurements of two-phase flashing propane jetsAllen, John Thomas January 1998 (has links)
No description available.
|
3 |
CFD-DEM modelling of two-phase pneumatic conveying with experimental validationEbrahimi, Mohammadreza January 2014 (has links)
A wide range of industrial processes involve multiphase granular flows. These include catalytic reactions in fluidized beds, the pneumatic conveying of raw materials and gas-particle separators. Due to the complex nature of multiphase flows and the lack of fundamental understanding of the phenomena in a multiphase system, appropriate design and optimized operation of such systems has remained a challenging field of research. Design of these processes is hampered by difficulties in upscaling pilot scale results, the difficulties involved in experimental measurements and in finding reliable numerical modelling methods. Significant work has been carried out on numerical modelling of multiphase systems but challenges remain, notably computational time, appropriate definition of boundary conditions, relative significance of effects such as lift and turbulence and the availability of reliable model validation. The work presented in this thesis encompasses experimental and numerical investigations of horizontal pneumatic conveying. In the experimental work, carefully controlled experiments were carried out in a 6.5 m long, 0.075 m diameter horizontal conveying line with the aid of the laser Doppler anemometry (LDA). Initially, LDA measurements were performed to measure the gas velocity in clear flow. Good agreement was observed between the theory and experimental measurements. For two-phase experiments, spherical and non-spherical particles with different sizes and densities were used to study the effect of particle size and solid loading ratio on the mean axial particle velocity. Three different sizes of spherical glass beads, ranging from 0.9 mm to 2 mm and cylindrical shaped particle of size 1x1.5 mm were employed. It was found that by increasing the particle size and solid loading ratios, the mean axial particle velocity decreased. Turbulence modulation of the carrier phase due to the presence of spherical particles was also investigated by measuring fluctuating gas velocity for clear gas flow and particle laden flow with different particle sizes and solid loading ratios. Results suggested that for the size ranges of particles tested, the level of gas turbulence intensity increased significantly by adding particles, and the higher the solid loading ratio, the higher the turbulence intensity. With the rapid advancement of computer resources and hardware, it is now possible to perform simulations for multiphase flows. For a fundamental understanding of the underlying phenomena in pneumatic conveying, the coupled Reynolds averaged Navier-Stokes and discrete element method (RANS-DEM) was selected. The aim of the modelling section of this study was to evaluate the abilities of coupled RANSDEM to predict the phenomena occurring in a research-sized pneumatic conveying line. Simulations for both one-way and two-way RANS-DEM coupling were performed using the commercial coupled software FLUENT-EDEM in an Eulerian- Lagrangian framework, where the gas is simulated as a continuum medium, while solid phase is treated as a discrete phase. In one-way coupling simulations, a considerable discrepancy in mean axial particle velocity was observed compared to the experimental results, meaning two-way coupling was required. It was further found that the inclusion of Magnus lift force due to particle rotation was essential to reproduce the general behaviour observed in the experiments. Turbulence modulation also was investigated numerically. Experimental and simulation results of gas and particle velocities were compared showing that the RANS-DEM method is a promising method to simulate pneumatic conveying. However, some discrepancy between simulation and experimental results was observed. Most studies in two-phase flow fields have focused on spherical particles. However the majority of particles encountered in industry involve non-spherical granules which show considerably different transportation behaviour compared with spherical particles. Further modelling of cylindrical particles was conducted using a multisphere model to represent cylindrical particles in the DEM code. Drag and lift forces and torque equations were modified in the code to take the effect of particle orientation into account. The framework developed was evaluated for two test cases, indicating a good agreement with the analytical and experimental results. The transportation of isometric (low-aspect-ratio) non-spherical particles in pneumatic conveying was also modelled. The simulation results of mean axial particle velocity agreed well with the experimental measurements with the LDA technique.
|
4 |
Predicting entrainment of mixed size sediment grains by probabilistic methodsCunningham, Gavin James January 2000 (has links)
The bedload transport of mixed size sediment is an important process in river engineering. Bedload transport controls channel stability and has a significant bearing on the hydraulic roughness of the channel. The prediction of bedload transport traditionally relies upon defining some critical value of fluid force above which particles of a particular diameter are assumed to be put into transport. The suggestion here is that the transport of bed material is size dependent with large grains being more difficult to remove from the bed surface than small grains and that all grains of the same size start to move under identical conditions. While it is relatively straightforward to assess the forces required to engender transport in a bed of uniform size grains, it is not so simple where there are a number of different grain sizes present. Multitudinous experimental studies have revealed that where there are a number of grain sizes present, large grains tend to become mobilised under lower fluid forces and small grains mobilised under higher fluid forces than those required for beds of uniform material. These results led to the development of so-called hiding functions which are used to model the variation of particle mobility with its relative size within the mixture. These functions derive their name from the tendency of large grains to shelter smaller grains from the action of the flow. Determining the relative mobility of each fraction in a mixture under given hydraulic conditions is the key to predicting how the composition of the bed load will relate to that of the bed surface material. Experiments were carried out in a rectangular, glass sided channel, in a sediment recirculation mode, under varying hydraulic conditions with a set of six different sediment mixtures. Laser Doppler Anemometry (LDA) was used to attain instantaneous velocity measurements at a number of locations in the flow. A Laser Displacement Meter was used to measure the detailed topography of small sections of the bed surface. Novel analysis techniques facilitated the determination of the grain size distribution of the bed surface by a grid-by-number method. The minimum force required to entrain each grain could also be estimated by a grain pivoting analysis. This information represents the resistance of the bed grains to erosion by flowing water. With the critical conditions for the bed grains known, it is possible to estimate the proportion of each fraction entrained from the bed surface under given hydraulic conditions. To estimate the bedload composition it is first necessary to scale by the proportion each size comprises on the bed surface and then, by a function of grain diameter to account for size dependency of travel velocity. For mean hydraulic conditions the proportion of the bed mobilised can be simply determined by inspection of a cumulative distribution of critical conditions. In reality, although it may be possible to entrain some grains at the mean velocity/shear stress, the majority of transport may be anticipated to occur during high magnitude events. Turbulence may be incorporated by adopting a probabilistic approach to the prediction of grain entrainment. By considering the joint probability distribution of bed shear stress and critical shear stress, one may attain the probability of grain entrainment. Comparison of the probability of erosion of each fraction facilitates a prediction of the bedload composition. Results show that the probabilistic approach provides a significant improvement over deterministic methods for the prediction of bedload composition.
|
5 |
Topic modeling using latent dirichlet allocation on disaster tweetsPatel, Virashree Hrushikesh January 1900 (has links)
Master of Science / Department of Computer Science / Cornelia Caragea / Doina Caragea / Social media has changed the way people communicate information. It has been noted that social media platforms like Twitter are increasingly being used by people and authorities in the wake of natural disasters. The year 2017 was a historic year for the USA in terms of natural calamities and associated costs. According to NOAA (National Oceanic and Atmospheric Administration), during 2017, USA experienced 16 separate billion-dollar disaster events, including three tropical cyclones, eight severe storms, two inland floods, a crop freeze, drought, and wild re. During natural disasters, due to the collapse of infrastructure and telecommunication, often it is hard to reach out to people in need or to determine what areas are affected. In such situations, Twitter can be a lifesaving tool for local government and search and rescue agencies. Using Twitter streaming API service, disaster-related tweets can be collected and analyzed in real-time. Although tweets received from Twitter can be sparse, noisy and ambiguous, some may contain useful information with respect to situational awareness. For example, some tweets express emotions, such as grief, anguish, or call for help, other tweets provide information specific to a region, place or person, while others simply help spread information from news or environmental agencies. To extract information useful for disaster response teams from tweets, disaster tweets need to be cleaned and classified into various categories. Topic modeling can help identify topics from the collection of such disaster tweets. Subsequently, a topic (or a set of topics) will be associated with a tweet. Thus, in this report, we will use Latent Dirichlet Allocation (LDA) to accomplish topic modeling for disaster tweets dataset.
|
6 |
A não qualidade e o seu impacto no processo produtivo na Volkswagen AutoeuropaOliveira, Quirina Verónica Ferreira January 2011 (has links)
Tese de mestrado integrado. Engenharia Industrial e Gestão. Universidade do Porto. Faculdade de Engenharia. 2011
|
7 |
Numerical Simulation of the Flow Field in 3D Eccentric Annular and 2D Centered Labyrinth Seals for Comparison with Experimental LDA DataVijaykumar, Anand 2010 December 1900 (has links)
The flow field in an annular seal is simulated for synchronous circular whirl orbits with 60Hz whirl frequency and a clearance/radius ratio of 0.0154 using the Fluent Computational Fluid Dynamics (CFD) code. Fluent's Moving Reference Frame model (MRF) is used to render the flow quasi-steady by making transformations to a rotating frame. The computed flow fields for velocity, pressure and shear stress measurements are compared with the experimental data of Winslow, Thames and Cusano. The CFD predictions are found to be in good agreement with the experimental results. The present CFD methodology can be extended to other whirl frequencies and clearances. The dynamic wall pressure distributions in an annular seal for non-circular whirl orbits were obtained using CFD. The simulations were performed using a time dependant solver utilizing Fluent's Dynamic Mesh model and User Defined Functions (UDFs). The wall pressure distributions obtained from the simulations are compared with data of Cusano. The CFD simulations over predicted the pressure field when compared to experimental results however the general trends in pressure contours are similar. The flow fields for varying rotor eccentricities are also studied by performing coordinate transformations and rendering the flow quasi-steady at set eccentricities using Fluent's MRF model. The computed velocity and pressure fields are compared with the time dependant solution obtained using Fluent's Dynamic Mesh model and UDFs for the same eccentricity. Good agreement in the velocity fields is obtained; however the pressure fields require further investigation. 2D Labyrinth seal simulations were performed for comparisons with experimental LDA data from Johnson. The velocity fields match the experimental LDA data to a fair degree of extent; however, Fluent simulations under predicted the secondary recirculation zones in Labyrinth Backward Swirl (LBS) case.
|
8 |
Experimental and Numerical Investigations of Velocity and Turbulent Quantities of a Jet Diffusion FlamePiro, Markus Hans 10 October 2007 (has links)
A turbulent diffusion flame that is typically used in a thermal spray coating system was analyzed in this study, as part of a diagnostic and development program undertaken by a research group at Queen’s University. Contributions made by this researcher were to numerically and experimentally investigate velocity and turbulent fields of the gaseous phase of the jet. Numerical and experimental analyses have been further developed upon previous research, with improved numerical methods and advanced experimental instrumentation. Numerous numerical simulations were performed in both two dimensional axisymmetric and three dimensional wedge geometries, while testing the dependence of the final solution on various physical models. Numerical analyses revealed the requirement for simulating this problem in three dimensions and improved turbulence modeling to account for relatively high levels of anisotropy. Velocity and turbulent measurements of non-reacting and combusting jets were made with a laser Doppler anemometer to validate numerical models. Excellent agreement was found between predicted and measured velocity and turbulent quantities for cold flow cases. However, numerical predictions did not agree quite as well with experiments of the flame due to limitations in modeling techniques and flow tracking abilities of tracer particles used in experimentation. / Thesis (Master, Mechanical and Materials Engineering) -- Queen's University, 2007-09-28 13:05:54.365
|
9 |
Dimensionality Reduction of Hyperspectral Signatures for Optimized Detection of Invasive SpeciesMathur, Abhinav 13 December 2002 (has links)
The aim of this thesis is to investigate the use of hyperspectral reflectance signals for the discrimination of cogongrass (Imperata cylindrica) from other subtly different vegetation species. Receiver operating characteristics (ROC) curves are used to determine which spectral bands should be considered as candidate features. Multivariate statistical analysis is then applied to the candidate features to determine the optimum subset of spectral bands. Linear discriminant analysis (LDA) is used to compute the optimum linear combination of the selected subset to be used as a feature for classification. Similarly, for comparison purposes, ROC analysis, multivariate statistical analysis, and LDA are utilized to determine the most advantageous discrete wavelet coefficients for classification. The overall system was applied to hyperspectral signatures collected with a handheld spectroradiometer (ASD) and to simulated satellite signatures (Hyperion). A leave-one-out testing of a nearest mean classifier for the ASD data shows that cogongrass can be detected amongst various other grasses with an accuracy as high as 87.86% using just the pure spectral bands and with an accuracy of 92.77% using the Haar wavelet decomposition coefficients. Similarly, the Hyperion signatures resulted in classification accuracies of 92.20% using just the pure spectral bands and with an accuracy of 96.82% using the Haar wavelet decomposition coefficients. These results show that hyperspectral reflectance signals can be used to reliably detect cogongrass from subtly different vegetation.
|
10 |
Infrared microspectroscopy of inflammatory process and colon tumors. / Microespectroscopia infravermelha de processos inflamatórios e tumores de cólonLima, Fabricio Augusto de 17 March 2016 (has links)
According to the last global burden of disease published by the World Health Organization, tumors were the third leading cause of death worldwide in 2004. Among the different types of tumors, colorectal cancer ranks as the fourth most lethal. To date, tumor diagnosis is based mainly on the identification of morphological changes in tissues. Considering that these changes appears after many biochemical reactions, the development of vibrational techniques may contribute to the early detection of tumors, since they are able to detect such reactions. The present study aimed to develop a methodology based on infrared microspectroscopy to characterize colon samples, providing complementary information to the pathologist and facilitating the early diagnosis of tumors. The study groups were composed by human colon samples obtained from paraffin-embedded biopsies. The groups are divided in normal (n=20), inflammation (n=17) and tumor (n=18). Two adjacent slices were acquired from each block. The first one was subjected to chemical dewaxing and H&E staining. The infrared imaging was performed on the second slice, which was not dewaxed or stained. A computational preprocessing methodology was employed to identify the paraffin in the images and to perform spectral baseline correction. Such methodology was adapted to include two types of spectral quality control. Afterwards the preprocessing step, spectra belonging to the same image were analyzed and grouped according to their biochemical similarities. One pathologist associated each obtained group with some histological structure based on the H&E stained slice. Such analysis highlighted the biochemical differences between the three studied groups. Results showed that severe inflammation presents biochemical features similar to the tumors ones, indicating that tumors can develop from inflammatory process. A spectral database was constructed containing the biochemical information identified in the previous step. Spectra obtained from new samples were confronted with the database information, leading to their classification into one of the three groups: normal, inflammation or tumor. Internal and external validation were performed based on the classification sensitivity, specificity and accuracy. Comparison between the classification results and H&E stained sections revealed some discrepancies. Some regions histologically normal were identified as inflammation by the classification algorithm. Similarly, some regions presenting inflammatory lesions in the stained section were classified into the tumor group. Such differences were considered as misclassification, but they may actually evidence that biochemical changes are in course in the analyzed sample. In the latter case, the method developed throughout this thesis would have proved able to identify early stages of inflammatory and tumor lesions. It is necessary to perform additional experiments to elucidate this discrepancy between the classification results and the morphological features. One solution would be the use of immunohistochemistry techniques with specific markers for tumor and inflammation. Another option includes the recovering of the medical records of patients who participated in this study in order to check, in later times to the biopsy collection, whether they actually developed the lesions supposedly detected in this research. / De acordo com o último compêndio de doenças publicado pela Organização Mundial da Saúde, tumores foram a terceira principal causa de morte mundial em 2004, sendo o câncer colorretal o quarto mais letal. O diagnóstico de tumores baseia-se, principalmente, na identificação de alterações morfológicas dos tecidos. Considerando-se que estas surgem após alterações bioquímicas, o desenvolvimento de técnicas espectroscópicas pode contribuir para a identificação de tumores em estágios iniciais, já que estas são capazes de caracterizar bioquimicamente as amostras em estudo. Esta pesquisa teve por objetivo desenvolver uma metodologia baseada em microespectroscopia infravermelha para caracterização de amostras de cólon, visando fornecer informações complementares ao médico patologista. Foram estudados três grupos de amostras obtidas de biópsias humanas incluídas em parafina: tecido normal (n=20), lesões inflamatórias (n=17) e tumores (n=18). Dois cortes histológicos adjacentes foram coletados de cada bloco. O primeiro corte foi submetido à remoção química de parafina e coloração H&E. O segundo corte foi utilizado para aquisição de imagens espectrais, não sendo submetido à remoção de parafina ou à coloração química. Foi implementada uma técnica computacional para pré-processamento dos espectros coletados, visando identificar a parafina nas imagens e corrigir variações na linha de base espectral. Tal metodologia foi adaptada para incluir dois tipos de controle de qualidade espectral. Após o pré-processamento, espectros pertencentes a uma mesma imagem foram comparados e agrupados de acordo com suas semelhanças bioquímicas. Os grupos obtidos foram submetidos à análise de um médico patologista que associou cada grupo a uma estrutura histológica, tendo como base o corte corado com H&E. Esta análise revelou as diferenças bioquímicas entre os três grupos estudados. Os resultados mostraram que inflamações severas tem propriedades bioquímicas semelhantes às dos tumores, sugerindo que estes podem evoluir a partir de tais inflamações. Foi construído um banco de dados espectral contendo as informações bioquímicas identificadas em cada grupo na etapa anterior. Espectros de novas amostras foram comparados com a informação contida no banco de dados, possibilitando a sua classificação em um dos três grupos: normal, inflamação ou tumor. O banco de dados foi validado interna e externamente por meio da sensibilidade, especificidade e acurácia de classificação. Discrepâncias foram encontradas ao comparar os resultados da classificação com os cortes histológicos corados com H&E. Algumas regiões que se mostram histologicamente normais foram identificadas como inflamação pelo algoritmo de classificação, assim como regiões histologicamente inflamadas foram classificadas no grupo tumoral. Tais discrepâncias foram consideradas como erros de classificação, ainda que possam ser indícios de que alterações bioquímicas estejam ocorrendo nos tecidos analisados. Neste caso, a metodologia desenvolvida teria se mostrado capaz de identificar precocemente lesões inflamatórias e tumorais. É necessário realizar experimentos adicionais para elucidar esta discrepância entre o algoritmo de classificação e a morfologia do tecidual. Uma solução seria o emprego de técnicas de imunohistoquímica com marcadores específicos para câncer e inflamação. Outra opção seria recuperar os registros médicos dos pacientes que participaram deste estudo para verificar se, em períodos posteriores à coleta da biópsia, houve realmente o desenvolvimento das lesões supostamente identificadas neste estudo.
|
Page generated in 0.0302 seconds