151 |
PDF document search within a very large databaseWang, Lizhong January 2017 (has links)
Digital search engine, taking a search request from user and then returning a result responded to the request to the user, is indispensable for modern humans who are used to surfing the Internet. On the other hand, the digital document PDF is accepted by more and more people and becomes widely used in this day and age due to the convenience and effectiveness. It follows that, the traditional library has already started to be replaced by the digital one. Combining these two factors, a document based search engine that is able to query a digital document database with an input file is urgently needed. This thesis is a software development that aims to design and implement a prototype of such search engine, and propose latent optimization methods for Loredge. This research can be mainly divided into two categories: Prototype Development and Optimization Analysis. It involves an analytical research on sample documents provided by Loredge and a multi-perspective performance analysis. The prototype contains reading, preprocessing and similarity measurement. The reading part reads in a PDF file by using an imported Java library Apache PDFBox. The preprocessing processes the in-reading document and generates document fingerprint. The similarity measurement is the final stage that measures the similarity between the input fingerprint with all the document fingerprints in the database. The optimization analysis is to balance resource consumptions involving response time, accuracy rate and memory consumption. According to the performance analysis, the shorter the document fingerprint is, the better performance the search program presents. Moreover, a permanent feature database and a similarity based filtration mechanism are proposed to further optimize the program. This project has laid a solid foundation for further study in the document based search engine by providing a feasible prototype and enough relevant experimental data. This study figures out that the following study should mainly focuses on improving the effectiveness of the database access, which involves data entry labeling and search algorithm optimization. / Digital sökmotor, som tar en sökfråga från användaren och sedan returnerar ett resultat som svarar på den begäran tillbaka till användaren, är oumbärligt för moderna människor som brukar surfa på Internet. Å andra sidan, det digitala dokumentets format PDF accepteras av fler och fler människor, och det används i stor utsträckning i denna tidsålder på grund av bekvämlighet och effektivitet. Det följer att det traditionella biblioteket redan har börjat bytas ut av det digitala biblioteket. När dessa två faktorer kombineras, framgår det att det brådskande behövs en dokumentbaserad sökmotor, som har förmåga att fråga en digital databas om en viss fil. Den här uppsatsen är en mjukvaruutveckling som syftar till att designa och implementera en prototyp av en sådan sökmotor, och föreslå relevant optimeringsmetod för Loredge. Den här undersökningen kan huvudsakligen delas in i två kategorier, prototyputveckling och optimeringsanalys. Arbeten involverar en analytisk forskning om exempeldokument som kommer från Loredge och en prestandaanalys utifrån flera perspektiv. Prototypen innehåller läsning, förbehandling och likhetsmätning. Läsningsdelen läser in en PDF-fil med hjälp av en importerad Java bibliotek, Apache PDFBox. Förbehandlingsdelen bearbetar det inlästa dokumentet och genererar ett dokumentfingeravtryck. Likhetsmätningen är det sista steget, som mäter likheten mellan det inlästa fingeravtrycket och fingeravtryck av alla dokument i Loredge databas. Målet med optimeringsanalysen är att balansera resursförbrukningen, som involverar responstid, noggrannhet och minnesförbrukning. Ju kortare ett dokuments fingeravtryck är, desto bättre prestanda visar sökprogram enligt resultat av prestandaanalysen. Dessutom föreslås en permanent databas med fingeravtryck, och en likhetsbaserad filtreringsmekanism för att ytterligare optimera sökprogrammet. Det här projektet har lagt en solid grund för vidare studier om dokumentbaserad sökmotorn, genom att tillhandahålla en genomförbar prototyp och tillräckligt relevanta experimentella data. Den här studie visar att kommande forskning bör huvudsakligen inriktas på att förbättra effektivitet i databasåtkomsten, vilken innefattar data märkning och optimering av sökalgoritm.
|
152 |
Modeling and Analysis of a Feedstock Logistics ProblemJudd, Jason D. 02 May 2012 (has links)
Recently, there has been a surge in the research and application of "Green energy" in the United States. This has been driven by the following three objectives: (1) to reduce the nation's reliance on foreign oil, (2) to mitigate emission of greenhouse gas, and (3) to create an economic stimulus within the United States. Switchgrass is the biomass of choice for the Southeastern United States. In this dissertation, we address a feedstock logistics problem associated with the delivery of switchgrass for conversion into biofuel. In order to satisfy the continual demand of biomass at a bioenergy plant, production fields within a 48-km radius of its location are assumed to be attracted into production. The bioenergy plant is expected to receive as many as 50-400 loads of biomass per day. As a result, an industrialized transportation system must be introduced as early as possible in order to remove bottlenecks and reduce the total system cost. Additionally, we assume locating multiple bioenergy plants within a given region for the production of biofuel. We develop mixed integer programming formulations for the feedstock logistics problem that we address and for some related problems, and we solve them either through the use of decomposition-based methods or directly through the use of CPLEX 12.1.0.
The feedstock logistics problem that we address spans the entire system-from the growing of switchgrass to the transporting of bio-crude oil, a high energy density intermediate product, to a refinery for conversion into a final product. To facilitate understanding, we present the reader with a case study that includes a preliminary cost analysis of a real-life-based instance in order to provide the reader appropriate insights of the logistics system before applying optimization techniques for its solution. First, we consider the benefits of active versus passive ownership of the production fields. This is followed by a discussion on the selection of baler type, and then, a discussion of contracts between various business entities. The advantages of storing biomass at a satellite storage location (SSL) and interactions between the operations performed at the production field with those performed at the storage locations are then established. We also provide a detailed description of the operations performed at a SSL. Three potential equipment options are presented for transporting biomass from the SSLs to a utilization point, defined in this study as a Bio-crude Plant (BcP). The details of the entire logistics chain are presented in order to highlight the need for making decisions in view of the entire chain rather than basing them on its segments.
We model the feedstock logistics problem as a combination of a 2-level facility location-allocation problem and a multiple traveling salesmen problem (mATSP). The 2-level facility location-allocation problem pertains to the allocation of production fields to SSLs and SSLs to one of the multiple bioenergy plants. The mATSP arises because of the need for scheduling unloading operations at the SSLs. To this end, we provide a detailed study of 13 formulations of the mATSP and their reformulations as ATSPs. First, we assume that the SSLs are always full, regardless of when they are scheduled to be unloaded. We, then, relax this assumption by providing precedence constraints on the availability of the SSLs. This precedence is defined in two different ways and, is then, effectively modeled utilizing all the formulations for the mATSP and ATSP.
Given the location of a BcP for the conversion of biomass to bio-crude oil, we develop a feedstock logistics system that relies on the use of SSLs for temporary storage and loading of round bales. Three equipment systems are considered for handling biomass at the SSLs, and they are either placed permanently or are mobile, and thereby, travel from one SSL to another. We use a mathematical programming-based approach to determine SSLs and equipment routes in order to minimize the total cost incurred. The mathematical program is applied to a real-life production region in South-central Virginia (Gretna, VA), and it clearly reveals the benefits of using SSLs as a part of the logistics system. Finally, we provide a sensitivity analysis on the input parameters that we used. This analysis highlights the key cost factors in the model, and it emphasizes areas where biggest gains can be achieved for further cost reduction.
For a more general scenario, where multiple BcPs have to be located, we use a nested Benders' decomposition-based method. First, we prove the validity of using this method. We, then, employ this method for the solution of a potential real-life instance. Moreover, we successfully solve problems that are more than an order of magnitude larger than those solved directly by CPLEX 12.1.0.
Finally, we develop a Benders' decomposition-based method for the solution of a problem that gives rise to a binary sub-problem. The difficulty arises because of the sub-problem being an integer program for which the dual solution is not readily available. Our approach consists of first solving the integer sub-problem, and then, generating the convex hull at the optimal integer point. We illustrate this approach for an instance for which such a convex hull is readily available, but otherwise, it is too expensive to generate for the entire problem. This special instance is the solution of the mATSP (using Benders' decomposition) for which each of the sub-problems is an ATSP. The convex hull for the ATSP is given by the Dantzig, Fulkerson, and Johnson constraints. These constraints at a given integer solution point are only polynomial in number. With the inclusion of these constraints, a linear programming solution and its corresponding dual solution can now be obtained at the optimal integer points. We have proven the validity of using this method. However, the success of our algorithm is limited because of a large number of integer problems that must be solved at every iteration. While the algorithm is theoretically promising, the advantages of the decomposition do not seem to outweigh the additional cost resulting from solving a larger number of decomposed problems. / Ph. D.
|
153 |
Generation and Optimization of Local Shape Descriptors for Point Matching in 3-D SurfacesTaati, BABAK 01 September 2009 (has links)
We formulate Local Shape Descriptor selection for model-based object recognition in range data as an optimization problem and offer a platform that facilitates a solution. The goal of object recognition is to identify and localize objects of interest in an image. Recognition is often performed in three phases: point matching, where correspondences are established between points on the 3-D surfaces of the models and the range image; hypothesis generation, where rough alignments are found between the image and the visible models; and pose refinement, where the accuracy of the initial alignments is improved. The overall efficiency and reliability of a recognition system is highly influenced by the effectiveness of the point matching phase. Local Shape Descriptors are used for establishing point correspondences by way of encapsulating local shape, such that similarity between two descriptors indicates geometric similarity between their respective neighbourhoods.
We present a generalized platform for constructing local shape descriptors that subsumes a large class of existing methods and allows for tuning descriptors to the geometry of specific models and to sensor characteristics. Our descriptors, termed as Variable-Dimensional Local Shape Descriptors, are constructed as multivariate observations of several local properties and are represented as histograms. The optimal set of properties, which maximizes the performance of a recognition system, depend on the geometry of the objects of interest and the noise characteristics of range image acquisition devices and is selected through pre-processing the models and sample training images. Experimental analysis confirms the superiority of optimized descriptors over generic ones in recognition tasks in LIDAR and dense stereo range images. / Thesis (Ph.D, Electrical & Computer Engineering) -- Queen's University, 2009-09-01 11:07:32.084
|
154 |
Analýza dat síťové komunikace mobilních zařízení / Analysis of Mobile Devices Network Communication DataAbraham, Lukáš January 2020 (has links)
At the beginning, the work describes DNS and SSL/TLS protocols, it mainly deals with communication between devices using these protocols. Then we'll talk about data preprocessing and data cleaning. Furthermore, the thesis deals with basic data mining techniques such as data classification, association rules, information retrieval, regression analysis and cluster analysis. The next chapter we can read something about how to identify mobile devices on the network. We will evaluate data sets that contain collected data from communication between the above mentioned protocols, which will be used in the practical part. After that, we finally get to the design of a system for analyzing network communication data. We will describe the libraries, which we used and the entire system implementation. We will perform a large number of experiments, which we will finally evaluate.
|
155 |
Vyhledání význačných bodů v rastrovém obraze / Searching for Points of Interest in Raster ImageKaněčka, Petr Unknown Date (has links)
This document deals with an image points of interest detection possibilities, especially corner detectors. Many applications which are interested in computer vision needs these points as their necessary step in the image processing. It describes the reasons why it is so useful to find these points and shows some basic methods to find them. There are compared features of these methods at the end.
|
156 |
Human Pose and Action Recognition using Negative Space AnalysisJanse Van Vuuren, Michaella 12 1900 (has links)
This thesis proposes a novel approach to extracting pose information from image sequences. Current state of the art techniques focus exclusively on the image space occupied by the body for pose and action recognition. The method proposed here, however, focuses on the negative spaces: the areas surrounding the individual. This has resulted in the colour-coded negative space approach, an image preprocessing step that circumvents the need for complicated model fitting or template matching methods. The approach can be described as follows: negative spaces surrounding the human silhouette are extracted using horizontal and vertical scanning processes. These negative space areas are more numerous, and undergo more radical changes in shape than the single area occupied by the figure of the person performing an action. The colour-coded negative space representation is formed using the four binary images produced by the scanning processes. Features are then extracted from the colour-coded images. These are based on the percentage of area occupied by distinct coloured regions as well as the bounding box proportions. Pose clusters are identified using feedback from an independent action set. Subsequent images are classified using a simple Euclidean distance measure. An image sequence is thus temporally segmented into its corresponding pose representations. Action recognition simply becomes the detection of a temporally ordered sequence of poses that characterises the action. The method is purely vision-based, utilising monocular images with no need for body markers or special clothing. Two datasets were constructed using several actors performing different poses and actions. Some of these actions included actors waving their arms, sitting down or kicking a leg. These actions were recorded against a monochrome background to simplify the segmentation of the actors from the background. The actions were then recorded on DV cam and digitised into a data base. The silhouette images from these actions were isolated and placed in a frame or bounding box. The next step was to highlight the negative spaces using a directional scanning method. This scanning method colour-codes the negative spaces of each action. What became immediately apparent is that very distinctive colour patterns formed for different actions. To emphasise the action, different colours were allocated to negative spaces surrounding the image. For example, the space between the legs of an actor standing in a T - pose with legs apart would be allocated yellow, while the space below the arms were allocated different shades of green. The space surrounding the head would be different shades of purple. During an action when the actor moves one leg up in a kicking fashion, the yellow colour would increase. Inversely, when the actor closes his legs and puts them together, the yellow colour filling the negative space would decrease substantially. What also became apparent is that these coloured negative spaces are interdependent and that they influence each other during the course of an action. For example, when an actor lifts one of his legs, increasing the yellow-coded negative space, the green space between that leg and the arm decreases. This interrelationship between colours hold true for all poses and actions as presented in this thesis. In terms of pose recognition, it is significant that these colour coded negative spaces and the way the change during an action or a movement are substantial and instantly recognisable. Compare for example, looking at someone lifting an arm as opposed to seeing a vast negative space changing shape. In a controlled research environment, several actors were instructed to perform a number of different actions. After colour coding the negative spaces, it became apparent that every action can be recognised by a unique colour coded pattern. The challenge is to ascribe a numerical presentation, a mathematical quotation, to extract the essence of what is so visually apparent. The essence of pose recognition and it's measurability lies in the relationship between the colours in these negative spaces and how they impact on each other during a pose or an action. The simplest way of measuring this relationship is by calculating the percentage of each colour present during an action. These calculated percentages become the basis of pose and action recognition. By plotting these percentages on a graph confirms that the essence of these different actions and poses can in fact been captured and recognised. Despite variations in these traces caused by time differences, personal appearance and mannerisms, what emerged is a clear recognisable pattern that can be married to an action or different parts of an action. 7 Actors might lift their left leg, some slightly higher than others, some slower than others and these variations in terms of colour percentages would be recorded as a trace, but there would be very specific stages during the action where the traces would correspond, making the action recognisable.In conclusion, using negative space as a tool in human pose and tracking recognition presents an exiting research avenue because it is influenced less by variations such as difference in personal appearance and changes in the angle of observation. This approach is also simplistic and does not rely on complicated models and templates
|
157 |
GIS-based Episode Reconstruction Using GPS Data for Activity Analysis and Route Choice Modeling / GIS-based Episode Reconstruction Using GPS DataDalumpines, Ron 26 September 2014 (has links)
Most transportation problems arise from individual travel decisions. In response, transportation researchers had been studying individual travel behavior – a growing trend that requires activity data at individual level. Global positioning systems (GPS) and geographical information systems (GIS) have been used to capture and process individual activity data, from determining activity locations to mapping routes to these locations. Potential applications of GPS data seem limitless but our tools and methods to make these data usable lags behind. In response to this need, this dissertation presents a GIS-based toolkit to automatically extract activity episodes from GPS data and derive information related to these episodes from additional data (e.g., road network, land use).
The major emphasis of this dissertation is the development of a toolkit for extracting information associated with movements of individuals from GPS data. To be effective, the toolkit has been developed around three design principles: transferability, modularity, and scalability. Two substantive chapters focus on selected components of the toolkit (map-matching, mode detection); another for the entire toolkit. Final substantive chapter demonstrates the toolkit’s potential by comparing route choice models of work and shop trips using inputs generated by the toolkit.
There are several tools and methods that capitalize on GPS data, developed within different problem domains. This dissertation contributes to that repository of tools and methods by presenting a suite of tools that can extract all possible information that can be derived from GPS data. Unlike existing tools cited in the transportation literature, the toolkit has been designed to be complete (covers preprocessing up to extracting route attributes), and can work with GPS data alone or in combination with additional data. Moreover, this dissertation contributes to our understanding of route choice decisions for work and shop trips by looking into the combined effects of route attributes and individual characteristics. / Dissertation / Doctor of Philosophy (PhD)
|
158 |
Railway curve squeal: Statistical analysis of train speed impact on squeal noiseAsplund, Ruben January 2024 (has links)
No description available.
|
Page generated in 0.0286 seconds