Global ETD Search

161	Bioinformatics-inspired binary image correlation: application to bio-/medical-images, microsarrays, finger-prints and signature classifications Unknown Date (has links) The efforts addressed in this thesis refer to assaying the extent of local features in 2D-images for the purpose of recognition and classification. It is based on comparing a test-image against a template in binary format. It is a bioinformatics-inspired approach pursued and presented as deliverables of this thesis as summarized below: 1. By applying the so-called 'Smith-Waterman (SW) local alignment' and 'Needleman-Wunsch (NW) global alignment' approaches of bioinformatics, a test 2D-image in binary format is compared against a reference image so as to recognize the differential features that reside locally in the images being compared 2. SW and NW algorithms based binary comparison involves conversion of one-dimensional sequence alignment procedure (indicated traditionally for molecular sequence comparison adopted in bioinformatics) to 2D-image matrix 3. Relevant algorithms specific to computations are implemented as MatLabTM codes 4. Test-images considered are: Real-world bio-/medical-images, synthetic images, microarrays, biometric finger prints (thumb-impressions) and handwritten signatures. Based on the results, conclusions are enumerated and inferences are made with directions for future studies. / by Deepti Pappusetty. / Thesis (M.S.C.S.)--Florida Atlantic University, 2011. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2011. Mode of access: World Wide Web. Bioinformatics--Statistical methods Diagnostic imaging--Digital techniques Image processing--Digital techniques Pattern perception--Data processing DNA microarrays
162	Lexical and sublexical processing in Chinese character recognition. / 汉字认知中的词汇与亚词汇加工 / CUHK electronic theses & dissertations collection / Han zi ren zhi zhong de ci hui yu ya ci hui jia gong January 2013 (has links) Mo, Deyuan. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 153-167). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese; appendixes includes Chinese. Chinese characters--Data processing Chinese language--Word formation Pattern perception Word recognition--Data processing
163	Audio-video based handwritten mathematical content recognition Vemulapalli, Smita 12 November 2012 (has links) Recognizing handwritten mathematical content is a challenging problem, and more so when such content appears in classroom videos. However, given the fact that in such videos the handwritten text and the accompanying audio refer to the same content, a combination of video and audio based recognizer has the potential to significantly improve the content recognition accuracy. This dissertation, using a combination of video and audio based recognizers, focuses on improving the recognition accuracy associated with handwritten mathematical content in such videos. Our approach makes use of a video recognizer as the primary recognizer and a multi-stage assembly, developed as part of this research, is used to facilitate effective combination with an audio recognizer. Specifically, we address the following challenges related to audio-video based handwritten mathematical content recognition: (1) Video Preprocessing - generates a timestamped sequence of segmented characters from the classroom video in the face of occlusions and shadows caused by the instructor, (2) Ambiguity Detection - determines the subset of input characters that may have been incorrectly recognized by the video based recognizer and forwards this subset for disambiguation, (3) A/V Synchronization - establishes correspondence between the handwritten character and the spoken content, (4) A/V Combination - combines the synchronized outputs from the video and audio based recognizers and generates the final recognized character, and (5) Grammar Assisted A/V Based Mathematical Content Recognition - utilizes a base mathematical speech grammar for both character and structure disambiguation. Experiments conducted using videos recorded in a classroom-like environment demonstrate the significant improvements in recognition accuracy that can be achieved using our techniques. Pattern recognition Digital signal processing Classifier combination Signal processing Digital techniques Image processing Information retrieval Pattern perception Pattern recognition systems
164	Statistical pattern recognition approaches for retrieval-based machine translation systems Mansjur, Dwi Sianto 01 November 2011 (has links) This dissertation addresses the problem of Machine Translation (MT), which is defined as an automated translation of a document written in one language (the source language) to another (the target language) by a computer. The MT task requires various types of knowledge of both the source and target language, e.g., linguistic rules and linguistic exceptions. Traditional MT systems rely on an extensive parsing strategy to decode the linguistic rules and use a knowledge base to encode those linguistic exceptions. However, the construction of the knowledge base becomes an issue as the translation system grows. To overcome this difficulty, real translation examples are used instead of a manually-crafted knowledge base. This design strategy is known as the Example-Based Machine Translation (EBMT) principle. Traditional EBMT systems utilize a database of word or phrase translation pairs. The main challenge of this approach is the difficulty of combining the word or phrase translation units into a meaningful and fluent target text. A novel Retrieval-Based Machine Translation (RBMT) system, which uses a sentence-level translation unit, is proposed in this study. An advantage of using the sentence-level translation unit is that the boundary of a sentence is explicitly defined and the semantic, or meaning, is precise in both the source and target language. The main challenge of using a sentential translation unit is the limited coverage, i.e., the difficulty of finding an exact match between a user query and sentences in the source database. Using an electronic dictionary and a topic modeling procedure, we develop a procedure to obtain clusters of sensible variations for each example in the source database. The coverage of our MT system improves because an input query text is matched against a cluster of sensible variations of translation examples instead of being matched against an original source example. In addition, pattern recognition techniques are used to improve the matching procedure, i.e., the design of optimal pattern classifiers and the incorporation of subjective judgments. A high performance statistical pattern classifier is used to identify the target sentences from an input query sentence in our MT system. The proposed classifier is different from the conventional classifier in terms of the way it addresses the generalization capability. A conventional classifier addresses the generalization issue using the parsimony principle and may encounter the possibility of choosing an oversimplified statistical model. The proposed classifier directly addresses the generalization issue in terms of training (empirical) data. Our classifier is expected to generalize better than the conventional classifiers because our classifier is less likely to use over-simplified statistical models based on the available training data. We further improve the matching procedure by the incorporation of subjective judgments. We formulate a novel cost function that combines subjective judgments and the degree of matching between translation examples and an input query. In addition, we provide an optimization strategy for the novel cost function so that the statistical model can be optimized according to the subjective judgments. Machine translation Text categorization Information retrieval Machine learning Pattern recognition Artificial intelligence Pattern perception Pattern recognition systems Machine translating Algorithms
165	Genetic Programming Based Multicategory Pattern Classification Kishore, Krishna J 03 1900 (has links) Nature has created complex biological structures that exhibit intelligent behaviour through an evolutionary process. Thus, intelligence and evolution are intimately connected. This has inspired evolutionary computation (EC) that simulates the evolutionary process to develop powerful techniques such as genetic algorithms (GAs), genetic programming (GP), evolutionary strategies (ES) and evolutionary programming (EP) to solve real-world problems in learning, control, optimization and classification. GP discovers the relationship among data and expresses it as a LISP-S expression i.e., a computer program. Thus the goal of program discovery as a solution for a problem is addressed by GP in the framework of evolutionary computation. In this thesis, we address for the first time the problem of applying GP to mu1ticategory pattern classification. In supervised pattern classification, an input vector of m dimensions is mapped onto one of the n classes. It has a number of application areas such as remote sensing, medical diagnosis etc., A supervised classifier is developed by using a training set that contains representative samples of various classes present in the application. Supervised classification has been done earlier with maximum likelihood classifier: neural networks and fuzzy logic. The major considerations in applying GP to pattern classification are listed below: (i) GP-based techniques are data distribution-free i.e., no a priori knowledge is needed abut the statistical distribution of the data or no assumption such as normal distribution for data needs to be made as in MLC. (ii) GP can directly operate on the data in its original form. (iii) GP can detect the underlying but unknown relationship that mists among data and express it as a mathematical LISP S-expression. The generated LISP S-expressions can be directly used in the application environment. (iv) GP can either discover the most important discriminating features of a class during evolution or it requires minor post-processing of the LISP-S expression to discover the discriminant features. In a neural network, the knowledge learned by the neural network about the data distributions is embedded in the interconnection weights and it requires considerable amount of post-processing of the weights to understand the decision of the neural network. In 2-category pattern classification, a single GP expression is evolved as a discriminant function. The output of the GP expression can be +l for samples of one class and -1 for samples of the other class. When the GP paradigm is applied to an n-class problem, the following questions arise: Ql. As a typical GP expression returns a value (+l or -1) for a 2-class problem, how does one apply GP for the n-class pattern classification problem? Q2. What should be the fitness function during evolution of the GP expressions? Q3. How does the choice of a function set affect the performance of GP-based classification? Q4. How should training sets be created for evaluating fitness during the evolution of GP classifier expressions? Q5. How does one improve learning of the underlying data distributions in a GP framework? Q6. How should conflict resolution be handled before assigning a class to the input feature vector? Q7. How does GP compare with other classifiers for an n-class pattern classification problem? The research described here seeks to answer these questions. We show that GP can be applied to an n-category pattern classification problem by considering it as n 2-class problems. The suitability of this approach is demonstrated by considering a real-world problem based on remotely sensed satellite images and Fisher's Iris data set. In a 2-class problem, simple thresholding is sufficient for a discriminant function to divide the feature space into two regions. This means that one genetic programming classifier expression (GPCE) is sufficient to say whether or not the given input feature vector belongs to that class; i.e., the GP expression returns a value (+1 or -1). As the n-class problem is formulated as n 2-class problems, n GPCEs are evolved. Hence, n GPCE specific training sets are needed to evolve these n GPCEs. For the sake of illustration, consider a 5-class pat tern classification problem. Let n, be the number of samples that belong to class j, and N, be the number of samples that do not belong to class j, (j = 1,..., 5). Thus, N1=n2+n3+n4+n5 N2=n1+n3+n4+n5 N3=n1+n2+n4+n5 N4=n1+n2+n3+n5 N5=n1+n2+n3+n4 Thus, When the five class problem is formulated as five 2-class problems. we need five GPCEs as discriminant functions to resolve between n1 and N1, n2 and N2, n3 and N3, n4 and N4 and lastly n5 and N5. Each of these five 2-class problems is handled as a separate 2-class problem with simple thresholding. Thus, GPCE# l resolves between samples of class# l and the remaining n - 1 classes. A training set is needed to evaluate the fitness of GPCE during its evolution. If we directly create the training set, it leads to skewness (as n1 < N1). To overcome the skewness, an interleaved data format is proposed for the training set of a GPCE. For example, in the training set of GPCE# l, samples of class# l are placed alternately between samples of the remaining n - 1 classes. Thus, the interleaved data format is an artifact to create a balanced training set. Conventionally, all the samples of a training set are fed to evaluate the fitness of every member of the population in each generation. We call this "global" learning 3s GP tries to learn the entire training set at every stage of the evolution. We have introduced incremental learning to simplify the task of learning for the GP paradigm. A subset of the training set is fed and the size of the subset is gradually increased over time to cover the entire training data. The basic motivation for incremental learning is to improve learning during evolution as it is easier to learn a smaller task and then to progress from a smaller task to a bigger task. Experimental results are presented to show that the interleaved data format and incremental learning improve the performance of the GP classifier. We also show that the GPCEs evolved with an arithmetic function set are able to track variation in the input better than GPCEs evolved with function sets containing logical and nonlinear elements. Hence, we have used arithmetic function set, incremental learning, and interleaved data format to evolve GPCEs in our simulations. AS each GPCE is trained to recognize samples belonging to its own class and reject samples belonging to other classes a strength of association measure is associated with each GPCE to indicate the degree to which it can recognize samples belonging to its own class. The strength of association measures are used for assigning a class to an input feature vector. To reduce misclassification of samples, we also show how heuristic rules can be generated in the GP framework unlike in either MLC or the neural network classifier. We have also studied the scalability and generalizing ability of the GP classifier by varying the number of classes. We also analyse the performance of the GP classifier by considering the well-known Iris data set. We compare the performance of classification rules generated from the GP classifier with those generated from neural network classifier, (24.5 method and fuzzy classifier for the Iris data set. We show that the performance of GP is comparable to other classifiers for the Iris data set. We notice that the classification rules can be generated with very little post-processing and they are very similar to the rules generated from the neural network and C4.5 for the Iris data set. Incremental learning influences the number of generations available for GP to learn the data distribution of classes whose d is -1 in the interleaved data format. This is because the samples belonging to the true class (desired output d is +1) are alternately placed between samples belonging to other classes i.e., they are repeated to balance the training set in the interleaved data format. For example, in the evolution of GPCE for class# l, the fitness function can be fed initially with samples of class#:! and subsequently with the samples of class#3, class#4 and class#. So in the evaluation of the fitness function, the samples of class#kt5 will not be present when the samples of class#2 are present in the initial stages. However, in the later stages of evolution, when samples of class#5 are fed, the fitness function will utilize the samples of both class#2 and class#5. As learning in evolutionary computation is guided by the evaluation of the fitness function, GPCE# l gets lesser number of generations to learn how to reject data of class#5 as compared to the data of class#2. This is because the termination criterion (i.e., the maximum number of generations) is defined a priori. It is clear that there are (n-l)! Ways of ordering the samples of classes whose d is -1 in the interleaved data format. Hence a heuristic is presented to determine a possible order to feed data of different classes for the GPCEs evolved with incremental learning and interleaved data format. The heuristic computes an overlap index for each class based on its spatial spread and distribution of data in the region of overlap with respect to other classes in each feature. The heuristic determines the order in which classes whose desired output d is –1 should be placed in each GPCE-specific training set for the interleaved data format. This ensures that GP gets more number of generations to learn about the data distribution of a class with higher overlap index than a class with lower overlap index. The ability of the GP classifier to learn the data distributions depends upon the number of classes and the spatial spread of data. As the number of classes increases, the GP classifier finds it difficult to resolve between classes. So there is a need to partition the feature space and identify subspaces with reduced number of classes. The basic objective is to divide the feature space into subspaces and hence the data set that contains representative samples of n classes into subdata sets corresponding to the subspaces of the feature space, so that some of the subdata sets/spaces can have data belonging to only p classes (p < n). The GP classifier is then evolved independently for the subdata sets/spaces of the feature space. This results in localized learning as the GP classifier has to learn the data distribution in only a subspace of the feature space rather than in the entire feature space. By integrating the GP classifier with feature space partitioning (FSP), we improve classification accuracy due to localized learning. Although serial computers have increased steadily in their performance, the quest for parallel implementation of a given task has continued to be of interest in any computationally intensive task since parallel implementation leads to a faster execution than a serial implementation As fitness evaluation, selection strategy and population structures are used to evolve a solution in GP, there is scope for a parallel implementation of GP classifier. We have studied distributed GP and massively parallel GP for our approach to GP-based multicategory pattern classification. We present experimental results for distributed GP with Message Passing Interface on IBM SP2 to highlight the speedup that can be achieved over the serial implementation of GP. We also show how data parallelism can be used to further speed up fitness evaluation and hence the execution of the GP paradigm for multicategory pat tern classification. We conclude that GP can be applied to n-category pattern classification and its potential lies in its simplicity and scope for parallel implementation. The GP classifier developed in this thesis can be looked upon as an addition to the earlier statistical, neural and fuzzy approaches to multicategory pattern classification. Computer and Information Science Computer Programming Genetic Algorithms Data Engineering Evolutionary Computation Genetic Programming Pattern Classification Pattern Perception Iris data set
166	A product family design methodology employing pattern recognition Freeman, Dane Fletcher 13 January 2014 (has links) Sharing components in a product family requires a trade-off between the individual products' performances and overall family costs. It is critical for a successful family to identify which components are similar, so that sharing does not compromise the individual products' performances. This research formulates two commonality identification approaches for use in product family design and investigates their applicability in a generic product family design methodology. Having a commonality identification approach reduces the combinatorial sharing problem and allows for more quality family alternatives to be considered. The first is based on the pattern recognition technique of fuzzy c-means clustering in component subspaces. If components from different products are similar enough to be grouped into the same cluster, then those components could possibly become the same platform. Fuzzy equivalence relations that show the binary relationship from one products' component to a different products' component can be extracted from the cluster membership functions. The second approach builds a Bayesian network representing the joint distribution of a design space exploration. Using this model, a series of inferences can be made based on product performance and component constraints. Finally the posterior design variable distributions can be processed using a similarity metric like the earth mover distance to identify which products' components are similar to another's. Product family Bayesian network Data mining Engineering Pattern perception Pattern recognition systems Product design Industrial design coordination
167	Statistical processing for telecommunication networks applied to ATM traffic monitoring Villegas, Ruben M. M. January 1997 (has links) Within the fields of network operation and performance measurement, it is a common requirement that the technologies involved must provide the basis for an effective, reliable, measurable and controllable service. In order to comply with the service performance criteria, the constrains often lead to very complex techniques and methodologies for the simulation, control, test, and measurement processes. This thesis addresses some of the factors that contribute to the overall spectrum of statistical performance measurements in telecommunication services. Specifically, it is concerned with the development of three low complexity and effective techniques for real-time traffic generation, control and measurement. These techniques have proved to be accurate and near optimum. In the three cases the work starts with a literature survey of known methodologies, and later new techniques are proposed and investigated by simulating the processes involved. The work is based on the use of high-speed Asynchronous Transfer Mode (ATM) networks. The problem of developing a fast traffic generation technique for the simulation of Variable Bit Rate traffic sources is considered in the first part of this thesis. For this purpose, statistical measures are obtained from the analysis of different traffic profiles or from the literature. With the aid of these measures, a model for the fast generation of Variable Bit Rate traffic at different time resolutions is developed. The simulated traffic is then analysed in order to obtain the equivalent set of statistical measures and these are compared against those observed in real traffic traces. The subject of traffic control comprises a very wide area in communication networks. It refers to the generalised classification of actions such as Connection Admission and Flow Control, Traffic Policing and Shaping. In the second part of this thesis, a method to modify the instantaneous traffic profile of a variable rate source is developed. It is particularly useful for services which have a hard bound on the cell loss probability, but a soft bound on the admissible delay, matching the characteristics of some of the services provided by ATM networks. Finally, this thesis is also concerned with a particular aspect of the operation and management of high speed networks, or OAM functions plane, namely with the monitoring of network resources. A monitoring technique based on numerical approximation and statistical sampling methods is developed and later used to characterise a particular traffic stream, or a particular connection, within a high speed network. The resulting algorithms are simple and computationally inexpensive, but effective and accurate at the same time, and are suitable for real-time processing. 621.382
168	Machine vision for the determination of identity, orientation and position of two dimensional industrial components English, Jonathan January 1996 (has links) No description available. 621.3994
169	Analysis and recognition of Persian and Arabic handwritten characters / Hosseini, Habib Mir Mohamad. January 1997 (has links) (PDF) Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 1997. / Bibliography: leaves 146-159.
170	Bus real-time arrival prediction using statistical pattern recognition technique / Vu, Nam Hoai, January 1900 (has links) Thesis (Ph.D.) - Carleton University, 2007. / Includes bibliographical references (p. 219-233). Also available in electronic format on the Internet.

Search results