Return to search

Artificial training samples for the improvement of pattern recognitionsystems

Pattern recognition is the assignment of some sort of label to a given input value

or instance, according to some specific learning algorithm. The recognition

performance is directly linked with the quality and size of the training data.

However, in many real pattern recognition implementations, it is difficult or not so

convenient to collect as many samples as possible for training up the classifier,

such as face recognition or Chinese character recognition.

In view of the shortage of training samples, the main object of our research is to

investigate the generation and use of artificial samples for improving the

recognition performance. Besides enhancing the learning, artificial samples are

also used in a novel way such that a conventional Chinese character recognizer

can read half or combined Chinese character segments. It greatly simplifies the

segmentation procedure as well as reduces the error introduced by segmentation.

Two novel generation models have been developed to evaluate the effectiveness

of supplementing artificial samples in the training. One model generates artificial

faces with various facial expressions or lighting conditions by morphing and

warping two given sample faces. We tested our face generation model in three

popular 2D face databases, which contain both gray scale and color images.

Experiments show the generated faces look quite natural and they improve the

recognition rates by a large margin.

The other model uses stroke and radical information to build new Chinese

characters. Artificial Chinese characters are produced by Bezier curves passing

through some specified points. This model is more flexible in generating artificial

handwritten characters than merely distorting the genuine real samples, with both

stroke level and radical level variations. Another feature of this character

generation model is that it does not require any real handwritten character sample

at hand. In other words, we can train the conventional character classifier and

perform character recognition tasks without collecting handwritten samples.

Experiment results have validated its possibility and the recognition rate is still


Besides tackling the small sample size problem in face recognition and isolated

character recognition, we improve the performance of bank check legal amount

recognizer by proposing character segments recognition and applying Hidden

Markov Model (HMM).

It is hoped that this thesis can provide some insights for future researches in

artificial sample generation, face morphing, Chinese character segmentation and

text recognition or some other related issues. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy

  1. 10.5353/th_b4784964
  2. b4784964
Date January 2012
CreatorsNi, Zhibo., 倪志博.
ContributorsLeung, CH
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
Detected LanguageEnglish
RightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works., Creative Commons: Attribution 3.0 Hong Kong License
RelationHKU Theses Online (HKUTO)

Page generated in 0.0063 seconds