• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 112
  • 42
  • 13
  • 9
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 208
  • 208
  • 208
  • 80
  • 59
  • 54
  • 43
  • 36
  • 32
  • 28
  • 25
  • 25
  • 25
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

A Semi Supervised Support Vector Machine for a Recommender System : Applied to a real estate dataset

Méndez, José January 2021 (has links)
Recommender systems are widely used in e-commerce websites to improve the buying experience of the customer. In recent years, e-commerce has been quickly expanding and its growth has been accelerated during the COVID-19 pandemic, when customers and retailers were asked to keep their distance and do lockdowns. Therefore, there is an increasing demand for items and good recommendations to the users to improve their shopping experience. In this master’s thesis a recommender system for a real-estate website is built, based on Support Vector Machines (SVM). The main characteristic of the built model is that it is trained with a few labelled samples and the rest of unlabelled samples, using a semi-supervised machine learning paradigm. The model is constructed step-by-step from the simple SVM, until the semi-supervised Nested Cost-Sensitive Support Vector Machine (NCS-SVM). Then, we compare our model using four different kernel functions: gaussian, second-degree polynomial, fourth-degree polynomial, and linear. We also compare a user with strict housing requirements against a user with vague requirements. We finish with a discussion focusing principally on parameter tuning, and briefly in the model downsides and ethical considerations.
162

Action Recognition with Knowledge Transfer

Choi, Jin-Woo 07 January 2021 (has links)
Recent progress on deep neural networks has shown remarkable action recognition performance from videos. The remarkable performance is often achieved by transfer learning: training a model on a large-scale labeled dataset (source) and then fine-tuning the model on the small-scale labeled datasets (targets). However, existing action recognition models do not always generalize well on new tasks or datasets because of the following two reasons. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor generalization performance. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small- scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. For the first problem, I propose to learn scene-invariant action representations to mitigate the scene bias in action recognition models. Specifically, I augment the standard cross-entropy loss for action classification with 1) an adversarial loss for the scene types and 2) a human mask confusion loss for videos where the human actors are invisible. These two losses encourage learning representations unsuitable for predicting 1) the correct scene types and 2) the correct action types when there is no evidence. I validate the efficacy of the proposed method by transfer learning experiments. I trans- fer the pre-trained model to three different tasks, including action classification, temporal action localization, and spatio-temporal action detection. The results show consistent improvement over the baselines for every task and dataset. I formulate human action recognition as an unsupervised domain adaptation (UDA) problem to handle the second problem. In the UDA setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already exist- ing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene, to learn domain-invariant action representations. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Then I explore the semi-supervised video action recognition, where we have a lot of labeled videos as source data and sparsely labeled videos as target data. The semi-supervised setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject photometric, geometric, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks. / Doctor of Philosophy / Recent progress on deep learning has shown remarkable action recognition performance. The remarkable performance is often achieved by transferring the knowledge learned from existing large-scale data to the small-scale data specific to applications. However, existing action recog- nition models do not always work well on new tasks and datasets because of the following two problems. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor performance on the new datasets and tasks. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small-scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. To tackle the first problem, I propose to learn scene-invariant action representations to mitigate background scene- biased human action recognition models for the first problem. Specifically, the proposed method learns representations that cannot predict the scene types and the correct actions when there is no evidence. I validate the proposed method's effectiveness by transferring the pre-trained model to multiple action understanding tasks. The results show consistent improvement over the baselines for every task and dataset. To handle the second problem, I formulate human action recognition as an unsupervised learning problem on the target data. In this setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already existing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Here, we have many labeled videos as source data and sparsely labeled videos as target data. The setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject color, spatial, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks.
163

Label-Efficient Visual Understanding with Consistency Constraints

Zou, Yuliang 24 May 2022 (has links)
Modern deep neural networks are proficient at solving various visual recognition and understanding tasks, as long as a sufficiently large labeled dataset is available during the training time. However, the progress of these visual tasks is limited by the number of manual annotations. On the other hand, it is usually time-consuming and error-prone to annotate visual data, rendering the challenge of scaling up human labeling for many visual tasks. Fortunately, it is easy to collect large-scale, diverse unlabeled visual data from the Internet. And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how to utilize the unlabeled data and synthetic labeled data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea is to encourage deep neural networks to produce consistent predictions across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose to use the consistency over different geometric formulations and a cycle consistency over time to tackle the low-level scene geometry perception tasks in a self-supervised learning setting. In Part II, we tackle the high-level semantic understanding tasks in a semi-supervised learning setting, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains with one single forward pass, without model training or optimization at the inference time. / Doctor of Philosophy / Recently, deep learning has emerged as one of the most powerful tools to solve various visual understanding tasks. However, the development of deep learning methods is significantly limited by the amount of manually labeled data. On the other hand, it is usually time-consuming and error-prone to annotate visual data, making the human labeling process not easily scalable. Fortunately, it is easy to collect large-scale, diverse raw visual data from the Internet (\eg search engines, YouTube, Instagram, etc.). And we can acquire a large amount of synthetic visual data with annotations from game engines effortlessly. In this dissertation, we explore how we can utilize the raw visual data and synthetic data for various visual tasks, aiming to replace or reduce the direct supervision from the manual annotations. The key idea behind this is to encourage deep neural networks to produce consistent predictions of the same visual input across different transformations (\eg geometry, temporal, photometric, etc.). We organize the dissertation as follows. In Part I, we propose using the consistency over different geometric formulations and a forward-backward cycle consistency over time to tackle the low-level scene geometry perception tasks, using unlabeled visual data only. In Part II, we tackle the high-level semantic understanding tasks using both a small amount of labeled data and a large amount of unlabeled data jointly, with the constraint that different augmented views of the same visual input maintain consistent semantic information. In Part III, we tackle the cross-domain image segmentation problem. By encouraging an adaptive segmentation model to output consistent results for a diverse set of strongly-augmented synthetic data, the model learns to perform test-time adaptation on unseen target domains.
164

Handling Domain Shift in 3D Point Cloud Perception

Saltori, Cristiano 10 April 2024 (has links)
This thesis addresses the problem of domain shift in 3D point cloud perception. In the last decades, there has been tremendous progress in within-domain training and testing. However, the performance of perception models is affected when training on a source domain and testing on a target domain sampled from different data distributions. As a result, a change in sensor or geo-location can lead to a harmful drop in model performance. While solutions exist for image perception, addressing this problem in point clouds remains unresolved. The focus of this thesis is the study and design of solutions for mitigating domain shift in 3D point cloud perception. We identify several settings differing in the level of target supervision and the availability of source data. We conduct a thorough study of each setting and introduce a new method to solve domain shift in each configuration. In particular, we study three novel settings in domain adaptation and domain generalization and propose five new methods for mitigating domain shift in 3D point cloud perception. Our methods are used by the research community, and at the time of writing, some of the proposed approaches hold the state-of-the-art. In conclusion, this thesis provides a valuable contribution to the computer vision community, setting the groundwork for the development of future works in cross-domain conditions.
165

<b>A Study on the Use of Unsupervised, Supervised, and Semi-supervised Modeling for Jamming Detection and Classification in Unmanned Aerial Vehicles</b>

Margaux Camille Marie Catafort--Silva (18477354) 02 May 2024 (has links)
<p dir="ltr">In this work, first, unsupervised machine learning is proposed as a study for detecting and classifying jamming attacks targeting unmanned aerial vehicles (UAV) operating at a 2.4 GHz band. Three scenarios are developed with a dataset of samples extracted from meticulous experimental routines using various unsupervised learning algorithms, namely K-means, density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering (AGG) and Gaussian mixture model (GMM). These routines characterize attack scenarios entailing barrage (BA), single- tone (ST), successive-pulse (SP), and protocol-aware (PA) jamming in three different settings. In the first setting, all extracted features from the original dataset are used (i.e., nine in total). In the second setting, Spearman correlation is implemented to reduce the number of these features. In the third setting, principal component analysis (PCA) is utilized to reduce the dimensionality of the dataset to minimize complexity. The metrics used to compare the algorithms are homogeneity, completeness, v-measure, adjusted mutual information (AMI) and adjusted rank index (ARI). The optimum model scored 1.00, 0.949, 0.791, 0.722, and 0.791, respectively, allowing the detection and classification of these four jamming types with an acceptable degree of confidence.</p><p dir="ltr">Second, following a different study, supervised learning (i.e., random forest modeling) is developed to achieve a binary classification to ensure accurate clustering of samples into two distinct classes: clean and jamming. Following this supervised-based classification, two-class and three-class unsupervised learning is implemented considering three of the four jamming types: BA, ST, and SP. In this initial step, the four aforementioned algorithms are used. This newly developed study is intended to facilitate the visualization of the performance of each algorithm, for example, AGG performs a homogeneity of 1.0, a completeness of 0.950, a V-measure of 0.713, an ARI of 0.557 and an AMI of 0.713, and GMM generates 1, 0.771, 0.645, 0.536 and 0.644, respectively. Lastly, to improve the classification of this study, semi-supervised learning is adopted instead of unsupervised learning considering the same algorithms and dataset. In this case, GMM achieves results of 1, 0.688, 0.688, 0.786 and 0.688 whereas DBSCAN achieves 0, 0.036, 0.028, 0.018, 0.028 for homogeneity, completeness, V-measure, ARI and AMI respectively. Overall, this unsupervised learning is approached as a method for jamming classification, addressing the challenge of identifying newly introduced samples.</p>
166

Interactive Transcription of Old Text Documents

Serrano Martínez-Santos, Nicolás 09 June 2014 (has links)
Nowadays, there are huge collections of handwritten text documents in libraries all over the world. The high demand for these resources has led to the creation of digital libraries in order to facilitate the preservation and provide electronic access to these documents. However text transcription of these documents im- ages are not always available to allow users to quickly search information, or computers to process the information, search patterns or draw out statistics. The problem is that manual transcription of these documents is an expensive task from both economical and time viewpoints. This thesis presents a novel ap- proach for e cient Computer Assisted Transcription (CAT) of handwritten text documents using state-of-the-art Handwriting Text Recognition (HTR) systems. The objective of CAT approaches is to e ciently complete a transcription task through human-machine collaboration, as the e ort required to generate a manual transcription is high, and automatically generated transcriptions from state-of-the-art systems still do not reach the accuracy required. This thesis is centered on a special application of CAT, that is, the transcription of old text document when the quantity of user e ort available is limited, and thus, the entire document cannot be revised. In this approach, the objective is to generate the best possible transcription by means of the user e ort available. This thesis provides a comprehensive view of the CAT process from feature extraction to user interaction. First, a statistical approach to generalise interactive transcription is pro- posed. As its direct application is unfeasible, some assumptions are made to apply it to two di erent tasks. First, on the interactive transcription of hand- written text documents, and next, on the interactive detection of the document layout. Next, the digitisation and annotation process of two real old text documents is described. This process was carried out because of the scarcity of similar resources and the need of annotated data to thoroughly test all the developed tools and techniques in this thesis. These two documents were carefully selected to represent the general di culties that are encountered when dealing with HTR. Baseline results are presented on these two documents to settle down a benchmark with a standard HTR system. Finally, these annotated documents were made freely available to the community. It must be noted that, all the techniques and methods developed in this thesis have been assessed on these two real old text documents. Then, a CAT approach for HTR when user e ort is limited is studied and extensively tested. The ultimate goal of applying CAT is achieved by putting together three processes. Given a recognised transcription from an HTR system. The rst process consists in locating (possibly) incorrect words and employs the user e ort available to supervise them (if necessary). As most words are not expected to be supervised due to the limited user e ort available, only a few are selected to be revised. The system presents to the user a small subset of these words according to an estimation of their correctness, or to be more precise, according to their con dence level. Next, the second process starts once these low con dence words have been supervised. This process updates the recogni- tion of the document taking user corrections into consideration, which improves the quality of those words that were not revised by the user. Finally, the last process adapts the system from the partially revised (and possibly not perfect) transcription obtained so far. In this adaptation, the system intelligently selects the correct words of the transcription. As results, the adapted system will bet- ter recognise future transcriptions. Transcription experiments using this CAT approach show that this approach is mostly e ective when user e ort is low. The last contribution of this thesis is a method for balancing the nal tran- scription quality and the supervision e ort applied using our previously de- scribed CAT approach. In other words, this method allows the user to control the amount of errors in the transcriptions obtained from a CAT approach. The motivation of this method is to let users decide on the nal quality of the desired documents, as partially erroneous transcriptions can be su cient to convey the meaning, and the user e ort required to transcribe them might be signi cantly lower when compared to obtaining a totally manual transcription. Consequently, the system estimates the minimum user e ort required to reach the amount of error de ned by the user. Error estimation is performed by computing sepa- rately the error produced by each recognised word, and thus, asking the user to only revise the ones in which most errors occur. Additionally, an interactive prototype is presented, which integrates most of the interactive techniques presented in this thesis. This prototype has been developed to be used by palaeographic expert, who do not have any background in HTR technologies. After a slight ne tuning by a HTR expert, the prototype lets the transcribers to manually annotate the document or employ the CAT ap- proach presented. All automatic operations, such as recognition, are performed in background, detaching the transcriber from the details of the system. The prototype was assessed by an expert transcriber and showed to be adequate and e cient for its purpose. The prototype is freely available under a GNU Public Licence (GPL). / Serrano Martínez-Santos, N. (2014). Interactive Transcription of Old Text Documents [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37979
167

Federated Online Learning with Streaming Data for Intrusion Detection Systems : Comparing Federated and Centralized Learning Methods in Online and Offline Settings

Arvidsson, Victor January 2024 (has links)
Background. With increased pressure from both regulatory bodies and end-users, interest in privacy preserving machine learning methods have increased among companies and researchers in the last few years. One of the main areas of research regarding this is federated learning. Further, with the current situation in the world, interest in cybersecurity is also at an all time high, where intrusion detection systems are one component of interest. With anomaly-based intrusion detection systems using machine learning methods, it is desirable that these can adapt automatically over time as the network patterns change, resulting in online learning being highly relevant for this application. Previous research has studied offline federated intrusion detection systems. However, there have been very little work performed in the study of online federated learning for intrusion detection systems. Objectives. The objective of this thesis is to evaluate the performance of online federated machine learning methods for intrusion detection systems. Furthermore, the thesis will study the performance relationship between offline and online models for both centralized and federated learning, in order to draw conclusions about the ability to extrapolate from results between the different types of models. Methods. This thesis uses a quasi-experiment to evaluate two different types of models, Naive Bayes and Semi-supervised Federated Learning on Evolving Data Streams (SFLEDS), on three different datasets, NSL-KDD, UNSW-NB15, and CIC-IDS2017. For each model, four variants are implemented: centralized offline, centralized online, federated offline and federated online, and in the federated setting the models are evaluated with 20, 30, and 40 clients. Results. The results show that the best performing model in general is the federated online SFLEDS. They also highlight an important problem with using imbalanced datasets without proper care for data preprocessing and model design. Finally, the results show that there are no general relationships between offline and online models that hold in both the centralized and federated settings in terms of prediction performance. Conclusions. The main conclusion of the thesis is that online federated learning has a lot of potential for the application of intrusion detection systems, but more research is required to find the optimal models and parameters that result in satisfactory performance. / Bakgrund. Med ökat tryck från både tillsynsorgan och slutanvändare har intresset för integritetsbevarande maskininlärning ökat hos företag och forskare under de senaste åren. Ett av huvudområdena där det forskas om detta är inom federerad inlärning. Vidare, med det nuvarande läget i världen är intresset för cybersäkerhet högre än någonsin, där bland annat intrångsdetekteringssystem är av intresse. Med avvikelsebaserade intrångsdetekteringssystem som använder sig av maskininlärning så är det önskvärt att dessa automatiskt kan anpassa sig över tid när nätverksmönster förändras, vilket resulterar i att online maskininlärning är högst relevant för området. Tidigare forskning har studerat federerade offline intrångsdetekteringssystem, men det finns väldigt lite forskning gällande federerad online maskininlärning för intrångsdetekteringssystem. Syfte. Syftet med det här arbetet är att utvärdera prestandan av federerad online maskininlärning för intrångsdetekteringssystem. Vidare kommer det här arbetet att studera prestandaförhållandet mellan offline och online modeller för både centraliserad och federerad inlärning, för att kunna dra slutsatser om förmågan att extrapolera resultat mellan olika typer av modeller. \newline\textbf{Metod.} Det här arbetet använder sig av ett kvasiexperiment för att utvärdera två olika modeller, Naive Bayes och Semi-supervised Federated Learning on Evolving Data Streams (SFLEDS), på tre olika dataset, NSL-KDD, UNSW-NB15 och CIC-IDS2017. För varje modell implementeras fyra varianter: centraliserad offline, centraliserad online, federerad offline och federerad online. De federerade modellerna utvärderas med 20, 30 och 40 klienter. Resultat. Resultaten visar att den generellt bästa modellen är online SFLEDS. De belyser även ett viktigt problem med att använda obalanserade dataset utan tillräcklig hänsyn till förbearbetning av datan och modelldesign. Slutligen visar resultaten att det inte finns något generellt samband mellan offline och online modeller som stämmer för både centraliserad och federerad inlärning när det gäller modellprestanda. Slutsatser. Den huvudsakliga slutsatsen från arbetet är att federerad online maskininlärning har stor potential för intrångsdetekteringssystem, men mer forskning krävs för att hitta den bästa modellen och de bästa parametrarna för att nå ett tillfredsställande resultat.
168

Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. Griebenow

Griebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
169

Enkele tegnieke vir die ontwikkeling en benutting van etiketteringhulpbronne vir hulpbronskaars tale / A.C. Griebenow

Griebenow, Annick January 2015 (has links)
Because the development of resources in any language is an expensive process, many languages, including the indigenous languages of South Africa, can be classified as being resource scarce, or lacking in tagging resources. This study investigates and applies techniques and methodologies for optimising the use of available resources and improving the accuracy of a tagger using Afrikaans as resource-scarce language and aims to i) determine whether combination techniques can be effectively applied to improve the accuracy of a tagger for Afrikaans, and ii) determine whether structural semi-supervised learning can be effectively applied to improve the accuracy of a supervised learning tagger for Afrikaans. In order to realise the first aim, existing methodologies for combining classification algorithms are investigated. Four taggers, trained using MBT, SVMlight, MXPOST and TnT respectively, are then combined into a combination tagger using weighted voting. Weights are calculated by means of total precision, tag precision and a combination of precision and recall. Although the combination of taggers does not consistently lead to an error rate reduction with regard to the baseline, it manages to achieve an error rate reduction of up to 18.48% in some cases. In order to realise the second aim, existing semi-supervised learning algorithms, with specific focus on structural semi-supervised learning, are investigated. Structural semi-supervised learning is implemented by means of the SVD-ASO-algorithm, which attempts to extract the shared structure of untagged data using auxiliary problems before training a tagger. The use of untagged data during the training of a tagger leads to an error rate reduction with regard to the baseline of 1.67%. Even though the error rate reduction does not prove to be statistically significant in all cases, the results show that it is possible to improve the accuracy in some cases. / MSc (Computer Science), North-West University, Potchefstroom Campus, 2015
170

應用共變異矩陣描述子及半監督式學習於行人偵測 / Semi-supervised learning for pedestrian detection with covariance matrix feature

黃靈威, Huang, Ling Wei Unknown Date (has links)
行人偵測為物件偵測領域中一個極具挑戰性的議題。其主要問題在於人體姿勢以及衣著服飾的多變性,加之以光源照射狀況迥異,大幅增加了辨識的困難度。吾人在本論文中提出利用共變異矩陣描述子及結合單純貝氏分類器與級聯支持向量機的線上學習辨識器,以增進行人辨識之正確率與重現率。 實驗結果顯示,本論文所提出之線上學習策略在某些辨識狀況較差之資料集中能有效提升正確率與重現率達百分之十四。此外,即便於相同之初始訓練條件下,在USC Pedestrian Detection Test Set、 INRIA Person dataset 及 Penn-Fudan Database for Pedestrian Detection and Segmentation三個資料集中,本研究之正確率與重現率亦較HOG搭配AdaBoost之行人辨識方式為優。 / Pedestrian detection is an important yet challenging problem in object classification due to flexible body pose, loose clothing and ever-changing illumination. In this thesis, we employ covariance feature and propose an on-line learning classifier which combines naïve Bayes classifier and cascade support vector machine (SVM) to improve the precision and recall rate of pedestrian detection in a still image. Experimental results show that our on-line learning strategy can improve precision and recall rate about 14% in some difficult situations. Furthermore, even under the same initial training condition, our method outperforms HOG + AdaBoost in USC Pedestrian Detection Test Set, INRIA Person dataset and Penn-Fudan Database for Pedestrian Detection and Segmentation.

Page generated in 0.0867 seconds