Spelling suggestions: "subject:"cybrid schemes"" "subject:"bybrid schemes""
1 |
Large Data Clustering And Classification Schemes For Data MiningBabu, T Ravindra 12 1900 (has links)
Data Mining deals with extracting valid, novel, easily understood by humans, potentially useful and general abstractions from large data. A data is large when number of patterns, number of features per pattern or both are large. Largeness of data is characterized by its size which is beyond the capacity of main memory of a computer. Data Mining is an interdisciplinary field involving database systems, statistics, machine learning, visualization and computational aspects. The focus of data mining algorithms is scalability and efficiency. Large data clustering and classification is an important activity in Data Mining. The clustering algorithms are predominantly iterative requiring multiple scans of dataset, which is very expensive when data is stored on the disk.
In the current work we propose different schemes that have both theoretical validity and practical utility in dealing with such a large data. The schemes broadly encompass data compaction, classification, prototype selection, use of domain knowledge and hybrid intelligent systems. The proposed approaches can be broadly classified as (a) compressing the data by some means in a non-lossy manner; cluster as well as classify the patterns in their compressed form directly through a novel algorithm, (b) compressing the data in a lossy fashion such that a very high degree of compression and abstraction is obtained in terms of 'distinct subsequences'; classify the data in such compressed form to improve the prediction accuracy, (c) with the help of incremental clustering, a lossy compression scheme and rough set approach, obtain simultaneous prototype and feature selection, (d) demonstrate that prototype selection and data-dependent techniques can reduce number of comparisons in multiclass classification scenario using SVMs, and (e) by making use of domain knowledge of the problem and data under consideration, we show that we obtaina very high classification accuracy with less number of iterations with AdaBoost.
The schemes have pragmatic utility. The prototype selection algorithm is incremental, requiring a single dataset scan and has linear time and space requirements. We provide results obtained with a large, high dimensional handwritten(hw) digit data. The compression algorithm is based on simple concepts, where we demonstrate that classification of the compressed data improves computation time required by a factor 5 with prediction accuracy with both compressed and original data being exactly the same as 92.47%. With the proposed lossy compression scheme and pruning methods, we demonstrate that even with a reduction of distinct sequences by a factor of 6 (690 to 106), the prediction accuracy improves. Specifically, with original data containing 690 distinct subsequences, the classification accuracy is 92.47% and with appropriate choice of parameters for pruning, the number of distinct subsequences reduces to 106 with corresponding classification accuracy as 92.92%. The best classification accuracy of 93.3% is obtained with 452 distinct subsequences. With the scheme of simultaneous feature and prototype selection, we improved classification accuracy to better than that obtained with kNNC, viz., 93.58%, while significantly reducing the number of features and prototypes, achieving a compaction of 45.1%. In case of hybrid schemes based on SVM, prototypes and domain knowledge based tree(KB-Tree), we demonstrated reduction in SVM training time by 50% and testing time by about 30% as compared to complete data and improvement of classification accuracy to 94.75%. In case of AdaBoost the classification accuracy is 94.48%, which is better than those obtained with NNC and kNNC on the entire data; the training timing is reduced because of use of prototypes instead of the complete data. Another important aspect of the work is to devise a KB-Tree (with maximum depth of 4), that classifies a 10-category data in just 4 comparisons.
In addition to hw data, we applied the schemes to Network Intrusion Detection Data (10% dataset of KDDCUP99) and demonstrated that the proposed schemes provided less overall cost than the reported values.
|
2 |
Etude numérique de la transformation des vagues en zone littorale, de la zone de levée aux zones de surf et de jet de riveTissier, Marion 15 December 2011 (has links)
Dans cette thèse, nous introduisons un nouveau modèle instationnaire de vagues valable de la zone de levée à la zone de jet de rive adapté à l'étude de la submersion. Le modèle est basé sur les équations de Serre Green-Naghdi (S-GN), dont l'application à la zone de surf reste un domaine de recherche ouvert. Nous proposons une nouvelle approche pour gérer le déferlement dans ce type de modèle, basée sur la représentation des fronts déferlés par des chocs. Cette approche a été utilisée avec succès pour les modèles basés sur les équations de Saint-Venant (SV) et permet une description simple et efficace du déferlement et des mouvements de la ligne d'eau. Dans ces travaux, nous cherchons à étendre le domaine de validité du modèle SV SURF-WB (Marche et al. 2007) vers la zone de levée en incluant les termes dispersifs propres aux équations de S-GN. Des basculements locaux vers les équations de SV au niveau des fronts permettent alors aux vagues de déferler et dissiper leur énergie. Le modèle obtenu, appelé SURF-GN, est validé à l'aide de données de laboratoire correspondant à différents types de vagues incidentes et de plages. Il est ensuite utilisé pour analyser la dynamique des fronts d'ondes longues de type tsunami en zone littorale. Nous montrons que SURF-GN peut décrire les différents types de fronts, d'ondulé non-déferlé à purement déferlé. Les conséquences de la transformation d'une onde de type tsunami en train d'ondulations lors de la propagation sur une plage sont ensuite considérées. Nous présentons finalement une étude de la célérité des vagues déferlées, basée sur les données de la campagne de mesure in-situ ECORS Truc-Vert 2008. L'influence des non-linéarités est en particulier quantifiée. / In this thesis, we introduce a new numerical model able to describe wave transformation from the shoaling to the swash zones, including overtopping. This model is based on Serre Green-Naghdi equations, which are the basic fully nonlinear Boussinesq-type equations. These equations can accurately describe wave dynamics prior to breaking, but their application to the surf zone usually requires the use of complex parameterizations. We propose a new approach to describe wave breaking in S-GN models, based on the representation of breaking wave fronts as shocks. This method has been successfully applied to the Nonlinear Shallow Water (NSW) equations, and allows for an easy treatment of wave breaking and shoreline motions. However, the NSW equations can only be applied after breaking. In this thesis, we aim at extending the validity domain of the NSW model SURF-WB (Marche et al. 2007) to the shoaling zone by adding the S-GN dispersive terms to the governing equations. Local switches to NSW equations are then performed in the vicinity of the breaking fronts, allowing for the waves to break and dissipate their energy. Extensive validations using laboratory data are presented. The new model, called SURF-GN, is then applied to study tsunami-like undular bore dynamics in the nearshore. The model ability to describe bore dynamics for a large range of Froude number is first demonstrated, and the effects of the bore transformation on wave run-up over a sloping beach are considered. We finally present an in-situ study of broken wave celerity, based on the ECORS-Truc Vert 2008 field experiment. In particular, we quantify the effects of non-linearities and evaluate the predictive ability of several non-linear celerity models.
|
Page generated in 0.0801 seconds