Return to search

Enhanced Machine Learning Engine Engineering Using Innovative Blending, Tuning, and Feature Optimization

<p> Investigated into and motivated by Ensemble Machine Learning (<i>ML</i>) techniques, this thesis contributes to addressing performance, consistency, and integrity issues such as overfitting, underfitting, predictive errors, accuracy paradox, and poor generalization for the <i>ML</i> models. Ensemble <i>ML</i> methods have shown promising outcome when a single algorithm failed to approximate the true prediction function. Using meta-learning, a super learner is engineered by combining weak learners. Generally, several methods in Supervised Learning (<i>SL</i>) are evaluated to find the best fit to the underlying data and predictive analytics (<i>i.e.</i>, &ldquo;<i>No Free Lunch</i>&rdquo; Theorem relevance). This thesis addresses three main challenges/problems, <i> i</i>) determining the optimum blend of algorithms/methods for enhanced <i> SL</i> ensemble models, <i>ii</i>) engineering the selection and grouping of features that aggregate to the highest possible predictive and non-redundant value in the training data set, and <i>iii</i>) addressing the performance integrity issues such as accuracy paradox. Therefore, an enhanced Machine Learning Engine Engineering (<i>eMLEE</i>) is inimitably constructed via built-in parallel processing and specially designed novel constructs for error and gain functions to optimally score the classifier elements for improved training experience and validation procedures. <i> eMLEE</i>, as based on stochastic thinking, is built on; <i>i</i>) one centralized unit as Logical Table unit (<i>LT</i>), <i> ii</i>) two explicit units as enhanced Algorithm Blend and Tuning (<i> eABT</i>) and enhanced Feature Engineering and Selection (<i>eFES </i>), and two implicit constructs as enhanced Weighted Performance Metric (<i>eWPM</i>) and enhanced Cross Validation and Split (<i> eCVS</i>). Hence, it proposes an enhancement to the internals of the <i> SL</i> ensemble approaches. </p><p> Motivated by nature inspired metaheuristics algorithms (such as <i> GA, PSO, ACO</i>, etc.), feedback mechanisms are improved by introducing a specialized function as <i>Learning from the Mistakes</i> (<i> LFM</i>) to mimic the human learning experience. <i>LFM</i> has shown significant improvement towards refining the predictive accuracy on the testing data by utilizing the computational processing of wrong predictions to increase the weighting scoring of the weak classifiers and features. <i> LFM</i> further ensures the training layer experiences maximum mistakes (<i>i.e.</i>, errors) for optimum tuning. With this designed in the engine, stochastic modeling/thinking is implicitly implemented. </p><p> Motivated by OOP paradigm in the high-level programming, <i>eMLEE </i> provides interface infrastructure using <i>LT</i> objects for the main units (<i>i.e.</i>, Unit A and Unit B) to use the functions on demand during the classifier learning process. This approach also assists the utilization of <i>eMLEE</i> API by the outer real-world usage for predictive modeling to further customize the classifier learning process and tuning elements trade-off, subject to the data type and end model in goal. </p><p> Motivated by higher dimensional processing and Analysis (<i>i.e. </i>, <i>3D</i>) for improved analytics and learning mechanics, <i> eMLEE</i> incorporates <i>3D</i> Modeling of fitness metrics such as <i>x</i> for overfit, <i>y</i> for underfit, and <i> z</i> for optimum fit, and then creates logical cubes using <i> LT</i> handles to locate the optimum space during ensemble process. This approach ensures the fine tuning of ensemble learning process with improved accuracy metric. </p><p> To support the built and implementation of the proposed scheme, mathematical models (<i>i.e.</i>, <i>Definitions, Lemmas, Rules</i>, and <i>Procedures</i>) along with the governing algorithms&rsquo; definitions (and <i>pseudo-code</i>), and necessary illustrations (<i>to assist in elaborating the concepts</i>) are provided. Diverse sets of data are used to improve the generalization of the engine and tune the underlying constructs during development-testing phases. To show the practicality and stability of the proposed scheme, several results are presented with a comprehensive analysis of the outcomes for the metrics (<i>i.e.</i>, <i> via integrity, corroboration</i>, and <i>quantification</i>) of the engine. Two approaches are followed to corroborate the engine, <i> i</i>) testing inner layers (<i>i.e.</i>, internal constructs) of the engine (<i>i.e.</i>, <i>Unit-A, Unit-B</i>, and <i> C-Unit</i>) to stabilize and test the fundamentals, and <i>ii</i>) testing outer layer (<i>i.e.</i>, <i>engine as a black box </i>) for standard measuring metrics for the real-world endorsement. Comparison with various existing techniques in the state of the art are also reported. In conclusion of the extensive literature review, research undertaken, investigative approach, engine construction and tuning, validation approach, experimental study, and results visualization, the <i>eMLEE</i> is found to be outperforming the existing techniques most of the time, in terms of the classifier learning, generalization, metrics trade-off, optimum-fitness, feature engineering, and validation.</p><p>

Identiferoai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:13427950
Date21 March 2019
CreatorsUddin, Muhammad Fahim
PublisherUniversity of Bridgeport
Source SetsProQuest.com
LanguageEnglish
Detected LanguageEnglish
Typethesis

Page generated in 0.0072 seconds