Global ETD Search

151	A Nonlinear Statistical Algorithm to Predict Daily Lightning in Mississippi Thead, Erin Amanda 15 December 2012 (has links) Recent improvements in numerical weather model resolution open the possibility of producing forecasts for lightning using indirect lightning threat indicators well in advance of an event. This research examines the feasibility of a statistical machine-learning algorithm known as a support vector machine (SVM) to provide a probabilistic lightning forecast for Mississippi at 9 km resolution up to one day in advance of a thunderstorm event. Although the results indicate that SVM forecasts are not consistently accurate with single-day lightning forecasts, the SVM performs skillfully on a data set consisting of many forecast days. It is plausible that errors by the numerical forecast model are responsible for the poorer performance of the SVM with individual forecasts. More research needs to be conducted into the possibility of using SVM for lightning prediction with input data sets from a variety of numerical weather models. support vector machines lightning weather forecasting statistical modeling lightning forecasting Mississippi
152	Can a Support Vector Machine identify poor performance of dyslectic children playing a serious game? Lemon, Viktor January 2021 (has links) This paper has been a part of developing the serious game Kunna, a web-based game with exercises targeting children diagnosed with dyslexia. This game currently consists of five different exercises aiming to practice reading and writing without a therapist or neuropsychologist present. As Kunna can be used anywhere, tools are needed to understand each individual's capacities and difficulties. Hence, this paper aims to present how a serious game and a support vector machine were used to identify children that performed poorly in Kunna’s exercises. Though, due to the current corona pandemic, Kunna could only be tested on children not diagnosed with dyslexia. Therefore, this paper should be seen as a proof of concept. As an initial step, several variables were identified to measure the performance of dyslectic children. Secondly, the variables were implemented into Kunna and tested on 16 Spanish-speaking children. The results were analyzed to identify how poor performance could be recognized using the identified variables. As a final step, the data was divided into two groups for each exercise, of which one group contained participants who appear to perform poorly. These were participants with clearly outlying values in the number of errors and duration. Thus, to train and evaluate if a Support Vector Machine (SVM) can separate the two groups and thereby identify the participants who performed poorly. From the discussion followed that the SVM is not the most efficient choice for this aim. Instead, it is suggested that future work should consider multiclassification algorithms. / Den här uppsatsen har varit en del i utvecklingen av det seriösa spelet Kunna, ett webbaserat spel för barn diagnostiserade med dyslexi. Spelet består av fem olika övningar som syftar till att öva och utveckla barnens läs- och skrivförmåga. Då Kunna kan användas var som helst behövs verktyg för att förstå varje individs kapaciteter och svårigheter. Därför syftar den här uppsatsen till att presentera hur ett seriöst spel och stödvektormaskiner (eng. support vector machine) kan användas för att identifiera de användare som inte uppnådde prestationskraven. På grund av den uppblossande coronapandemin kunde dock Kunna enbart testas på barn som inte var diagnostiserade med dyslexi och därför bör den här uppsatsen ses som en pilotstudie. Inledningsvis identifierades flera variabler för att mäta prestandan hos barn med dyslexi. Därefter implementerades variablerna i Kunna och testades på 16 spansktalande barn där resultaten analyserades i syfte att identifiera samband kopplade till svaga prestationer. Slutligen delades deltagarnas data upp i två grupper, varav en grupp innehöll deltagare med klart högre värden i tid och antal fel. Uppdelningen gjordes för att träna och utvärdera om en stödvektormaskin kan separera de två grupperna och därav identifiera de deltagare som inte uppnådde prestationskraven. De slutliga resultaten indikerar dock att en stödvektormaskin inte är det effektivaste valet för detta ändamål. Istället föreslås att framtida arbeten bör överväga multiklassificeringsalgoritmer. Serious games Dyslexia Identifying Performance Support Vector Machines Computer and Information Sciences Data- och informationsvetenskap
153	Support Vector Machines for Classification and Imputation Rogers, Spencer David 16 May 2012 (has links) (PDF) Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios using data from the 1997 National Labor Survey. support vector machines SVM imputation binary classification handwritten digit recognition EM algorithm NLSY97 Statistics and Probability
154	Automated Prediction of CMEs Using Machine Learning of CME – Flare Associations Qahwaji, Rami S.R., Colak, Tufan, Al-Omari, M., Ipson, Stanley S. 06 December 2007 (has links) Yes / In this work, machine learning algorithms are applied to explore the relation between significant flares and their associated CMEs. The NGDC flares catalogue and the SOHO/LASCO CMEs catalogue are processed to associate X and M-class flares with CMEs based on timing information. Automated systems are created to process and associate years of flares and CMEs data, which are later arranged in numerical training vectors and fed to machine learning algorithms to extract the embedded knowledge and provide learning rules that can be used for the automated prediction of CMEs. Different properties are extracted from all the associated (A) and not-associated (NA) flares representing the intensity, flare duration, duration of decline and duration of growth. Cascade Correlation Neural Networks (CCNN) are used in our work. The flare properties are converted to numerical formats that are suitable for CCNN. The CCNN will predict if a certain flare is likely to initiate a CME after input of its properties. Intensive experiments using the Jack-knife techniques are carried out and it is concluded that our system provides an accurate prediction rate of 65.3%. The prediction performance is analysed and recommendation for enhancing the performance are provided.
155	Using computational methods for the prediction of drug vehicles Mistry, Pritesh, Palczewska, Anna Maria, Neagu, Daniel, Trundle, Paul R. January 2014 (has links) No / Drug vehicles are chemical carriers that aid a drug's passage through an organism. Whilst they possess no intrinsic efficacy they are designed to achieve desirable characteristics which can include improving a drug's permeability and or solubility, targeting a drug to a specific site or reducing a drug's toxicity. All of which are ideally achieved without compromising the efficacy of the drug. Whilst the majority of drug vehicle research is focused on the solubility and permeability issues of a drug, significant progress has been made on using vehicles for toxicity reduction. Achieving this can enable safer and more effective use of a potent drug against diseases such as cancer. From a molecular perspective, drugs activate or deactivate biochemical pathways through interactions with cellular macromolecules resulting in toxicity. For newly developed drugs such pathways are not always clearly understood but toxicity endpoints are still required as part of a drug's registration. An understanding of which vehicles could be used to ameliorate the unwanted toxicities of newly developed drugs would be highly desirable to the pharmaceutical industry. In this paper we demonstrate the use of different classifiers as a means to select vehicles best suited to avert a drug's toxic effects when no other information about a drug's characteristics is known. Through analysis of data acquired from the Developmental Therapeutics Program (DTP) we are able to establish a link between a drug's toxicity and vehicle used. We demonstrate that classification and selection of the appropriate vehicle can be made based on the similarity of drug choice.
156	Detection of breast cancer microcalcifications in digitized mammograms. Developing segmentation and classification techniques for the processing of MIAS database mammograms based on the Wavelet Decomposition Transform and Support Vector Machines. Al-Osta, Husam E.I. January 2010 (has links) Mammography is used to aid early detection and diagnosis systems. It takes an x-ray image of the breast and can provide a second opinion for radiologists. The earlier detection is made, the better treatment works. Digital mammograms are dealt with by Computer Aided Diagnosis (CAD) systems that can detect and analyze abnormalities in a mammogram. The purpose of this study is to investigate how to categories cropped regions of interest (ROI) from digital mammogram images into two classes; normal and abnormal regions (which contain microcalcifications). The work proposed in this thesis is divided into three stages to provide a concept system for classification between normal and abnormal cases. The first stage is the Segmentation Process, which applies thresholding filters to separate the abnormal objects (foreground) from the breast tissue (background). Moreover, this study has been carried out on mammogram images and mainly on cropped ROI images from different sizes that represent individual microcalcification and ROI that represent a cluster of microcalcifications. The second stage in this thesis is feature extraction. This stage makes use of the segmented ROI images to extract characteristic features that would help in identifying regions of interest. The wavelet transform has been utilized for this process as it provides a variety of features that could be examined in future studies. The third and final stage is classification, where machine learning is applied to be able to distinguish between normal ROI images and ROI images that may contain microcalcifications. The result indicated was that by combining wavelet transform and SVM we can distinguish between regions with normal breast tissue and regions that include microcalcifications. Mammography Wavelet Decomposition Transform Feature extraction Support Vector Machines Segmentation Computer Aided Diagnosis (CAD)
157	Prisförändringar vid förändrad försörjningskedja för livsmedel Javenius, Hugo, Nerman, Hugo January 2021 (has links) Global food prices are currently rising at a rapid pace. The current supply chain involves a number of different steps, where each step involves a price surcharge that is ultimately paid by the consumer. Modern technology, such as machine learning and smart logistics, enables alternative supply chains. This report examines the possibility of designing a model that, with the help of scenarios of change based on previous studies and the taskmaster’s vision, can make predictions for future food prices. The report was based on the supply chain and current prices for potatoes. The models used are ARIMA, SVR with different cores, linear regression, Ridge regression and Lasso regression. The models are evaluated with the error measurements Mean Absolute Error, Mean Squared Error, Root Mean Squared Error and R2. The best-performing models, with which the prediction was then performed, were ARIMA and SVR with a linear core. The predictions and calculations showed drastically reduced food prices and a large reduction in unnecessary food waste, especially in the scenario that involves an overall change of the supply chain. This has major macroeconomic effects, as food prices affect inflation. The analysis also shows the importance of the industry’s players working with analysis and strategy to handle a future shift that entails higher uncertainty in the market. There are uncertainties about the effect on other supply chains, as well as the net effect of a shift as the costs for this are unknown. / I dagsläget stiger livsmedelspriserna globalt i hög takt. Den nuvarande försörjningskedjan innebär många olika steg, där varje steg innebär prispåslag som till slut betalas av konsumenten. Modern teknik, som maskininlärning och smart logistik ger upphov till alternativa försörjningskedjor. Denna rapport undersöker möjligheten att utforma en modell som, med hjälp av omställningsscenarion baserade på tidigare studier och uppdragsgivarens vision, kan göra prediktioner för framtida livsmedelspriser. Rapporten baserades på försörjningskedjan och aktuella priser för matpotatis. De använda modellerna är ARIMA, SVR med olika kärnor, linjär regression, Ridge regression samt Lasso regression. Modellerna utvärderas med felmåtten Mean Absolute Error, Mean Squared Error, Root Mean Squared Error samt R2. De bäst presterande modellerna, som prediktionen sedan utfördes med, var ARIMA och SVR med linjär kärna. Prediktionerna och uträkningarna visade på drastiskt sänkta matpriser och en stor sänkning av onödigt matsvinn, framför allt vid det scenario som innebär en övergripande omställning av försörjningskedjan. Detta för med sig stora makroekonomiska effekter, då livsmedelspriset påverkar inflationen. Analysen visar även på vikten av att branschens aktörer arbetar med analys och strategi för att hantera ett kommande skifte som innebär en högre osäkerhet på marknaden. Osäkerheter finns kring effekten på andra försörjningskedjor, samt nettoeffekten av en omställning då kostnaderna för denna är okända. Prediktion Maskininlärning Regressionsanalys Support Vector Machines ARIMA Organisationsförändring Försörjningskedja Dynamiska Förmågor Computer and Information Sciences Data- och informationsvetenskap
158	A machine learning based spatio-temporal data mining approach for coastal remote sensing data Gokaraju, Balakrishna 07 August 2010 (has links) Continuous monitoring of coastal ecosystems aids in better understanding of their dynamics and inherent harmful effects. As many of these ecosystems prevail over space and time, there is a need for mining this spatio-temporal information for building accurate monitoring and forecast systems. Harmful Algal Blooms (HABs) pose an enormous threat to the U.S. marine habitation and economy in the coastal waters. Federal and state coastal administrators have been devising a state-of-the-art monitoring and forecasting systems for these HAB events. The efficacy of a monitoring and forecasting system relies on the performance of HAB detection. A Machine Learning based Spatio-Temporal data mining approach for the detection of HAB (STML-HAB) events in the region of Gulf of Mexico is proposed in this work. The spatio-temporal cubical neighborhood around the training sample is considered to retrieve relevant spectral information pertaining to both HAB and Non-HAB classes. A unique relevant feature subset combination is derived through evolutionary computation technique towards better classification of HAB from Non-HAB. Kernel based feature transformation and classification is used in developing the model. STML-HAB model gave significant performance improvements over the current optical detection based techniques by highly reducing the false alarm rate with an accuracy of 0.9642 on SeaWiFS data. The developed model is used for prediction on new datasets for further spatio-temporal analyses such as the seasonal variations of HAB, and sequential occurrence of algal blooms. New variability visualizations are introduced to illustrate the dynamic behavior and seasonal variations of HABs from large spatiotemporal datasets. The results outperformed the ensemble of the currently available empirical methods for HAB detection. The ensemble method is implemented by a new approach for combining the empirical models using a probabilistic neural network model. The model is also compared with the results obtained using various feature extraction techniques, spatial neighborhoods and classifiers. Machine Learning Spatio-Temporal Data Mining Support Vector Machines Kernel Methods
159	Semantics-Enabled Framework for Knowledge Discovery from Earth Observation Data Durbha, Surya Srinivas 09 December 2006 (has links) Earth observation data has increased significantly over the last decades with satellites collecting and transmitting to Earth receiving stations in excess of three terabytes of data a day. This data acquisition rate is a major challenge to the existing data exploitation and dissemination approaches. The lack of content and semantics based interactive information searching and retrieval capabilities from the image archives is an impediment to the use of the data. The proposed framework (Intelligent Interactive Image Knowledge retrieval-I3KR) is built around a concept-based model using domain dependant ontologies. An unsupervised segmentation algorithm is employed to extract homogeneous regions and calculate primitive descriptors for each region. An unsupervised classification by means of a Kernel Principal Components Analysis (KPCA) method is then performed, which extracts components of features that are nonlinearly related to the input variables, followed by a Support Vector Machine (SVM) classification to generate models for the object classes. The assignment of the concepts in the ontology to the objects is achieved by a Description Logics (DL) based inference mechanism. This research also proposes new methodologies for domain-specific rapid image information mining (RIIM) modules for disaster response activities. In addition, several organizations/individuals are involved in the analysis of Earth observation data. Often the results of this analysis are presented as derivative products in various classification systems (e.g. land use/land cover, soils, hydrology, wetlands, etc.). The generated thematic data sets are highly heterogeneous in syntax, structure and semantics. The second framework developed as a part of this research (Semantics-Enabled Thematic data Integration (SETI)) focuses on identifying and resolving semantic conflicts such as confounding conflicts, scaling and units conflicts, and naming conflicts between data in different classification schemes. The shared ontology approach presented in this work facilitates the reclassification of information items from one information source into the application ontology of another source. Reasoning on the system is performed through a DL reasoner that allows classification of data from one context to another by equality and subsumption. This enables the proposed system to provide enhanced knowledge discovery, query processing, and searching in way that is not possible with key word based searches. Remote Sensing Ontology Support vector machines Semantics Middleware Thematic Land Cover Semantic Web Machine Learning
160	A Wavelet-Based Approach to Primitive Feature Extraction, Region-Based Segmentation, and Identification for Image Information Mining Shah, Vijay Pravin 11 August 2007 (has links) Content- and semantic-based interactive mining systems describe remote sensing images by means of relevant features. Region-based retrieval systems have been proposed to capture the local properties of an image. Existing systems use computationally extensive methods to extract primitive features based on color, texture (spatial gray level dependency - SGLD matrices), and shape from the segmented homogenous region. The use of wavelet transform techniques has recently gained momentum in multimedia image archives to expedite the retrieval process. However, the current semantic-enabled framework for the geospatial data uses computationally extensive methods for feature extraction and image segmentation. Hence, this dissertation presents the use of a wavelet-based feature extraction in a semantics-enabled framework to expedite the knowledge discovery in geospatial data archives. Geospatial data has different characteristics than multimedia images and poses more challenges. The experimental assumptions, such as the selection of the wavelet decomposition level and mother wavelet used for multimedia data archives, might not prove to be efficient for the retrieval of geospatial data. Discrete wavelet transforms (DWT) introduce aliasing effects due to subband decimation at a certain decomposition level. This dissertation addresses the issue of selecting a suitable wavelet decomposition level, and a systematic selection process is developed for image segmentation. To validate the applicability of this method, a synthetic image is generated to assess the performance qualitatively and quantitatively. In addition, results for a Landsat7 ETM+ imagery archive are illustrated, and the F-measure is used to assess the feasibility of this method for retrieval of different classes. This dissertation also introduces a new feature set obtained by coalescing wavelet and independent component analysis for image information mining. Feature-level fusion is performed to include the missing high detail information from the panchromatic image. Results show that the presented feature set is computationally less expensive and more efficient in capturing the spectral and spatial texture information when compared to traditional approaches. After extensive experimentation with different types of mother wavelets, it can be concluded that reverse Biorthogonal wavelets of shorter length and the simple Haar filter provided better results for the image information mining from the database used in this study. wavelet feature extraction image information mining support vector machines independent component analysis remote sensing

Search results