Global ETD Search

391	Optimisation des techniques de compression d'images fixes et de vidéo en vue de la caractérisation des matériaux : applications à la mécanique / Optimization of compression techniques for still images and video for characterization of materials : mechanical applications Eseholi, Tarek Saad Omar 17 December 2018 (has links) Cette thèse porte sur l’optimisation des techniques de compression d'images fixes et de vidéos en vue de la caractérisation des matériaux pour des applications dans le domaine de la mécanique, et s’inscrit dans le cadre du projet de recherche MEgABIt (MEchAnic Big Images Technology) soutenu par l’Université Polytechnique Hauts-de-France. L’objectif scientifique du projet MEgABIt est d’investiguer dans l’aptitude à compresser de gros volumes de flux de données issues d’instrumentation mécanique de déformations à grands volumes tant spatiaux que fréquentiels. Nous proposons de concevoir des algorithmes originaux de traitement dans l’espace compressé afin de rendre possible au niveau calculatoire l’évaluation des paramètres mécaniques, tout en préservant le maximum d’informations fournis par les systèmes d’acquisitions (imagerie à grande vitesse, tomographie 3D). La compression pertinente de la mesure de déformation des matériaux en haute définition et en grande dynamique doit permettre le calcul optimal de paramètres morpho-mécaniques sans entraîner la perte des caractéristiques essentielles du contenu des images de surface mécaniques, ce qui pourrait conduire à une analyse ou une classification erronée. Dans cette thèse, nous utilisons le standard HEVC (High Efficiency Video Coding) à la pointe des technologies de compression actuelles avant l'analyse, la classification ou le traitement permettant l'évaluation des paramètres mécaniques. Nous avons tout d’abord quantifié l’impact de la compression des séquences vidéos issues d’une caméra ultra-rapide. Les résultats expérimentaux obtenus ont montré que des taux de compression allant jusque 100 :1 pouvaient être appliqués sans dégradation significative de la réponse mécanique de surface du matériau mesurée par l’outil d’analyse VIC-2D. Finalement, nous avons développé une méthode de classification originale dans le domaine compressé d’une base d’images de topographie de surface. Le descripteur d'image topographique est obtenu à partir des modes de prédiction calculés par la prédiction intra-image appliquée lors de la compression sans pertes HEVC des images. La machine à vecteurs de support (SVM) a également été introduite pour renforcer les performances du système proposé. Les résultats expérimentaux montrent que le classificateur dans le domaine compressé est robuste pour la classification de nos six catégories de topographies mécaniques différentes basées sur des méthodologies d'analyse simples ou multi-échelles, pour des taux de compression sans perte obtenus allant jusque 6: 1 en fonction de la complexité de l'image. Nous avons également évalué les effets des types de filtrage de surface (filtres passe-haut, passe-bas et passe-bande) et de l'échelle d'analyse sur l'efficacité du classifieur proposé. La grande échelle des composantes haute fréquence du profil de surface est la mieux appropriée pour classer notre base d’images topographiques avec une précision atteignant 96%. / This PhD. thesis focuses on the optimization of fixed image and video compression techniques for the characterization of materials in mechanical science applications, and it constitutes a part of MEgABIt (MEchAnic Big Images Technology) research project supported by the Polytechnic University Hauts-de-France (UPHF). The scientific objective of the MEgABIt project is to investigate the ability to compress large volumes of data flows from mechanical instrumentation of deformations with large volumes both in the spatial and frequency domain. We propose to design original processing algorithms for data processing in the compressed domain in order to make possible at the computational level the evaluation of the mechanical parameters, while preserving the maximum of information provided by the acquisitions systems (high-speed imaging, tomography 3D). In order to be relevant image compression should allow the optimal computation of morpho-mechanical parameters without causing the loss of the essential characteristics of the contents of the mechanical surface images, which could lead to wrong analysis or classification. In this thesis, we use the state-of-the-art HEVC standard prior to image analysis, classification or storage processing in order to make the evaluation of the mechanical parameters possible at the computational level. We first quantify the impact of compression of video sequences from a high-speed camera. The experimental results obtained show that compression ratios up to 100: 1 could be applied without significant degradation of the mechanical surface response of the material measured by the VIC-2D analysis tool. Then, we develop an original classification method in the compressed domain of a surface topography database. The topographical image descriptor is obtained from the prediction modes calculated by intra-image prediction applied during the lossless HEVC compression of the images. The Support vector machine (SVM) is also introduced for strengthening the performance of the proposed system. Experimental results show that the compressed-domain topographies classifier is robust for classifying the six different mechanical topographies either based on single or multi-scale analyzing methodologies. The achieved lossless compression ratios up to 6:1 depend on image complexity. We evaluate the effects of surface filtering types (high-pass, low-pass, and band-pass filter) and the scale of analysis on the efficiency of the proposed compressed-domain classifier. We verify that the high analysis scale of high-frequency components of the surface profile is more appropriate for classifying our surface topographies with accuracy of 96 %. Big data Mécanique Science des matériaux Compression et analyse des données Traitement de l'information Big data Mechanics Materials science Data compression and analysis High efficiency video coding (HEVC) Support vector machine (SVM)
392	Impact Assessment Of Climate Change On Hydrometeorology Of River Basin For IPCC SRES Scenarios Anandhi, Aavudai 12 1900 (has links) There is ample growth in scientific evidence about climate change. Since, hydrometeorological processes are sensitive to climate variability and changes, ascertaining the linkages and feedbacks between the climate and the hydrometeorological processes becomes critical for environmental quality, economic development, social well-being etc. As the river basin integrates some of the important systems like ecological and socio-economic systems, the knowledge of plausible implications of climate change on hydrometeorology of a river basin will not only increase the awareness of how the hydrological systems may change over the coming century, but also prepare us for adapting to the impacts of climate changes on water resources for sustainable management and development. In general, quantitative climate impact studies are based on several meteorological variables and possible future climate scenarios. Among the meteorological variables, sic “cardinal” variables are identified as the most commonly used in impact studies (IPCC, 2001). These are maximum and minimum temperatures, precipitation, solar radiation, relative humidity and wind speed. The climate scenarios refer to plausible future climates, which have been constructed for explicit use for investigating the potential consequences of anthropogenic climate alterations, in addition to the natural climate variability. Among the climate scenarios adapted in impact assessments, General circulation model(GCM) projections based on marker scenarios given in Intergovernmental Panel on Climate Change’s (IPCC’s) Special Report on Emissions Scenarios(SRES) have become the standard scenarios. The GCMs are run at coarse resolutions and therefore the output climate variables for the various scenarios of these models cannot be used directly for impact assessment on a local(river basin)scale. Hence in the past, several methodologies such as downscaling and disaggregation have been developed to transfer information of atmospheric variables from the GCM scale to that of surface meteorological variables at local scale. The most commonly used downscaling approaches are based on transfer functions to represent the statistical relationships between the large scale atmospheric variables(predictors) and the local surface variables(predictands). Recently Support vector machine (SVM) is proposed, and is theoretically proved to have advantages over other techniques in use such as transfer functions. The SVM implements the structural risk minimization principle, which guarantees the global optimum solution. Further, for SVMs, the learning algorithm automatically decides the model architecture. These advantages make SVM a plausible choice for use in downscaling hydrometeorological variables. The literature review on use of transfer function for downscaling revealed that though a diverse range of transfer functions has been adopted for downscaling, only a few studies have evaluated the sensitivity of such downscaling models. Further, no studies have so far been carried out in India for downscaling hydrometeorological variables to a river basin scale, nor there was any prior work aimed at downscaling CGCM3 simulations to these variables at river basin scale for various IPCC SRES emission scenarios. The research presented in the thesis is motivated to assess the impact of climate change on streamflow at river basin scale for the various IPCC SRES scenarios (A1B, A2, B1 and COMMIT), by integrating implications of climate change on all the six cardinal variables. The catchment of Malaprabha river (upstream of Malaprabha reservoir) in India is chosen as the study area to demonstrate the effectiveness of the developed models, as it is considered to be a climatically sensitive region, because though the river originates in a region having high rainfall it feeds arid and semi-arid regions downstream. The data of the National Centers for Environmental Prediction (NCEP), the third generation Canadian Global Climate Model (CGCM3) of the Canadian Center for Climate Modeling and Analysis (CCCma), observed hydrometeorological variables, Digital Elevation model (DEM), land use/land cover map, and soil map prepared based on PAN and LISS III merged, satellite images are considered for use in the developed models. The thesis is broadly divided into four parts. The first part comprises of general introduction, data, techniques and tools used. The second part describes the process of assessment of the implications of climate change on monthly values of each of the six cardinal variables in the study region using SVM downscaling models and k-nearest neighbor (k-NN) disaggregation technique. Further, the sensitivity of the SVM downscaling models to the choice of predictors, predictand, calibration period, season and location is evaluated. The third part describes the impact assessment of climate change on streamflow in the study region using the SWAT hydrologic model, and SVM downscaling models. The fourth part presents summary of the work presented in the thesis, conclusions draws, and the scope for future research. The development of SVM downscaling model begins with the selection of probable predictors (large scale atmospheric variables). For this purpose, the cross-correlations are computed between the probable predictor variables in NCEP and GCM data sets, and the probable predictor variables in NCEP data set and the predictand. A pool of potential predictors is then stratified (which is optional and variable dependant) based on season and or location by specifying threshold values for the computed cross-correlations. The data on potential predictors are first standardized for a baseline period to reduce systemic bias (if any) in the mean and variance of predictors in GCM data, relative to those of the same in NCEP reanalysis data. The standardized NCEP predictor variables are then processed using principal component analysis (PCA) to extract principal components (PCs) which are orthogonal and which preserve more than 98% of the variance originally present in them. A feature vector is formed for each month using the PCs. The feature vector forms the input to the SVM model, and the contemporaneous value of predictand is its output. Finally, the downscaling model is calibrated to capture the relationship between NCEP data on potential predictors (i.e feature vectors) and the predictand. Grid search procedure is used to find the optimum range for each of the parameters. Subsequently, the optimum values of parameters are obtained from the selected ranges, using the stochastic search technique of genetic algorithm. The SVM model is subsequently validated, and then used to obtain projections of predictand for simulations of CGCM3. Results show that precipitation, maximum and minimum temperature, relative humidity and cloud cover are projected to increase in future for A1B, A2 and B1 scenarios, whereas no trend is discerned with theCOMMIT. The projected increase in predictands is high for A2 scenario and is least for B1 scenario. The wind speed is not projected to change in future for the study region for all the aforementioned scenarios. The solar radiation is projected to decrease in future for A1B, A2 and B1 scenarios, whereas no trend is discerned with the COMMIT. To assess the monthly streamflow responses to climate change, two methodologies are considered in this study namely (i) downscaling and disaggregating the meteorological variables for use as inputs in SWAT and (ii) directly downscaling streamflow using SVM. SWAT is a physically based, distributed, continuous time hydrological model that operates on a daily time scale. The hydrometeorologic variables obtained using SVM downscaling models are disaggregated to daily scale by using k-nearest neighbor method developed in this study. The other inputs to SWAT are DEM, land use/land cover map, soil map, which are considered to be the same for the present and future scenarios. The SWAT model has projected an increase in future streamflows for A1B, A2 andB1 scenarios, whereas no trend is discerned with the COMMIT. The monthly projections of streamflow at river basin scale are also obtained using two SVM based downscaling models. The first SVM model (called one-stage SVM model) considered feature vectors prepared based on monthly values of large scale atmospheric variables as inputs, whereas the second SVM model (called two-stage SVM model) considered feature vectors prepared from the monthly projections of cardinal variables as inputs. The trend in streamflows projected using two-stage SVM model is found to be similar to that projected by SWAT for each of the scenarios considered. The streamflow is not projected to change for any of the scenarios considered with the one-stage SVM downscaling model. The relative performance of the SWAT and the two SVM downscaling models in simulating observed streamflows is evaluated. In general, all the three models are able to simulate the streamflows well. Nevertheless, the performance of SWAT model is better. Further, among the two SVM models, the performance of one-stage streamflow downscaling model is marginally better than that of the two-stage streamflow downscaling model. Hydrometeorology - India Climatic Changes - India Hydrological Modeling Climate Variables - Downscaling General Climate Models Atmospheric General Circulation Models Oceanic General Circulation Models Hydrometeorological Variables Support Vector Machine Downscaling Climate Models - Disaggregation Global Circulation Models (GCMs) Downscaling Climate Meteorology
393	基植於非負矩陣分解之華語流行音樂曲式分析 / Chinese popular music structure analysis based on non-negative matrix factorization 黃柏堯, Huang, Po Yao Unknown Date (has links) 近幾年來，華語流行音樂的發展越來越多元，而大眾所接收到的資訊是流行音樂當中的組成元素”曲與詞”，兩者分別具有賦予人類感知的功能，使人能夠深刻體會音樂作品當中所表答的內容與意境。然而，作曲與作詞都是屬於專業的創作藝術，作詞者通常在填詞時，會先對樂曲當中的結構進行粗略的分析，找出整首曲子的曲式，而針對可以填詞的部份，再進行更細部的分析將詞填入最適當的位置。流行音樂當中，曲與詞存在著密不可分的關係，瞭解歌曲結構不僅能降低填詞的門檻，亦能夠明白曲子的骨架與脈絡;在音樂教育與音樂檢索方面亦有幫助。本研究的目標為，使用者輸入流行音樂歌曲，系統會自動分析出曲子的『曲式結構』。方法主要分成三個部分，分別為主旋律擷取、歌句分段與音樂曲式結構擷取。首先，我們利用Support Vector Machine以學習之方式建立模型後，擷取出符號音樂中之主旋律。第二步驟我們以”歌句”為單位，對主旋律進行分段，對於分段之結果建構出Self-Similarity Matrix矩陣。最後再利用Non-Negative Matrix Factorization針對不同特徵值矩陣進行分解並建立第二層之Self-Similarity Matrix矩陣，以歧異度之方式找出曲式邊界。我們針對分段方式對歌曲結構之影響進行分析與觀察。實驗數據顯示，事先將歌曲以歌句單位分段之效果較未分段佳，而歌句分段之評測結果F-Score為0.82;將音樂中以不同特徵值建構之自相似度矩進行Non-Negative Matrix Factorization後，另一空間中之基底特徵更能有效地分辨出不同的歌曲結構，其F-Score為0.71。 / Music structure analysis is helpful for music information retrieval, music education and alignment between lyrics and music. This thesis investigates the techniques of music structure analysis for Chinese popular music. Our work is to analyze music form automatically by three steps, main melody finding, sentence discovery, and music form discovery. First, we extract main melody based on learning from user-labeled sample using support vector machine. Then, the boundary of music sentence is detected by two-way classification using support vector machine. To discover the music form, the sentence-based Self-Similarity Matrix is constructed for each music feature. Non-negative Matrix Factorization is employed to extract the new features and to construct the second level Self-Similarity Matrix. The checkerboard kernel correlation is utilized to find music form boundaries on the second level Self-Similarity Matrix. Experiments on eighty Chinese popular music are performed for performance evaluation of the proposed approaches. For the main melody finding, our proposed learning-based approach is better than existing methods. The proposed approaches achieve 82% F-score for sentence discovery while 71% F-score for music form discovery. 曲式分析音樂分段支持向量機非負矩陣分解 Music Form Analysis Music Segmentation Support Vector Machine Non-Negative Matrix Factorization
394	Técnicas de Sistemas Automáticos de Soporte Vectorial en la Réplica del Rating Crediticio Campos Espinoza, Ricardo Álex 10 July 2012 (has links) La correcta qualificació de risc de crèdit d'un emissor és un factor crític en l’economia actual. Aquest és un punt d’acord entre professionals i acadèmics. Actualment, des dels mitjans de comunicació s’han difós sovint notícies d'impacte provocades per agències de ràting. És per aquest motiu que treball d'anàlisi realitzat per experts financers aporta importants recursos a les empreses de consultoria d'inversió i agències qualificadores. Avui en dia, hi ha molts avenços metodològics i tècnics que permeten donar suport a la tasca que fan els professionals de la qualificació de la qualitat de crèdit dels emissors. Tanmateix encara queden molts buits per completar i àrees a desenvolupar per tal què aquesta tasca sigui tan precisa com cal. D'altra banda, els sistemes d'aprenentatge automàtic basats en funcions nucli, particularment les Support Vector Machines (SVM), han donat bons resultats en problemes de classificació quan les dades no són linealment separables o quan hi ha patrons amb soroll. A més, al usar estructures basades en funcions nucli és possible tractar qualsevol espai de dades, ampliant les possibilitats per trobar relacions entre els patrons, tasca que no resulta fàcil amb tècniques estadístiques convencionals. L’objectiu d'aquesta tesi és examinar les aportacions que s'han fet en la rèplica de ràting, i alhora, examinar diferents alternatives que permetin millorar l'acompliment de la rèplica amb SVM. Per a això, primer s'ha revisat la literatura financera amb la idea d'obtenir una visió general i panoràmica dels models usats per al mesurament del risc de crèdit. S'han revisat les aproximacions de mesurament de risc de crèdit individuals, utilitzades principalment per a la concessió de crèdits bancaris i per l'avaluació individual d'inversions en títols de renda fixa. També s'han revisat models de carteres d'actius, tant aquells proposats des del món acadèmic com els patrocinats per institucions financeres. A més, s'han revisat les aportacions dutes a terme per avaluar el risc de crèdit usant tècniques estadístiques i sistemes d'aprenentatge automàtic. S'ha fet especial èmfasi en aquest últim conjunt de mètodes d'aprenentatge i en el conjunt de metodologies usades per realitzar adequadament la rèplica de ràting. Per millorar l'acompliment de la rèplica, s'ha triat una tècnica de discretització de les variables sota la suposició que, per emetre l'opinió tècnica del ràting de les companyies, els experts financers en forma intuïtiva avaluen les característiques de les empreses en termes intervalars. En aquesta tesi, per fer la rèplica de ràting, s'ha fet servir una mostra de dades de companyies de països desenvolupats. S'han usat diferents tipus de SVM per replicar i s'ha exposat la bondat dels resultats d'aquesta rèplica, comparant-la amb altres dues tècniques estadístiques àmpliament usades en la literatura financera. S'ha concentrat l'atenció de la mesura de la bondat de l'ajust dels models en les taxes d'encert i en la forma en què es distribueixen els errors. D'acord amb els resultats obtinguts es pot sostenir que l'acompliment de les SVM és millor que el de les tècniques estadístiques usades en aquesta tesi, i després de la discretització de les dades d'entrada s'ha mostrat que no es perd informació rellevant en aquest procés. Això contribueix a la idea que els experts financers instintivament realitzen un procés similar de discretització de la informació financera per lliurar la seva opinió creditícia de les companyies qualificades. / La correcta calificación de riesgo crediticio de un emisor es un factor crítico en nuestra actual economía. Profesionales y académicos están de acuerdo en esto, y los medios de comunicación han difundido mediáticamente eventos de impacto provocados por agencias de rating. Por ello, el trabajo de análisis del deudor realizado por expertos financieros conlleva importantes recursos en las empresas de consultoría de inversión y agencias calificadoras. Hoy en día, muchos avances metodológicos y técnicos permiten el apoyo a la labor que hacen los profesionales en de calificación de la calidad crediticia de los emisores. No obstante aún quedan muchos vacíos por completar y áreas que desarrollar para que esta tarea sea todo lo precisa que necesita. Por otra parte, los sistemas de aprendizaje automático basados en funciones núcleo, particularmente las Support Vector Machines (SVM), han dado buenos resultados en problemas de clasificación cuando los datos no son linealmente separables o cuando hay patrones ruidosos. Además, al usar estructuras basadas en funciones núcleo resulta posible tratar cualquier espacio de datos, expandiendo las posibilidades para encontrar relaciones entre los patrones, tarea que no resulta fácil con técnicas estadísticas convencionales. El propósito de esta tesis es examinar los aportes que se han hecho en la réplica de rating, y a la vez, examinar diferentes alternativas que permitan mejorar el desempeño de la réplica con SVM. Para ello, primero se ha revisado la literatura financiera con la idea de obtener una visión general y panorámica de los modelos usados para la medición del riesgo crediticio. Se han revisado las aproximaciones de medición de riesgo crediticio individuales, utilizadas principalmente para la concesión de créditos bancarios y para la evaluación individual de inversiones en títulos de renta fija. También se han revisado modelos de carteras de activos, tanto aquellos propuestos desde el mundo académico como los patrocinados por instituciones financieras. Además, se han revisado los aportes llevados a cabo para evaluar el riesgo crediticio usando técnicas estadísticas y sistemas de aprendizaje automático. Se ha hecho especial énfasis en este último conjunto de métodos de aprendizaje y en el conjunto de metodologías usadas para realizar adecuadamente la réplica de rating. Para mejorar el desempeño de la réplica, se ha elegido una técnica de discretización de las variables bajo la suposición de que, para emitir la opinión técnica del rating de las compañías, los expertos financieros en forma intuitiva evalúan las características de las empresas en términos intervalares. En esta tesis, para realizar la réplica de rating, se ha usado una muestra de datos de compañías de países desarrollados. Se han usado diferentes tipos de SVM para replicar y se ha expuesto la bondad de los resultados de dicha réplica, comparándola con otras dos técnicas estadísticas ampliamente usadas en la literatura financiera. Se ha concentrado la atención de la medición de la bondad del ajuste de los modelos en las tasas de acierto y en la forma en que se distribuyen los errores. De acuerdo con los resultados obtenidos se puede sostener que el desempeño de las SVM es mejor que el de las técnicas estadísticas usadas en esta tesis; y luego de la discretización de los datos de entrada se ha mostrado que no se pierde información relevante en dicho proceso. Esto contribuye a la idea de que los expertos financieros instintivamente realizan un proceso similar de discretización de la información financiera para entregar su opinión crediticia de las compañías calificadas. / Proper credit rating of an issuer is a critical factor in our current economy. Professionals and academics agree on this, and the media have spread impact events caused by rating agencies. Therefore, the analysis performed by the debtor's financial experts has significant resources on investment consulting firms and rating agencies. Nowadays, many methodological and technical exist to support the professional qualification of the credit quality of issuers. However there are still many gaps to complete and areas to develop for this task to be as precise as needed. Moreover, machine learning systems based on core functions, particularly Support Vector Machines (SVM) have been successful in classification problems when the data are not linearly separable or when noisy patterns are used. In addition, by using structures based on kernel functions is possible to treat any data space, expanding the possibilities to find relationships between patterns, a task that is not easy with conventional statistical techniques. The purpose of this thesis is to examine the contributions made in the replica of rating, and, to look at different alternatives to improve the performance of prediction with SVM. To do this, we first reviewed the financial literature and overview the models used to measure credit risk. We reviewed the approaches of individual credit risk measurement, used principally for the lending bank and the individual assessment of investments in fixed income securities. Models based on portfolio of assets have also been revised, both those proposed from academia such as those used by financial institutions. In addition, we have reviewed the contributions carried out to assess credit risk using statistical techniques and machine learning systems. Particular emphasis has been placed on learning methods methodologies used to perform adequately replicate rating. To improve the performance of replication, a discretization technique has been chosen for the variables under the assumption that, for the opinion of the technical rating companies, financial experts intuitively evaluate the performances of companies in intervalar terms. In this thesis, for rating replication, we used a data sample of companies in developed countries. Different types of SVM have been used to replicate and discussed the goodness of the results of the replica, compared with two other statistical techniques widely used in the financial literature. Special attention has been given to measure the goodness of fit of the models in terms of rates of success and how they errors are distributed. According to the results it can be argued that the performance of SVM is better than the statistical techniques used in this thesis. In addition, it has been shown that in the process of discretization of the input data no-relevant information is lost. This contributes to the idea that financial experts instinctively made a similar process of discretization of financial information to deliver their credit opinion of the qualified companies. Màquina de vector de suport Qualificació de crèdit Risc creditici Máquina de soporte vectorial Calificación crediticia Riesgo crediticio Support vector machine Supervised learning machines Credit rating Credit risk Management Sciences 004 336
395	Frequency Analysis of Droughts Using Stochastic and Soft Computing Techniques Sadri, Sara January 2010 (has links) In the Canadian Prairies recurring droughts are one of the realities which can have significant economical, environmental, and social impacts. For example, droughts in 1997 and 2001 cost over $100 million on different sectors. Drought frequency analysis is a technique for analyzing how frequently a drought event of a given magnitude may be expected to occur. In this study the state of the science related to frequency analysis of droughts is reviewed and studied. The main contributions of this thesis include development of a model in Matlab which uses the qualities of Fuzzy C-Means (FCMs) clustering and corrects the formed regions to meet the criteria of effective hydrological regions. In FCM each site has a degree of membership in each of the clusters. The algorithm developed is flexible to get number of regions and return period as inputs and show the final corrected clusters as output for most case scenarios. While drought is considered a bivariate phenomena with two statistical variables of duration and severity to be analyzed simultaneously, an important step in this study is increasing the complexity of the initial model in Matlab to correct regions based on L-comoments statistics (as apposed to L-moments). Implementing a reasonably straightforward approach for bivariate drought frequency analysis using bivariate L-comoments and copula is another contribution of this study. Quantile estimation at ungauged sites for return periods of interest is studied by introducing two new classes of neural network and machine learning: Radial Basis Function (RBF) and Support Vector Machine Regression (SVM-R). These two techniques are selected based on their good reviews in literature in function estimation and nonparametric regression. The functionalities of RBF and SVM-R are compared with traditional nonlinear regression (NLR) method. As well, a nonlinear regression with regionalization method in which catchments are first regionalized using FCMs is applied and its results are compared with the other three models. Drought data from 36 natural catchments in the Canadian Prairies are used in this study. This study provides a methodology for bivariate drought frequency analysis that can be practiced in any part of the world. Droughts Frequency Analysis Fuzzy C-Means Clustering Regionalization L-moments L-comoments Bivariate Copula Radial Basis Function Support Vector Machine Regression Nonlinear Regression Civil Engineering
396	Applications of Soft Computing for Power-Quality Detection and Electric Machinery Fault Diagnosis Wu, Chien-Hsien 20 November 2008 (has links) With the deregulation of power industry and the market competition, stable and reliable power supply is a major concern of the independent system operator (ISO). Power-quality (PQ) study has become a more and more important subject lately. Harmonics, voltage swell, voltage sag, and power interruption could downgrade the service quality. In recent years, high speed railway (HSR) and massive rapid transit (MRT) system have been rapidly developed, with the applications of widespread semiconductor technologies in the auto-traction system. The harmonic distortion level worsens due to these increased uses of electronic equipment and non-linear loads. To ensure the PQ, power-quality disturbances (PQD) detection becomes important. A detection method with classification capability will be helpful for detecting disturbance locations and types. Electric machinery fault diagnosis is another issue of considerable attentions from utilities and customers. ISO need to provide a high quality service to retain their customers. Fault diagnosis of turbine-generator has a great effect on the benefit of power plants. The generator fault not only damages the generator itself, but also causes outages and loss of profits. With high-temperature, high-pressure and factors such as thermal fatigues, many components may go wrong, which will not only lead to great economic loss, but sometimes a threat to social security. Therefore, it is necessary to detect generator faults and take immediate actions to cut the loss. Besides, induction motor plays a major role in a power system. For saving cost, it is important to run periodical inspections to detect incipient faults inside the motor. Preventive techniques for early detection can find out the incipient faults and avoid outages. This dissertation developed various soft computing (SC) algorithms for detection including power-quality disturbances (PQD), turbine-generator fault diagnosis, and induction motor fault diagnosis. The proposed SC algorithms included support vector machine (SVM), grey clustering analysis (GCA), and probabilistic neural network (PNN). Integrating the proposed diagnostic procedure and existing monitoring instruments, a well-monitored power system will be constructed without extra devices. Finally, all the methods in the dissertation give reasonable and practical estimation method. Compared with conventional method, the test results showed a high accuracy, good robustness, and a faster processing performance. Soft Computing (SC) and Probabilistic Neural Network (PNN) Power-Quality Disturbances (PQD) Induction Motor Fault Diagnosis (IMFD) Support Vector Machine (SVM) Turbine-Generator Fault Diagnosis (TGFD) Grey Clustering Analysis (GCA)
397	New support vector machine formulations and algorithms with application to biomedical data analysis Guan, Wei 13 June 2011 (has links) The Support Vector Machine (SVM) classifier seeks to find the separating hyperplane wx=r that maximizes the margin distance 1/\|\|w\|\|2^2. It can be formalized as an optimization problem that minimizes the hinge loss Ʃ[subscript i](1-y[subscript i] f(x[subscript i]))₊ plus the L₂-norm of the weight vector. SVM is now a mainstay method of machine learning. The goal of this dissertation work is to solve different biomedical data analysis problems efficiently using extensions of SVM, in which we augment the standard SVM formulation based on the application requirements. The biomedical applications we explore in this thesis include: cancer diagnosis, biomarker discovery, and energy function learning for protein structure prediction. Ovarian cancer diagnosis is problematic because the disease is typically asymptomatic especially at early stages of progression and/or recurrence. We investigate a sample set consisting of 44 women diagnosed with serous papillary ovarian cancer and 50 healthy women or women with benign conditions. We profile the relative metabolite levels in the patient sera using a high throughput ambient ionization mass spectrometry technique, Direct Analysis in Real Time (DART). We then reduce the diagnostic classification on these metabolic profiles into a functional classification problem and solve it with functional Support Vector Machine (fSVM) method. The assay distinguished between the cancer and control groups with an unprecedented 99\% accuracy (100\% sensitivity, 98\% specificity) under leave-one-out-cross-validation. This approach has significant clinical potential as a cancer diagnostic tool. High throughput technologies provide simultaneous evaluation of thousands of potential biomarkers to distinguish different patient groups. In order to assist biomarker discovery from these low sample size high dimensional cancer data, we first explore a convex relaxation of the L₀-SVM problem and solve it using mixed-integer programming techniques. We further propose a more efficient L₀-SVM approximation, fractional norm SVM, by replacing the L₂-penalty with L[subscript q]-penalty (q in (0,1)) in the optimization formulation. We solve it through Difference of Convex functions (DC) programming technique. Empirical studies on the synthetic data sets as well as the real-world biomedical data sets support the effectiveness of our proposed L₀-SVM approximation methods over other commonly-used sparse SVM methods such as the L₁-SVM method. A critical open problem in emph{ab initio} protein folding is protein energy function design. We reduce the problem of learning energy function for extit{ab initio} folding to a standard machine learning problem, learning-to-rank. Based on the application requirements, we constrain the reduced ranking problem with non-negative weights and develop two efficient algorithms for non-negativity constrained SVM optimization. We conduct the empirical study on an energy data set for random conformations of 171 proteins that falls into the {it ab initio} folding class. We compare our approach with the optimization approach used in protein structure prediction tool, TASSER. Numerical results indicate that our approach was able to learn energy functions with improved rank statistics (evaluated by pairwise agreement) as well as improved correlation between the total energy and structural dissimilarity. Ovarian cancer detection Functional SVM Biomarker discovery Mixed-integer SVM Fractional-norm SVM Non-negative SVM Ranking SVM Protein folding energy function Support vector machine optimization Support vector machines Algorithms Bioinformatics Machine learning
398	重疊法應用於蛋白質質譜儀資料 / Overlap Technique on Protein Mass Spectrometry Data 徐竣建, Hsu, Chun-Chien Unknown Date (has links) 癌症至今已連續蟬聯並高居國人十大死因之首，由於癌症初期病患接受適時治療的存活率較高，因此若能「早期發現，早期診斷，早期治療」則可降低死亡率。本文所引用的資料庫，是經由「表面強化雷射解吸電離飛行質譜技術」（SELDI-TOF-MS）所擷取建置的蛋白質質譜儀資料，包括兩筆高維度資料：一筆為攝護腺癌症，另一筆則為頭頸癌症。然而蛋白質質譜儀資料常因維度變數繁雜眾多，對於資料的存取容量及運算時間而言，往往造成相當沉重的負擔與不便；有鑑於此，本文之目的即在探討將高維度資料經由維度縮減後，找出分錯率最小化之分析方法，希冀提高癌症病例資料分類的準確性。本研究分為實驗組及對照組兩部分，實驗組是以主成份分析（Principal Component Analysis，PCA）進行維度縮減，再利用支持向量機（Support Vector Machine，SVM）予以分類，最後藉由重疊法（Overlap）以期改善分類效果；對照組則是以支持向量機直接進行分類。分析結果顯示，重疊法對於攝護腺癌症具有顯著的改善效果，但對於頭頸癌症的改善效果卻不明顯。此外，本研究也探討關於蛋白質質譜儀資料之質量範圍，藉以確認專家學者所建議的質量範圍是否與分析結果相互一致。在攝護腺癌症中的原始資料，專家學者所建議的質量範圍以外，似乎仍隱藏著重要的相關資訊；在頭頸癌症中的原始資料，專家學者所建議的質量範圍以外，對於研究分析而言則並沒有實質上的幫助。 / Cancer has been the number one leading cause of death in Taiwan for the past 24 years. Early detection of this disease would significantly reduce the mortality rate. The database adopted in this study is from the Protein Mass Spectrometry Data Sets acquired and established by “Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry” (SELDI-TOF-MS) technique, including the Prostate Cancer and Head/Neck Cancer Data Sets. However, because of its high dimensionality, dealing the analysis of the raw data is not easy. Therefore, the purpose of this thesis is to search a feasible method, putting the dimension reduction and minimizing classification errors in the same time. The data sets are separated into the experimental and controlled groups. The first step of the experimental group is to use dimension reduction by Principal Component Analysis (PCA), following by Support Vector Machine (SVM) for classification, and finally Overlap Method is used to reduce classification errors. For comparison, the controlled group uses SVM for classification. The empirical results indicate that the improvement of Overlap Method is significant in the Prostate Cancer case, but not in that of the Head/Neck case. We also study data range suggested according to the expert opinions. We find that there is information hidden outside the data range suggested by the experts in the Prostate Cancer case, but not in the Head/Neck case. 疾病診斷維度縮減分類主成份分析支持向量機重疊法 Disease Diagnosis Dimension Reduction Classification Principal Component Analysis Support Vector Machine Overlap
399	Frequency Analysis of Droughts Using Stochastic and Soft Computing Techniques Sadri, Sara January 2010 (has links) In the Canadian Prairies recurring droughts are one of the realities which can have significant economical, environmental, and social impacts. For example, droughts in 1997 and 2001 cost over $100 million on different sectors. Drought frequency analysis is a technique for analyzing how frequently a drought event of a given magnitude may be expected to occur. In this study the state of the science related to frequency analysis of droughts is reviewed and studied. The main contributions of this thesis include development of a model in Matlab which uses the qualities of Fuzzy C-Means (FCMs) clustering and corrects the formed regions to meet the criteria of effective hydrological regions. In FCM each site has a degree of membership in each of the clusters. The algorithm developed is flexible to get number of regions and return period as inputs and show the final corrected clusters as output for most case scenarios. While drought is considered a bivariate phenomena with two statistical variables of duration and severity to be analyzed simultaneously, an important step in this study is increasing the complexity of the initial model in Matlab to correct regions based on L-comoments statistics (as apposed to L-moments). Implementing a reasonably straightforward approach for bivariate drought frequency analysis using bivariate L-comoments and copula is another contribution of this study. Quantile estimation at ungauged sites for return periods of interest is studied by introducing two new classes of neural network and machine learning: Radial Basis Function (RBF) and Support Vector Machine Regression (SVM-R). These two techniques are selected based on their good reviews in literature in function estimation and nonparametric regression. The functionalities of RBF and SVM-R are compared with traditional nonlinear regression (NLR) method. As well, a nonlinear regression with regionalization method in which catchments are first regionalized using FCMs is applied and its results are compared with the other three models. Drought data from 36 natural catchments in the Canadian Prairies are used in this study. This study provides a methodology for bivariate drought frequency analysis that can be practiced in any part of the world. Droughts Frequency Analysis Fuzzy C-Means Clustering Regionalization L-moments L-comoments Bivariate Copula Radial Basis Function Support Vector Machine Regression Nonlinear Regression Civil Engineering
400	應用探勘技術於社會輿情以預測捷運週邊房地產市場之研究 / A Study of Applying Public Opinion Mining to Predict the Housing Market Near the Taipei MRT Stations 吳佳芸, Wu, Chia Yun Unknown Date (has links) 因網際網路帶來的便利性與即時性，網路新聞成為社會大眾吸收與傳遞新聞資訊的重要管道之一，而累積的巨量新聞亦可反映出社會輿論對某特定新聞議題之即時反應、熱門程度以及情緒走向等。因此，本研究期望借由意見探勘與情緒分析技術，從特定領域新聞中挖掘出有價值的關聯，並結合傳統機器學習建立一個房地產市場的預測模式，提供購屋決策的參考依據。本研究搜集99年1月1日至103年6月30日共1,1150筆房地產新聞，以及8,165件捷運週邊250公尺內房屋買賣交易資料，運用意見探勘萃取意見詞彙進行情緒分析，並建立房市情緒與成交價量時間序列，透過半年移動平均、二次移動平均及成長斜率，瞭解社會輿情對房市行情抱持樂觀或悲觀，分析社會情緒與實際房地產成交間關聯性，以期能找出房地產買賣時機點，並進一步結合情緒及房地產的環境影響因素，藉由支援向量機建立站點房市的預測模型。實證結果中，本研究發現房市情緒與成交價量之波動有一定的週期與相關性，且新捷運開通前一年將連帶影響整體捷運房市波動，當成交線穿越情緒線且斜率同時向上時，可做為適當的房市進場時機點。而本研究針對站點情緒與環境變數所建立之預測模型，其預測新捷運線站點之平均準確率為69.2％，而預測新捷運線熱門站點之準確率為78％，顯示模型於預測熱門站點上具有不錯的預測能力。 / Nowadays, E-News have become an important way for people to get daily information. These enormous amounts of news could reflect public opinions on a particular attention or sentiment trends in news topics. Therefore, how to use opinion mining and sentiment analysis technology to dig out valuable information from particular news becomes the latest issue. In this study, we collected 1,1150 house news and 8,165 house transaction records around the MRT stations within 250 meters over the last five years. We extracted the emotion words from the news by manipulating opinion mining. Furthermore, we built moving average lines and the slope of the moving average in order to explore the relationship and entry point between public opinion and housing market. In conclusion, we indicated that there is a high correlation between the news sentiment and housing market. We also uses SVM algorithm to construct a model to predict housing hotspots. The results demonstrate that the SVM model reaches average accuracy at 69.2% and the model accuracy increases up to 78% for predicting housing hotspots. Besides, we also provide investors with a basis of entry point into the housing market by utilizing the moving average cross overs and slopes analysis and a better way of predicting housing hotspots. 文字探勘情緒探勘房地產移動平均支援向量機 Text Mining Opinion Mining Housing Market Moving Average Support Vector Machine

Search results