121 |
Detection, Localization, and Recognition of Faults in Transmission Networks Using Transient CurrentsPerera, Nuwan 18 September 2012 (has links)
The fast clearing of faults is essential for preventing equipment damage and preserving the stability of the power transmission systems with smaller operating margins. This thesis examined the application of fault generated transients for fast detection and isolation of faults in a transmission system. The basis of the transient based protection scheme developed and implemented in this thesis is the fault current directions identified by a set of relays located at different nodes of the system. The direction of the fault currents relative to a relay location is determined by comparing the signs of the wavelet coefficients of the currents measured in all branches connected to the node. The faulted segment can be identified by combining the fault directions identified at different locations in the system. In order to facilitate this, each relay is linked with the relays located at the adjacent nodes through a telecommunication network.
In order to prevent possible malfunctioning of relays due to transients originating from non-fault related events, a transient recognition system to supervise the relays is proposed. The applicability of different classification methods to develop a reliable transient recognition system was examined. A Hidden Markov Model classifier that utilizes the energies associated with the wavelet coefficients of the measured currents as input features was selected as the most suitable solution.
Performance of the protection scheme was evaluated using a high voltage transmission system simulated in PSCAD/EMTDC simulation software. The custom models required to simulate the complete protection scheme were implemented in PSCAD/EMTDC. The effects of various factors such as fault impedance, signal noise, fault inception angle and current transformer saturation were investigated. The performance of the protection scheme was also tested with the field recorded signals.
Hardware prototypes of the fault direction identification scheme and the transient classification system were implemented and tested under different practical scenarios using input signals generated with a real-time waveform playback instrument. The test results presented in this thesis successfully demonstrate the potential of using transient signals embedded in currents for detection, localization and recognition of faults in transmission networks in a fast and reliable manner.
|
122 |
Development of a classification model in disability sportWu, Sheng Kuang January 1999 (has links)
The principal aim of this study was to develop a classification model in disability sports. Using disability swimming as an example, methods of participant observation, interview, survey and document analysis were undertaken in three empirical studies to develop and clarify the classification model and three elements in swimming classification- (a) the classification process, (b) classifiers and (c) the classification system. First, the swimming classification process was identified as a social process. Members in the classification process socially interacted. The detailed classification process was described, interpreted and discussed. Several features in the classification process were identified. They included interaction among social actors, routinization, rules in the process, resources used by classifiers, power relations among social actors, allocation of rewards and sanctions in the classification process, and conflicts among social actors. Second, the role of classifiers as an agent of social control in disability swimming was examined. Resources used by medical and technical classifiers in the classification process to maintain their role and social order, and the socialization of classifiers in swimming were specifically explored. In addition, the important characteristics of swimming classifiers were identified in the study. Third, classification outcomes in disability swimming were monitored to evaluate the effectiveness of the classification system. Performance and impairment approaches were used in the study. Data of performances and types of impairment of Paralympic swimmers were analysed. The results revealed that the swimming classification system was generally fair but some classes needed to be fine-tuned. In this study elements of the classification model were clarified by integration of the results of the three empirical studies and the classification literature. It is suggested that researchers may use the concepts of the classification model for further investigationin disability sportc lassificationa nd disability sport committees may apply the model to systematicallye valuatet heir own classification systems, processes and classifiers.
|
123 |
Some problems in high dimensional data analysisPham, Tung Huy January 2010 (has links)
The bloom of economics and technology has had an enormous impact on society. Along with these developments, human activities nowadays produce massive amounts of data that can be easily collected for relatively low cost with the aid of new technologies. Many examples can be mentioned here including data from web term-document data, sensor arrays, gene expression, finance data, imaging and hyperspectral analysis. Because of the enormous amount of data from various different and new sources, more and more challenging scientific problems appear. These problems have changed the types of problems which mathematical scientists work. / In traditional statistics, the dimension of the data, p say, is low, with many observations, n say. In this case, classical rules such as the Central Limit Theorem are often applied to obtain some understanding from data. A new challenge to statisticians today is dealing with a different setting, when the data dimension is very large and the number of observations is small. The mathematical assumption now could be p > n, or even p goes to infinity and n fixed in many cases, for example, there are few patients with many genes. In these cases, classical methods fail to produce a good understanding of the nature of the problem. Hence, new methods need to be found to solve these problems. Mathematical explanations are also needed to generalize these cases. / The research preferred in this thesis includes two problems: Variable selection and Classification, in the case where the dimension is very large. The work on variable selection problems, in particular the Adaptive Lasso was completed by June 2007 and the research on classification has been carried out through out 2008 and 2009. The research on the Dantzig selector and the Lasso were finished in July 2009. Therefore, this thesis is divided into two parts. In the first part of the thesis we study the Adaptive Lasso, the Lasso and the Dantzig selector. In particular, in Chapter 2 we present some results for the Adaptive Lasso. Chapter 3 will provides two examples that show that neither the Dantzig selector or the Lasso is definitely better than the other. The second part of the thesis is organized as follows. In Chapter 5, we shall construct the model setting. In Chapter 6, we summarize the results of the scaled centroid-based classifier. We also prove some results on the scaled centroid-based classifier. Because there are similarities between the Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD) classifiers, Chapter 8 introduces a class of distance-based classifiers that could be considered a generalization of the SVM and DWD classifiers. Chapters 9 and 10 are about the SVM and DWD classifiers. Chapter 11 demonstrates the performance of these classifiers on simulated data sets and some cancer data sets.
|
124 |
Some problems in high dimensional data analysisPham, Tung Huy January 2010 (has links)
The bloom of economics and technology has had an enormous impact on society. Along with these developments, human activities nowadays produce massive amounts of data that can be easily collected for relatively low cost with the aid of new technologies. Many examples can be mentioned here including data from web term-document data, sensor arrays, gene expression, finance data, imaging and hyperspectral analysis. Because of the enormous amount of data from various different and new sources, more and more challenging scientific problems appear. These problems have changed the types of problems which mathematical scientists work. / In traditional statistics, the dimension of the data, p say, is low, with many observations, n say. In this case, classical rules such as the Central Limit Theorem are often applied to obtain some understanding from data. A new challenge to statisticians today is dealing with a different setting, when the data dimension is very large and the number of observations is small. The mathematical assumption now could be p > n, or even p goes to infinity and n fixed in many cases, for example, there are few patients with many genes. In these cases, classical methods fail to produce a good understanding of the nature of the problem. Hence, new methods need to be found to solve these problems. Mathematical explanations are also needed to generalize these cases. / The research preferred in this thesis includes two problems: Variable selection and Classification, in the case where the dimension is very large. The work on variable selection problems, in particular the Adaptive Lasso was completed by June 2007 and the research on classification has been carried out through out 2008 and 2009. The research on the Dantzig selector and the Lasso were finished in July 2009. Therefore, this thesis is divided into two parts. In the first part of the thesis we study the Adaptive Lasso, the Lasso and the Dantzig selector. In particular, in Chapter 2 we present some results for the Adaptive Lasso. Chapter 3 will provides two examples that show that neither the Dantzig selector or the Lasso is definitely better than the other. The second part of the thesis is organized as follows. In Chapter 5, we shall construct the model setting. In Chapter 6, we summarize the results of the scaled centroid-based classifier. We also prove some results on the scaled centroid-based classifier. Because there are similarities between the Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD) classifiers, Chapter 8 introduces a class of distance-based classifiers that could be considered a generalization of the SVM and DWD classifiers. Chapters 9 and 10 are about the SVM and DWD classifiers. Chapter 11 demonstrates the performance of these classifiers on simulated data sets and some cancer data sets.
|
125 |
Off-line signature verification using classifier ensembles and flexible grid featuresSwanepoel, Jacques Philip 12 1900 (has links)
Thesis (MSc (Mathematical Sciences))—University of Stellenbosch, 2009. / Thesis presented in partial fulfilment of the requirements
for the degree of Master of Science in applied mathematics
at Stellenbosch University / ENGLISH ABSTRACT: In this study we investigate the feasibility of combining an ensemble of eight continuous base classifiers for the purpose of off-line signature verification. This work is mainly inspired by the process of cheque authentication within the banking environment. Each base classifier is constructed by utilising a specific local feature, in conjunction with a
specific writer-dependent signature modelling technique. The local features considered are
pixel density, gravity centre distance, orientation and predominant slant. The modelling
techniques considered are dynamic time warping and discrete observation hidden Markov models. In this work we focus on the detection of high quality (skilled) forgeries. Feature extraction is achieved by superimposing a grid with predefined resolution onto a signature image, whereafter a single local feature is extracted from each signature sub-image corresponding to a specific grid cell. After encoding the signature image into a matrix of local features, each column within said matrix represents a feature vector (observation) within a feature set (observation sequence). In this work we propose a novel flexible grid-based feature extraction technique and show that it outperforms existing rigid grid-based techniques. The performance of each continuous classifier is depicted by a receiver operating characteristic (ROC) curve, where each point in ROC-space represents the true positive rate and false positive rate of a threshold-specific discrete classifier. The objective is therefore to develope a combined classifier for which the area-under-curve (AUC) is maximised -or for which the equal error rate (EER) is minimised. Two disjoint data sets, in conjunction with a cross-validation protocol, are used for model optimisation and model evaluation. This protocol avoids possible model overfitting,
and also scrutinises the generalisation potential of each classifier. During the first optimisation stage, the grid configuration which maximises proficiency is determined for each base classifier. During the second optimisation stage, the most proficient ensemble of optimised base classifiers is determined for several classifier fusion strategies. During both optimisation stages only the optimisation data set is utilised. During evaluation, each optimal classifier ensemble is combined using a specific fusion strategy, and retrained and tested on the separate evaluation data set. We show that the performance of the optimal combined classifiers is significantly better than that of the optimal individual base classifiers. Both score-based and decision-based fusion strategies are investigated, which includes a novel extension to an existing decision-based fusion strategy. The existing strategy is based on ROC-statistics of the base classifiers and maximum likelihood estimation. We show that the proposed elitist maximum attainable ROC-based strategy outperforms the existing one. / AFRIKAANSE OPSOMMING: In hierdie projek ondersoek ons die haalbaarheid van die kombinasie van agt kontinue basis-klassifiseerders, vir statiese handtekeningverifikasie. Hierdie werk is veral relevant met die oog op die bekragtiging van tjeks in die bankwese. Elke basis-klassifiseerder word gekonstrueer deur ’n spesifieke plaaslike kenmerk in verband te bring met ’n spesifieke
skrywer-afhanklike handtekeningmodelleringstegniek. Die plaaslike kenmerke sluit pikseldigtheid,
swaartepunt-afstand, oriëntasie en oorheersende helling in, terwyl die modelleringstegnieke
dinamiese tydsverbuiging en diskrete verskuilde Markov modelle insluit. Daar word op die opsporing van hoë kwaliteit vervalsings gefokus.
Kenmerk-onttreking word bewerkstellig deur die superponering van ’n rooster van voorafgedefinieerde resolusie op ’n bepaalde handtekening. ’n Enkele plaaslike kenmerk
word onttrek vanuit die betrokke sub-beeld geassosieer met ’n spesifieke roostersel. Nadat
die handtekeningbeeld na ’n matriks van plaaslike kenmerke getransformeer is, verteenwoordig elke kolom van die matriks ’n kenmerkvektor in ’n kenmerkstel. In hierdie werk stel ons ’n nuwe buigsame rooster-gebasseerde kenmerk-ontrekkingstegniek voor en toon aan dat dit die bestaande starre rooster-gebasseerde tegnieke oortref. Die prestasie van elke kontinue klassifiseerder word voorgestel deur ’n ROC-kurwe,
waar elke punt in die ROC-ruimte die ware positiewe foutkoers en vals positiewe foutkoers
van ’n drempel-spesifieke diskrete klassifiseerder verteenwoordig. Die doelwit is derhalwe die ontwikkeling van ’n gekombineerde klassifiseerder, waarvoor die area onder die kurwe
(AUC) gemaksimeer word - of waarvoor die gelyke foutkoers (EER) geminimeer word. Twee disjunkte datastelle en ’n kruisverifi¨eringsprotokol word gebruik vir model optimering en model evaluering. Hierdie protokol vermy potensiële model-oorpassing, en ondersoek ook die veralgemeningspotensiaal van elke klassifiseerder. Tydens die eerste optimeringsfase
word die rooster-konfigurasie wat die bekwaamheid van elke basis-klassifiseerder maksimeer, gevind. Tydens die tweede optimeringsfase word die mees bekwame groepering van geoptimeerde basis-klassifiseerders gevind vir verskeie klassifiseerder fusiestrategieë. Tydens beide optimeringsfases word slegs die optimeringsdatastel gebruik. Tydens evaluering word elke optimale groep klassifiseerders gekombineer met ’n spesifieke fusie-strategie, her-afgerig en getoets op die aparte evalueringsdatastel. Ons toon aan dat die prestasie van die optimale gekombineerde klassifiseerder aansienlik beter is as dié van die optimale individuele basis-klassifiseerders.
Beide telling- en besluit-gebaseerde fusie-strategieë word ondersoek, insluitend ’n nuwe
uitbreiding van ’n bestaande besluit-gebasseerde kombinasie strategie. Die bestaande strategie is gebaseer op die ROC-statistiek van die basis-klassifiseerders en maksimum aanneemlikheidsberaming. Ons toon aan dat die voorgestelde elitistiese maksimum haalbare
ROC-gebasseerde strategie die bestaande strategie oortref.
|
126 |
Linguistic Innovations in Chinese: Internal and External FactorsPeng, Xinjia 06 September 2017 (has links)
This dissertation seeks to deepen understanding towards language change by answering three questions: What is the unit of change? What is the manner of change? What are the factors of change? Three cases of linguistic innovation in the Chinese language are examined. Adopting a usage-based approach, I analyze the language data of these three linguistic innovations, and the results provide unanimous answers to the three questions. First, the basic unit of language change is a construction, and it can be of any length, such as phrasal, clausal or discourse-length. Second, these cases of linguistic innovation present a scenario of change led by high-frequency exemplars, demonstrating that language change can be abrupt rather than gradual. Third, the external factors giving rise to the exemplars prove crucial in reconstructing language change in progress. All three case studies present linguistic innovation as a response to a changing material reality. I thus advocate a usage-based constructionist approach that considers external factors in the investigation of language change, as it allows us to develop a more comprehensive understanding of the process.
|
127 |
Social training : aprendizado semi supervisionado utilizando funções de escolha social / Social-Training: Semi-Supervised Learning Using Social Choice FunctionsAlves, Matheus January 2017 (has links)
Dada a grande quantidade de dados gerados atualmente, apenas uma pequena porção dos mesmos pode ser rotulada manualmente por especialistas humanos. Isso é um desafio comum para aplicações de aprendizagem de máquina. Aprendizado semi-supervisionado aborda este problema através da manipulação dos dados não rotulados juntamente aos dados rotulados. Entretanto, se apenas uma quantidade limitada de exemplos rotulados está disponível, o desempenho da tarefa de aprendizagem de máquina (e.g., classificação) pode ser não satisfatória. Diversas soluções abordam este problema através do uso de uma ensemble de classificadores, visto que essa abordagem aumenta a diversidade dos classificadores. Algoritmos como o co-training e o tri-training utilizam múltiplas partições de dados ou múltiplos algoritmos de aprendizado para melhorar a qualidade da classificação de instâncias não rotuladas através de concordância por maioria simples. Além disso, existem abordagens que estendem esta ideia e adotam processos de votação menos triviais para definir os rótulos, como eleição por maioria ponderada, por exemplo. Contudo, estas soluções requerem que os rótulos possuam um certo nível de confiança para serem utilizados no treinamento. Consequentemente, nem toda a informação disponível é utilizada. Por exemplo: informações associadas a níveis de confiança baixos são totalmente ignoradas. Este trabalho propõe uma abordagem chamada social-training, que utiliza toda a informação disponível na tarefa de aprendizado semi-supervisionado. Para isto, múltiplos classificadores heterogêneos são treinados com os dados rotulados e geram diversas classificações para as mesmas instâncias não rotuladas. O social-training, então, agrega estes resultados em um único rótulo por meio de funções de escolha social que trabalham com agregação de rankings sobre as instâncias. Especificamente, a solução trabalha com casos de classificação binária. Os resultados mostram que trabalhar com o ranking completo, ou seja, rotular todas as instâncias não rotuladas, é capaz de reduzir o erro de classificação para alguns conjuntos de dados da base da UCI utilizados. / Given the huge quantity of data currently being generated, just a small portion of it can be manually labeled by human experts. This is a challenge for machine learning applications. Semi-supervised learning addresses this problem by handling unlabeled data alongside labeled ones. However, if only a limited quantity of labeled examples is available, the performance of the machine learning task (e.g., classification) can be very unsatisfactory. Many solutions address this issue by using a classifier ensemble because this increases diversity. Algorithms such as co-training and tri-training use multiple views or multiple learning algorithms in order to improve the classification of unlabeled instances through simple majority agreement. Also, there are approaches that extend this idea and adopt less trivial voting processes to define the labels, like weighted majority voting. Nevertheless, these solutions require some confidence level on the label in order to use it for training. Hence, not all information is used, i.e., information associated with low confidence level is disregarded completely. An approach called social-training is proposed, which uses all information available in the semi-supervised learning task. For this, multiple heterogeneous classifiers are trained with the labeled data and generate diverse classifications for the same unlabeled instances. Social-training then aggregates these results into a single label by means of social choice functions that work with rank aggregation over the instances. The solution addresses binary classification cases. The results show that working with the full ranking, i.e., labeling all unlabeled instances, is able to reduce the classification error for some UCI data sets used.
|
128 |
Predicting context specific enhancer-promoter interactions from ChIP-Seq time course dataDzida, Tomasz January 2017 (has links)
We develop machine learning approaches to predict context specific enhancer-promoter interactions using evidence from changes in genomic protein occupancy over time. Occupancy of estrogen receptor alpha (ER-alpha), RNA polymerase (Pol II) and histone marks H2AZ and H3K4me3 were measured over time using ChIP-Seq experiments in MCF7 cells stimulated with estrogen. Two Bayesian classifiers were developed, unsupervised and supervised. The supervised approach uses the correlation of temporal binding patterns at enhancers and promoters and genomic proximity as features and predicts interactions. The method was trained using experimentally determined interactions from the same system and achieves much higher precision than predictions based on the genomic proximity of nearest ER-alpha binding. We use the method to identify a confident set of ER-alpha target genes and their regulatory enhancers genome-wide. Validation with publicly available GRO-Seq data shows our predicted targets are much more likely to show early nascent transcription than predictions based on genomic ER-alpha binding proximity alone. Accuracy of the predictions from the supervised model was compared against the second more complex unsupervised generative approach which uses proximity-based prior and temporal binding patterns at enhancers and promoters to infer protein-mediated regulatory complexes involving individual genes and their networks of multiple distant regulatory enhancers.
|
129 |
News Feed Classifications to Improve Volatility Predictions / News Feed Classifications to Improve Volatility PredictionsPogodina, Ksenia January 2018 (has links)
This thesis analyzes various text classification techniques in order to assess whether the knowledge of published news articles about selected companies can improve its' stock return volatility modelling and forecasting. We examine the content of the textual news releases and derive the news sentiment (po larity and strength) employing three different approaches: supervised machine learning Naive Bayes algorithm, lexicon-based as a representative of linguistic approach and hybrid Naive Bayes. In hybrid Naive Bayes we consider only the words contained in the specific lexicon rather than whole set of words from the article. For the lexicon-based approach we used independently two lexicons one with binary another with multiclass labels. The training set for the Naive Bayes was labeled by the author. When comparing the classifiers from the machine learning approach we can conclude that all of them performed similarly with a slight advantage of the hybrid Naive Bayes combined with multiclass lexicon. The resulting quantitative data in form of sentiment scores will be then incorpo rated into GARCH volatility modelling. The findings suggest that information contained in news feeds does bring an additional explanatory power to tradi tional GARCH model and is able to improve it's forecast. On the...
|
130 |
Aplicação da lógica nebulosa em um classificador para identificação de perfis por aspectos cognitivos / Application of fuzzy logic in a classifier for identification of profiles by cognitive aspectsOliveira, Alciano Gustavo Genovez [UNESP] 29 November 2016 (has links)
Submitted by Alciano Oliveira (alcianoliveira@gmail.com) on 2017-01-10T03:09:14Z
No. of bitstreams: 1
Oliveira_dhe_me_sjrp.pdf: 2092453 bytes, checksum: 94092310c872107bbc01f5999659f295 (MD5) / Approved for entry into archive by LUIZA DE MENEZES ROMANETTO (luizamenezes@reitoria.unesp.br) on 2017-01-12T19:41:44Z (GMT) No. of bitstreams: 1
oliveira_agg_me_sjrp.pdf: 2092453 bytes, checksum: 94092310c872107bbc01f5999659f295 (MD5) / Made available in DSpace on 2017-01-12T19:41:44Z (GMT). No. of bitstreams: 1
oliveira_agg_me_sjrp.pdf: 2092453 bytes, checksum: 94092310c872107bbc01f5999659f295 (MD5)
Previous issue date: 2016-11-29 / Atualmente, as instituições de ensino, em sua grande maioria, estão disponibilizando cursos na modalidade de ensino a distância com o intuito de possibilitar aos indivíduos que não podem frequentar as aulas regularmente a realização de estudos a distância. Em muitos casos, os indivíduos desistem dessa modalidade de ensino sem concluir os estudos, isso se deve a vários fatores, sendo um deles, a dificuldade no entendimento do conteúdo disponibilizado durante as aulas devido a forma com que o mesmo é apresentado. Esta dissertação apresenta a utilização da lógica nebulosa em um classificador computacional, que tem por objetivo classificar indivíduos por aspectos cognitivos que estão relacionados com a Teoria das Inteligências Múltiplas propostas, originalmente, por Howard Gardner. O resultado dessa classificação possibilita direcionar os indivíduos para ambientes de ensino em que o conteúdo esteja adequado ao seu perfil cognitivo. Os testes foram realizados utilizando uma ferramenta acadêmica de mineração de dados que possibilitou determinar padrões cognitivos em cada indivíduo pela inserção de dados de entrada obtidos por meio da aplicação de um questionário e retornando os aspectos cognitivos mais aflorados de cada indivíduo. Após validados, os resultados mostraram aproximadamente 67% das classificações condizentes com os aspectos cognitivos identificados em aula. / Most educational institutions are offering distance-learning courses, in order to enable individuals who cannot attend classes regularly, to conduct their studies. In many cases, individuals drop out of this mode of teaching without completing their studies. One of the factors for that being the difficulty in understanding the content due to the way content is presented. This dissertation presents the use of fuzzy logic in a computational classifier that aims to classify individuals by cognitive aspects related to Howard Gardner’s Theory of Multiple Intelligences. The outcome of such a classifier makes it possible to direct individuals to learning environments in which the content is presented according to his/her main cognitive profile. The tests were performed using a data mining academic tool that allowed determining cognitive patterns for each individual by inputting data obtained from a questionnaire and returning the cognitive aspect most outlined in each individual. After validated, about 67% of the classification outcomes were considered in accordance to the cognitive aspects identified in classroom observations.
|
Page generated in 0.0447 seconds