Spelling suggestions: "subject:"class balance"" "subject:"class alance""
1 |
Aplica??o de t?cnicas de aprendizado de m?quina no reconhecimento de classes estruturais de prote?nasBittencourt, Valnaide Gomes 25 November 2005 (has links)
Made available in DSpace on 2014-12-17T14:56:03Z (GMT). No. of bitstreams: 1
ValnaideGB.pdf: 1369975 bytes, checksum: 404710d72240200cbd30a9116933d340 (MD5)
Previous issue date: 2005-11-25 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated / Atualmente, a classifica??o estrutural de prote?nas, que diz respeito ? infer?ncia de padr?es em sua conforma??o 3D, ? um dos principais problemas em aberto da Biologia Molecular. Esse problema vem recebendo a aten??o de muitos pesquisadores na ?rea de Bioinform?tica pelo fato de as fun??es das prote?nas estarem intrinsecamente relacionadas ?s suas diferentes conforma??es espaciais, que s?o de dif?cil obten??o experimental em laborat?rio. Considerando a grande diferen?a entre o n?mero de seq??ncias de prote?nas conhecidas e o n?mero de estruturas tridimensionais determinadas experimentalmente, ? alta a demanda por t?cnicas automatizadas de classifica??o estrutural de prote?nas. Nesse contexto, as ferramentas computacionais, principalmente as t?cnicas de Aprendizado de M?quina (AM), tornaram-se alternativas essenciais para tratar esse problema. Neste trabalho, t?cnicas de AM s?o empregadas no reconhecimento de classes estruturais de prote?nas: ?rvore de Decis?o, k-Vizinhos Mais Pr?ximos, Na?ve Bayes, M?quinas de Vetores Suporte e Redes Neurais Artificiais. Esses m?todos foram escolhidos por representarem diferentes paradigmas de aprendizado e serem bastante citados na literatura. Visando conseguir uma melhoria de desempenho na solu??o do problema abordado, sistemas de multiclassifica??o homog?nea (Bagging e Boosting) e heterog?nea (Voting, Stacking e StackingC) s?o aplicados nesta pesquisa, usando como base as t?cnicas de AM anteriormente mencionadas. Al?m disso, pelo fato de a base de dados de prote?nas considerada neste trabalho apresentar o problema de classes desbalanceadas, t?cnicas artificiais de balanceamento de classes (Under-sampling Aleat?rio, Tomek Links, CNN, NCL e OSS) s?o utilizadas a fim de minimizar esse problema e melhorar o desempenho dos classificadores. Para a avalia??o dos m?todos de AM, um procedimento de valida??o cruzada ? empregado, em que a acur?cia dos classificadores ? medida atrav?s das m?dias da taxa de classifica??o incorreta nos conjuntos de testes independentes. Essas m?dias s?o comparadas duas a duas pelo teste de hip?tese a fim de avaliar se h? diferen?a estatisticamente significativa entre elas. Com os resultados obtidos, pode-se observar, entre os classificadores base, o desempenho superior do m?todo M?quinas de Vetores Suporte. Os sistemas de multiclassifica??o (homog?nea e heterog?nea), por sua vez, apresentaram, em geral, uma acur?cia superior ou similar a dos classificadores usados como base, destacando-se o Boosting que usou ?rvore de Decis?o em sua forma??o e o StackingC tendo como meta classificador a Regress?o Linear. O m?todo Voting, apesar de sua simplicidade, tamb?m mostrou-se adequado para a solu??o do problema considerado nesta disserta??o. Em rela??o ?s t?cnicas de balanceamento de classes, n?o foram alcan?ados melhores resultados de classifica??o global com as bases de dados obtidas com a aplica??o de tais t?cnicas. No entanto, foi poss?vel uma melhor classifica??o espec?fica da classe minorit?ria, de dif?cil aprendizado. A t?cnica NCL foi a que se mostrou mais apropriada ao balanceamento de classes da base de dados de prote?nas
|
2 |
CenterPoint-based 3D Object Detection in ONCE DatasetDu, Yuwei January 2022 (has links)
High-efficiency point cloud 3D object detection is important for autonomous driving. 3D object detection based on point cloud data is naturally more complex and difficult than the 2D task based on images. Researchers keep working on improving 3D object detection performance in autonomous driving scenarios recently. In this report, we present our optimized point cloud 3D object detection model based on CenterPoint method. CenterPoint detects centers of objects using a keypoint detector on top of a voxel-based backbone, then regresses to other attributes. On the basis of this, our modified model is featured with an improved Region Proposal Network (RPN) with extended receptive field, an added sub-head that produces an IoU-aware confidence score, as well as box ensemble inference strategies with more accurate predictions. These model enhancements, together with class-balanced data pre-processing, lead to a competitive accuracy of 72.02 mAP on ONCE Validation Split, and 79.09 mAP on ONCE Test Split. Our model gains the fifth place of ICCV 2021 Workshop SSLAD Track 3D Object Detection Challenge. / Högeffektiv punktmoln 3D-objektdetektering är viktig för autonom körning. 3D-objektdetektering baserad på punktmolnsdata är naturligtvis mer komplex och svårare än 2D-uppgiften baserad på bilder. Forskare fortsätter att arbeta med att förbättra 3D-objektdetekteringsprestandan i scenarier för autonom körning nyligen. I den här rapporten presenterar vi vår optimerade 3D-objektdetekteringsmodell baserad på CenterPoint. CenterPoint upptäcker objektcentrum med hjälp av en nyckelpunktsdetektor ovanpå en voxelbaserad ryggrad och går sedan tillbaka till andra attribut. På grundval av detta presenteras vår modifierade modell med ett förbättrat regionförslagsnätverk med utökat receptivt fält, en extra underrubrik som producerar en IoU-medveten konfidenspoäng och ensemblestrategier med mer exakta förutsägelser. Dessa modellförbättringar, tillsammans med klassbalanserad dataförbehandling, leder till en konkurrenskraftig noggrannhet på 72,02 mAP på ONCE Validation Split och 79,09 mAP på ONCE Test Split. Vår modell vinner femteplatsen i ICCV 2021 Workshop SSLAD Track 3D Object Detection Challenge.
|
Page generated in 0.0603 seconds