• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 95
  • 80
  • 11
  • 11
  • 10
  • 4
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 255
  • 92
  • 80
  • 69
  • 60
  • 57
  • 53
  • 52
  • 47
  • 47
  • 44
  • 41
  • 38
  • 37
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Humanness and classifiers in Mandarin Chinese: a corpus-based study of anthropocentric classification

Frankowsky, Maximilian, Ke, Dan January 2016 (has links)
Mandarin Chinese numeral classifiers receive considerable at-tention in linguistic research. The status of the general classifier 个 gè re-mains unresolved. Many linguists suggest that the use of 个 gè as a noun classifier is arbitrary. This view is challenged in the current study. Relying on the CCL-Corpus of Peking University and data from Google, we investigated which nouns for living beings are most likely classified by the general clas-sifier 个 gè. The results suggest that the use of the classifier 个 gè is motivated by an anthropocentric continuum as described by Köpcke and Zubin in the 1990s. We tested Köpcke and Zubin’s approach with Chinese native speakers. We examined 76 animal expressions to explore the semantic interdepen-dence of numeral classifiers and the nouns. Our study shows that nouns with the semantic feature [+ animate] are more likely to be classified by 个 gè if their denotatum is either very close to or very far located from the anthropo-centric center. In contrast animate nouns whose denotata are located at some intermediate distance from the anthropocentric center are less likely to be classified by 个 gè.
132

Klasifikace příspěvků ve webových diskusích / Classification of Web Forum Entries

Margold, Tomáš January 2008 (has links)
This thesis is dealing text ranking on the internet background. There are described available methods for classification and splitting of the text reports. The part of this thesis is implementation of Bayes naive algorithm and classifier using neuron nets. Selected methods are compared considering their error rate or other ranking features.
133

Early Stopping of a Neural Network via the Receiver Operating Curve.

Yu, Daoping 13 August 2010 (has links) (PDF)
This thesis presents the area under the ROC (Receiver Operating Characteristics) curve, or abbreviated AUC, as an alternate measure for evaluating the predictive performance of ANNs (Artificial Neural Networks) classifiers. Conventionally, neural networks are trained to have total error converge to zero which may give rise to over-fitting problems. To ensure that they do not over fit the training data and then fail to generalize well in new data, it appears effective to stop training as early as possible once getting AUC sufficiently large via integrating ROC/AUC analysis into the training process. In order to reduce learning costs involving the imbalanced data set of the uneven class distribution, random sampling and k-means clustering are implemented to draw a smaller subset of representatives from the original training data set. Finally, the confidence interval for the AUC is estimated in a non-parametric approach.
134

Classifying Germinal Center Derived Lymphomas: Navigate a Complex Transcriptional Landscape

Loeffler-Wirth, Henry, Kreuz, Markus, Schmidt, Maria, Ott, German, Siebert, Reiner, Binder, Hans 30 October 2023 (has links)
Classification of lymphoid neoplasms is based mainly on histologic, immunologic, and (rarer) genetic features. It has been supplemented by gene expression profiling (GEP) in the last decade. Despite the considerable success, particularly in associating lymphoma subtypes with specific transcriptional programs and classifier signatures of up- or downregulated genes, competing molecular classifiers were often proposed in the literature by different groups for the same classification tasks to distinguish, e.g., BL versus DLBCL or different DLBCL subtypes. Moreover, rarer sub-entities such as MYC and BCL2 “double hit lymphomas” (DHL), IRF4-rearranged large cell lymphoma (IRF4-LCL), and Burkitt-like lymphomas with 11q aberration pattern (mnBLL-11q) attracted interest while their relatedness regarding the major classes is still unclear in many respects. We explored the transcriptional landscape of 873 lymphomas referring to a wide spectrum of subtypes by applying self-organizing maps (SOM) machine learning. The landscape reveals a continuum of transcriptional states activated in the different subtypes without clear-cut borderlines between them and preventing their unambiguous classification. These states show striking parallels with single cell gene expression of the active germinal center (GC), which is characterized by the cyclic progression of B-cells. The expression patterns along the GC trajectory are discriminative for distinguishing different lymphoma subtypes. We show that the rare subtypes take intermediate positions between BL, DLBCL, and FL as considered by the 5th edition of the WHO classification of haemato-lymphoid tumors in 2022. Classifier gene signatures extracted from these states as modules of coregulated genes are competitive with literature classifiers. They provide functional-defined classifiers with the option of consenting redundant classifiers from the literature. We discuss alternative classification schemes of different granularity and functional impact as possible avenues toward personalization and improved diagnostics of GC-derived lymphomas
135

Decoding communication of non-human species - Unsupervised machine learning to infer syntactical and temporal patterns in fruit-bats vocalizations.

Assom, Luigi January 2023 (has links)
Decoding non-human species communication offers a unique chance to explore alternative intelligence forms using machine learning. This master thesis focuses on discreteness and grammar, two of five linguistic areas machine learning can support, and tackles inferring syntax and temporal structures from bioacoustics data annotated with animal behavior. The problem lies in a lack of species-specific linguistic knowledge, time-consuming feature extraction and availability of limited data; additionally, unsupervised clustering struggles to discretize vocalizations continuous to human perception due to unclear parameter tuning to preprocess audio. This thesis investigates unsupervised learning to generalize deciphering syntax and short-range temporal patterns in continuous-type vocalizations, specifically fruit-bats, to address the research questions: How does dimensionality reduction affect unsupervised manifold learning to quantify size and diversity of the animal repertoire? and How do syntax and temporal structure encode contextual information? An experimental strategy is designed to improve effectiveness of unsupervised clustering for quantifying the repertoire and to investigate linguistic properties with classifiers and sequence mining; acoustic segments are collected from a dataset of fruit-bat vocalizations annotated with behavior. The methodology keeps clustering methods constant while varying dimensionality reduction techniques on spectrograms and their latent representations learnt by Autoencoders. Uniform Manifold Approximation and Projection (UMAP) embeds data into a manifold; density-based clusterings are applied to its embeddings and compared with agglomerative-based labels, used as ground-truth proxy to test robustness of models. Vocalizations are encoded into label sequences. Syntactic rules and short-range patterns in sequences are investigated with classifiers (Support Vector Machines, Random Forests); graph-analytics and prefix-suffix trees. Reducing the temporal dimension of Mel-spectrograms outperformed previous clustering baseline (Silhouette score > 0.5, 95% assignment accuracy). UMAP embeddings from sequential autoencoders showed potential advantages over convolutional autoencoders. The study revealed a repertoire between seven and approximately 20 vocal-units characterized by combinatorial patterns: context-classification achieved F1-score > 0.9 also with permuted sequences; repetition characterized vocalizations of isolated pups. Vocal-unit distributions were significantly different (p < 0.05) across contexts; a truncated-power law (alpha < 2) described the distribution of maximal repetitions. This thesis contributed to unsupervised machine learning in bioacoustics for decoding non-human communication, aiding research in language evolution and animal cognition.
136

Novel Architectures for Human Voice and Environmental Sound Recognitionusing Machine Learning Algorithms

Dhakal, Parashar January 2018 (has links)
No description available.
137

Identifying Induced Bias in Machine Learning

Chowdhury Mohammad Rakin Haider (18414885) 22 April 2024 (has links)
<p dir="ltr">The last decade has witnessed an unprecedented rise in the application of machine learning in high-stake automated decision-making systems such as hiring, policing, bail sentencing, medical screening, etc. The long-lasting impact of these intelligent systems on human life has drawn attention to their fairness implications. A majority of subsequent studies targeted the existing historically unfair decision labels in the training data as the primary source of bias and strived toward either removing them from the dataset (de-biasing) or avoiding learning discriminatory patterns from them during training. In this thesis, we show label bias is not a necessary condition for unfair outcomes from a machine learning model. We develop theoretical and empirical evidence showing that biased model outcomes can be introduced by a range of different data properties and components of the machine learning development pipeline.</p><p dir="ltr">In this thesis, we first prove that machine learning models are expected to introduce bias even when the training data doesn’t include label bias. We use the proof-by-construction technique in our formal analysis. We demonstrate that machine learning models, trained to optimize for joint accuracy, introduce bias even when the underlying training data is free from label bias but might include other forms of disparity. We identify two data properties that led to the introduction of bias in machine learning. They are the group-wise disparity in the feature predictivity and the group-wise disparity in the rates of missing values. The experimental results suggest that a wide range of classifiers trained on synthetic or real-world datasets are prone to introducing bias under feature disparity and missing value disparity independently from or in conjunction with the label bias. We further analyze the trade-off between fairness and established techniques to improve the generalization of machine learning models such as adversarial training, increasing model complexity, etc. We report that adversarial training sacrifices fairness to achieve robustness against noisy (typically adversarial) samples. We propose a fair re-weighted adversarial training method to improve the fairness of the adversarially trained models while sacrificing minimal adversarial robustness. Finally, we observe that although increasing model complexity typically improves generalization accuracy, it doesn’t linearly improve the disparities in the prediction rates.</p><p dir="ltr">This thesis unveils a vital limitation of machine learning that has yet to receive significant attention in FairML literature. Conventional FairML literature reduces the ML fairness task to as simple as de-biasing or avoiding learning discriminatory patterns. However, the reality is far away from it. Starting from deciding on which features collect up to algorithmic choices such as optimizing robustness can act as a source of bias in model predictions. It calls for detailed investigations on the fairness implications of machine learning development practices. In addition, identifying sources of bias can facilitate pre-deployment fairness audits of machine learning driven automated decision-making systems.</p>
138

Técnicas de visão computacional aplicadas ao reconhecimento de cenas naturais e locomoção autônoma em robôs agrícolas móveis / Computer vision techniques applied to natural scenes recognition and autonomous locomotion of agricultural mobile robots

Lulio, Luciano Cássio 09 August 2011 (has links)
O emprego de sistemas computacionais na Agricultura de Precisão (AP) fomenta a automação de processos e tarefas aplicadas nesta área, precisamente voltadas à inspeção e análise de culturas agrícolas, e locomoção guiada/autônoma de robôs móveis. Neste contexto, no presente trabalho foi proposta a aplicação de técnicas de visão computacional nas tarefas citadas, desenvolvidas em abordagens distintas, a serem aplicadas em uma plataforma de robô móvel agrícola, em desenvolvimento no NEPAS/EESC/USP. Para o problema de locomoção do robô (primeira abordagem), foi desenvolvida uma arquitetura de aquisição, processamento e análise de imagens com o objetivo de segmentar, classificar e reconhecer padrões de navegação das linhas de plantio, como referências de guiagem do robô móvel, entre plantações de laranja, milho e cana. Na segunda abordagem, tais técnicas de processamento de imagens são aplicadas também na inspeção e localização das culturas laranja (primário) e milho (secundário), para análise de suas características naturais, localização e quantificação. Para as duas abordagens, a estratégia adotada nas etapas de processamento de imagens abrange: filtragem no domínio espacial das imagens adquiridas; pré-processamento nos espaços de cores RGB e HSV; segmentação não supervisionada JSEG customizada à quantização de cores em regiões não homogêneas nestes espaços de cores; normalização e extração de características dos histogramas das imagens pré-processadas para os conjuntos de treinamento e teste através da análise das componentes principais; reconhecimento de padrões e classificação cognitiva e estatística. A metodologia desenvolvida contemplou bases de dados para cada abordagem entre 700 e 900 imagens de cenas naturais sob condições distintas de aquisição, apresentando resultados significativos quanto ao algoritmo de segmentação nas duas abordagens, mas em menor grau em relação à localização de gramíneas, sendo que os milhos requerem outras técnicas de segmentação, que não aplicadas apenas em quantização de regiões não homogêneas. A classificação estatística, Bayes e Bayes Ingênuo, mostrou-se superior à cognitiva RNA e Fuzzy nas duas abordagens, e posterior construção dos mapas de classe no espaço de cores HSV. Neste mesmo espaço de cores, a quantificação e localização de frutos apresentaram melhores resultados que em RGB. Com isso, as cenas naturais nas duas abordagens foram devidamente processadas, de acordo com os materiais e métodos empregados na segmentação, classificação e reconhecimento de padrões, fornecendo características intrínsecas e distintas das técnicas de visão computacional propostas a cada abordagem. / The use of computer systems in Precision Agriculture (PA) promotes the processes automation and its applied tasks, specifically the inspection and analysis of agricultural crops, and guided/autonomous locomotion of mobile robots. In this context, it was proposed in the present work the application of computer vision techniques on such mentioned tasks, developed in different approaches, to be applied in an agricultural mobile robot platform, under development at NEPAS/EESC/USP. For agricultural mobile robot locomotion, an architecture for the acquisition, image processing and analysis was built, in order to segment, classify and recognize patterns of planting rows, as references way points for guiding the mobile robot. In the second approach, such image processing techniques were applied also in the inspection and location of the orange crop (primary) and maize crop (secondary) aiming its natural features, location and quantification. For both mentioned approaches, the adopted image processing steps include: filtering in the spatial domain for acquired images; pre-processing in RGB and HSV color spaces; JSEG unsupervised segmentation algorithm, applied to color quantization in non-homogeneous regions; normalization and histograms feature extraction of preprocessed images for training and test sets, fulfilled by the principal components analysis (PCA); pattern recognition and cognitive and statistical classification. The developed methodology includes sets of 700 and 900 images databases for each approach of natural scenes under different conditions of acquisition, providing great results on the segmentation algorithm, but not as appropriate as in the location of maize grass, considering other segmentation techniques, applied not only in the quantization of non-homogeneous regions. Statistical classification, Bayes and Naive Bayes, outperforms the cognitives Fuzzy and ANN on two approaches and subsequent class maps construction in HSV color space. Quantification and localization of fruits had more accurate results in HSV than RGB. Thus, natural scenes in two approaches were properly processed, according to the materials and methods employed in segmentation, classification and pattern recognition, providing intrinsic and different features of the proposed computer vision techniques to each approach.
139

Metodologia de estimação de idade óssea baseada em características métricas utilizando mineradores de dados e classificador neural / Methodology for bone age estimation based on metric characteristics using data mining and neural classifier

Raymundo, Evandra Maria 29 September 2009 (has links)
Este trabalho apresenta uma proposta de metodologia de estimação de idade óssea baseada em características métricas, utilizando o banco de imagens carpais da Escola de Engenharia de São Carlos (EESC). As imagens foram devidamente segmentadas para obtenção da área, perímetro e comprimento de cada osso, gerando, assim, um banco de dados métricos o CarpEven. As informações da base métrica CarpEven foram submetidas a dois mineradores de dados: ao StARMiner, (Statistical Association Rules) uma metodologia de mineração de dados criada por um grupo de pesquisadores do ICMC-USP, e ao Weka (Waikato Environment for Knowledge Analysis), desenvolvido pela Universidade Waikato da Nova Zelândia. As informações foram submetidas a classificadores neurais, contribuindo, assim, para a criação de uma nova metodologia de estimação de idade óssea. Finalmente, é feita uma comparação entre os resultados obtidos e os resultados já alcançados por outras pesquisas. / This work presents a methodology for bone age estimation based on metric characteristics using the carpal images database from Engineering School of São Carlos (EESC-USP). The images were properly segmented to obtain the area, perimeter and length of each bone, thus generating a metric database named CarpEven. The database information were submitted to two data miners: the StarMiner (Statistical Association Rules Miner) a methodology for data mining created by a group of researchers from ICMC-USP, and the Weka (Waikato Environment for Knowledge Analysis), developed by the University of Waikato in New Zealand. The information was submitted to the neural classifiers contributing to the creation of a new methodology for bone age estimation. The results are compared with those obtained by others research.
140

Planejamento, gerenciamento e análise de dados de microarranjos de DNA para identificação de biomarcadores de diagnóstico e prognóstico de cânceres humanos / Planning, management and analysis of DNA microarray data aiming at discovery of biomarkers for diagnosis and prognosis of human cancers.

Simões, Ana Carolina Quirino 12 May 2009 (has links)
Nesta tese, apresentamos nossas estratégias para desenvolver um ambiente matemático e computacional para análises em larga-escala de dados de expressão gênica obtidos pela tecnologia de microarranjos de DNA. As análises realizadas visaram principalmente à identificação de marcadores moleculares de diagnóstico e prognóstico de cânceres humanos. Apresentamos o resultado de diversas análises implementadas através do ambiente desenvolvido, as quais conduziram a implementação de uma ferramenta computacional para a anotação automática de plataformas de microarranjos de DNA e de outra ferramenta destinada ao rastreamento da análise de dados realizada em ambiente R. Programação eXtrema (eXtreme Programming, XP) foi utilizada como técnica de planejamento e gerenciamento dos projetos de análise dados de expressão gênica. Todos os conjuntos de dados foram obtidos por nossos colaboradores, utilizando-se duas diferentes plataformas de microarranjos de DNA: a primeira enriquecida em regiões não-codificantes do genoma humano, em particular regiões intrônicas, e a segunda representando regiões exônicas de genes humanos. A primeira plataforma foi utilizada para avaliação do perfil de expressão gênica em tumores de próstata e rim humanos, sendo que análises utilizando SAM (Significance Analysis of Microarrays) permitiram a proposição de um conjunto de 49 sequências como potenciais biomarcadores de prognóstico de tumores de próstata. A segunda plataforma foi utilizada para avaliação do perfil de transcritos expressos em sarcomas, carcinomas epidermóide e carcinomas epidermóides de cabeça e pescoço. As análises com sarcomas permitiram a identificação de um conjunto de 12 genes relacionados à agressividade local e metástase. As análises com carcinomas epidermóides de cabeça e pescoço permitiram a identificação de 7 genes relacionados à metástase linfonodal. / In this PhD Thesis, we present our strategies to the development of a mathematical and computational environment aiming the analysis of large-scale microarray datasets. The analyses focused mainly on the identification of molecular markers for diagnosis and prognosis of human cancers. Here we show the results of several analyses implemented using this environment, which led to the development of a computational tool for automatic annotation of DNA microarray platforms and a tool for tracking the analysis within R environment. We also applied eXtreme Programming (XP) as a tool for planning and management of gene expression analyses projects. All data sets were obtained by our collaborators using two different microarray platforms. The first is enriched in non-coding human sequences, particularly intronic sequences. The second one represents exonic regions of human genes. Using the first platform, we evaluated gene expression profiles of prostate and kidney human tumors. Applying SAM to prostate tumor data revealed 49 potential molecular markers for prognosis of this disease. Gene expression in samples of sarcomas, epidermoid carcinomas and head and neck epidermoid carcinomas was investigated using the second platform. A set of 12 genes were identified as potential biomarkers for local aggressiveness and metastasis in sarcoma. In addition, the analyses of data obtained from head and neck epidermoid carcinomas allowed the identification of 7 potential biomarkers for lymph-nodal metastases.

Page generated in 0.0367 seconds