Global ETD Search

1	Um novo algoritmo de clustering para a organização tridimensional de dados de expressão gênica / A new clustering algorithm for tridimensional gene expression data Lopes, Tiago José da Silva 29 March 2007 (has links) Neste trabalho desenvolvemos um novo algoritmo para clustering para dados de expressão gênica. As abordagens tradicionais utilizam um conjunto de dados na forma de uma tabela de duas dimensões, onde as linhas são os genes e as colunas são as condições experimentais. Nós utilizamos uma estrutura de três dimensões, acrescentando fatias de tempo. Implementamos nosso algoritmo e testamos com conjuntos de dados sintéticos e dados reais, usando índices de validação para comparar os resultados obtidos pelo nosso algoritmo com os resultados produzidos pelo algoritmo TriCluster. Os resultados mostraram que o nosso algoritmo é bom para dados de expressão gênica em três dimensões e pode ser aplicado a dados de outros domínios / In this study we developed a new clustring algorithm for gene expression data. Previous solutions use a dataset in the form of a table, where the rows are the genes and the columns are the experimental conditions. We used a three-dimensional structure adding time-slices. We implemented this algorithm and tested it with synthetic and real data, using validation index to compare our results with the results obtained by the TriCluster algotithm. Results show that our solution is good for three dimensional gene expression data and can be employed to other domains Bioinformática Bioinformatics Clustering Clustring Expressão gênica Gene expression Microarray Microarray
2	Um novo algoritmo de clustering para a organização tridimensional de dados de expressão gênica / A new clustering algorithm for tridimensional gene expression data Tiago José da Silva Lopes 29 March 2007 (has links) Neste trabalho desenvolvemos um novo algoritmo para clustering para dados de expressão gênica. As abordagens tradicionais utilizam um conjunto de dados na forma de uma tabela de duas dimensões, onde as linhas são os genes e as colunas são as condições experimentais. Nós utilizamos uma estrutura de três dimensões, acrescentando fatias de tempo. Implementamos nosso algoritmo e testamos com conjuntos de dados sintéticos e dados reais, usando índices de validação para comparar os resultados obtidos pelo nosso algoritmo com os resultados produzidos pelo algoritmo TriCluster. Os resultados mostraram que o nosso algoritmo é bom para dados de expressão gênica em três dimensões e pode ser aplicado a dados de outros domínios / In this study we developed a new clustring algorithm for gene expression data. Previous solutions use a dataset in the form of a table, where the rows are the genes and the columns are the experimental conditions. We used a three-dimensional structure adding time-slices. We implemented this algorithm and tested it with synthetic and real data, using validation index to compare our results with the results obtained by the TriCluster algotithm. Results show that our solution is good for three dimensional gene expression data and can be employed to other domains Bioinformática Clustering Expressão gênica Microarray Bioinformatics Clustring Gene expression Microarray
3	Clustering of Financial Account Time Series Using Self Organizing Maps / Klustring av Finansiella Konton med Kohonen-kartor Nordlinder, Magnus January 2021 (has links) This thesis aims to cluster financial account time series by extracting global features from the time series and by using two different dimensionality reduction methods, Kohonen Self Organizing Maps and principal component analysis, to cluster the set of the time series by using K-means. The results are then used to further cluster a set of financial services provided by a financial institution, to determine if it is possible to find a set of services which coincide with the time series clusters. The results find several sets of services that are prevalent in the different time series clusters. The resulting method can be used to understand the dynamics between deposits variability and the customers usage of different services and to analyse whether a service is more used in different clusters. / Målet med denna uppsats är att klustra tidsserier över finansiella konton genom att extrahera tidsseriernas karakteristik. För detta används två metoder för att reducera tidsseriernas dimensionalitet, Kohonen Self Organizing Maps och principal komponent analys. Resultatet används sedan för att klustra finansiella tjänster som en kund använder, med syfte att analysera om det existerar ett urval av tjänster som är mer eller mindre förekommande bland olika tidsseriekluster. Resultatet kan användas för att analysera dynamiken mellan kontobehållning och kundens finansiella tjänster, samt om en tjänst är mer förekommande i ett tidsseriekluster. Kohonen financial accounts self organizing maps clustring time series Kohonen finansiella konton klustring tidsserier Mathematics Matematik
4	Text Curation for Clustering of Free-text Survey Responses / Textbehandling för klustring av fritextsresponer i enkäter Gefvert, Anton January 2023 (has links) When issuing surveys, having the option for free-text answer fields is only feasible where the number of respondents is small, as the work to summarize the answers becomes unmanageable with a large number of responses. Using NLP techniques to cluster these answers and summarize them would allow a greater range of survey creators to incorporate free-text answers in their survey, without making their workload too large. Academic work in this domain is sparse, especially for smaller languages such as Swedish. The Swedish company iMatrics is regularly hired to do this kind of summarizing, specifically for workplace-related surveys. Their method of clustering has been semiautomatic, where both manual preprocessing and postprocessing have been necessary to accomplish this task. This thesis aims to explore if using more advanced, unsupervised NLP text representation methods, namely SentenceBERT and Sent2Vec, can improve upon these results and reduce the manual work needed for this task. Specifically, three questions are to be answered. Firstly, do the methods show good results? Secondly, can they remove the time-consuming postprocessing step of combining a large number of clusters into a smaller number? Lastly, can a model where unsupervised learning metrics can be shown to correlate to the real-world usability of the model, thus indicating that these metrics can be used to optimize the model for new data? To answer these questions, several models are trained, employed, and then compared using both internal and external metrics: Sent2Vec, SentenceBERT, and traditional baseline models. A manual evaluation procedure is performed to assess the real-world usability of the clusterings looks like, to see how well the models perform as well as to see if there is any correlation between this result and the internal metrics for the clustering. The results indicate that improving the text representation step is not sufficient for fully automating this task. Some of the models show promise in the results of human evaluation, but given the unsupervised nature of the problem and the large variance between models, it is difficult to predict the performance of new data. Thus, the models can serve as an improvement to the workflow, but the need for manual work remains. Natural Language Processing NLP Sentence Representations Sentence Representation Models Survey Surveys Clustring Computer Sciences Datavetenskap (datalogi) Other Computer and Information Science Annan data- och informationsvetenskap

1

Page generated in 0.0494 seconds