Global ETD Search

151	Big data management for periodic wireless sensor networks / Gestion de données volumineuses dans les réseaux de capteurs périodiques Medlej, Maguy 30 June 2014 (has links) Les recherches présentées dans ce mémoire s’inscrivent dans le cadre des réseaux decapteurs périodiques. Elles portent sur l’étude et la mise en oeuvre d’algorithmes et de protocolesdistribués dédiés à la gestion de données volumineuses, en particulier : la collecte, l’agrégation etla fouille de données. L’approche de la collecte de données permet à chaque noeud d’adapter sontaux d’échantillonnage à l’évolution dynamique de l’environnement. Par ce modèle le suréchantillonnageest réduit et par conséquent la quantité d’énergie consommée. Elle est basée surl’étude de la dépendance de la variance de mesures captées pendant une même période voirpendant plusieurs périodes différentes. Ensuite, pour sauvegarder plus de l’énergie, un modèled’adpatation de vitesse de collecte de données est étudié. Ce modèle est basé sur les courbes debézier en tenant compte des exigences des applications. Dans un second lieu, nous étudions unetechnique pour la réduction de la taille de données massive qui est l’agrégation de données. Lebut est d’identifier tous les noeuds voisins qui génèrent des séries de données similaires. Cetteméthode est basée sur les fonctions de similarité entre les ensembles de mesures et un modèle defiltrage par fréquence. La troisième partie est consacrée à la fouille de données. Nous proposonsune adaptation de l’approche k-means clustering pour classifier les données en clusters similaires,d’une manière à l’appliquer juste sur les préfixes des séries de mesures au lieu de l’appliquer auxséries complètes. Enfin, toutes les approches proposées ont fait l’objet d’études de performancesapprofondies au travers de simulation (OMNeT++) et comparées aux approches existantes dans lalittérature. / This thesis proposes novel big data management techniques for periodic sensor networksembracing the limitations imposed by wsn and the nature of sensor data. First, we proposed anadaptive sampling approach for periodic data collection allowing each sensor node to adapt itssampling rates to the physical changing dynamics. It is based on the dependence of conditionalvariance of measurements over time. Then, we propose a multiple level activity model that usesbehavioral functions modeled by modified Bezier curves to define application classes and allowfor sampling adaptive rate. Moving forward, we shift gears to address the periodic dataaggregation on the level of sensor node data. For this purpose, we introduced two tree-based bilevelperiodic data aggregation techniques for periodic sensor networks. The first one look on aperiodic basis at each data measured at the first tier then, clean it periodically while conservingthe number of occurrences of each measure captured. Secondly, data aggregation is performedbetween groups of nodes on the level of the aggregator while preserving the quality of theinformation. We proposed a new data aggregation approach aiming to identify near duplicatenodes that generate similar sets of collected data in periodic applications. We suggested the prefixfiltering approach to optimize the computation of similarity values and we defined a new filteringtechnique based on the quality of information to overcome the data latency challenge. Last butnot least, we propose a new data mining method depending on the existing K-means clusteringalgorithm to mine the aggregated data and overcome the high computational cost. We developeda new multilevel optimized version of « k-means » based on prefix filtering technique. At the end,all the proposed approaches for data management in periodic sensor networks are validatedthrough simulation results based on real data generated by periodic wireless sensor network. Réseaux de capteurs périodiques, Collecte adaptative de données Agrégation de données Filtrage par préfixe Fonction de similarité Fouille de données K-Means Periodic sensor networks, Adaptive sampling approach Bezier Curve Treebased data aggregation Similar sets Prefix frequency filtering Data mining K-Means 004.6
152	Algoritmy pro shlukování textových dat / Text data clustering algorithms Sedláček, Josef January 2011 (has links) The thesis deals with text mining. It describes the theory of text document clustering as well as algorithms used for clustering. This theory serves as a basis for developing an application for clustering text data. The application is developed in Java programming language and contains three methods used for clustering. The user can choose which method will be used for clustering the collection of documents. The implemented methods are K medoids, BiSec K medoids, and SOM (self-organization maps). The application also includes a validation set, which was specially created for the diploma thesis and it is used for testing the algorithms. Finally, the algorithms are compared according to obtained results.
153	Shluková analýza signálu EKG / ECG Cluster Analysis Pospíšil, David January 2013 (has links) This diploma thesis deals with the use of some methods of cluster analysis on the ECG signal in order to sort QRS complexes according to their morphology to normal and abnormal. It is used agglomerative hierarchical clustering and non-hierarchical method K – Means for which an application in Mathworks MATLAB programming equipment was developed. The first part deals with the theory of the ECG signal and cluster analysis, and then the second is the design, implementation and evaluation of the results of the usage of developed software on the ECG signal for the automatic division of QRS complexes into clusters.
154	Numerické metody pro klasifikaci metagenomických dat / Numerical methods for classification of metagenomic data Vaněčková, Tereza January 2016 (has links) This thesis deals with metagenomics and numerical methods for classification of metagenomic data. Review of alignment-free methods based on nucleotide word frequency is provided as they appear to be effective for processing of metagenomic sequence reads produced by next-generation sequencing technologies. To evaluate these methods, selected features based on k-mer analysis were tested on simulated dataset of metagenomic sequence reads. Then the data in original data space were enrolled for hierarchical clustering and PCA processed data were clustered by K-means algorithm. Analysis was performed for different lengths of nucleotide words and evaluated in terms of classification accuracy.
155	Analýza 3D CT obrazových dat se zaměřením na detekci a klasifikaci specifických struktur tkání / Analysis of 3D CT image data aimed at detection and classification of specific tissue structures Šalplachta, Jakub January 2017 (has links) This thesis deals with the segmentation and classification of paraspinal muscle and subcutaneous adipose tissue in 3D CT image data in order to use them subsequently as internal calibration phantoms to measure bone mineral density of a vertebrae. Chosen methods were tested and afterwards evaluated in terms of correctness of the classification and total functionality for subsequent BMD value calculation. Algorithms were tested in programming environment Matlab® on created patient database which contains lumbar spines of twelve patients. Following sections of this thesis contain theoretical research of the issue of measuring bone mineral density, segmentation and classification methods and description of practical part of this work.
156	Predicting Quality of Experience from Performance Indicators : Modelling aggregated user survey responses based on telecommunications networks performance indicators / Estimering av användarupplevelse från prestanda indikatorer Vestergaard, Christian January 2022 (has links) As user experience can be a competitive edge, it lies in the interest of businesses to be aware of how users perceive the services they provide. For telecommunications operators, how network performance influences user experience is critical. To attain this knowledge, one can survey users. However, sometimes users are not available or willing to answer. For this reason, there exists an interest in estimating the quality of user experience without having to ask users directly. Previous research has studied how the relationship between network performance and the quality of experience can be modelled over time through a fixed window classification approach. This work aims to extend this research by investigating the applicability of a regression approach without the fixed window limitation by the application of an Long Short Term Memmory based Machine Learning model. Aggregation of both network elements and user feedback through the application of three different clustering techniques was used to overcome challenges in user feedback sparsity. The performance while using each clustering technique was evaluated. It was found that all three methods can outperform a baseline based on the weekly average of the user feedback. The effect of applying different levels of detrending was also examined. It was shown that detrending the time series based on a smaller superset may increase overall performance but hinder relative model improvement, indicating that some helpful information may be lost in this process. The results should inspire future works to consider a regression approach for modelling Quality of Experience as a function of network performance as an avenue worthy of further study. This work should also motivate further research into the generalizability of models trained on network elements that reside in areas of different urban and rural conditions. / Användarupplevelsen kan utgöra en konkurrensfördel och således ligger det i marknadsaktörernas intressen att vara medvetna om hur användarna upplever det tjänster de erbjuder. Före telekommunikationsoperatörer är det kritiskt at vare varse om hur nätverkets prestanda influerar användarnas upplevelse. För att förskaffa sig den informationen kan operatörer välja att fråga användarna direkt. Detta kan dock vara svårt då användare kanske inte finns tillgängliga för eller inte är villiga att besvara operatörens frågor. Med detta som utgångspunkt finns det därför ett intresse för att estimera kundernas upplevelse utan att direkt fråga dem. Tidigare studier har undersökt möjligheten att genom klassificeringsmetoder som tillämpats på avgränsade tidsfönster modellera förhållandet mellan nätverksprestanda och kundupplevelse. Detta arbete syftar till att utvidga forskningsområdet genom att studera tillämparbarheten av att använda regressionsmetoder utan begränsningen av ett avgränsat tidsfönster. Detta ska göras genom att tillämpa en Long Short Term Memmory baserad maskininlärningsmodell. Genom att aggregera både nätverkselement och användarfeedback i en process som nyttjat tre olika klustringstekniker har utmaningar med glesfördelad feedback från användare hanterats. Resultaten av att använda vardera klustringsteknik har utvärderats. Från utvärderingen fans att alla tre metoder presterar bättre än ett jämförelsemått bestående av ett veckovis genomsnitt av användarnas återkoppling. Effekten av att applicera olika nivåer av aggregering för att ta bort trender i data. Resultaten visar att modellerna presenterat bättre då den övermängd som används för att ta bort trenden i en given delmängd då skillnaden mellan dessa är mindre. Dock försämrades den relative förbättringen hos modellerna då skillnaden mellan delmängd och övermängd minskade. Detta tror indikera att nyttig information i sammanhanget går förlorad i processen av att ta bort trenden i datamängden. De uppnådda resultaten bör inspirera framtida studier till att ha regressionsmodeller i åtanke när användarupplevelsen skall modelleras som en funktion av närverkets prestanda. Detta arbete borde även motivera vidare forskning kring huruvida modeller som tränats på nätverkselement belägna i urbana eller lantliga områden generaliserar till nätverks element i andra områden. Quality of Experience Telecommunication Regression Long Short Term Memmory Clustering K-means Gaussian Mixture Models Användarupplevelse Telekommunikation Regression Long Short Term Memmory Klusteranalys K-means Gaussian Mixture Models Computer and Information Sciences Data- och informationsvetenskap
157	Analyse de trajectoires, perte d'autonomie et facteurs prédictifs : Modélisation de trajectoires / Trajectory analysis, loss of independence and predictive factors : Trajectory modeling Bimou, Charlotte 09 October 2019 (has links) La poursuite du rythme d’augmentation de l’espérance de vie des générations issue du baby-boom dans les pays développés serait souvent accompagnée de limitations fonctionnelles, d’incapacité, de plus en plus observées dans la population gériatrique. L'objectif général de cette thèse était de contribuer à la connaissance de l’évolution de l’autonomie fonctionnelle des personnes âgées dans une population hétérogène. Il s’agissait dans un premier temps d'identifier des groupes homogènes dans une population hétérogène de personnes âgées suivant la même trajectoire d'autonomie fonctionnelle sur une période de deux ans, ainsi que des facteurs prédictifs potentiels. Dans un second temps, d’analyser les conséquences cliniques des trajectoires et la survie des patients sur la même période d’observation. Le SMAF (Système de Mesure de l’Autonomie Fonctionnelle) et les échelles ADL (Activities of Daily Living) ont été employés comme indicateurs d’évaluation de l’autonomie. Dans ce contexte, des données de 221 patients issues de la cohorte UPSAV (Unité de Prévention, de Suivi et d’Analyse du Vieillissement) ont été exploitées. Nous avons employé trois méthodes d’analyse de trajectoires dont le GBTM (Group-Based Trajectory Modeling), k-means et classification ascendante hiérarchique. Les résultats ont révélé trois trajectoires distinctes d’autonomie fonctionnelle : stable, stable pendant un temps puis détériorée, continuellement altérée. Les facteurs prédictifs des trajectoires obtenus à l’aide de la régression logistique sont des critères socio-démographiques, médicaux et biologiques. Les personnes âgées affectées à la trajectoire de perte d’autonomie (trajectoire continuellement altérée) ont montré de fortes proportions de chutes dommageables. A partir d’un modèle de Cox, les troubles neurocognitifs, l’insuffisance cardiaque, la perte de poids involontaire et l’alcool ont été révélés comme facteurs prédictifs de la survenue du décès. On conclut de ces travaux que l’analyse longitudinale sur deux ans de suivi a permis de trouver des sous-groupes homogènes de personnes âgées en termes d’évolution de l’indépendance fonctionnelle. Quel que soit le niveau d’autonomie, la prévention de l’UPSAV devient utile même si le niveau d’utilité n’est pas le même. La prévention et le dépistage de la perte d’autonomie de la personne âgée suivie sur son lieu de vie doivent être anticipés afin de retarder la dégradation et maintenir l’autonomie à domicile. Des analyses ultérieures devraient s’intéresser à l’exploration de plus larges cohortes de personnes âgées pour confirmer et généraliser notre travail. / The increase in life expectancy of baby boom generations in developed countries would often be accompanied by functional limitations, disability, increasingly observed in the geriatric population. The general objective of this thesis was to contribute to the knowledge of the evolution of the functional independence of older people in a heterogeneous population. First, it was to identify homogeneous groups in a heterogeneous population of elderly people following the same functional independence trajectory over a two-year period, and potential predictive factors. Second, it was to analyze the clinical consequences of trajectories and patient survival over the same observation period. The SMAF (Système de Mesure de l'Autonomie Fonctionnelle) and ADL (Activities of Daily Living) scales were used as indicators for measuring independence. Analysis were performed from a sample of 221 patients of UPSAV (Unit for Prevention, Monitoring and Analysis of Aging) cohort. We used three methods including trajectory analysis including GBTM (Group-Based Trajectory Modeling), k-means and ascending hierarchical classification. The results suggest three distinct trajectories of functional independence: stable, stable then decline, continuously decline. The predictors of trajectories obtained using logistic regression are socio-demographic, medical and biological criteria. Patients assigned to the loss of independence trajectory (continuously altered trajectory) reported high proportions of injurious falls. Based on a Cox model, neurocognitive disorders, heart failure, involuntary weight loss and alcohol were revealed as predictors of death. We conclude from this work that the two-year longitudinal analysis identified homogeneous subgroups of elderly people in terms of changes in functional independence. The prevention of UPSAV becomes a useful even if the utility level is not the same. Prevention and screening of the loss of independence of the elderly person followed at home must be anticipated in order to delay the deterioration and to maintaining the autonomy. Future analyses should focus on exploring large cohorts of older people to confirm and generalize our research. Autonomie Déclin fonctionnel Prévention Trajectoire Personnes âgées GBTM K-means Classification Ascendante Hiérarchique Régression logistique Modèle de Cox Independence Functional decline Prevention Trajectory Older adults GBTM K-means Ascending Hierarchical Classification Logistic regression Cox model 618.97 305.26
158	Optimisation of warehouse for second-hand items using Machine Learning / Optimering av lager för second-hand varor med hjälp av Maskininlärning Osnes, Simon January 2023 (has links) Warehouse management and organisation often use the popularity of items to assign placements in the warehouse and predict sales. However, when dealing with only unique second-hand items another solution is needed. This master's thesis therefore aims to identify influential features for customers buying items together and use this information to optimise Sellpy's warehouse management. Through data analyses, three features were identified as most influential: brand, demography, and type. These features were used to create a K-Means clustering model to group items in the warehouse, and the resulting model was evaluated against a demography baseline and random. Additionally, a second K-Means model was trained using the selected features and the age of items to differentiate between fast-moving and slow-moving items. The results of the analyses showed that the demography baseline performed the best when picking only one order at a time, while the K-Means models performed equally well when picking multiple orders simultaneously. Furthermore, organising items in the warehouse based on the K-Means clustering algorithm could significantly improve efficiency by reducing walking distances for warehouse workers compared to the random approach used today. In conclusion, this thesis highlights the importance of data analysis and clustering in optimising warehouse management for Sellpy. The identified influential features and K-Means clustering models provide a solid foundation for enhancing Sellpy's warehouse management. / Lagerhantering och organisation använder ofta varors popularitet för placering i lagret och för att uppskatta försäljning. Men när det handlar om unika second-hand varor måste en annan lösning användas. Därför kommer denna masteruppsats att ha som mål att identifiera inflytelserika egenskaper för när kunder köper flera saker samtidigt och använda detta för att optimera Sellpys lagerhantering. Genom dataanalyser identifierades tre egenskaper som mest de inflytelserika: demografi, märke och typ av vara. Dessa egenskaper användes för att träna en K-Means klustringsmodell för att gruppera sakerna i lagret, och den framtagna modellen utvärderades mot en förutbestämd gruppering som endast använde demografi samt mot slumpmässigt gruppering. Utöver det så tränades ytterligare en K-Means modell med de inflytelserika egenskaper samt åldern på varorna för att särskilja saker som säljs snabbt och saker som säljs långsamt. Resultatet av analyserna visade att den förutbestämda demografigrupperingen presterade bäst när endast en order skulle plockas, medan K-Means modellerna presterade lika bra när flera ordrar skulle plockas samtidigt i en plocklista. Att organisera lagret baserat på K-Means klustringsalgoritmen skulle kunna förbättra effektiviteten avsevärt genom att minska gångavstånden för lagerarbetare jämfört med det slumpmässiga tillvägagångssättet som används idag. Sammanfattningsvis så har denna uppsats visat vikten av dataanalys och klustring för optimering av lagerhantering för Sellpy. De identifierade inflytelserika egenskaperna och K-Means klustringsmodellen ger en bra grund för att förbättra Sellpys lagerhantering. Second-hand Machine Learning Data Analysis Feature Selection Clustering K-Means Second-hand Maskininlärning Dataanalys Urval av Attributer Klustring K-Means Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik Computer and Information Sciences Data- och informationsvetenskap
159	Customer segmentation of retail chain customers using cluster analysis / Kundsegmentering av detaljhandelskunder med klusteranalys Bergström, Sebastian January 2019 (has links) In this thesis, cluster analysis was applied to data comprising of customer spending habits at a retail chain in order to perform customer segmentation. The method used was a two-step cluster procedure in which the first step consisted of feature engineering, a square root transformation of the data in order to handle big spenders in the data set and finally principal component analysis in order to reduce the dimensionality of the data set. This was done to reduce the effects of high dimensionality. The second step consisted of applying clustering algorithms to the transformed data. The methods used were K-means clustering, Gaussian mixture models in the MCLUST family, t-distributed mixture models in the tEIGEN family and non-negative matrix factorization (NMF). For the NMF clustering a slightly different data pre-processing step was taken, specifically no PCA was performed. Clustering partitions were compared on the basis of the Silhouette index, Davies-Bouldin index and subject matter knowledge, which revealed that K-means clustering with K = 3 produces the most reasonable clusters. This algorithm was able to separate the customer into different segments depending on how many purchases they made overall and in these clusters some minor differences in spending habits are also evident. In other words there is some support for the claim that the customer segments have some variation in their spending habits. / I denna uppsats har klusteranalys tillämpats på data bestående av kunders konsumtionsvanor hos en detaljhandelskedja för att utföra kundsegmentering. Metoden som använts bestod av en två-stegs klusterprocedur där det första steget bestod av att skapa variabler, tillämpa en kvadratrotstransformation av datan för att hantera kunder som spenderar långt mer än genomsnittet och slutligen principalkomponentanalys för att reducera datans dimension. Detta gjordes för att mildra effekterna av att använda en högdimensionell datamängd. Det andra steget bestod av att tillämpa klusteralgoritmer på den transformerade datan. Metoderna som användes var K-means klustring, gaussiska blandningsmodeller i MCLUST-familjen, t-fördelade blandningsmodeller från tEIGEN-familjen och icke-negativ matrisfaktorisering (NMF). För klustring med NMF användes förbehandling av datan, mer specifikt genomfördes ingen PCA. Klusterpartitioner jämfördes baserat på silhuettvärden, Davies-Bouldin-indexet och ämneskunskap, som avslöjade att K-means klustring med K=3 producerar de rimligaste resultaten. Denna algoritm lyckades separera kunderna i olika segment beroende på hur många köp de gjort överlag och i dessa segment finns vissa skillnader i konsumtionsvanor. Med andra ord finns visst stöd för påståendet att kundsegmenten har en del variation i sina konsumtionsvanor. Cluster analysis customer segmentation tEIGEN MCLUST K-means NMF Silhouette Davies-Bouldin big spenders statistics applied mathematics unsupervised learning Klusteranalys kundsegmentering tEIGEN MCLUST K-means NMF Silhouette Davies-Bouldin storkonsumenter statistik tillämpad matematik Probability Theory and Statistics Sannolikhetsteori och statistik
160	[en] METHODOLOGY FOR EVALUATING THE CONTINUITY OF THE DISTRIBUTION SERVICE IN LOCATIONS WITH ACCESS RESTRICTIONS DUE TO RECORDS OF VIOLENCE / [pt] METODOLOGIA PARA AVALIAÇÃO DA CONTINUIDADE DO SERVIÇO DE DISTRIBUIÇÃO EM LOCAIS COM RESTRIÇÃO DE ACESSO POR REGISTROS DE VIOLÊNCIA THAIS ROUPE BORGES 30 October 2023 (has links) [pt] Os segmentos de geração, transmissão e distribuição constituem a cadeia produtiva do setor elétrico, sendo o consumidor ou carga o último elo que deve ser atendido pelas distribuidoras. A percepção de qualidade, e consequentemente a satisfação do cliente, está intrinsecamente relacionada, entre outros fatores, à continuidade do fornecimento assegurada pelas concessionárias. No Brasil, a Agência Nacional de Energia Elétrica (ANEEL) é responsável por regular o setor de distribuição e estabelecer indicadores de referência com o objetivo de avaliar a eficiência das concessionárias em termos de confiabilidade e qualidade do serviço prestado. Diversos fatores podem impactar a continuidade da distribuição de energia, sendo alguns mais conhecidos e gerenciáveis pelas empresas, como quedas de objetos na rede ou sobrecarga de equipamentos. No entanto, outros fatores, como restrições de acesso a determinadas áreas devido à violência e ao controle territorial por grupos criminosos, apresentam desafios complexos e de gerenciabilidade inexistente por parte das distribuidoras. Essas limitações dificultam a pronta recomposição do serviço em situações emergenciais, resultando em tempos de falha mais longos e afetando negativamente os indicadores de continuidade monitorados pela ANEEL, bem como a satisfação do consumidor. Neste contexto, a presente dissertação propõe uma metodologia focada em identificar os ativos da distribuidora localizados em áreas com evidências de violência, o que implica em acesso limitado pelas equipes de campo. É utilizada a base de dados geográfica da distribuidora (BDGD) para identificar as unidades transformadoras em áreas com evidências de violência, também delineadas por plataformas de dados públicos. Técnicas de clusterização e testes estatísticos são então utilizados para aferir se os índices de continuidade nessas áreas são significativamente diferentes e superiores aos de locais em que não se observa registros de violência. Sistemas de distribuição dos estados do Rio de Janeiro e Pernambuco são utilizados para testar a eficácia da metodologia proposta. Diversos testes são realizados e os resultados obtidos são plenamente discutidos. / [en] The segments of generation, transmission and distribution constitute the production chain of the electricity sector, with the consumer or load being the last link that must be served by the distributors. The perception of quality, and consequently customer satisfaction, is intrinsically related, among other factors, to the continuity of supply ensured by the concessionaires. In Brazil, the National Electric Energy Agency (ANEEL) is responsible for regulating the distribution sector and establishing benchmarks in order to assess the efficiency of concessionaires in terms of reliability and quality of service provided. Several factors can impact the continuity of energy distribution, some of which are better known and manageable by companies, such as falling objects on the network or overloading equipment. However, other factors, such as access restrictions to certain areas due to violence and territorial control by criminal groups, present complex challenges and non-existent manageability on the part of the distributors. These limitations make it difficult to promptly restore the service in emergency situations, resulting in longer failure durations and negatively affecting the continuity indicators monitored by ANEEL, as well as consumer satisfaction. In this context, this dissertation proposes a methodology focused on identifying the distributor s assets located in areas with evidence of violence, which implies limited access by field service teams. The distribution company s geographic database (BDGD) is used to identify transforming units in areas with evidence of violence, also delineated by public data platforms. Clustering techniques and statistical tests are then used to assess whether the continuity indices in these areas are significantly different and higher than those in places where there are no records of violence. Distribution systems in the states of Rio de Janeiro and Pernambuco are used to test the effectiveness of the proposed methodology. Several tests are carried out and the results obtained are fully discussed. [pt] AREAS DE RISCO [pt] METODO K-MEANS [en] RISK AREAS [en] GEOGRAPHIC DATABASE [en] SERVICE CONTINUITY INDEX [en] K-MEANS METHOD [en] DISTRIBUTION SYSTEM RELIABILITY

Search results