Global ETD Search

51	DATA MINING IN PRACTICE : An application of the CRISP-DM framework in healthcare Lind, Emma, Glas, Sofi January 2022 (has links) With extensive data available in today's organizations, it has become increasingly important to secure valuable insights through data. As a result, the management of data to support decision-making processes is receiving increasing attention in organizations' IT strategies. The healthcare sector is no exception. However, there is an urgent need for tools that help organizations extract valuable insights from the rapidly growing volumes of data, one of the most important steps of which is data mining. So far, the healthcare sector has not found a way to harness its full potential, due to limited methods to extract useful knowledge hidden in large data sets. Knowledge gained from data mining can help healthcare to better serve patients, but there is a limited comprehensive picture of applications regarding data mining processes in healthcare. Against this background, the purpose of this study is to investigate practical dimensions of the data mining process in healthcare and further identify barriers that can inhibit this process. To answer our research question, we used a qualitative case study with semi structured interviews based on the CRISP-DM framework. Our findings indicate barriers that can inhibit the data mining process, which are related to the objectives, data availability and final reports. Data governance Data mining methodology The CRISP-DM Framework Data mining in healthcare Human Aspects of ICT Mänsklig interaktion med IKT
52	Robustness of Machine Learning algorithms applied to gas turbines / Robusthet av maskininlärningsalgoritmer i gasturbiner Cardenas Meza, Andres Felipe January 2024 (has links) This thesis demonstrates the successful development of a software sensor for Siemens Energy's SGT-700 gas turbines using machine learning algorithms. Our goal was to enhance the robustness of measurements and redundancies, enabling early detection of sensor or turbine malfunctions and contributing to predictive maintenance methodologies. The research is based on a real-world case study, implementing the Cross Industry Standard Process for Data Mining (CRISP DM) methodology in an industrial setting. The thesis details the process from dataset preparation and data exploration to algorithm development and evaluation, providing a comprehensive view of the development process. This work is a step towards integrating machine learning into gas turbine systems. The data preparation process highlights the challenges that arise in the industrial application of data-driven methodologies due to inevitable data quality issues. It provides insight into potential future improvements, such as the constraint programming approach used for dataset construction in this thesis, which remains a valuable tool for future research. The range of algorithms proposed for the software sensor's development spans from basic to more complex methods, including shallow networks, ensemble methods and recurrent neural networks. Our findings explore the limitations and potential of the proposed algorithms, providing valuable insights into the practical application of machine learning in gas turbines. This includes assessing the reliability of these solutions, their role in monitoring machine health over time, and the importance of clean, usable data in driving accurate and satisfactory estimates of different variables in gas turbines. The research underscores that, while replacing a physical sensor with a software sensor is not yet feasible, integrating these solutions into gas turbine systems for health monitoring is indeed possible. This work lays the groundwork for future advancements and discoveries in the field. / Denna avhandling dokumenterar den framgångsrika utvecklingen av en mjukvarusensor för Siemens Energy's SGT-700 gasturbiner med hjälp av maskininlärningsalgoritmer. Vårt mål var att öka mätkvaliten samt införa redundans, vilket möjliggör tidig upptäckt av sensor- eller turbinfel och bidrar till utvecklingen av prediktiv underhållsmetodik. Forskningen baseras på en verklig fallstudie, implementerad enligt Cross Industry Standard Process for Data Mining-metodiken i en industriell miljö. Avhandligen beskriver processen från datamängdsförberedelse och datautforskning till utveckling och utvärdering av algoritmer, vilket ger en heltäckande bild av utvecklingsprocessen. Detta arbete är ett steg mot att integrera maskininlärning i gasturbinssystem. Dataförberedelsesprocessen belyser de utmaningar som uppstår vid industriell tillämpning av datadrivna metoder på grund av oundvikliga datakvalitetsproblem. Det ger insikt i potentiella framtida förbättringar, såsom den begränsningsprogrammeringsansats som används för datamängdskonstruktion i denna avhandling, vilket förblir ett värdefullt verktyg för framtida forskning. Utvecklingen av mjukvarusensorn sträcker sig från grundläggande till mer komplexa metoder, inklusive ytliga nätverk, ensemblemetoder och återkommande neurala nätverk. Våra resultat utforskar begränsningarna och potentialen hos de föreslagna algoritmerna och ger värdefulla insikter i den praktiska tillämpningen av maskininlärning i gasturbiner. Detta inkluderar att bedöma tillförlitligheten hos dessa lösningar, deras roll i övervakning av maskinhälsa över tid och vikten av ren, användbar data för att generera korrekta och tillfredsställande uppskattningar av olika variabler i gasturbiner. Forskningen understryker att, medan det ännu inte är genomförbart att ersätta en fysisk sensor med en mjukvarusensor, är det verkligen möjligt att integrera dessa lösningar i gasturbinssystem för tillståndsövervakning. Detta arbete lägger grunden för vidare studier och upptäckter inom området. / Esta tesis demuestra el exitoso desarrollo de un sensor basado en software para las turbinas de gas SGT-700 de Siemens Energy utilizando algoritmos de aprendizaje automático. Esto con el objetivo de contribuir a las metodologías de mantenimiento predictivo. La investigación se basa en un estudio industrial que implementa la metodología de Proceso Estándar de la Industria para la Minería de Datos, cuyo acrónimo en inglés CRISP-DM. La tesis detalla el proceso desde la preparación del 'dataset', la exploración de datos hasta el desarrollo y evaluación de algoritmos, proporcionando una visión holistica del proceso de desarrollo. Este trabajo representa un paso hacia la integración del aprendizaje automático en turbinas de gas. Nuestros hallazgos exploran las limitaciones y el potencial de los algoritmos propuestos, proporcionando un analisis sobre la aplicación práctica del aprendizaje automático en turbinas de gas. Esto incluye evaluar la confiabilidad de estas soluciones, su papel en la monitorización de la salud de la máquina a lo largo del tiempo, y la importancia de los datos limpios y utilizables para impulsar estimaciones precisas y satisfactorias de diferentes variables en las turbinas de gas. La investigación sugiere que, aunque reemplazar un sensor físico con un sensor basado en aprendizaje automatico aún no es factible, sí es posible integrar estas soluciones en los sistemas de turbinas de gas para monitorear del estado de la maquina. Gas turbines machine learning deep learning predictive maintenance software sensor data quality Reliability Ensemble Methods Gasturbiner maskininlärning djupinlärning prediktivt underhåll mjukvarusensor datakvalitet Pålitlighet Ensemble metoder Turbinas de gas aprendizaje automático aprendizaje profundo mantenimiento predictivo sensor basado en software calidad de datos CRISP DM Fiabilidad Métodos de conjunto Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
53	Praktické uplatnění technologií data mining ve zdravotních pojišťovnách / Practical applications of data mining technologies in health insurance companies Kulhavý, Lukáš January 2010 (has links) This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.
54	Purification and characterisation of Tex31, a conotoxin precursor processing protease, isolated from the venom duct of Conus textile Milne, Trudy Jane January 2008 (has links) The venom of cone snails (predatory marine molluscs of the genus Conus) has yielded a rich source of novel neuroactive peptides or “conotoxins”. Conotoxins are bioactive peptides found in the venom duct of Conus spp. Like other neuropeptides, conotoxins are expressed as propeptides that undergo posttranslational proteolytic processing. Peptides derived from propeptides are typically cleaved at a pair of dibasic residues (Lys-Arg, Arg-Arg, Lys-Lys or Arg-Lys) by proteases found in secretory vesicles. However, many precursor peptides contain multiple sets of basic residues, suggesting that highly substrate specific or differentially expressed proteases can determine processing outcomes. As many of the substrate-specific proteases remain unidentified, predicting new bioactive peptides from cDNA sequences is presently difficult, if not impossible. In order to understand more about the substrate specificity of conotoxin substrate-specific proteases a characterisation study of one such endoprotease isolated from the venom duct of Conus textile was undertaken. The C. textile mollusc was chosen as a good source from which to isolate the endoprotease for two reasons; firstly, these cone shells are found in great abundance on the Great Barrier Reef (Queensland, Australia) and are readily obtainable and secondly, a number of conotoxin precursors and their cleavage products have been previously identified in the venom duct. In order to purify the endoprotease an activity-guided fractionation protocol that included a para-nitroanilide (p-NA) substrate assay was developed. The p-NA substrate mimicked the cleavage site of the conotoxin TxVIA, a member of the C. textile O-superfamily of toxins. The protocol included a number of chromatographic techniques including ion exchange, size-exclusion and reverse-phased HPLC and resulted in isolation of an active protease, termed Tex31, to >95% purity. The purification of microgram quantities of Tex31 made it possible to characterise the proteolytic nature of Tex31 and to further characterise the O-superfamily conopeptide propeptide cleavage site specificity. Specificity experiments showed Tex31 requires a minimum of four residues including a leucine in the P4 position (LNKR↓) for efficient substrate processing. The complete sequence of Tex31 was determined from cDNA. A BLAST search revealed Tex31 to have high amino acid sequence similarity to the CAP (abbreviated from CRISP (Cysteine-rich secretory protein), Antigen 5 and PR-1 (pathogenesis-related protein)) superfamily and most closely related to the CRISP family of mammalian and venom proteins that, like Tex31, have a cysteine-rich C-terminal domain. The CAP superfamily is widely distributed in the animal, plant and fungal kingdoms, and is implicated in processes as diverse as human brain tumour growth and plant pathogenesis. This is the first report of a biological role for the N-terminal domain of CAP proteins. A homology model of Tex31 constructed from two PR-1 proteins, Antigen 5 and P14a, revealed the highly conserved and likely catalytic residues, His78, Ser99 and Glu115. These three amino acids fall within a structurally conserved N-terminal domain found in all CAP proteins. It is possible that other CAP proteins are also substrate-specific proteases. With no homology to any known proteases, Tex31 may belong to a new class of protease. The sequence alignment of five Tex31-like proteins cloned from C. marmoreus, C. litteratus, C. arentus, C. planboris, and C. omaria show very high sequence similarity to Tex31 (~80%), but only one weakly conserved serine residue was identified when the conserved residues of the new Tex31-like protein sequences were aligned with members of the CAP superfamily. Future work to identify members of catalytic diad or triad, e.g. by site-directed mutagenesis, will rely on the expression of active recombinant Tex31. In this study neither Escherichia coli nor Pichia pastoris expression systems yielded active recombinant Tex31 protein, possibly due to the number of cysteine residues hindering the expression of correctly folded active Tex31. This study has shown Tex31 to be highly sequence specific in its cleavage site and it is likely that this high substrate specificity has confounded previous attempts to identify the proteolytic nature of other CAP proteins. With the proteolytic nature of one member of the CAP protein family confirmed, it is hoped this important discovery may lead the way to discovering the role of other CAP family members.
55	Building a decision support system to predict the number of visitors to an amusement park : Using an Artificial Neural Network and Statistical Analysis Johansson, Benjamin, Almqvist, Elias January 2018 (has links) In this thesis, we develop a decision support system for the amusement park Skara Sommarland. The aim is to predict how many visitors will come to the park in order to help the management allocate the correct amount of personnel on any given day. In order to achieve this, the widely used CRoss-Industry Standard for Data Mining framework was applied to finally build a multiple linear regression (MLR) function and an artificial neural network (ANN) model. The data used to develop the models were Skara Sommarland’s historical business data and historical weather data for the surrounding area. Additionally, a fully functional web application was built which allowed the management at Skara Sommarland to use the predictions in their daily operations. The ANN outperformed the MLR and managed to achieve about 80% accuracy in predicting the number of visitors, reaching the initial data mining goal set by the project group. The conclusion formed by this thesis is that an ANN can be used to predict the number of visitors to an amusement park similar to Skara Sommarland. The final IT artifact produced can realistically help improve an amusement park’s operations by avoiding over- and under-staffing. Artificial Neural Network ANN Multiple Linear Regression MLR Decision Support System CRISP-DM Visitor Prediction Amusement Park Information Systems, Social aspects
56	Uma aplicação de mineração de dados ao programa bolsa escola da prefeitura da cidade do Recife Tabosa Florencio Filho, Roberto 31 January 2009 (has links) Made available in DSpace on 2014-06-12T15:58:15Z (GMT). No. of bitstreams: 2 arquivo3328_1.pdf: 1621200 bytes, checksum: d29e5bc60f1421ccb8a8ca95694cb6d6 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2009 / Faculdade dos Guararapes / A tarefa de Mineração de Dados envolve um conjunto de técnicas de estatística e inteligência artificial com objetivo de descobrir informações não encontradas por ferramentas usualmente utilizadas para extração e armazenamento de dados em grandes bases de dados. A aplicação da Mineração de Dados pode ser realizada em qualquer área de conhecimento (Ciências Exatas, Humanas, Sociais, Biológica, Saúde, Agrária e outras) proporcionando ganhos de informações e conhecimentos, ora desconhecidos, em qualquer uma delas. Este trabalho apresenta uma aplicação de mineração de dados ao programa Bolsa Escola da Prefeitura da Cidade do Recife (PCR), particularmente na investigação da situação das famílias beneficiadas, com o objetivo de oferecer à administração municipal uma ferramenta de suporte à decisão capaz de aprimorar o processo de concessão de benefícios. Foi analisada uma massa de dados sócio-econômicos inicialmente de cerca de 60 mil famílias cadastradas no programa. Foi utilizada uma rede neural artificial MultiLayer Perceptron (MLP) para classificar as famílias beneficiadas com base nas suas características sócio-econômicas. A avaliação de desempenho e resultados obtidos, além da resposta da especialista no domínio de aplicação, demonstram a viabilidade dessa aplicação no processo de concessão do benefício ao Programa Bolsa Escola da Prefeitura da Cidade do Recife Mineração de dados Redes neurais Programa bolsa escola municipal (PBEM)
57	Smart info: sistema inteligente para extração de informação de comentários em lojas de aplicativos móveis MOREIRA, Átila Valgueiro Malta 23 February 2016 (has links) Submitted by Natalia de Souza Gonçalves (natalia.goncalves@ufpe.br) on 2016-09-28T12:13:59Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação Átila Valgueiro Malta Moreira.pdf: 1329930 bytes, checksum: 6f5ad643b747ebf5a53091b1afaccd17 (MD5) / Made available in DSpace on 2016-09-28T12:13:59Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação Átila Valgueiro Malta Moreira.pdf: 1329930 bytes, checksum: 6f5ad643b747ebf5a53091b1afaccd17 (MD5) Previous issue date: 2016-02-23 / CAPES / O SMART INFO é um sistema de descoberta de conhecimento em avaliações feitas por usuários de jogos móveis em lojas virtuais, tais como Google Play e iTunes, visando a detecção automática de falhas que possam prejudicar a vida útil do jogo, assim como o levantamento de sugestões feitas pelos usuários. Este sistema tem vital importância para o novo paradigma de desenvolvimento, onde jogos deixam de ser tratados como produtos e passam a ser tratados como serviços, passando a respeitar o ciclo ARM, que consiste em três pontos: Aquisição, Retenção e Monetização. Para tanto foi utilizada Descoberta de Conhecimento em Texto (DCT) por meio de uma adaptação do CRISP-DM, juntamente com o processo de DCT. / SMART INFO is a knowledge discovery system that uses reviews made by mobile game users on virtual stores, such as Google Play and iTunes, with the goals of automatically detecting flaws, which might harm the game's lifespan, and obtaining suggestions made by users. This system is of vital importance for the new paradigm of development, where games stop being treated as products and start being treated as services, needing to respect the ARM cycle, which consists of three main aspects: Acquisition, Retention and Monetization. To achieve this, Knowledge Discovery in Text (KDT) was used through an adaptation of the CRISP-DM, together with the DCT process Retenção Monetização Jogo como serviço CRISP-DM Descoberta de conhecimento Inteligência artificial Processamento de linguagem natural Mineração de dados Mineração de opiniões Retention Monetization Game as a service Knowledge discovery
58	Využití technik Data Mining v různých odvětvích / Using Data Mining in Various Industries Fabian, Jaroslav January 2014 (has links) This master’s thesis concerns about the use of data mining techniques in banking, insurance and shopping centres industries. The thesis theoretically describes algorithms and methodology CRISP-DM dedicated to data mining processes. With usage of theoretical knowledge and methods, the thesis suggests possible solution for various industries within business intelligence processes.
59	Short-Term electricity consumption prediction: Elområde 4, Sweden Kothapalli, Anil Kumar January 2021 (has links) This Thesis work is part of course work for the Masters Program in Data Science at LTU. The focus of this work is mainly to review the literature published to identify state-of-art methodologies applied to predict short-term electricity consumption. This includes the exploration of features and models as well-as the discussion of the results attained. Identify opportunities to improve the forecast results for southern Elområde(bidding area)4, Sweden. The application of different modern methods to forecast electricity consumption has been studied and experimented with. This work adapted the CRISP-DM, a Data Science methodology. Time series prediction Time series forecasting CRISP-DM Data Science Machine Learning Sequence Model SMHI SCB ENTSO-E Computer Sciences Datavetenskap (datalogi)
60	Estudo de fissão e espalação em núcleos actinídeos e pré-actinídeos a energias intermediárias / Study of fission and spallation of pre-actinide and actinide nuclei at intermediate energies. Lorenzo, Carlos David Gonzales 21 May 2015 (has links) Neste trabalho apresentamos um estudo das reações de spallation a energias interme- diárias em núcleos actinídeos e pré-actinídeos. Para esta finalidade foi utilizado o modelo de Monte Carlo CRISP (Colaboração Rio-São Paulo), que neste estudo foi importante na reprodução da distribuição de massa de produtos residuais e as seções de choque de fissão e espalação. Estes observáveis são importantes para o estudo de Reatores Hibridos ADS considerado como dispositivos promissores para a transmutação de resíduos nucle- ares. Os modelos físicos necessários para uma correta simulação de dados experimentais foram já implementadas no CRISP, como o modelo de evaporação para emissão de par- tículas descrito por Weisskopf de 1937, e para fissão o clássico modelo de Bohr/Wheeler de 1939. Para a obtenção da distribuição dos fragmentos de massa de fissão o CRISP conta também com um modelo baseado na parametrização multimodal de fissão, que si- mula os processos de fissão simétrica e assimétrica predominantes em altas e baixas ener- gias, respectivamente. Os resultados obtidos do CRISP depois da aplicação dos modelos mencionados, foram os rendimentos de massa dos fragmentos residuais, os quais foram analisadas para o cálculo da seção de choque de fissão e espalação mediante uma fórmula implementada no modelo. Com o resultado se fez o gráfico da distribuição de massa para cada uma das reações analisadas. Uma das reações estudadas foi a reação induzida por fótons de Bremsstrahlung com energias máximas de 50 e 3500 MeV em um alvo de 181 Ta, calculando a distribuição de massa de fissão e espalação, mostrando bons resultados de acordo com os dados experimentais. Nas reações induzidas por prótons foram calcula- das as seções de choque de fissão e espalação assim como sua respectiva distribuição de massa dos produtos residuais. Neste caso estudamos duas reações, sendo: a reação p (1 GeV) + 208 Pb, e a reação de p (660 MeV) + 238 U. Para a primeira reação com chumbo os resultados do CRISP foram comparados com dados experimentais, e também com os resultados obtidos do modelo MCNPX-Bertini do trabalho de Baylac-Domengetroy de 2003, que simulou a mesma reação com chumbo. Obtendo-se melhores resultados com o CRISP mas com uma superestimação de dados no final da distribuição calculada. No caso do urânio, foi necessário usar a chamada fissão superassimétrica porque a distribuição de massa experimental é mais complexa e o modelo multimodal clássico não é suficiente para sua correta simulação. Foi também estudado as reações induzidas por dêuterons usando o modelo CRISP, mostrando os resultados da distribuição de massa para 197 Au e 208 Pb com algumas limitações do modelo para este tipo de reações. / In this work we present a study of the spallation reactions by intermediate energies in actinide and pre-actinide nuclei. For this purpose we used the Monte Carlo model CRISP (Rio-São Paulo Collaboration), for our study was important in the reproduction of the mass distribution of waste products and the total fission and spallation cross secti- ons. These observables are important for the study of Accelerator Driven System Reac- tors (ADS) considered as promising devices for the transmutation of nuclear waste. The physical models needed for a correct simulation of experimental data were already imple- mented in CRISP, such as the evaporation model for emission of particles described by Weisskopf in 1937, and the classical Bohr/Wheeler model in 1939, for fission. To obtain the fragment mass distribution for fission, CRISP has a model based on multimodal fis- sion parameter, which simulates the processes called symmetric and asymmetric fission that are predominant at high and low energies respectively. The CRISP results, obtai- ned after the application of the above mentioned models, were the mass yield of residual fragments, which were analyzed to calculate the fission and spallation cross section using a formula that was implemented in the CRISP model. With these result was obtained the mass distribution for each reaction analyzed. One of the reactions studied was a re- action induced by Bremsstrahlung photons with endpoint energies of 50 MeV and 3500 in a target 181 Ta, calculating the fission and spallation mass distribution, showing good results according the experimental data. In the reactions induced by protons were cal- culated fission and spallation cross sections as well as their respective mass distribution of the residual products. In this case we study two reactions, as follows: p (1 GeV) + 208 , and p (660 MeV) + 238 U. For the first reaction with lead, the results of CRISP were compared with experimental data and with results obtained of MCNPX-Bertini model of Baylac-Domengetroy work in 2003, simulated the same reaction with lead. Obtaining better results with CRISP, but with data-overestimated at the end of calculated distribu- tion. For uranium it was necessary to use the called superasymmetric fission, because the experimental mass distribution is more complicated and the classical model is not suffi- cient for a correct simulation. Has been also studied the reactions induced by deuterons using the CRISP model, showing the mass distribution Actinídeos Actinides Assimétrica Asymmetric Brems- strahlung Bremsstrah- lung CRISP CRISP Distribuição de Massa Espalação Evapora- ção Evaporation Fotofissão Mass Distribution MCNPX-Bertini. MCNPX-Bertini. Monte Carlo Monte Carlo Multimodal Multimodal Photofission Pré-actinídeos Pre-actinides Simétrica Spallation Spallation Super-Asymmetric Superassimétrica Symmetric

Search results