431 |
O uso das informações de big data na gestão de crise de marca / The use of big data information in the brand crisis managemantSalvador, Alexandre Borba 06 August 2015 (has links)
As crises de marca não só experimentam um crescimento em quantidade como também passam a ter sua visibilidade aumentada pelas redes sociais. A repercussão de uma crise de imagem de marca afeta negativamente tanto o brand equity como as vendas no curto prazo. Mais do que isso, gera custosas campanhas para minimização dos efeitos negativos. Se por um lado o avanço tecnológico aumenta a visibilidade da crise, por outro, possibilita acesso a uma série de informações, internas e externas, que podem ajudar na definição de um plano de ação. Big Data é um termo recentemente criado para designar o crescimento das informações, grandes em volume, diversificadas em formato e recebidas em alta velocidade. No ambiente de marketing, o sistema de informação de marketing (SIM) tem por objetivo fornecer as informações ao tomador de decisão de marketing. Informação relevante, confiável e disponibilizada em um curto espaço de tempo é fundamental para que as decisões sejam tomadas rapidamente, garantindo a liderança do processo de gestão de crise. A partir da pergunta \"qual o uso das informações provenientes do big data na gestão de crise de marca?\" e com o objetivo de \"verificar como gestores fazem uso das informações provenientes de big data na gestão de crise\", elaborou-se este estudo exploratório, empírico, qualitativo e com uso de entrevistas em profundidade com executivos de marketing com experiência em gestão de crise de marca. As entrevistas com seis gestores com experiência em crise e dois especialistas possibilitaram verificar uma grande diferença no uso das informações de big data na gestão de crises de marca, nas diferentes etapas da crise identificadas no referencial teórico: identificação e prevenção, gestão da crise, recuperação e melhorias e aprendizados. / The brand crises not only experience a growth in quantity but also now have increased their visibility through social networks. The impact of a brand crisis negatively affects the brand equity and short-term sales. It also generates costly campaigns to minimize its negative effect. If technological advancement increases the visibility of the crisis, it also provides access to a wealth of internal and external information that can help define an action plan. The term Big Data refers to the growth of information volume, diversification of formats and production and reception in real time, so that a traditional processing system could not store and analyze them. In the marketing environment, marketing information system (MIS) aims to provide information to the marketing decision maker. Relevance, reliability and availability of information is critical for the decision process. It also could ensures the leadership of the crisis management process. From the question \"what is the use of information from big data in brand crisis management?\" and in order to \"verify how managers make use of information from big data in crisis management\", this study exploratory empirical, qualitative, using interviews with marketing executives with experience in brand crisis management was elaborated. Interviews with six managers with experience in crisis and two experts made it possible to verify a big difference in the use of big data information on brand management of crises in the different stages of the crisis identified in the theoretical framework: identification and prevention, crisis management, recovery and improvement and learning.
|
432 |
Developing a data quality scorecard that measures data quality in a data warehouseGrillo, Aderibigbe January 2018 (has links)
The main purpose of this thesis is to develop a data quality scorecard (DQS) that aligns the data quality needs of the Data warehouse stakeholder group with selected data quality dimensions. To comprehend the research domain, a general and systematic literature review (SLR) was carried out, after which the research scope was established. Using Design Science Research (DSR) as the methodology to structure the research, three iterations were carried out to achieve the research aim highlighted in this thesis. In the first iteration, as DSR was used as a paradigm, the artefact was build from the results of the general and systematic literature review conduct. A data quality scorecard (DQS) was conceptualised. The result of the SLR and the recommendations for designing an effective scorecard provided the input for the development of the DQS. Using a System Usability Scale (SUS), to validate the usability of the DQS, the results of the first iteration suggest that the DW stakeholders found the DQS useful. The second iteration was conducted to further evaluate the DQS through a run through in the FMCG domain and then conducting a semi-structured interview. The thematic analysis of the semi-structured interviews demonstrated that the stakeholder's participants' found the DQS to be transparent; an additional reporting tool; Integrates; easy to use; consistent; and increases confidence in the data. However, the timeliness data dimension was found to be redundant, necessitating a modification to the DQS. The third iteration was conducted with similar steps as the second iteration but with the modified DQS in the oil and gas domain. The results from the third iteration suggest that DQS is a useful tool that is easy to use on a daily basis. The research contributes to theory by demonstrating a novel approach to DQS design This was achieved by ensuring the design of the DQS aligns with the data quality concern areas of the DW stakeholders and the data quality dimensions. Further, this research lay a good foundation for the future by establishing a DQS model that can be used as a base for further development.
|
433 |
Performance assessment of Apache Spark applicationsAL Jorani, Salam January 2019 (has links)
This thesis addresses the challenges of large software and data-intensive systems. We will discuss a Big Data software that consists of quite a bit of Linux configuration, some Scala coding and a set of frameworks that work together to achieve the smooth performance of the system. Moreover, the thesis focuses on the Apache Spark framework and the challenging of measuring the lazy evaluation of the transformation operations of Spark. Investigating the challenges are essential for the performance engineers to increase their ability to study how the system behaves and take decisions in early design iteration. Thus, we made some experiments and measurements to achieve this goal. In addition to that, and after analyzing the result we could create a formula that will be useful for the engineers to predict the performance of the system in production.
|
434 |
Os dados como base à criação de um método de planejamento de propaganda / Data as basis for developing an advertising planning methodLima, Carlos Eduardo de 14 March 2018 (has links)
Submitted by CARLOS EDUARDO DE LIMA (kadulima.planner@gmail.com) on 2018-05-07T14:07:05Z
No. of bitstreams: 1
CarlosEduardoDeLima - PequisaPPGMIT.pdf: 3805666 bytes, checksum: 230489a041e8c83f67320e15714cf8ad (MD5) / Rejected by Lucilene Cordeiro da Silva Messias null (lubiblio@bauru.unesp.br), reason: Solicitamos que realize uma nova submissão seguindo as orientações abaixo:
1 - Preencher e inserir a ficha catalográfica no arquivo em pdf, pois é um ítem obrigatório.
1 - Inserir a ata de defesa no arquivo em pdf, pois é um ítem obrigatório.
Agradecemos a compreensão. on 2018-05-07T18:51:39Z (GMT) / Submitted by CARLOS EDUARDO DE LIMA (kadulima.planner@gmail.com) on 2018-05-07T19:40:12Z
No. of bitstreams: 1
CarlosEduardoDeLima - PequisaPPGMIT.pdf: 4248684 bytes, checksum: 9dc7a3260da510a2abea5c583764f524 (MD5) / Approved for entry into archive by Lucilene Cordeiro da Silva Messias null (lubiblio@bauru.unesp.br) on 2018-05-08T13:07:34Z (GMT) No. of bitstreams: 1
lima_ce_me_bauru.pdf: 4122614 bytes, checksum: 0db356c3911bdb32092653e2195a5519 (MD5) / Made available in DSpace on 2018-05-08T13:07:34Z (GMT). No. of bitstreams: 1
lima_ce_me_bauru.pdf: 4122614 bytes, checksum: 0db356c3911bdb32092653e2195a5519 (MD5)
Previous issue date: 2018-03-14 / O presente estudo visa identificar as inúmeras transformações que o planejamento de propaganda tem enfrentado desde o advento da Internet e das tecnologias de comunicação e informação baseadas em big data, Machine Learning, cluster e outras ferramentas de inteligência de dados. Dessa forma, buscou-se fazer um levantamento histórico e documental sobre os modelos de planejamento de propaganda e briefs criativos. Percebeu-se fundamental traçar uma breve documentação histórica sobre a concepção da disciplina de planejamento para o planejador e a forma como esse processo foi desenvolvido no Brasil, assim como sua evolução. Fez-se necessário também definir conceitos sobre big data e inovação, buscando identificar como afetam a estrutura e as metodologias até então usadas pelo planejamento. Com isso, objetivou-se poder entender como o planejador está sendo levado a desenvolver novas competências que abordam diferentes disciplinas, além das que já eram aplicadas no processo de investigação e criação do planejamento. Dessa forma, foram utilizadas metodologias de pesquisa de campo com entrevistas em profundidade com heads e diretores de planejamento de agências de comunicação e players reconhecidos por sua competência e experiência no planejamento de propaganda. Sendo assim, esta pesquisa apresenta uma proposta de um método de planejamento que, por meio de ferramentas baseadas em softwares e aplicativos, permita que o profissional de planejamento possa gerar ideias inovadoras e propor uma nova cultura de pensamento à agência. / This study aims to spot the countless transformations that the advertising planning has been passing through since the appearance of the Internet, as well as communication and information technologies based upon big data, Machine Learning , cluster and othe r data intelligence mechanisms. Along these lines, it was undertaken to assemble historical and documental facts about advertising planning and creative briefs guidelines. It was noticed the importance to picture a brief historical documentation about the conception of the planning subject for planners and the way this process was developed in Brazil, as well as its evolution. It was also necessary to define concepts about big data and innovation, in order to find how they impact the structure and methodolo gies used by the advertising planning until then. Thereby, the goal is to understand how the planner is being compelled to develop new skills which approach different matters, beyond the ones that were already used in the process of inquiring and creating in advertising planning. Thus, field research methodologies were applied with in - depth interviews with heads and directors of planning at communication agencies and market players whom are renowned for their competence and experience in advertising plannin g. Therefore, this essay proposes a planning approach which, utilizing tools based upon softwares and appliances, enables planners to develop disrupting ideas and come up with new mindsets to agencies.
|
435 |
Réutilisation de données hospitalières pour la recherche d'effets indésirables liés à la prise d'un médicament ou à la pose d'un dispositif médical implantable / Reuse of hospital data to seek adverse events related to drug administration or the placement of an implantable medical deviceFicheur, Grégoire 11 June 2015 (has links)
Introduction : les effets indésirables associés à un traitement médicamenteux ou à la pose d'un dispositif médical implantable doivent être recherchés systématiquement après le début de leur commercialisation. Les études réalisées pendant cette phase sont des études observationnelles qui peuvent s'envisager à partir des bases de données hospitalières. L'objectif de ce travail est d'étudier l'intérêt de la ré-utilisation de données hospitalières pour la mise en évidence de tels effets indésirables.Matériel et méthodes : deux bases de données hospitalières sont ré-utilisées pour les années 2007 à 2013 : une première contenant 171 000 000 de séjours hospitaliers incluant les codes diagnostiques, les codes d'actes et des données démographiques, ces données étant chaînées selon un identifiant unique de patient ; une seconde issue d'un centre hospitalier contenant les mêmes types d'informations pour 80 000 séjours ainsi que les résultats de biologie médicale, les administrations médicamenteuses et les courriers hospitaliers pour chacun des séjours. Quatre études sont conduites sur ces données afin d'identifier d'une part des évènements indésirables médicamenteux et d'autre part des évènements indésirables faisant suite à la pose d'un dispositif médical implantable.Résultats : la première étude démontre l'aptitude d'un jeu de règles de détection à identifier automatiquement les effets indésirables à type d'hyperkaliémie. Une deuxième étude décrit la variation d'un paramètre de biologie médicale associée à la présence d'un motif séquentiel fréquent composé d'administrations de médicaments et de résultats de biologie médicale. Un troisième travail a permis la construction d'un outil web permettant d'explorer à la volée les motifs de réhospitalisation des patients ayant eu une pose de dispositif médical implantable. Une quatrième et dernière étude a permis l'estimation du risque thrombotique et hémorragique faisant suite à la pose d'une prothèse totale de hanche.Conclusion : la ré-utilisation de données hospitalières dans une perspective pharmacoépidémiologique permet l'identification d'effets indésirables associés à une administration de médicament ou à la pose d'un dispositif médical implantable. L'intérêt de ces données réside dans la puissance statistique qu'elles apportent ainsi que dans la multiplicité des types de recherches d'association qu'elles permettent. / Introduction:The adverse events associated with drug administration or placement of an implantable medical device should be sought systematically after the beginning of the commercialisation. Studies conducted in this phase are observational studies that can be performed from hospital databases. The objective of this work is to study the interest of the re-use of hospital data for the identification of such an adverse event.Materials and methods:Two hospital databases have been re-used between the years 2007 to 2013: the first contains 171 million inpatient stays including diagnostic codes, procedures and demographic data. This data is linked with a single patient identifier; the second database contains the same kinds of information for 80,000 stays and also the laboratory results and drug administrations for each inpatient stay. Four studies were conducted on these pieces of data to identify adverse drug events and adverse events following the placement of an implantable medical device.Results:The first study demonstrates the ability of a set of detection of rules to automatically identify adverse drug events with hyperkalaemia. The second study describes the variation of a laboratory results associated with the presence of a frequent sequential pattern composed of drug administrations and laboratory results. The third piece of work enables the user to build a web tool exploring on the fly the reasons for rehospitalisation of patients with an implantable medical device. The fourth and final study estimates the thrombotic and bleeding risks following a total hip replacement.Conclusion:The re-use of hospital data in a pharmacoepidemiological perspective allows the identification of adverse events associated with drug administration or placement of an implantable medical device. The value of this data is the amount statistical power they bring as well as the types of associations they allow to analyse.
|
436 |
Internet, big data e discurso de ódio: reflexões sobre as dinâmicas de interação no Twitter e os novos ambientes de debate políticoCappi, Juliano 23 November 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-12-06T11:01:38Z
No. of bitstreams: 1
Juliano Cappi.pdf: 4589720 bytes, checksum: bdb2ed397326a6feabccff377ea6f51c (MD5) / Made available in DSpace on 2017-12-06T11:01:38Z (GMT). No. of bitstreams: 1
Juliano Cappi.pdf: 4589720 bytes, checksum: bdb2ed397326a6feabccff377ea6f51c (MD5)
Previous issue date: 2017-11-23 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The present research analyzes the relations between, on the one hand, the interaction
dynamics that have consolidated in the social networks and, on the other, the of hate
speech online increase within these environments. The objectives of the research are,
firstly, to investigate the unfolding of the increasingly widespread use of social network
environments in political debates through the lens of cultural diversity, and to investigate
possible patterns of dissemination of hate speech in the new debate sphere that emerges
in these environments. The violence manifest in social networks has presented contours of
racial prejudice, misogyny, homophobia and totalitarianism, often surpassing the limits
of cyberspace. The analysis of the debates that took place on Twitter through the posts
around the conviction of former President Luis Inacio Lula da Silva showed that the
phenomenon of the filter bubbles was identified and followed the patterns already identified
in international research. The analysis of the postings suggests that the construction
of mutual identifications between groups of users ends up authorizing the systematic
discourse of disrespect for dignity from characteristics that identify Lula as a whore, as a
drunkard, a hobo and a thief. The theoretical framework uses the notion of communicational
environment proposed by Baitello to support the assumption that the construction of
identity and therefore the notion of alterity is increasingly related to the environment
developed by the Internet applications present in our daily life. If the environment is a
construction associated with subjectivity, an atmosphere generated by the availability
of subjects - people and things - by their intentionality of establishing bonds, then the
environments of interaction of cyberspace contribute to the structuring of the bonds so
important for the construction of identity. We also used in the research the concept of
filter bubbles in the terms of the work of Eli Pariser. The author proposes that the new
digital Internet browsing environments are bubbles of familiarity, structured by systems
of collection, analysis, classification and distribution of information using algorithms, in
which users are inserted. Pariser disputes the widely accepted belief that the Internet
environment is conducive to the contact with the diversity of expressions. The research
was able to extend this approach by proposing that the bubbles are often manifested by
ideological approximation. Finally, Eugenio Trivinho’s concept of cybercultural dromocracy
is based on the violent condition in which recognition of alterity in modern society takes
place. The methodological framework that guided the research is centered in the social
networks analysis (SNA), based on the works of Raquel Recuero / A presente pesquisa analisa as relações entre, de um lado, as dinâmicas de interação que
se consolidaram nas redes sociais digitais e, de outro, o avanço do discurso de ódio nesses
ambientes. Os objetivos da pesquisa são, em primeiro lugar investigar os desdobamentos
do uso cada vez mais disseminado dos ambiente das redes sociais digitais nos debates
políticos pela lente da diversidade cultural e investigar possíveis padrões de disseminação
do discurso de ódio na nova esfera de debates que emerge nesses ambientes. A violência
que se manifesta nas redes sociais digitais tem apresentado contornos de preconceito
racial, misoginia, homofobia e totalitarismo, muitas vezes ultrapassando os limites do
ciberespaço. A analise dos debates que tiveram lugar no Twitter através das postagem
em torno da condenação do ex-presidente Luis Inácio Lula da Silva mostraram que o
fenômemo da formação de bolhas ideológicas que se manifesta e tem acompanhado os
padrões já identificados em pesquisas internacionais. A análise das postagens sugerem que a
construção de identificações mútuas entre grupos de usuários acaba por autorizar o discurso
sistemático de desrespeito a dignidade a partir de características que identificam Lula como
uma puta, como bêbado, vagabundo e ladrão. O referencial teórico emprega a noção de
ambiente comunicacional proposta por Baitello para fundamentar o pressuposto de que a
construção da identidade e portanto da noção de alteridade está relacionada cada vez mais
intimamente com o ambiente disponibilizado pelas aplicações Internet presentes no nosso
dia a dia. Se o ambiente é uma construção associada a subjetividade, uma atmosfera gerada
pela disponibilidade dos sujeitos – pessoas e coisas – por sua intencionalidade de estabelecer
vínculos, então os ambientes de interação do ciberespaço tem contribuição na estruturação
dos vínculos tão importantes para a construção da identidade. Igualmente foi utilizado na
pesquisa o conceito de filtros invisíveis da Internet nos termos do trabalho de Eli Pariser.
O autor propõe que os novos ambientes digitais de navegação na Internet são bolhas de
familiaridade, estruturadas por sistemas de coleta, análise, classificação e distribuição de
informações com o uso de algoritmos, nas quais os usuários se encontram inseridos. Pariser
contesta a crença amplamente aceita de que o ambiente da Internet propicia o contato com
a diversidade de expressões. A pesquisa pôde ampliar a essa abordagem ao propor que
as bolhas se manifestam muitas vezes por aproximação ideológica. Por fim, o conceito de
dromocracia cibercultural de Eugênio Trivinho fundamenta a condição violenta na qual se
processa o reconhecimento da alteridade na sociedade moderna. O referencial metodológico
que orientou a investigação está centrado na análise de redes sociais (ARS), a partir dos
trabalhos de Raquel Recuero
|
437 |
A importância dos 2 Vs – Velocidade e Variedade – do Big Data em situações de busca da internet: um estudo envolvendo alunos do ensino superiorKadow, André Luis Dal Santo 06 December 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-12-18T12:06:38Z
No. of bitstreams: 1
André Luis Dal Santo Kadow.pdf: 4292044 bytes, checksum: b5a5a6e86704ac783dc0e1274c897b50 (MD5) / Made available in DSpace on 2017-12-18T12:06:38Z (GMT). No. of bitstreams: 1
André Luis Dal Santo Kadow.pdf: 4292044 bytes, checksum: b5a5a6e86704ac783dc0e1274c897b50 (MD5)
Previous issue date: 2017-12-06 / The amount of digital data, structured or not, generated daily is enormous and it is possible to observe within society how the companies that hold this information end up using the mass information of the past to try to predict or even facilitate future behavior on the part of users. It is the concept of Big Data working on increasingly diverse fronts, driven by algorithms, Internet of Things (IoT) among others. However, having a large amount of data alone is not as representative as understanding and exploiting its use within the processes involved. Authors such as Turing, Searle and Andersen have already foreseen the importance and impacts of this myriad of data for a long time. The analysis of these different points of view is a starting point for the understanding of how autonomous systems that learn through data inputs captured or given spontaneously by the users were the essence to reach the central idea of this thesis, which is to analyze and understand the influence of two Big Data Vs - Speed and Variety - of Big Data within the digital life of young college students, using a class from a specific faculty for the case study / A quantidade de dados digitais, estruturados ou não, gerada diariamente é enorme e é possível observar dentro da sociedade como as empresas que detêm essas informações acabam usando as informações em massa do passado para tentar prever ou, até mesmo, facilitar comportamentos futuros por parte dos usuários. É o conceito do Big Data trabalhando em frentes cada vez mais diversificadas, impulsionado por algoritmos, Internet das Coisas (IoT) entre outros. Contudo, possuir uma grande quantidade de dados por si só não tem tanta representatividade quanto entender e explorar a sua utilização dentro dos processos envolvidos. Autores como Turing, Searle e Andersen já anteviram a importância e os impactos desta miríade de dados há tempos. A análise desses diferentes pontos de vista se faz como um local de partida para o entendimento da atuação de como sistemas autônomos que aprendem através de entradas de dados capturados ou cedidos espontaneamente por parte dos usuários foram a essência para chegar à ideia central desta tese, que é analisar e entender a influência de dois Vs do Big Data – Velocidade e Variedade – do Big Data dentro da vida digital dos jovens universitários, usando uma classe de uma faculdade específica para o estudo de caso
|
438 |
Utilização da estatística e Big Data na Copa do Mundo FIFA 2014 / Use of statistics and Big Data at the 2014 FIFA World CupBenetti, Felipe Nogueira 12 December 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2018-01-19T10:48:06Z
No. of bitstreams: 1
Felipe Nogueira Benetti.pdf: 858687 bytes, checksum: 4987e158a0496fbf988ca88a363a474b (MD5) / Made available in DSpace on 2018-01-19T10:48:06Z (GMT). No. of bitstreams: 1
Felipe Nogueira Benetti.pdf: 858687 bytes, checksum: 4987e158a0496fbf988ca88a363a474b (MD5)
Previous issue date: 2017-12-12 / The objective of this study was to show the importance of statistical analysis and Big Data for the development of sport, especially soccer and the results obtained by the German team (specifically, the 2014 FIFA World Cup, in Brazil). The work covered the emergence of statistics and the types of analyses most used to obtain results with Big Data, passing through their definition and contributions to the daily lives of the population and companies that have access to the internet and smartphones. It was also was mentioned which sports modalities use the data volume processing with statistical analysis as a contribution to improve training and games. Finally, it was discussed the importance of the use of Big Data gave the German soccer team in conquering the World Cup in Brazil, what motives moved this investment and what results were obtained with this partnership. All the work was developed according to the standardization of the Brazilian Association of Technical Standards (ABNT, in portuguese) / O objetivo de estudo desta pesquisa foi mostrar a importância das análises estatísticas e do Big Data para o desenvolvimento do esporte, principalmente do futebol e os resultados obtidos pela seleção alemã (especificamente, a conquista da Copa do Mundo FIFA, em 2014). O trabalho abordou o surgimento da estatística e os tipos de análises mais utilizadas para a obtenção de resultados com Big Data, passando por sua definição e contribuições para o cotidiano da população e das empresas que possuem acesso à internet e a smartphones. Também foi mencionado quais modalidades esportivas utilizam o processamento de volume de dados com análises estatísticas como contribuição para melhorar treinos e partidas. Por fim, foi discutida a importância do uso do Big Data deu a seleção alemã de futebol na conquista da Copa do Mundo no Brasil, quais motivos moveram este investimento e quais resultados foram obtidos com essa parceria. Todo o trabalho foi desenvolvido de acordo com a normatização da Associação Brasileira de Normas Técnicas (ABNT)
|
439 |
Opportunities and challenges of Big Data Analytics in healthcare : An exploratory study on the adoption of big data analytics in the Management of Sickle Cell Anaemia.Saenyi, Betty January 2018 (has links)
Background: With increasing technological advancements, healthcare providers are adopting electronic health records (EHRs) and new health information technology systems. Consequently, data from these systems is accumulating at a faster rate creating a need for more robust ways of capturing, storing and processing the data. Big data analytics is used in extracting insight form such large amounts of medical data and is increasingly becoming a valuable practice for healthcare organisations. Could these strategies be applied in disease management? Especially in rare conditions like Sickle Cell Disease (SCD)? The study answers the following research questions;1. What Data Management practices are used in Sickle Cell Anaemia management?2. What areas in the management of sickle cell anaemia could benefit from use of big data Analytics?3. What are the challenges of applying big data analytics in the management of sickle cell anaemia?Purpose: The purpose of this research was to serve as pre-study in establishing the opportunities and challenges of applying big data analytics in the management of SCDMethod: The study adopted both deductive and inductive approaches. Data was collected through interviews based on a framework which was modified specifically for this study. It was then inductively analysed to answer the research questions.Conclusion: Although there is a lot of potential for big data analytics in SCD in areas like population health management, evidence-based medicine and personalised care, its adoption is not a surety. This is because of lack of interoperability between the existing systems and strenuous legal compliant processes in data acquisition.
|
440 |
Development of computational approaches for whole-genome sequence variation and deep phenotypingHaimel, Matthias January 2019 (has links)
The rare disease pulmonary arterial hypertension (PAH) results in high blood pressure in the lung caused by narrowing of lung arteries. Genes causative in PAH were discovered through family studies and very often harbour rare variants. However, the genetic cause in heritable (31%) and idiopathic (79%) PAH cases is not yet known but are speculated to be caused by rare variants. Advances in high-throughput sequencing (HTS) technologies made it possible to detect variants in 98% of the human genome. A drop in sequencing costs made it feasible to sequence 10,000 individuals including 1,250 subjects diagnosed with PAH and relatives as part of the NIHR Bioresource - Rare (BR-RD) disease study. This large cohort allows the genome-wide identification of rare variants to discover novel causative genes associated with PAH in a case-control study to advance our understanding of the underlying aetiology. In the first part of my thesis, I establish a phenotype capture system that allows research nurses to record clinical measurements and other patient related information of PAH patients recruited to the NIHR BR-RD study. The implemented extensions provide a programmatic data transfer and an automated data release pipeline for analysis ready data. The second part is dedicated to the discovery of novel disease genes in PAH. I focus on one well characterised PAH disease gene to establish variant filter strategies to enrich for rare disease causing variants. I apply these filter strategies to all known PAH disease genes and describe the phenotypic differences based on clinically relevant values. Genome-wide results from different filter strategies are tested for association with PAH. I describe the findings of the rare variant association tests and provide a detailed interrogation of two novel disease genes. The last part describes the data characteristics of variant information, available non SQL (NoSQL) implementations and evaluates the suitability and scalability of distributed compute frameworks to store and analyse population scale variation data. Based on the evaluation, I implement a variant analysis platform that incrementally merges samples, annotates variants and enables the analysis of 10,000 individuals in minutes. An incremental design for variant merging and annotation has not been described before. Using the framework, I develop a quality score to reduce technical variation and other biases. The result from the rare variant association test is compared with traditional methods.
|
Page generated in 0.0833 seconds