• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 163
  • 20
  • 11
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 325
  • 325
  • 136
  • 112
  • 82
  • 69
  • 67
  • 45
  • 43
  • 43
  • 39
  • 38
  • 36
  • 35
  • 35
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Informatic strategies for the discovery and characterization of peptidic natural products

Merwin, Nishanth 06 1900 (has links)
Microbial natural products have served a key role in the development of clinically relevant drugs. Despite significant interest, traditional strategies in their characterization have lead to diminishing returns, leaving this field stagnant. Recently developed technologies such as low-cost, high-throughput genome sequencing and high-resolution mass spectrometry allow for a much richer experimental strategy, allowing us to gather data at an unprecedented scale. Naive efforts in analyzing genomic data have already revealed the wealth of natural products encoded within diverse bacterial phylogenies. Herein, I leverage these technologies through the development of specialized computational platforms cognizant of existing natural products and their biosynthesis in order to reinvigorate our drug discovery protocols. As a first, I present a strategy for the targeted isolation of novel and structurally divergent ribosomally synthesized and post-translationally modified peptides (RiPPs). Specifically, this software platform is able to directly compare genomically encoded RiPPs to previously characterized chemical scaffolds, allowing for the identification of bacterial strains producing these specialized, and previously unstudied metabolites. Further, using metabolomics data, I have developed a strategy that facilitates direct identification and targeted isolation of these uncharacterized RiPPs. Through these set of tools, we were able to successfully isolate a structurally unique lasso peptide from a previously unexplored \textit{Streptomyces} isolate. With the technological rise of genomic sequencing, it is now possible to survey polymicrobial environments with remarkable detail. Through the use of metagenomics, we can survey the presence and abundances of bacteria, and further metatranscriptomics is able to reveal the expression of their biosynthetic pathways. Here, I developed a platform which is able to identify microbial peptides exclusively found within the human microbiome, and further characterize their putative antimicrobial properties. Through this endeavour, we identified a bacterially encoded peptide that can effectively protect against pathogenic \textit{Clostridium difficile} infections. With the wealth of publicly available multi-omics datasets, these works in conjunction demonstrate the potential of informatics strategies in the advancement of natural product discovery. / Thesis / Master of Science (MSc) / Biochemistry is the study in which life is built upon a series of diverse chemistry and their interactions. Some of these chemicals are not essential for the maintaining basic metabolism, but are instead tailored for alternative functions best suited to their environment. Often, these molecules mediate biological warfare, allowing organisms to compete and establish dominance amongst their neighbours. Understanding this, several of these molecules have been exploited in our modern pharmaceutical regimen as effective antibiotics. Due to the ever rising reality of antibiotic resistance, we are in dire need of novel antibiotics. With this goal, I have developed several software tools that can both identify these molecules encoded within bacterial genomes, but also predict their effects on neighbouring bacteria. Through these computational tools, I provide an updated strategy for the discovery and characterization of these biologically derived chemicals.
52

Clustering Web Users by Mouse Movement to Detect Bots and Botnet Attacks

Morgan, Justin L 01 March 2021 (has links) (PDF)
The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing machine learning-based detection schemes, by introducing a novel unsupervised learning approach to cluster users based on behavioral biometrics. The idea is that, by differentiating users based on their behavior, for example how they use the mouse or type on the keyboard, information can be provided for website administrators to make more informed decisions on declaring if a user is a human or a bot. This approach is similar to how modern websites require users to login before browsing their website; which in doing so, website administrators can make informed decisions on declaring if a user is a human or a bot. An added benefit of this approach is that it is a human observational proof (HOP); meaning that it will not inconvenience the user (user friction) with human interactive proofs (HIP) such as CAPTCHA, or with login requirements
53

(Intelligentes) Text Mining in der Marktforschung

Stützer, Cathleen M., Wachenfeld-Schell, Alexandra, Oglesby, Stefan 24 November 2021 (has links)
Die Extraktion von Informationen aus Texten – insbesondere aus unstrukturierten Textdaten wie Foren, Bewertungsportalen bzw. aus offenen Angaben – stellen heute eine besondere Herausforderung für Marktforscher und Marktforscherinnen dar. Hierzu wird zum einen neues methodisches Know-how gebraucht, um mit den komplexen Datenbeständen sowohl bei der Erhebung wie auch bei der Bewertung dieser umzugehen. Zum anderen müssen im Kontext der digitalen Beforschung von neuen Customer Insights sowohl technische als auch organisationale Infrastrukturen geschaffen werden, um u. a. Geschäftsmodelle in Abläufen und Arbeitsprozessen von Unternehmen, Institutionen und Organisationen etablieren zu können. Die Beiträge des Bandes besprechen nicht nur vielfältigste Methoden und Verfahren zur automatischen Textextraktion, sondern zeigen hierbei sowohl die Relevanz als auch die Herausforderungen für die Online-Marktforschung auf, die mit dem Einsatz solch innovativer Ansätze und Verfahren verbunden sind.:C. M. Stützer, A. Wachenfeld-Schell & S. Oglesby: Digitale Transformation der Marktforschung A. Lang & M. Egger, Insius UG: Wie Marktforscher durch kooperatives Natural Language Processing bei der qualitativen Inhaltsanalyse profitieren können M. Heurich & S. Štajner, Symanto Research: Durch Technologie zu mehr Empathie in der Kundenansprache – Wie Text Analytics helfen kann, die Stimme des digitalen Verbrauchers zu verstehen G. Heisenberg, TH Köln & T. Hees, Questback GmbH: Text Mining-Verfahren zur Analyse offener Antworten in Online-Befragungen im Bereich der Markt- und Medienforschung T. Reuter, Cogia Intelligence GmbH: Automatische semantische Analysen für die Online-Marktforschung P. de Buren, Caplena GmbH: Offenen Nennungen gekonnt analysieren
54

Extending Synthetic Data and Data Masking Procedures using Information Theory

Tyler J Lewis (15361780) 26 April 2023 (has links)
<p>The two primarily methodologies discussed in this thesis are the nonparametric entropy-based synthetic timeseries (NEST) and Directed infusion of data (DIOD) algorithms. </p> <p><br></p> <p>The former presents a novel synthetic data algorithm that is shown to outperform sismilar state-of-the-art, including generative networks, in terms of utility and data consistency. Majority of data used are open-source, and are cited where appropriate.</p> <p><br></p> <p>DIOD presents a novel data masking paradigm that presevres the utility, privacy, and efficiency required by the current industrial paradigm, and presents a cheaper alternative to many state-of-the-art. Data used include simulation data (source code cited), equations-based data, and open-source images (cited as needed). </p>
55

Anonymization Techniques for Privacy-preserving Process Mining

Fahrenkrog-Petersen, Stephan A. 30 August 2023 (has links)
Process Mining ermöglicht die Analyse von Event Logs. Jede Aktivität ist durch ein Event in einem Trace recorded, welcher jeweils einer Prozessinstanz entspricht. Traces können sensible Daten, z.B. über Patienten enthalten. Diese Dissertation adressiert Datenschutzrisiken für Trace Daten und Process Mining. Durch eine empirische Studie zum Re-Identifikations Risiko in öffentlichen Event Logs wird die hohe Gefahr aufgezeigt, aber auch weitere Risiken sind von Bedeutung. Anonymisierung ist entscheidend um Risiken zu adressieren, aber schwierig weil gleichzeitig die Verhaltensaspekte des Event Logs erhalten werden sollen. Dies führt zu einem Privacy-Utility-Trade-Off. Dieser wird durch neue Algorithmen wie SaCoFa und SaPa angegangen, die Differential Privacy garantieren und gleichzeitig Utility erhalten. PRIPEL ergänzt die anonymiserten Control-flows um Kontextinformationen und ermöglich so die Veröffentlichung von vollständigen, geschützten Logs. Mit PRETSA wird eine Algorithmenfamilie vorgestellt, die k-anonymity garantiert. Dafür werden privacy-verletztende Traces miteinander vereint, mit dem Ziel ein möglichst syntaktisch ähnliches Log zu erzeugen. Durch Experimente kann eine bessere Utility-Erhaltung gegenüber existierenden Lösungen aufgezeigt werden. / Process mining analyzes business processes using event logs. Each activity execution is recorded as an event in a trace, representing a process instance's behavior. Traces often hold sensitive info like patient data. This thesis addresses privacy concerns arising from trace data and process mining. A re-identification risk study on public event logs reveals high risk, but other threats exist. Anonymization is vital to address these issues, yet challenging due to preserving behavioral aspects for analysis, leading to a privacy-utility trade-off. New algorithms, SaCoFa and SaPa, are introduced for trace anonymization using noise for differential privacy while maintaining utility. PRIPEL supplements anonymized control flows with trace contextual info for complete protected logs. For k-anonymity, the PRETSA algorithm family merges privacy-violating traces based on a prefix representation of the event log, maintaining syntactic similarity. Empirical evaluations demonstrate utility improvements over existing techniques.
56

Statistical Modelling of Plug-In Hybrid Fuel Consumption : A study using data science methods on test fleet driving data / Statistisk Modellering av Bränsleförbrukning För Laddhybrider : En studie gjord med hjälp av data science metoder baserat på data från en test flotta

Matteusson, Theodor, Persson, Niclas January 2020 (has links)
The automotive industry is undertaking major technological steps in an effort to reduce emissions and fight climate change. To reduce the reliability on fossil fuels a lot of research is invested into electric motors (EM) and their applications. One such application is plug-in hybrid electric vehicles (PHEV), in which internal combustion engines (ICE) and EM are used in combination, and take turns to propel the vehicle based on driving conditions. The main optimization problem of PHEV is to decide when to use which motor. If this optimization is done with respect to emissions, the entire electric charge should be used up before the end of the trip. But if the charge is used up too early, latter driving segments for which the optimal choice would have been to use the EM will have to be done using the ICE. To address this optimization problem, we studied the fuel consumption during different driving conditions. These driving conditions are characterized by hundreds of sensors which collect data about the state of the vehicle continuously when driving. From these data, we constructed 150 seconds segments, including e.g. vehicle speed, before new descriptive features were engineered for each segment, e.g. max vehicle speed. By using the characteristics of typical driving conditions specified by the Worldwide Harmonized Light Vehicles Test Cycle (WLTC), segments were labelled as a highway or city road segments. To reduce the dimensions without losing information, principle component analysis was conducted, and a Gaussian mixture model was used to uncover hidden structures in the data. Three machine learning regression models were trained and tested: a linear mixed model, a kernel ridge regression model with linear kernel function, and lastly a kernel ridge regression model with an RBF kernel function. By splitting the data into a training set and a test set the models were evaluated on data which they have not been trained on. The model performance and explanation rate obtained for each model, such as R2, Mean Absolute Error and Mean Squared Error, were compared to find the best model. The study shows that the fuel consumption can be modelled by the sensor data of a PHEV test fleet where 6 features contributes to an explanation ratio of 0.5, thus having highest impact on the fuel consumption. One needs to keep in mind the data were collected during the Covid-19 outbreak where travel patterns were not considered to be normal. No regression model can explain the real world better than what the underlying data does. / Fordonsindustrin vidtar stora tekniska steg för att minska utsläppen och bekämpa klimatförändringar. För att minska tillförlitligheten på fossila bränslen investeras en hel del forskning i elmotorer (EM) och deras tillämpningar. En sådan applikation är laddhybrider (PHEV), där förbränningsmotorer (ICE) och EM används i kombination, och turas om för att driva fordonet baserat på rådande körförhållanden. PHEV: s huvudoptimeringsproblem är att bestämma när man ska använda vilken motor. Om denna optimering görs med avseende på utsläpp bör hela den elektriska laddningen användas innan resan är slut. Men om laddningen används för tidigt måste senare delar av resan, för vilka det optimala valet hade varit att använda EM, göras med ICE. För att ta itu med detta optimeringsproblem, studerade vi bränsleförbrukningen under olika körförhållanden. Dessa körförhållanden kännetecknas av hundratals sensorer som samlar in data om fordonets tillstånd kontinuerligt vid körning. Från dessa data konstruerade vi 150 sekunder segment, inkluderandes exempelvis fordonshastighet, innan nya beskrivande attribut konstruerades för varje segment, exempelvis högsta fordonshastighet. Genom att använda egenskaperna för typiska körförhållanden som specificerats av Worldwide Harmonized Light Vehicles Test Cycle (WLTC), märktes segment som motorvägs- eller stadsvägsegment. För att minska dimensioner på data utan att förlora information, användes principal component analysis och en Gaussian Mixture model för att avslöja dolda strukturer i data. Tre maskininlärnings regressionsmodeller skapades och testades: en linjär blandad modell, en kernel ridge regression modell med linjär kernel funktion och slutligen en en kernel ridge regression modell med RBF kernel funktion. Genom att dela upp informationen i ett tränings set och ett test set utvärderades de tre modellerna på data som de inte har tränats på. För utvärdering och förklaringsgrad av varje modell användes, R2, Mean Absolute Error och Mean Squared Error. Studien visar att bränsleförbrukningen kan modelleras av sensordata för en PHEV-testflotta där 6 stycken attribut har en förklaringsgrad av 0.5 och därmed har störst inflytande på bränsleförbrukningen . Man måste komma ihåg att all data samlades in under Covid-19-utbrottet där resmönster inte ansågs vara normala och att ingen regressionsmodell kan förklara den verkliga världen bättre än vad underliggande data gör.
57

Learning Predictive Models from Electronic Health Records

Zhao, Jing January 2017 (has links)
The ongoing digitization of healthcare, which has been much accelerated by the widespread adoption of electronic health records, generates unprecedented amounts of clinical data in a readily computable form. This, in turn, affords great opportunities for making meaningful secondary use of clinical data in the endeavor to improve healthcare, as well as to support epidemiology and medical research. To that end, there is a need for techniques capable of effectively and efficiently analyzing large amounts of clinical data. While machine learning provides the necessary tools, learning effective predictive models from electronic health records comes with many challenges due to the complexity of the data. Electronic health records contain heterogeneous and longitudinal data that jointly provides a rich perspective of patient trajectories in the healthcare process. The diverse characteristics of the data need to be properly accounted for when learning predictive models from clinical data. However, how best to represent healthcare data for predictive modeling has been insufficiently studied. This thesis addresses several of the technical challenges involved in learning effective predictive models from electronic health records. Methods are developed to address the challenges of (i) representing heterogeneous types of data, (ii) leveraging the concept hierarchy of clinical codes, and (iii) modeling the temporality of clinical events. The proposed methods are evaluated empirically in the context of detecting adverse drug events in electronic health records. Various representations of each type of data that account for its unique characteristics are investigated and it is shown that combining multiple representations yields improved predictive performance. It is also demonstrated how the information embedded in the concept hierarchy of clinical codes can be exploited, both for creating enriched feature spaces and for decomposing the predictive task. Moreover, incorporating temporal information leads to more effective predictive models by distinguishing between event occurrences in the patient history. Both single-point representations, using pre-assigned or learned temporal weights, and multivariate time series representations are shown to be more informative than representations in which temporality is ignored. Effective methods for representing heterogeneous and longitudinal data are key for enhancing and truly enabling meaningful secondary use of electronic health records through large-scale analysis of clinical data.
58

Identifying and Evaluating Early Stage Fintech Companies: Working with Consumer Internet Data and Analytic Tools

Dymov, Khasan 24 January 2018 (has links)
The purpose of this project is to work as an interdisciplinary team whose primary role is to mentor a team of WPI undergraduate students completing their Major Qualifying Project (MQP) in collaboration with Vestigo Ventures, LLC. (“Vestigo Ventures�) and Cogo Labs. We worked closely with the project sponsors at Vestigo Ventures and Cogo Labs to understand each sponsor’s goals and desires, and then translated those thoughts into actionable items and concrete deliverables to be completed by the undergraduate student team. As a graduate student team with a diverse set of educational backgrounds and a range of academic and professional experiences, we provided two primary functions throughout the duration of this project. The first function was to develop a roadmap for each individual project, with concrete steps, justification, goals and deliverables. The second function was to provide the undergraduate team with clarification and assistance throughout the implementation and completion of each project, as well as provide our opinions and thoughts on any proposed changes. The two teams worked together in lock-step in order to provide the project sponsors with a complete set of deliverables, with the undergraduate team primarily responsible for implementation and final delivery of each completed project.
59

Identifying and Evaluating Early Stage Fintech Companies: Working with Consumer Internet Data and Analytic Tools

Shoop, Alexander 24 January 2018 (has links)
The purpose of this project is to work as an interdisciplinary team whose primary role is to mentor a team of WPI undergraduate students completing their Major Qualifying Project (MQP) in collaboration with Vestigo Ventures, LLC. (“Vestigo Ventures�) and Cogo Labs. We worked closely with the project sponsors at Vestigo Ventures and Cogo Labs to understand each sponsor’s goals and desires, and then translated those thoughts into actionable items and concrete deliverables to be completed by the undergraduate student team. As a graduate student team with a diverse set of educational backgrounds and a range of academic and professional experiences, we provided two primary functions throughout the duration of this project. The first function was to develop a roadmap for each individual project, with concrete steps, justification, goals and deliverables. The second function was to provide the undergraduate team with clarification and assistance throughout the implementation and completion of each project, as well as provide our opinions and thoughts on any proposed changes. The two teams worked together in lock-step in order to provide the project sponsors with a complete set of deliverables, with the undergraduate team primarily responsible for implementation and final delivery of each completed project.
60

Estratégia computacional para apoiar a reprodutibilidade e reuso de dados científicos baseado em metadados de proveniência. / Computational strategy to support the reproducibility and reuse of scientific data based on provenance metadata.

Silva, Daniel Lins da 17 May 2017 (has links)
A ciência moderna, apoiada pela e-science, tem enfrentado desafios de lidar com o grande volume e variedade de dados, gerados principalmente pelos avanços tecnológicos nos processos de coleta e processamento dos dados científicos. Como consequência, houve também um aumento na complexidade dos processos de análise e experimentação. Estes processos atualmente envolvem múltiplas fontes de dados e diversas atividades realizadas por grupos de pesquisadores geograficamente distribuídos, que devem ser compreendidas, reutilizadas e reproduzíveis. No entanto, as iniciativas da comunidade científica que buscam disponibilizar ferramentas e conscientizar os pesquisadores a compartilharem seus dados e códigos-fonte, juntamente com as publicações científicas, são, em muitos casos, insuficientes para garantir a reprodutibilidade e o reuso das contribuições científicas. Esta pesquisa objetiva definir uma estratégia computacional para o apoio ao reuso e a reprodutibilidade dos dados científicos, por meio da gestão da proveniência dos dados durante o seu ciclo de vida. A estratégia proposta nesta pesquisa é apoiada em dois componentes principais, um perfil de aplicação, que define um modelo padronizado para a descrição da proveniência dos dados, e uma arquitetura computacional para a gestão dos metadados de proveniência, que permite a descrição, armazenamento e compartilhamento destes metadados em ambientes distribuídos e heterogêneos. Foi desenvolvido um protótipo funcional para a realização de dois estudos de caso que consideraram a gestão dos metadados de proveniência de experimentos de modelagem de distribuição de espécies. Estes estudos de caso possibilitaram a validação da estratégia computacional proposta na pesquisa, demonstrando o seu potencial no apoio à gestão de dados científicos. / Modern science, supported by e-science, has faced challenges in dealing with the large volume and variety of data generated primarily by technological advances in the processes of collecting and processing scientific data. Therefore, there was also an increase in the complexity of the analysis and experimentation processes. These processes currently involve multiple data sources and numerous activities performed by geographically distributed research groups, which must be understood, reused and reproducible. However, initiatives by the scientific community with the goal of developing tools and sensitize researchers to share their data and source codes related to their findings, along with scientific publications, are often insufficient to ensure the reproducibility and reuse of scientific results. This research aims to define a computational strategy to support the reuse and reproducibility of scientific data through data provenance management during its entire life cycle. Two principal components support our strategy in this research, an application profile that defines a standardized model for the description of provenance metadata, and a computational architecture for the management of the provenance metadata that enables the description, storage and sharing of these metadata in distributed and heterogeneous environments. We developed a functional prototype for the accomplishment of two case studies that considered the management of provenance metadata during the experiments of species distribution modeling. These case studies enabled the validation of the computational strategy proposed in the research, demonstrating the potential of this strategy in supporting the management of scientific data.

Page generated in 0.0552 seconds