• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 162
  • 20
  • 11
  • 11
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 322
  • 322
  • 134
  • 111
  • 81
  • 69
  • 66
  • 44
  • 43
  • 42
  • 39
  • 38
  • 36
  • 35
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Designing Surveys on Youth Immigration Reform: Lessons from the 2016 CCES Anomaly

Calkins, Saige 18 December 2020 (has links)
Even with clear advantages to using internet based survey research, there are still some uncertainties to which survey methods are most conducive to an online platform. Most survey method literature, whether focusing on online, telephone, or in-person formats, tend to observe little to no differences between using various survey modes and survey results. Despite this, there is little research focused on the interaction effect between survey formatting, in terms of design and framing, and public opinion on social issues, specifically child immigration policies - a recent topic of popular debate. This paper examines an anomalous result found within the 2016 Cooperative Congressional Election Study (CCES) public opinion immigration question focusing on a DACA-related policy, where support was evenly split on the typically highly favored policy. To decipher the unprecedented result, an experimental survey design was conducted via Qualtrics by comparing various survey formats (single-style, forced choice, Likert scale) and inclusionary policy details to the original CCES “select all that apply” matrix style. By comparing the experimental polls, the results indicated that the “select all that apply” matrix again produced anomalous results, while the various other methods produced a breakdown similar to typical DACA-related polling data. These findings have necessary implications for future survey designs and those examining public opinion on child immigration policies.
202

Identification of alkaline fens using convolutional neural networks and multispectral satellite imagery

Jernberg, John January 2021 (has links)
The alkaline fen is a particularly valuable type of wetland with unique characteristics.Due to anthropogenic risk factors and the sensitive nature of the fens, protection is highlyprioritized with identification and mapping of current locations being important parts ofthis process. To accomplish this in a cost effective manner for large areas, remote sensingmethods using satellite images might be very effective. Following the rapid developmentin computer vision, deep learning using convolutional neural networks (CNN) is thecurrent state of the art for satellite image classification. Accordingly, this study evaluatesthe combination of different CNN architectures and multispectral Sentinel 2 satelliteimages for identification of alkaline fens using semantic segmentation. The implementedmodels are different variations of the proven U-net network design. In addition, a RandomForest classifier was trained for baseline comparison. The best result was produced bya spatial attention U-net with a IoU-score of 0.31 for the alkaline fen class and a meanIoU-score of 0.61. These findings suggest that identification of alkaline fens is possiblewith the current method even with a small dataset. However, an optimal solution tothis task may require deeper research. The results also further establish deep learningto be the superior choice over traditional machine learning algorithms for satellite imageclassification.
203

Propuesta de mercados alternativos y potenciales para la empresa Sociedad Agrícola Drokasa S.A

Gonzales Lanasca, Felix Junior, Mejia Mendoza, Jimmy Gerson, Otoya Pagan, Angela Katia 30 November 2020 (has links)
El presente proyecto de investigación es un análisis desde la perspectiva de negocios y estadístico de la empresa Agrokasa. El objetivo principal es encontrar nuevos mercados alternativos en crecimiento que le permitan obtener una mejor rentabilidad por el precio de kilo exportado de palta. Para alcanzar lo mencionado, se llevó a cabo un análisis empresarial que nos permita comprender el contexto y rubro de la empresa. Una vez alcanzado ese objetivo, se aplicó la metodología de la ciencia de datos para encontrar países de destino que son atractivos para Agrokasa. En cuanto al conjunto de datos, se obtuvo de diferentes fuentes públicas y privadas como Veritrade, Trade Map y Adex Data Trade. En consecuencia, se logró identificar 03 mercados alternativos y potenciales, tales como, Rusia, China y Corea Del Sur. En el análisis se utilizaron diferentes herramientas tecnológicas para la compilación, depuración, procesamiento y visualización de los datos, tales como Excel, Power Bi y Python. Con lo cual se demostró la importancia de ver todas las variables en una visualización que nos permite entender el comportamiento de los datos y nos sirve como fundamento para la toma de decisiones. En cuanto a los nuevos mercados, China presento el mayor valor total FOB exportado en el periodo analizado, 2018 -2020. Pese a presentar una tendencia negativa en la Regresión Lineal. Sin embargo, el precio promedio por kilo de palta aun es conveniente. Por otro lado, Rusia fue el mercado con mayores perspectivas de crecimiento y Corea Del Sur con un mejor precio por KG. Finalmente, para todos los mercados se utilizó una técnica de ciencia de datos con aprendizaje supervisado con un enfoque predictivo para pronosticar las importaciones de cada uno de ellos a fin de establecer estrategias comerciales para penetrar en ellos. / This paper is an analysis from a business and statistical perspective of the Agrokasa company in order to find new potential markets that allow it to grow in the volume of its avocado exports and in profitability per Kg exported. To achieve the aforementioned, a previous analysis from a business approach has been used, to understand the context and business area. Once this is understood, the methodology of data science has been applied to find destination countries that are attractive to Agrokasa. The data set was obtained from different public and private sources such as Veritrade and Trademap, with which it was possible to identify 03 potential markets that were China, Russia and South Korea. In the analysis, different technological tools were used to compile, debug, process and visualize the data, such as Excel, Power Bi and Python. With which it was demonstrated the importance of seeing all the variables in a visualization that allows us to understand the behavior of the data and serves as a basis for decision-making. China was the market with the highest total FOB value exported in the analyzed period, which was from 2018 -2020, however, with a negative trend, but with a convenient average price. On the other hand, Russia was the market with the best growth prospects and South Korea with a better price per KG. ​Finally, for all markets, a data science technique with supervised learning with a predictive approach was used to forecast the imports of each of them in order to establish commercial strategies to penetrate them. / Trabajo de investigación
204

SEARCHING THE EDGES OF THE PROTEIN UNIVERSE USING DATA SCIENCE

Mengmeng Zhu (8775917) 30 April 2020 (has links)
<p>Data science uses the latest techniques in statistics and machine learning to extract insights from data. With the increasing amount of protein data, a number of novel research approaches have become feasible.</p><p>Micropeptides are an emerging field in the protein universe. They are small proteins with <= 100 amino acid residues (aa) and are translated from small open reading frames (sORFs) of <= 303 base pairs (bp). Traditionally, their existence was ignored because of the technical difficulties in isolating them. With technological advances, a growing number of micropeptides have been characterized and shown to play vital roles in many biological processes. Yet, we lack bioinformatics methods for predicting them directly from DNA sequences, which could substantially facilitate research in this field with minimal cost. With the increasing amount of data, developing new methods to address this need becomes possible. We therefore developed MiPepid, a machine-learning-based method specifically designed for predicting micropeptides from DNA sequences by curating a high-quality dataset and by training MiPepid using logistic regression with 4-mer features. MiPepid performed exceptionally well on holdout test sets and performed much better than existing methods. MiPepid is available for downloading, easy to use, and runs sufficiently fast.</p><p>Long noncoding RNAs (LncRNAs) are transcripts of > 200 bp and does not encode a protein. Contrary to their “noncoding” definition, an increasing number of lncRNAs have been found to be translated into functional micropeptides. Therefore, whether most lncRNAs are translated is an open question of great significance. To address this question, by harnessing the availability of large-scale human variation data, we have explored the relationships between lncRNAs, micropeptides, and canonical regular proteins (> 100 aa) from the perspective of genetic variation, which has long been used to study natural selection to infer functional relevance. Through rigorous statistical analyses, we find that lncRNAs share a similar genetic variation profile with proteins regarding single nucleotide polymorphism (SNP) density, SNP spectrum, enrichment of rare SNPs, etc., suggesting lncRNAs are under similar negative selection strength with proteins. Our study revealed similarities between micropeptides, lncRNAs, and canonical proteins and is the first attempt to explore the relationships between the three groups from a genetic variation perspective.</p><p>Deep learning has been tremendously successful in 2D image recognition. Protein binding ligand prediction is fundamental topic in protein research as most proteins bind ligands to function. Proteins are 3D structures and can be considered as 3D images. Prediction of binding ligands of proteins can then be converted to a 3D image classification problem. In addition, a large number of protein structure data are available now. We therefore utilized deep learning to predict protein binding ligands by designing a 3D convolutional neural network from scratch and by building a large 3D image dataset of protein structures. The trained model achieved an average F1 score of over 0.8 across 151 classes on the holdout test set. Compared to existing methods, our model performed better. In summary, we showed the feasibility of deploying deep learning in protein structure research.</p><p>In conclusion, by exploring various edges of the protein universe from the perspective of data science, we showed that the increasing amount of data and the advancement of data science methods made it possible to address a wide variety of pressing biological questions. We showed that for a successful data science study, the three components – goal, data, method – all of them are indispensable. We provided three successful data science studies: the careful data cleaning and selection of machine learning algorithm lead to the development of MiPepid that fits the urgent need of a micropeptide prediction method; identifying the question and exploring it from a different angle lead to the key insight that lncRNAs resemble micropeptides; applying deep learning to protein structure data lead to a new approach to the long-standing question of protein-ligand binding. The three studies serve as excellent examples in solving a wide range of data science problems with a variety of issues.</p>
205

Deep Learning Based User Models for Interactive Optimization of Watershed Designs

Andrew Paul Hoblitzell (8086769) 11 December 2019 (has links)
<p>This dissertation combines stakeholder and analytical intelligence for consensus decision-making via an interactive optimization process. This dissertation outlines techniques for developing user models of subjective criteria of human stakeholders for an environmental decision support system called WRESTORE. The dissertation compares several user modeling techniques and develops methods for incorporating such user models selectively for interactive optimization, combining multiple objective and subjective criteria. </p><p>This dissertation describes additional functionality for our watershed planning system, called WRESTORE (Watershed REstoration Using Spatio-Temporal Optimization of REsources) (http://wrestore.iupui.edu). Techniques for performing the interactive optimization process in the presence of limited data are described. This work adds a user modeling component that develops a computational model of a stakeholder’s preferences and then integrates the user model component into the decision support system. <br></p><p>Our system is one of many decision support systems and is dependent upon stake- holder interaction. The user modeling component within the system utilizes deep learning, which can be challenging with limited data. Our work integrates user models with limited data with application-specific techniques to address some of these challenges. The dissertation describes steps for implementing accurate virtual stakeholder models based on limited training data. </p><p>Another method for dealing with limited data, based upon computing training data uncertainty, is also presented in this dissertation. Results presented show more stable convergence in fewer iterations when using an uncertainty-based incremental sampling method than when using stability based sampling or random sampling. The technique is described in additional detail. </p><p>The dissertation also discusses non-stationary reinforcement-based feature selection for the interactive optimization component of our system. The presented results indicate that the proposed feature selection approach can effectively mitigate against superfluous and adversarial dimensions which if left untreated can lead to degradation in both computational performance and interactive optimization performance against analytically determined environmental fitness functions. </p><p>The contribution of this dissertation lays the foundation for developing a framework for multi-stakeholder consensus decision-making in the presence of limited data.</p>
206

Knot Flow Classification and its Applications in Vehicular Ad-Hoc Networks (VANET)

Schmidt, David 01 May 2020 (has links)
Intrusion detection systems (IDSs) play a crucial role in the identification and mitigation for attacks on host systems. Of these systems, vehicular ad hoc networks (VANETs) are difficult to protect due to the dynamic nature of their clients and their necessity for constant interaction with their respective cyber-physical systems. Currently, there is a need for a VANET-specific IDS that meets this criterion. To this end, a spline-based intrusion detection system has been pioneered as a solution. By combining clustering with spline-based general linear model classification, this knot flow classification method (KFC) allows for robust intrusion detection to occur. Due its design and the manner it is constructed, KFC holds great potential for implementation across a distributed system. The purpose of this thesis was to explain and extrapolate the afore mentioned IDS, highlight its effectiveness, and discuss the conceptual design of the distributed system for use in future research.
207

Building predictive models for dynamic line rating using data science techniques

Doban, Nicolae January 2016 (has links)
The traditional power systems are statically rated and sometimes renewable energy sources (RES) are curtailed in order not to exceed this static rating. The RES are curtailed because of their intermittent character and therefore, it is difficult to predict their output at specific time periods throughout the day. Dynamic Line Rating (DLR) technology can overcome this constraint by leveraging the available weather data and technical parameters of the transmission line. The main goal of the thesis is to present prediction models of Dynamic Line Rating (DLR) capacity on two days ahead and on one day ahead. The models are evaluated based on their error rate profiles. DLR provides the capability to up-rate the line(s) according to the environmental conditions and has always a much higher profile than the static rating. By implementing DLR a power utility can increase the efficiency of the power system, decrease RES curtailment and optimize their integration within the grid. DLR is mainly dependent on the weather parameters and specifically, in large wind speeds and low ambient temperature, the DLR can register the highest profile. Additionally, this is especially profitable for the wind energy producers that can both, produce more (until pitch control) and transmit more in high wind speeds periods with the same given line(s), thus increasing the energy efficiency.  The DLR was calculated by employing modern Data Science and Machine Learning tools and techniques and leveraged historical weather and transmission line data provided by SMHI and Vattenfall respectively. An initial phase of Exploratory Data Analysis (EDA) was developed to understand data patterns and relationships between different variables, as well as to determine the most predictive variables for DLR. All the predictive models and data processing routines were built in open source R and are available on GitHub. There were three types of models built: for historical data, for one day-ahead and for two days-ahead time-horizons. The models built for both time-horizons registered a low error rate profile of 9% (for day-ahead) and 11% (for two days-ahead). As expected, the predictive models built on historical data were more accurate with an error as low as 2%-3%.  In conclusion, the implemented models met the requirements set by Vattenfall of maximum error of 20% and they can be applied in the control room for that specific line. Moreover, predictive models can also be built for other lines if the required data is available. Therefore, this Master Thesis project’s findings and outcomes can be reproduced in other power lines and geographic locations in order to achieve a more efficient power system and an increased share of RES in the energy mix
208

Machine learning and statistical analysis in fuel consumption prediction for heavy vehicles / Maskininlärning och statistisk analys för prediktion av bränsleförbrukning i tunga fordon

Almér, Henrik January 2015 (has links)
I investigate how to use machine learning to predict fuel consumption in heavy vehicles. I examine data from several different sources describing road, vehicle, driver and weather characteristics and I find a regression to a fuel consumption measured in liters per distance. The thesis is done for Scania and uses data sources available to Scania. I evaluate which machine learning methods are most successful, how data collection frequency affects the prediction and which features are most influential for fuel consumption. I find that a lower collection frequency of 10 minutes is preferable to a higher collection frequency of 1 minute. I also find that the evaluated models are comparable in their performance and that the most important features for fuel consumption are related to the road slope, vehicle speed and vehicle weight. / Jag undersöker hur maskininlärning kan användas för att förutsäga bränsleförbrukning i tunga fordon. Jag undersöker data från flera olika källor som beskriver väg-, fordons-, förar- och väderkaraktäristiker. Det insamlade datat används för att hitta en regression till en bränsleförbrukning mätt i liter per sträcka. Studien utförs på uppdrag av Scania och jag använder mig av datakällor som är tillgängliga för Scania. Jag utvärderar vilka maskininlärningsmetoder som är bäst lämpade för problemet, hur insamlingsfrekvensen påverkar resultatet av förutsägelsen samt vilka attribut i datat som är mest inflytelserika för bränsleförbrukning. Jag finner att en lägre insamlingsfrekvens av 10 minuter är att föredra framför en högre frekvens av 1 minut. Jag finner även att de utvärderade modellerna ger likvärdiga resultat samt att de viktigaste attributen har att göra med vägens lutning, fordonets hastighet och fordonets vikt.
209

Aplicación de Data Science para el análisis de una campaña de respuesta directa televisiva de UNICEF Perú

Olivera Taboada, Luis Angel, Sialer Puelles, Melissa Aurora, Velarde Gonzales, Juan José 07 December 2020 (has links)
El presente trabajo de investigación abordó una novedosa campaña televisiva desplegada por UNICEF Perú, para fomentar la afiliación de donantes a su programa Soy Socio, en beneficio de niños y niñas en situación de vulnerabilidad. A partir de la información de la primera campaña de promoción televisiva, se extrajeron diversos insights, gracias a la aplicación de herramientas y técnicas de ciencia de datos, para analizar sus principales resultados y evaluar el performance del proveedor de atención telefónica, como la contribución de los grupos televisivos contratados. De este modo, se buscaron reconocer las principales características de los nuevos asociados. Se recurrió al modelo de regresión lineal, a fin de proyectar el nivel de efectividad en función del volumen de llamadas y donaciones captadas (nivel de conversión) y los días de duración de la campaña. Asimismo, se aplicó el método de ajuste de error cuadrático medio, para establecer la bondad del modelo, encontrándose una relación positiva, aunque sujeta a la influencia de otros factores exógenos. Para la construcción del perfil de los donantes, se utilizó la técnica de clusterización, la cual permitió reconocer y agrupar características relevantes de los nuevos asociados, con el fin de que la institución pueda orientar con mayor eficiencia sus campañas a futuro. Por último, se elaboraron diversas visualizaciones en línea con los objetivos de la investigación, para esto, se recurrieron a los conocimientos adquiridos en los cursos que componen la mención de ciencia de datos. / This research work addresses a novel television campaign deployed by UNICEF Peru to encourage donor affiliation to its Soy Socio program, for the benefit of children in vulnerable situations. Based on the information from the first television promotion campaign, various insights were extracted thanks to the application of data science tools and techniques, to analyze their main results and evaluate the performance of the telephone service provider, as well as the contribution of the television groups hired. In the same way, they sought to recognize the main characteristics of the new associates. The Linear Regression Model was used to project the level of effectiveness based on the volume of calls and donations captured (conversion level) and the days of the campaign. Likewise, the Mean Square Error adjustment method was applied to establish the goodness of the model, finding a positive relationship, although subject to the influence of other exogenous factors. For the construction of the donor profile, the clustering technique was used, which made it possible to recognize and group relevant characteristics of the new associates, so that the institution can more efficiently orient its future campaigns. Finally, various visualizations were developed in line with the objectives of the research, resorting to the knowledge acquired in the courses that make up the data science mention. / Trabajo de investigación
210

A Text Analysis of Data Science Career Opportunities and U.S. iSchool Curriculum

Durr, Angel Krystina 12 1900 (has links)
Data science employment opportunities of varied complexity and environment are in growing demand across the globe. Data science as a discipline potentially offers a wealth of jobs to prospective employees, while traditional information science-based roles continue to decrease as budgets get cut across the U.S. Since data is related closely to information historically, this research will explore the education of U.S. iSchool professionals and compare it to traditional data science roles being advertised within the job market. Through a combination of latent semantic analysis of over 1600 job postings and iSchool course documentation, it is our aim to explore the intersection of library and information science and data science. Hopefully these research findings will guide future directions for library and information science professionals into data science driven roles, while also examining and highlighting the data science techniques currently driven by the education of iSchool professionals. In addition, it is our aim to understand how data science could benefit from a mutually symbiotic relationship with the field of information science as statistically data scientists spend far too much time working on data preparation and not nearly enough time conducting scientific inquiry. The results of this examination will potentially guide future directions of iSchool students and professionals towards more cooperative data science roles and guide future research into the intersection between iSchools and data science and possibilities for partnership.

Page generated in 0.5878 seconds