• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 151
  • 28
  • 25
  • 13
  • 13
  • 12
  • 11
  • 8
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 335
  • 47
  • 44
  • 34
  • 33
  • 33
  • 33
  • 32
  • 29
  • 29
  • 28
  • 27
  • 27
  • 26
  • 26
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Designing Incentive for Cooperative Problem Solving in Crowdsourcing / クラウドソーシングにおける協調問題解決のためのインセンティブ設計

Jiang, Huan 23 March 2015 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19120号 / 情博第566号 / 新制||情||99(附属図書館) / 32071 / 京都大学大学院情報学研究科社会情報学専攻 / (主査)准教授 松原 繁夫, 教授 田島 敬史, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
32

Crowdsourcing i små företag : En kvalitativ studie om hur små företag upplever och arbetar med crowdsourcing / Crowdsourcing in small enterprises : A qualitative study on how small enterprises experience and use crowdsourcing

Wallenborg Björk, Amanda, Weinheimer, Hanna January 2023 (has links)
Bakgrund: En globaliserad värld och ny teknik har gett företag fler möjligheter att få tillgång till arbetskraft och kompetens. Crowdsourcing kan användas av företag för att anlita egenställda frilansare via digitala plattformar på ett flexibelt och kostnadseffektivt sätt. För små företag som kännetecknas av att ha begränsade resurser av kapital, arbetskraft och kompetens kan crowdsourcing vara en lösning för att överleva i en ständigt föränderlig värld och samtidigt skapa konkurrenskraft på marknaden. Syfte och Metod: Syftet med denna studie är att öka förståelsen för de möjligheter och utmaningar små företag i svensk kontext har med crowdsourcing. För att uppnå syftet genomförs studien med en kvalitativ metod uppbyggd av åtta semistrukturerade intervjuer. Teoretisk referensram: Den teoretiska referensramen består av en beskrivning av hur processen går till när företag arbetar med crowdsourcing, ett ramverk med faktorer företag tar hänsyn till vid beslutet att arbeta med crowdsourcing samt vilka möjligheter och utmaningar små företag i andra kontexter upplever att crowdsourcing medför. Resultat och slutsats: Studien visar att små företag i svensk kontext finner en rad olika möjligheter och utmaningar med crowdsourcing där rykte som utmaning är ett bidrag till forskningen från den svenska kontexten. Dessa möjligheter och utmaningar har en stark koppling till beslutsfaktorer som påverkar små företags beslut att arbeta med crowdsourcing genom att de tar hänsyn till och arbetar med de utmaningar som finns för att kunna dra nytta av de möjligheter som finns med att arbeta med crowdsourcing. De möjligheter som finns värderas därmed högre vid beslut att arbeta med crowdsourcing än de utmaningar som små företag måste hantera. / Background: A globalized world and new technologies have given enterprises more opportunities to access labour and skills. Crowdsourcing can be used by enterprises to hire self-employed freelancers via digital platforms in a flexible and cost-effective way. For small businesses that are characterized by limited resources of capital, labour, skills, crowdsourcing can be a solution to survive in an ever-changing world while remaining competitive in the market.  Purpose and method: The purpose of this study is to increase the understanding of the opportunities and challenges small enterprises in the Swedish context have with crowdsourcing. The study is conducted using a qualitative method composed of eight semi structured interviews.  Theoretical frame of reference: The theoretical frame of reference consists of a description of the process of crowdsourcing, a framework of factors that companies take into account when deciding to crowdsource, and the opportunities and challenges that small companies in other contexts experience with crowdsourcing.  Results and conclusion: The study shows that small enterprises in the Swedish context find a variety of opportunities and challenges with crowdsourcing where reputation as a challenge is a contribution to the research from the Swedish context. These opportunities and challenges are strongly linked to decision-making factors that influence small enterprises decision to work with crowdsourcing, by taking into account and working with the challenges that exist in order to take advantage of the opportunities that exist in working with crowdsourcing. The opportunities that exist are valued more highly when deciding to work with crowdsourcing than the challenges that small businesses must deal with.
33

Crowd and Hybrid Algorithms for Cost-Aware Classification

Krivosheev, Evgeny 28 May 2020 (has links)
Classification is a pervasive problem in research that aims at grouping items in categories according to established criteria. There are two prevalent ways to classify items of interest: i) to train and exploit machine learning (ML) algorithms or ii) to resort to human classification (via experts or crowdsourcing). Machine Learning algorithms have been rapidly improving with an impressive performance in complex problems such as object recognition and natural language understanding. However, in many cases they cannot yet deliver the required levels of precision and recall, typically due to difficulty of the problem and (lack of) availability of sufficiently large and clean datasets. Research in crowdsourcing has also made impressive progress in the last few years, and the crowd has been shown to perform well even in difficult tasks [Callaghan et al., 2018; Ranard et al., 2014]. However, crowdsourcing remains expensive, especially when aiming at high levels of accuracy, which often implies collecting more votes per item to make classification more robust to workers' errors. Recently, we witness rapidly emerging the third direction of hybrid crowd-machine classification that can achieve superior performance by combining the cost-effectiveness of automatic machine classifiers with the accuracy of human judgment. In this thesis, we focus on designing crowdsourcing strategies and hybrid crowd-machine approaches that optimize the item classification problem in terms of results and budget. We start by investigating crowd-based classification under the budget constraint with different loss implications, i.,e., when false positive and false negative errors carry different harm to the task. Further, we propose and validate a probabilistic crowd classification algorithm that iteratively estimates the statistical parameters of the task and data to efficiently manage the accuracy vs. cost trade-off. We then investigate how the crowd and machines can support each other in tackling classification problems. We present and evaluate a set of hybrid strategies balancing between investing money in building machines and exploiting them jointly with crowd-based classifiers. While analyzing our results of crowd and hybrid classification, we found it is relevant to study the problem of quality of crowd observations and their confusions as well as another promising direction of linking entities from structured and unstructured sources of data. We propose crowd and neural network grounded algorithms to cope with these challenges followed by rich evaluation on synthetic and real-world datasets.
34

Towards Best Practices for Crowdsourcing Ontology Alignment Benchmarks

Amini, Reihaneh 15 August 2016 (has links)
No description available.
35

Methods for the spatial modeling and evalution of tree canopy cover

Datsko, Jill Marie 24 May 2022 (has links)
Tree canopy cover is an essential measure of forest health and productivity, which is widely studied due to its relevance to many disciplines. For example, declining tree canopy cover can be an indicator of forest health, insect infestation, or disease. This dissertation consists of three studies, focused on the spatial modeling and evaluation of tree canopy cover, drawing on recent developments and best practices in the fields of remote sensing, data collection, and statistical analysis.newlinenewline The first study evaluates how well harmonic regression variables derived at the pixel-level using a time-series of all available Landsat images predict values of tree canopy cover. Harmonic regression works to approximate the reflectance curve of a given band across time. Therefore the coefficients that result from the harmonic regression model estimate relate to the phenology of the area of each pixel. We use a time-series of all available cloud-free observations in each Landsat pixel for NDVI, SWIR1 and SWIR2 bands to obtain harmonic regression coefficients for each variable and then use those coefficients to estimate tree canopy cover at two discrete points in time. This study compares models estimated using these harmonic regression coefficients to those estimated using Landsat median composite imagery, and combined models. We show that (1) harmonic regression coefficients that use a single harmonic coefficient provided the best quality models, (2) harmonic regression coefficients from Landsat-derived NDVI, SWIR1, and SWIR2 bands improve the quality of tree canopy cover models when added to the full suite of median composite variables, (3) the harmonic regression constant for the NDVI time-series is an important variable across models, and (4) there is little to no additional information in the full suite of predictors compared to the harmonic regression coefficients alone based on the information criterion provided by principal components analysis. The second study presented evaluates the use of crowdsourcing with Amazon's Mechanical Turk platform to obtain photointerpretated tree canopy cover data. We collected multiple interpretations at each plot from both crowd and expert interpreters, and sampled these data using a Monte Carlo framework to estimate a classification model predicting the "reliability" of each crowd interpretation using expert interpretations as a benchmark, and identified the most important variables in estimating this reliability. The results show low agreement between crowd and expert groups, as well as between individual experts. We found that variables related to fatigue had the most bearing on the "reliability" of crowd interpretations followed by whether the interpreter used false color or natural color composite imagery during interpretation. Recommendations for further study and future implementations of crowdsourced photointerpretation are also provided. In the final study, we explored sampling methods for the purpose of model validation. We evaluated a method of stratified random sampling with optimal allocation using measures of prediction uncertainty derived from random forest regression models by comparing the accuracy and precision of estimates from samples drawn using this method to estimates from samples drawn using other common sampling protocols using three large, simulated datasets as case studies. We further tested the effect of reduced sample sizes on one of these datasets and demonstrated a method to report the accuracy of continuous models for domains that are either regionally constrained or numerically defined based on other variables or the modeled quantity itself. We show that stratified random sampling with optimal allocation provides the most precise estimates of the mean of the reference Y and the RMSE of the population. We also demonstrate that all sampling methods provide reasonably accurate estimates on average. Additionally we show that, as sample sizes are increased with each sampling method, the precision generally increases, eventually reaching a level of convergence where gains in estimate precision from adding additional samples would be marginal. / Doctor of Philosophy / Tree canopy cover is an essential measure of forest health, which is widely studied due to its relevance to many disciplines. For example, declining tree canopy cover can be an indicator of forest health, insect infestation, or disease. This dissertation consists of three studies, focused on the spatial modeling and evaluation of tree canopy cover, drawing on recent developments and best practices in the fields of remote sensing, data collection, and statistical analysis. The first study is an evaluation of the utility of harmonic regression coefficients from time-series satellite imagery, which describe the timing and magnitude of green-up and leaf loss at each location, to estimate tree canopy cover. This study compares models estimated using these harmonic regression coefficients to those estimated using median composite imagery, which obtain the median value of reflectance values across time data at each location, and models which used both types of variables. We show that (1) harmonic regression coefficients that use a simplified formula provided higher quality models compared to more complex alternatives, (2) harmonic regression coefficients improved the quality of tree canopy cover models when added to the full suite of median composite variables, (3) the harmonic regression constant, which is the coefficient that determines the average reflectance over time, based on time-series vegetation index data, is an important variable across models, and (4) there is little to no additional information in the full suite of predictors compared to the harmonic regression coefficients alone.newlinenewline The second study presented, evaluates the use of crowdsourcing, which engages non-experts in paid online tasks, with Amazon's Mechanical Turk platform to obtain tree canopy cover data, as interpreted from aerial images. We collected multiple interpretations at each location from both crowd and expert interpreters, and sampled these data using a repeated sampling framework to estimate a classification model predicting the "reliability" of each crowd interpretation using expert interpretations as a benchmark, and identified the most important variables in estimating this "reliability". The results show low agreement between crowd and expert groups, as well as between individual experts. We found that variables related to fatigue had the most bearing on the reliability of crowd interpretations followed by variables related to the display settings used to view imagery during interpretation. Recommendations for further study and future implementations of crowdsourced photointerpretation are also provided. In the final study, we explored sampling methods for the purpose of model validation. We evaluated a method of stratified random sampling with optimal allocation, a sampling method that is specifically designed to improve the precision of sample estimates, using measures of prediction uncertainty, describing the variability in predictions from different models in an ensemble of regression models. We compared the accuracy and precision of estimates from samples drawn using this method to estimates from samples drawn using other common sampling protocols using three large, mathematically simulated data products as case studies. We further tested the effect of smaller sample sizes on one of these data products and demonstrated a method to report the accuracy of continuous models for different land cover classes and for classes defined using 10% tree canopy cover intervals. We show that stratified random sampling with optimal allocation provides the most precise sample estimates. We also demonstrate that all sampling methods provide reasonably accurate estimates on average and we show that, as sample sizes are increased with each sampling method, the precision generally increases, eventually leveling off where gains in estimate precision from adding additional samples would be marginal.
36

Crowd Compositions for Bias Detection and Mitigation in Predicting Recidivism

Mhatre, Sakshi Manish 30 September 2024 (has links)
This thesis explores an approach to predicting recidivism by leveraging crowdsourcing, contrasting traditional judicial discretion and algorithmic models. Instead of relying on judges or algorithms, participants predicted the likelihood of re-offending using the COMPAS dataset, which includes demographic and criminal record information. The study analyzed both quantitative and qualitative data to assess biases in human versus algorithmic predictions. Findings reveal that homogeneous crowds reflect the biases of their composition, leading to more pronounced gender and racial biases. In contrast, heterogeneous crowds, with equal and random distributions, present a more balanced view, though underlying biases still emerge. Both gender and racial biases influence how re-offending risk is perceived, significantly impacting risk evaluations. Specifically, crowds rated African American offenders as less likely to re-offend compared to COMPAS, which assigned them higher risk scores, while Caucasian and Hispanic offenders were perceived as more likely to re-offend by crowds. Gender differences also emerged, with males rated as less likely to re-offend and females as more likely. This study highlights crowdsourcing's potential to mitigate biases and provides insights into balancing consistency and fairness in risk assessments. / Master of Science / Within the criminal justice system, predicting whether someone will re-offend has typically depended on the judgment of judges and computerized systems. This thesis investigates another avenue for predicting re-offending by using crowdsourcing, which gathers input from a group of people. In this study, participants were asked to predict the likelihood of re-offending for several offenders using demographic and criminal record information from the publicly available COMPAS dataset. Participants provided scores, and some also explained their reasoning. Bias, defined as a systematic unfairness that leads to prejudiced outcomes, was a key focus. To understand bias, the study created different groups within the participant crowd based on age, gender, and race, and compared their predictions with COMPAS scores. The analysis revealed important insights into the biases present in both human and algorithmic predictions. A homogeneous crowd, is associated with minimal differences in ratings across genders and races, suggesting a consistent but potentially biased perspective. While a diverse crowd, leads to varied ratings without a clear trend, reflecting a broader range of viewpoints but also increased variability. This suggests that while a diverse crowd may help reduce bias, it can also result in less predictable assessments.
37

Harar : En ny företagskategori som växer utan att rekrytera

Rosén, Joel January 2018 (has links)
Purpose – the literature expresses a strong relation between sales and employee growth, where two categories are explained. Traditional companies that actively avoid growth and gazelles with a high growth degree. Footway however emphasize with a low employee number and a high growth degree, thereby introduce a new phenomenon that sympathize with both previously mentioned categories. This questioned the relation between sales and employee growth, where Footway instead use the scalability concepts of automation, crowdsourcing and outsourcing. This study aimed to view how these concepts could be used to reach the phenomenon of reaching sales growth without employee growth.     Method – the nature of this study was a qualitative case study of the e-commerce company Footway, according with high credibility, transferability and confirmability. The data was conducted through a total of 14 semi-structured interviews within two phases. First nine internally with the case company to view their process and then five externally with similar companies to see the phenomenon’s general acceptance. The study was limited to view e-commerce only, and the gathered data was analyzed through open-, axial-, and selective coding.    Result – the empirical part showed an acceptance of the phenomenon, where the internal approach and culture was viewed as vital. For the phenomenon some strategical and organizational conditions was explained to on an agile way underlay the scalable concepts to solve certain company specific problems. Here automation was viewed as good for simple and in some cases complicated problems, outsourcing good for complicated problems and crowdsourcing good for complex problems. Through identifying success factors with the conditions, automation, outsourcing and crowdsourcing the study could view how the phenomenon of having sales growth without employee growth could be reached. Theoretical implications – for the literature, this study showed a new company category assessing company growth that could fill the gap between traditional companies and gazelles. This study defined it as Hares since it similar to the traditional companies choose to remain small but also emphasize with gazelles when it come to a fast sales growth. That Hares have emerged is due to an increasing focus of scalability where there is an internal focus of control rather than production. Practical implications – the practical concern of this report show a number of approaches that could be considered to become Hares. Here the approach according to conditions, automation, outsourcing and crowdsourcing are described. / Syfte – Idag uttrycksen stark relation mellan omsättningstillväxt och tillväxt i antal anställda, samtidigt som två tillväxtkategorier är uttalade. Traditionella företag som undviker en stadig tillväxt och Gasellföretag med en hög tillväxttakt. Footway påvisar dock delar av båda nämnda tillväxtkategorier meden hög tillväxttakt i omsättning utan att rekrytera ett större antal. Detta ifrågasätter relationen mellan omsättningstillväxt och tillväxt i antal anställda, där Footway använder skalbarhetsmöjligheter i form av koncepten automatisering, crowdsourcing och outsourcing. Studien syftade därmed undersöka hur skalbarhetskoncepten kan användas för att nå ett fenomen av att växa i omsättning utan att ytterligare rekrytera. Metod - Studiens empiriska del utfördes genom en kvalitativ fallstudie av e-handelsföretaget Footway, med hänsyn till kvalitetshöjande åtgärder i form av Kredabilitet, Överförbarhet och Bekräftelsebarhet. Datainsamlingen skedde främst genom 14 semistrukturerade intervjuer i två faser. Först 9 intervjuer med fallstudieföretaget för att skapa en bild av hur fenomenet kunnat uppstå, men sedan även 5 externa intervjuer med liknande bolag för att acceptera studiens omfattning. Studien avgränsades till att följa e-handelsbolag och den insamlade datan analyserades slutligen via öppen-, axial- och selektiv kodning vilket sedan formade resultatets struktur. Resultat – Empirin påvisade en acceptans för fenomenet där det interna arbetssättet och kulturen ansågs vital. För att nå fenomenet påvisades därför vissa strategiska och organisationella förutsättningar för att på ett flexibelt sätt kunna användade skalbara koncepten för att lösa olika typer av företagsmässiga problem. Här ansågs automatisering kunna användas för att lösa enkla problem och till viss del komplicerade problem, crowdsourcing ansågs fördelaktigt vid komplexa problem och outsourcing var fördelaktigt vid komplicerade problem. Genom att identifiera framgångsfaktorer vid förutsättningarna och de skalbara koncepten kunde det sedan konstateras ett arbetssätt till att nå fenomenet att växa i omsättning utan att rekrytera. Teoretiska implikationer – Den här studien påvisade ytterligare en tillväxtkategori som ska fylla klyftan mellan traditionella företag och gasellföretag. Den här studien definierade den som Harar då den likt traditionella bolag väljer att vara små med ett lågt antal anställda, men likt gasellföretag har en hög omsättning med en väldigt snabb tillväxtgrad. Att harar först uppstått nu baseras på ett ökat fokus på skalbarhet där de framför ett mer kontrollerande och mindre producerande arbetssätt internt. Praktiska implikationer – Det praktiska bidraget från den här studien innefattar en rad förhållningssätt för att för att kunna bli harar. Här beskrivs harars förhållning till förutsättningar, automatisering, outsourcing och crowdsourcing.
38

Quality based approach for updating geographic authoritative datasets from crowdsourced GPS traces / Une approche basée sur la qualité pour mettre à jour les bases de données géographiques de référence à partir de traces GPS issues de la foule

Ivanovic, Stefan 19 January 2018 (has links)
Ces dernières années, le besoin de données géographiques de référence a significativement augmenté. Pour y répondre, il est nécessaire de mettre jour continuellement les données de référence existantes. Cette tâche est coûteuse tant financièrement que techniquement. Pour ce qui concerne les réseaux routiers, trois types de voies sont particulièrement complexes à mettre à jour en continu : les chemins piétonniers, les chemins agricoles et les pistes cyclables. Cette complexité est due à leur nature intermittente (elles disparaissent et réapparaissent régulièrement) et à l’hétérogénéité des terrains sur lesquels elles se situent (forêts, haute montagne, littoral, etc.).En parallèle, le volume de données GPS produites par crowdsourcing et disponibles librement augmente fortement. Le nombre de gens enregistrant leurs positions, notamment leurs traces GPS, est en augmentation, particulièrement dans le contexte d’activités sportives. Ces traces sont rendues accessibles sur les réseaux sociaux, les blogs ou les sites d’associations touristiques. Cependant, leur usage actuel est limité à des mesures et analyses simples telles que la durée totale d’une trace, la vitesse ou l’élévation moyenne, etc. Les raisons principales de ceci sont la forte variabilité de la précision planimétrique des points GPS ainsi que le manque de protocoles et de métadonnées (par ex. la précision du récepteur GPS).Le contexte de ce travail est l’utilisation de traces GPS de randonnées pédestres ou à vélo, collectées par des volontaires, pour détecter des mises à jours potentielles de chemins piétonniers, de voies agricoles et de pistes cyclables dans des données de référence. Une attention particulière est portée aux voies existantes mais absentes du référentiel. L’approche proposée se compose de trois étapes : La première consiste à évaluer et augmenter la qualité des traces GPS acquises par la communauté. Cette qualité a été augmentée en filtrant (1) les points extrêmes à l’aide d’un approche d’apprentissage automatique et (2) les points GPS qui résultent d’une activité humaine secondaire (en dehors de l’itinéraire principal). Les points restants sont ensuite évalués en termes de précision planimétrique par classification automatique. La seconde étape permet de détecter de potentielles mises à jour. Pour cela, nous proposons une solution d’appariement par distance tampon croissante. Cette distance est adaptée à la précision planimétrique des points GPS classifiés pour prendre en compte la forte hétérogénéité de la précision des traces GPS. Nous obtenons ainsi les parties des traces n’ayant pas été appariées au réseau de voies des données de référence. Ces parties sont alors considérées comme de potentielles voies manquantes dans les données de référence. Finalement nous proposons dans la troisième étape une méthode de décision multicritère visant à accepter ou rejeter ces mises à jour possibles. Cette méthode attribue un degré de confiance à chaque potentielle voie manquante. L’approche proposée dans ce travail a été évaluée sur un ensemble de trace GPS multi-sources acquises par crowdsourcing dans le massif des Vosges. Les voies manquantes dans les données de références IGN BDTOPO® ont été détectées avec succès et proposées comme mises à jour potentielles / Nowadays, the need for very up to date authoritative spatial data has significantly increased. Thus, to fulfill this need, a continuous update of authoritative spatial datasets is a necessity. This task has become highly demanding in both its technical and financial aspects. In terms of road network, there are three types of roads in particular which are particularly challenging for continuous update: footpath, tractor and bicycle road. They are challenging due to their intermittent nature (e.g. they appear and disappear very often) and various landscapes (e.g. forest, high mountains, seashore, etc.).Simultaneously, GPS data voluntarily collected by the crowd is widely available in a large quantity. The number of people recording GPS data, such as GPS traces, has been steadily increasing, especially during sport and spare time activities. The traces are made openly available and popularized on social networks, blogs, sport and touristic associations' websites. However, their current use is limited to very basic metric analysis like total time of a trace, average speed, average elevation, etc. The main reasons for that are a high variation of spatial quality from a point to a point composing a trace as well as lack of protocols and metadata (e.g. precision of GPS device used).The global context of our work is the use of GPS hiking and mountain bike traces collected by volunteers (VGI traces), to detect potential updates of footpaths, tractor and bicycle roads in authoritative datasets. Particular attention is paid on roads that exist in reality but are not represented in authoritative datasets (missing roads). The approach we propose consists of three phases. The first phase consists of evaluation and improvement of VGI traces quality. The quality of traces was improved by filtering outlying points (machine learning based approach) and points that are a result of secondary human behaviour (activities out of main itinerary). Remained points are then evaluated in terms of their accuracy by classifying into low or high accurate (accuracy) points using rule based machine learning classification. The second phase deals with detection of potential updates. For that purpose, a growing buffer data matching solution is proposed. The size of buffers is adapted to the results of GPS point’s accuracy classification in order to handle the huge variations in VGI traces accuracy. As a result, parts of traces unmatched to authoritative road network are obtained and considered as candidates for missing roads. Finally, in the third phase we propose a decision method where the “missing road” candidates should be accepted as updates or not. This decision method was made in multi-criteria process where potential missing roads are qualified according to their degree of confidence. The approach was tested on multi-sourced VGI GPS traces from Vosges area. Missing roads in IGN authoritative database BDTopo® were successfully detected and proposed as potential updates
39

Uso de aprendizado supervisionado para análise de confiabilidade de dados de crowdsourcing sobre posicionamento de ônibus / Use of supervised learning to analyze reliability of crowdsourcing bus location data

Diego Vieira Neves 16 October 2018 (has links)
Pesquisadores de diversas áreas estão estudando o desenvolvimento do que chamamos de Cidades Inteligentes: a integração de Sistemas de Informação e Comunicação com tecnologias de Internet das Coisas para utilizar os recursos de uma cidade de forma mais inteligente. Um dos principais objetivos das cidades inteligentes é solucionar os problemas relacionados à mobilidade urbana, que afeta significativamente a qualidade de vida da população. Um problema observável nas grandes metrópoles é a qualidade dos seus serviços de transporte público, especialmente quando nos referimos ao modal ônibus. A falta de informações confiáveis, associada à baixa qualidade dos serviços de transporte coletivo disponibilizados, leva o usuário a não optar pela utilização desse recurso, o que agrava problemas urbanos sociais e ambientais. Para reverter esse cenário, as iniciativas em cidades inteligentes propõem o uso de Sistemas de Transportes Inteligentes que podem utilizar diversos sensores e equipamentos para coletar diferente tipos de dados referente aos serviços de transporte público. A captura e processamento desses dados permite, em tese, permite que o cidadão possa utilizar o transporte público com confiabilidade e previsibilidade. Contudo, esses dados podem ser insuficientes ou de baixa qualidade para uso em tempo real. Neste trabalho de mestrado investigamos o uso de dados obtidos via colaboração coletiva (crowdsourcing) como complemento dessas informações. Para mitigar as incertezas introduzidas pelo uso de crowdsourcing, este trabalho propõe a utilização de técnicas de aprendizado de máquina para criação de métodos de análise de confiabilidade dos dados coletados para o sistema de transporte público (por ônibus) do município de São Paulo. Para mitigar as incertezas introduzidas pelo uso de crowdsourcing, este trabalho propõe e compara o uso de diferentes técnicas de aprendizado de máquina para criar um modelo de análise de confiabilidade para os dados coletados, especializado no sistema de transporte coletivo (por ônibus) da cidade de São Paulo. Os resultados demostram, que os algoritmos de Árvore de Decisão e Gaussian Naive Bayes foram mais eficazes e eficientes na realização da atividade de classificação dos dados obtidos com crowdsourcing. O algoritmo de Árvore de Decisão, apresentou os melhores indicadores de desempenho em termos de acurácia (94,34\\%) e F-score (99\\%), e o segundo melhor tempo de execução (0,023074 segundo). Já o algoritmo de Gaussian Naive Bayes foi o mais eficiente, com tempo médio de execução de 0,003182 segundos e foi o quarto melhor resultado em termos de acurácia (98,18\\%) e F-score (97\\%) / Researchers from different areas are studying the development of what we call Smart Cities: integrating Information and Communication Systems with Internet of Things to use city resources more intelligently. A major objective of smart cities is to solve problems related to urban mobility that significantly affects the quality of life of the population. An observable problem in big cities is the quality of their public transport services, specifically when we refer to the bus modal. The lack of reliable information, associated with the poor quality of public transport services, encouraging the user to look for alternatives, which aggravates urban social and environmental problems. To reverse this scenario, smart cities initiatives propose the use Intelligent Transport Systems, that can use various sensors and equipment to collect several types of data on public transport services. The capture and processing of these data allows, in theory, citizens to use the public transport with reliability and predictability. However, this data can be insufficient or of poor quality for usage in real-time. This master\'s work investigates the use of crowdsourcing data as a complement to this information. To mitigate the uncertainties introduced by the use of crowdsourcing, this research proposes and compares the use of different machine learning techniques to create a reliability analysis model for the data collected that is specialized for use on public transport system (bus) in the city of São Paulo. The results show that the Decision Tree and Gaussian Naive Bayes algorithms are more effective and efficient in performing the classification activity of the data obtained with crowdsourcing. The Decision Tree algorithm presented the best performance indicators in terms of accuracy (94.34\\%) and F-score (99\\%), and the second best execution time (0.023074 seconds). The Gaussian Naive Bayes algorithm was the most efficient, with an average execution time of 0.003182 seconds and was the forth best result in terms of accuracy (98.18\\%) and F-score (97\\%)
40

Uso de aprendizado supervisionado para análise de confiabilidade de dados de crowdsourcing sobre posicionamento de ônibus / Use of supervised learning to analyze reliability of crowdsourcing bus location data

Neves, Diego Vieira 16 October 2018 (has links)
Pesquisadores de diversas áreas estão estudando o desenvolvimento do que chamamos de Cidades Inteligentes: a integração de Sistemas de Informação e Comunicação com tecnologias de Internet das Coisas para utilizar os recursos de uma cidade de forma mais inteligente. Um dos principais objetivos das cidades inteligentes é solucionar os problemas relacionados à mobilidade urbana, que afeta significativamente a qualidade de vida da população. Um problema observável nas grandes metrópoles é a qualidade dos seus serviços de transporte público, especialmente quando nos referimos ao modal ônibus. A falta de informações confiáveis, associada à baixa qualidade dos serviços de transporte coletivo disponibilizados, leva o usuário a não optar pela utilização desse recurso, o que agrava problemas urbanos sociais e ambientais. Para reverter esse cenário, as iniciativas em cidades inteligentes propõem o uso de Sistemas de Transportes Inteligentes que podem utilizar diversos sensores e equipamentos para coletar diferente tipos de dados referente aos serviços de transporte público. A captura e processamento desses dados permite, em tese, permite que o cidadão possa utilizar o transporte público com confiabilidade e previsibilidade. Contudo, esses dados podem ser insuficientes ou de baixa qualidade para uso em tempo real. Neste trabalho de mestrado investigamos o uso de dados obtidos via colaboração coletiva (crowdsourcing) como complemento dessas informações. Para mitigar as incertezas introduzidas pelo uso de crowdsourcing, este trabalho propõe a utilização de técnicas de aprendizado de máquina para criação de métodos de análise de confiabilidade dos dados coletados para o sistema de transporte público (por ônibus) do município de São Paulo. Para mitigar as incertezas introduzidas pelo uso de crowdsourcing, este trabalho propõe e compara o uso de diferentes técnicas de aprendizado de máquina para criar um modelo de análise de confiabilidade para os dados coletados, especializado no sistema de transporte coletivo (por ônibus) da cidade de São Paulo. Os resultados demostram, que os algoritmos de Árvore de Decisão e Gaussian Naive Bayes foram mais eficazes e eficientes na realização da atividade de classificação dos dados obtidos com crowdsourcing. O algoritmo de Árvore de Decisão, apresentou os melhores indicadores de desempenho em termos de acurácia (94,34\\%) e F-score (99\\%), e o segundo melhor tempo de execução (0,023074 segundo). Já o algoritmo de Gaussian Naive Bayes foi o mais eficiente, com tempo médio de execução de 0,003182 segundos e foi o quarto melhor resultado em termos de acurácia (98,18\\%) e F-score (97\\%) / Researchers from different areas are studying the development of what we call Smart Cities: integrating Information and Communication Systems with Internet of Things to use city resources more intelligently. A major objective of smart cities is to solve problems related to urban mobility that significantly affects the quality of life of the population. An observable problem in big cities is the quality of their public transport services, specifically when we refer to the bus modal. The lack of reliable information, associated with the poor quality of public transport services, encouraging the user to look for alternatives, which aggravates urban social and environmental problems. To reverse this scenario, smart cities initiatives propose the use Intelligent Transport Systems, that can use various sensors and equipment to collect several types of data on public transport services. The capture and processing of these data allows, in theory, citizens to use the public transport with reliability and predictability. However, this data can be insufficient or of poor quality for usage in real-time. This master\'s work investigates the use of crowdsourcing data as a complement to this information. To mitigate the uncertainties introduced by the use of crowdsourcing, this research proposes and compares the use of different machine learning techniques to create a reliability analysis model for the data collected that is specialized for use on public transport system (bus) in the city of São Paulo. The results show that the Decision Tree and Gaussian Naive Bayes algorithms are more effective and efficient in performing the classification activity of the data obtained with crowdsourcing. The Decision Tree algorithm presented the best performance indicators in terms of accuracy (94.34\\%) and F-score (99\\%), and the second best execution time (0.023074 seconds). The Gaussian Naive Bayes algorithm was the most efficient, with an average execution time of 0.003182 seconds and was the forth best result in terms of accuracy (98.18\\%) and F-score (97\\%)

Page generated in 0.0439 seconds