41 |
Prediction of the future trend of e-commerce / Prognostisering av trender inom e-handel i SverigeEngström, Freja, Nilsson Rojas, Disa January 2021 (has links)
In recent years more companies have invested in electronic commerce as a result of more customers using the internet as a tool for shopping. However, the basics of marketing still apply to online stores, and thus companies need to conduct market analyses of customers and the online market to be able to successfully target customers online. In this report, we propose the use of machine learning, a tool that has received a lot of attention and positive affirmation for the ability to tackle a range of problems, to predict future trends of electronic commerce in Sweden. More precise, to predict the future share of users of electronic commerce in general and for certain demographics. We will build three different models, polynomial regression, SVR and ARIMA. The findings from the constructed forecasts were that there are differences between different demographics of customers and between groups within a certain demographic. Furthermore, the result showed that the forecast was more accurate when modelling a certain demographic than the entire population. Companies can thereby possibly use the models to predict the behaviour of certain smaller segments of the market and use that in their marketing to attract these customers. / Pa senare år har många företag investerat i elektronisk handel, även kallat e-handel, vilket är ett resultat av att individer i samhället i större utsträckning använder internet som ett redskap. Grunderna för marknadsföring gäller fortfarande för webbaserade butiker, och därmed behöver företag genomföra marknadsanalyser över potentiella kunder och internet-marknaden för att kunna lansera starka marknadsföringskampanjer. I denna rapport föreslår vi användning av maskininlärning, ett verktyg som har fått mycket uppmärksamhet på senaste tiden för dess förmåga att hantera olika problem kring data och för att prognostisera framtida trender för e-handel i Sverige. Mer exakt kommer andelen användare av e-handel i framtiden prognostiseras, både generellt och för enskilda demografier. Vi kommer att implementera tre olika modeller, polynomisk regression, SVR och ARIMA. Resultaten från de konstruerade prognoserna visar att det finns tydliga skillnader mellan olika demografier av kunder och mellan grupper inom en viss demografi. Dessutom visade resultaten att prognoserna var mer exakta vid modellering av en viss demografi än över hela befolkningen. Företag kan därmed möjligtvis använda modellerna för att förutsäga beteendet hos vissa mindre segment av marknaden.
|
42 |
Assessing and predicting stream-flow at different time scales in the context of climate change: Case of the upper Senegal River basinDiop, Lamine 30 October 2017 (has links)
No description available.
|
43 |
Stock Price Prediction Using SVR with Stock Price, Macroeconomic and Microeconomic DataEce Korkmaz, Idil, Sandberg, Simon January 2021 (has links)
A wide variety of machine learning algorithms havebeen used to predict stock prices. The aim of this project hasbeen to implement a machine learning algorithm using supportvector regression to predict the stock price of two well knowncompanies—Apple and Microsoft—one day into the future usingthe current day’s stock price, macroeconomic data and microeconomicdata and to compare the prediction error with the differentdata inputs. The results show that the addition of macroeconomicand microeconomic data did not improve the prediction error.This suggests that the macroeconomic and microeconomic dataused in this project does not contain additional information aboutfuture stock prices. The results also show that support vectorregression performs worse than linear regression, however inthis case no definite conclusion can be drawn since only onekernel and a handful of parameter values were considered whentraining and testing the algorithm. However, these results mightalso suggest that using the current day’s data is not sufficient tobe able to predict the non-linear relationships. / Ett flertal maskininlärnings-algoritmer har använts för att förutspå aktiepriser. Målet med det här projektet har varit att implementera en maskininlärnings-algoritm som använder sig av support vector regression för att förutspå aktiepriset av två välkända företag—Apple och Microsoft—en dag in i framtiden genom att använda dagens aktiepris, makroekonomisk data och mikroekonomisk data samt att jämföra prediktionsfelet med dem olika indata. Resultaten indikerar att additionen av makroekonomisk och mikroekonomisk data inte förbättrade prediktionsfelet. Detta antyder att den makroekonomiska och mikroekonomiska data som användes i projektet inte innehåller någon ytterliggare information om framtida aktiepriser. Resultaten indikerade också att linjär regression presterar bättre än support vector regression, men i detta fallet kan ingen definitiv slutsats dras eftersom endast en kernel och ett par parameter-värden användes för att träna och testa algoritmen. Däremot kan dessa resultat också antyda att a inte är tillräcklig för att kunna förutspå dem icke-linjära förhållandena. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
44 |
Cost Modeling Based on Support Vector Regression for Complex Products During the Early Design PhasesHuang, Guorong 04 September 2007 (has links)
The purpose of a cost model is to provide designers and decision-makers with accurate cost information to assess and compare multiple alternatives for obtaining the optimal solution and controlling cost. The cost models developed in the design phases are the most important and the most difficult to develop. Therefore it is necessary to identify appropriate cost drivers and employ appropriate modeling techniques to accurately estimate cost for directing designers. The objective of this study is to provide higher predictive accuracy of cost estimation for directing designer in the early design phases of complex products.
After a generic cost estimation model is presented and the existing methods for identification of cost drivers and different cost modeling techniques are reviewed, the dissertation first proposes new methodologies to identify and select the cost drivers: Causal-Associated (CA) method and Tabu-Stepwise selection approach. The CA method increases understanding and explanation of the cost analysis and helps avoid missing some cost drivers. The Tabu-Stepwise selection approach is used to select significant cost drivers and eliminate irrelevant cost drivers under nonlinear situation. A case study is created to illustrate their procedure and benefits. The test data show they can improve predictive capacity.
Second, this dissertation introduces Tabu-SVR, a nonparametric approach based on support vector regression (SVR) for cost estimation for complex products in the early design phases. Tabu-SVR determines the parameters of SVR via a tabu search algorithm improved by the author. For verification and validation of performance on Tabu-SVR, the five common basic cost characteristics are summarized: accumulation, linear function, power function, step function, and exponential function. Based on these five characteristics and the Flight Optimization Systems (FLOPS) cost module (engine part), seven test data sets are generated to test Tabu-SVR and are used to compare it with other traditional methods (parametric modeling, neural networking and case-based reasoning). The results show Tabu-SVR significantly improves the performance compared to SVR based on empirical study. The radial basis function (RBF) kernel, which is much more robust, often has better performance over linear and polynomial kernel functions. Compared with other traditional cost estimating approaches, Tabu-SVR with RBF kernel function has strong predicable capability and is able to capture nonlinearities and discontinuities along with interactions among cost drivers.
The third part of this dissertation focuses on semiparametric cost estimating approaches. Extensive studies are conducted on three semiparametric algorithms based on SVR. Three data sets are produced by combining the aforementioned five common basic cost characteristics. The experiments show Semiparametric Algorithm 1 is the best approach under most situations. It has better cost estimating accuracy over the pure nonparametric approach and the pure parametric approach. The model complexity influences the estimating accuracy for Semiparametric Algorithm 2 and Algorithm 3. If the inexact function forms are used as the parametric component of semiparametric algorithm, they often do not bring any improvement of cost estimating accuracy over the pure nonparametric approach and even worsen the performance.
The last part of this dissertation introduces two existing methods for sensitivity analysis to improve the explanation capability of the cost estimating approach based on SVR. These methods are able to show the contribution of cost drivers, to determine the effect of cost drivers, to establish the profiles of cost drivers, and to conduct monotonic analysis. They finally can help designers make trade-off study and answer “what-i” questions. / Ph. D.
|
45 |
Integrative Modeling and Analysis of High-throughput Biological DataChen, Li 21 January 2011 (has links)
Computational biology is an interdisciplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. In this dissertation research work, we propose novel methods to integrate, model and analyze multiple biological data, including microarray gene expression data, protein-DNA interaction data and protein-protein interaction data. These methods will help improve our understanding of biological systems.
First, we propose a knowledge-guided multi-scale independent component analysis (ICA) method for biomarker identification on time course microarray data. Guided by a knowledge gene pool related to a specific disease under study, the method can determine disease relevant biological components from ICA modes and then identify biologically meaningful markers related to the specific disease. We have applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification.
Second, we propose a novel method for transcriptional regulatory network identification by integrating gene expression data and protein-DNA binding data. The approach is built upon a multi-level analysis strategy designed for suppressing false positive predictions. With this strategy, a regulatory module becomes increasingly significant as more relevant gene sets are formed at finer levels. At each level, a two-stage support vector regression (SVR) method is utilized to reduce false positive predictions by integrating binding motif information and gene expression data; a significance analysis procedure is followed to assess the significance of each regulatory module. The resulting performance on simulation data and yeast cell cycle data shows that the multi-level SVR approach outperforms other existing methods in the identification of both regulators and their target genes. We have further applied the proposed method to breast cancer cell line data to identify condition-specific regulatory modules associated with estrogen treatment. Experimental results show that our method can identify biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer.
Third, we propose a bootstrapping Markov Random Filed (MRF)-based method for subnetwork identification on microarray data by incorporating protein-protein interaction data. Methodologically, an MRF-based network score is first derived by considering the dependency among genes to increase the chance of selecting hub genes. A modified simulated annealing search algorithm is then utilized to find the optimal/suboptimal subnetworks with maximal network score. A bootstrapping scheme is finally implemented to generate confident subnetworks. Experimentally, we have compared the proposed method with other existing methods, and the resulting performance on simulation data shows that the bootstrapping MRF-based method outperforms other methods in identifying ground truth subnetwork and hub genes. We have then applied our method to breast cancer data to identify significant subnetworks associated with drug resistance. The identified subnetworks not only show good reproducibility across different data sets, but indicate several pathways and biological functions potentially associated with the development of breast cancer and drug resistance. In addition, we propose to develop network-constrained support vector machines (SVM) for cancer classification and prediction, by taking into account the network structure to construct classification hyperplanes. The simulation study demonstrates the effectiveness of our proposed method. The study on the real microarray data sets shows that our network-constrained SVM, together with the bootstrapping MRF-based subnetwork identification approach, can achieve better classification performance compared with conventional biomarker selection approaches and SVMs.
We believe that the research presented in this dissertation not only provides novel and effective methods to model and analyze different types of biological data, the extensive experiments on several real microarray data sets and results also show the potential to improve the understanding of biological mechanisms related to cancers by generating novel hypotheses for further study. / Ph. D.
|
46 |
L'interaction 3D adaptative : une approche basée sur les méthodes de traitement de données multi-capteursBoudoin, Pierre 06 October 2010 (has links) (PDF)
La réalité virtuelle est un domaine touchant à plusieurs disciplines. Par le biais de l'interaction 3D l'Homme peut accomplir des tâches dans un environnement virtuel en utilisant des techniques d'interaction 3D. Ces techniques sont souvent mono-tâches et s'appuient sur l'utilisation de matériel de réalité virtuelle bien spécifique. Le passage d'une tâche de l'interaction 3D à une autre est le plus souvent à la charge de l'utilisateur, ou bien du programmeur. Cependant de nombreux problèmes sont présents dans ces systèmes, dits de réalité virtuelle. En effet, des problèmes matériels le plus souvent dû aux technologies utilisées sont présents et peuvent induire un comportement erratique du système. De plus, il peut arriver que les techniques d'interaction 3D ne soient pas adaptées à l'application de réalité virtuelle, ou que celles-ci soient trop complexes à utiliser pour le novice. Tous ces problèmes nuisent à l'immersion de l'être humain dans l'environnement virtuel ainsi qu'aux performances de l'interaction 3D et donc à l'accomplissement de la tâche dans l'application de réalité virtuelle. L'objectif de ce travail est de proposer un système d'interaction 3D adaptative. Par interaction 3D adaptative, on cherche à définir une interaction 3D qui soit continue tant au niveau des données qu'au basculement d'une tâche à l'autre. Nous avons donc modélisé et conçu un ensemble de composants pour accomplir cet objectif. Nous avons modélisé une technique d'interaction 3D pouvant être utilisé de manière continue même lors du basculement d'une tâche. Nous avons également conçu un système qui permet d'automatiser le basculement d'une tâche de l'interaction 3D vers une autre en estimant la tâche que souhaite accomplir l'utilisateur. Enfin, un dernier composant a pour rôle d'améliorer la précision et de garantir la continuité du tracking.
|
47 |
Técnicas de transferência de aprendizagem aplicadas a modelos QSAR para regressão / Transfer learning techniques applied to QSAR models for regressionSimões, Rodolfo da Silva 10 April 2018 (has links)
Para desenvolver um novo medicamento, pesquisadores devem analisar os alvos biológicos de uma dada doença, descobrir e desenvolver candidatos a fármacos para este alvo biológico, realizando em paralelo, testes em laboratório para validar a eficiência e os efeitos colaterais da substância química. O estudo quantitativo da relação estrutura-atividade (QSAR) envolve a construção de modelos de regressão que relacionam um conjunto de descritores de um composto químico e a sua atividade biológica com relação a um ou mais alvos no organismo. Os conjuntos de dados manipulados pelos pesquisadores para análise QSAR são caracterizados geralmente por um número pequeno de instâncias e isso torna mais complexa a construção de modelos preditivos. Nesse contexto, a transferência de conhecimento utilizando informações de outros modelos QSAR\'s com mais dados disponíveis para o mesmo alvo biológico seria desejável, diminuindo o esforço e o custo do processo para gerar novos modelos de descritores de compostos químicos. Este trabalho apresenta uma abordagem de transferência de aprendizagem indutiva (por parâmetros), tal proposta baseia-se em uma variação do método de Regressão por Vetores Suporte adaptado para transferência de aprendizagem, a qual é alcançada ao aproximar os modelos gerados separadamente para cada tarefa em questão. Considera-se também um método de transferência de aprendizagem por instâncias, denominado de TrAdaBoost. Resultados experimentais mostram que as abordagens de transferência de aprendizagem apresentam bom desempenho quando aplicadas a conjuntos de dados de benchmark e a conjuntos de dados químicos / To develop a new medicament, researches must analyze the biological targets of a given disease, discover and develop drug candidates for this biological target, performing in parallel, biological tests in laboratory to validate the effectiveness and side effects of the chemical substance. The quantitative study of structure-activity relationship (QSAR) involves building regression models that relate a set of descriptors of a chemical compound and its biological activity with respect to one or more targets in the organism. Datasets manipulated by researchers to QSAR analysis are generally characterized by a small number of instances and this makes it more complex to build predictive models. In this context, the transfer of knowledge using information other\'s QSAR models with more data available to the same biological target would be desirable, nince its reduces the effort and cost to generate models of chemical descriptors. This work presents an inductive learning transfer approach (by parameters), such proposal is based on a variation of the Vector Regression method Adapted support for learning transfer, which is achieved by approaching the separately generated models for each task. It is also considered a method of learning transfer by instances, called TrAdaBoost. Experimental results show that learning transfer approaches perform well when applied to some datasets of benchmark and dataset chemical
|
48 |
Técnicas de transferência de aprendizagem aplicadas a modelos QSAR para regressão / Transfer learning techniques applied to QSAR models for regressionRodolfo da Silva Simões 10 April 2018 (has links)
Para desenvolver um novo medicamento, pesquisadores devem analisar os alvos biológicos de uma dada doença, descobrir e desenvolver candidatos a fármacos para este alvo biológico, realizando em paralelo, testes em laboratório para validar a eficiência e os efeitos colaterais da substância química. O estudo quantitativo da relação estrutura-atividade (QSAR) envolve a construção de modelos de regressão que relacionam um conjunto de descritores de um composto químico e a sua atividade biológica com relação a um ou mais alvos no organismo. Os conjuntos de dados manipulados pelos pesquisadores para análise QSAR são caracterizados geralmente por um número pequeno de instâncias e isso torna mais complexa a construção de modelos preditivos. Nesse contexto, a transferência de conhecimento utilizando informações de outros modelos QSAR\'s com mais dados disponíveis para o mesmo alvo biológico seria desejável, diminuindo o esforço e o custo do processo para gerar novos modelos de descritores de compostos químicos. Este trabalho apresenta uma abordagem de transferência de aprendizagem indutiva (por parâmetros), tal proposta baseia-se em uma variação do método de Regressão por Vetores Suporte adaptado para transferência de aprendizagem, a qual é alcançada ao aproximar os modelos gerados separadamente para cada tarefa em questão. Considera-se também um método de transferência de aprendizagem por instâncias, denominado de TrAdaBoost. Resultados experimentais mostram que as abordagens de transferência de aprendizagem apresentam bom desempenho quando aplicadas a conjuntos de dados de benchmark e a conjuntos de dados químicos / To develop a new medicament, researches must analyze the biological targets of a given disease, discover and develop drug candidates for this biological target, performing in parallel, biological tests in laboratory to validate the effectiveness and side effects of the chemical substance. The quantitative study of structure-activity relationship (QSAR) involves building regression models that relate a set of descriptors of a chemical compound and its biological activity with respect to one or more targets in the organism. Datasets manipulated by researchers to QSAR analysis are generally characterized by a small number of instances and this makes it more complex to build predictive models. In this context, the transfer of knowledge using information other\'s QSAR models with more data available to the same biological target would be desirable, nince its reduces the effort and cost to generate models of chemical descriptors. This work presents an inductive learning transfer approach (by parameters), such proposal is based on a variation of the Vector Regression method Adapted support for learning transfer, which is achieved by approaching the separately generated models for each task. It is also considered a method of learning transfer by instances, called TrAdaBoost. Experimental results show that learning transfer approaches perform well when applied to some datasets of benchmark and dataset chemical
|
49 |
Development of an Innovative System for the Reconstruction of New Generation Satellite ImagesLORENZI, Luca 29 November 2012 (has links) (PDF)
Les satellites de télédétection sont devenus incontournables pour la société civile. En effet, les images satellites ont été exploitées avec succès pour traiter plusieurs applications, notamment la surveillance de l'environnement et de la prévention des catastrophes naturelles. Dans les dernières années, l'augmentation de la disponibilité de très haute résolution spatiale (THR) d'images de télédétection abouti à de nouvelles applications potentiellement pertinentes liées au suivi d'utilisation des sols et à la gestion environnementale. Cependant, les capteurs optiques, en raison du fait qu'ils acquièrent directement la lumière réfléchie par le soleil, ils peuvent souffrir de la présence de nuages dans le ciel et / ou d'ombres sur la terre. Il s'agit du problème des données manquantes, qui induit un problème important et crucial, en particulier dans le cas des images THR, où l'augmentation des détails géométriques induit une grande perte d'informations. Dans cette thèse, de nouvelles méthodologies de détection et de reconstruction de la région contenant des données manquantes dans les images THR sont proposées et appliquées sur les zones contaminées par la présence de nuages et / ou d'ombres. En particulier, les contributions méthodologiques proposées comprennent: i) une stratégie multirésolution d'inpainting visant à reconstruire les images contaminées par des nuages ; ii) une nouvelle combinaison d'information radiométrique et des informations de position spatiale dans deux noyaux spécifiques pour effectuer une meilleure reconstitution des régions contaminés par les nuages en adoptant une régression par méthode a vecteurs supports (RMVS) ; iii) l'exploitation de la théorie de l'échantillonnage compressé avec trois stratégies différentes (orthogonal matching pursuit, basis pursuit et une solution d'échantillonnage compressé, basé sur un algorithme génétique) pour la reconstruction d'images contaminés par des nuages; iv) une chaîne de traitement complète qui utilise une méthode à vecteurs de supports (SVM) pour la classification et la détection des zones d'ombre, puis une régression linéaire pour la reconstruction de ces zones, et enfin v) plusieurs critères d'évaluation promptes à évaluer la performance de reconstruction des zones d'ombre. Toutes ces méthodes ont été spécialement développées pour fonctionner avec des images très haute résolution. Les résultats expérimentaux menés sur des données réelles sont présentés afin de montrer et de confirmer la validité de toutes les méthodes proposées. Ils suggèrent que, malgré la complexité des problèmes, il est possible de récupérer de façon acceptable les zones manquantes masquées par les nuages ou rendues erronées les ombres.
|
50 |
Predicting The Effect Of Hydrophobicity Surface On Binding Affinity Of Pcp-like Compounds Using Machine Learning MethodsYoldas, Mine 01 April 2011 (has links) (PDF)
This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which affects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained from SOM and k-means clustering
are used in order to predict binding affinity of molecules individually. Support vector regression and partial least squares regression are used for prediction.
|
Page generated in 0.1567 seconds