• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 34
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 60
  • 60
  • 60
  • 29
  • 26
  • 18
  • 18
  • 18
  • 16
  • 16
  • 14
  • 13
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

應用文字探勘分析網路團購商品群集之研究 -以美食類商品為例 / The study of analyzing group-buying goods clusters by using text mining – exemplified by the group-buying foods

趙婉婷 Unknown Date (has links)
網路團購消費模式掀起一陣風潮,隨著網路團購市場接受度提高,現今以團購方式進行購物的消費模式不斷增加,團購商品品項也日益繁多。為了使網路團購消費者更容易找到感興趣的團購商品,本研究將針對團購商品進行群集分析。 本研究以國內知名團購網站「愛合購」為例,以甜點蛋糕分類下的熱門美食團購商品為主,依商品名稱找尋該商品的顧客團購網誌文章納入資料庫中。本研究從熱門度前1000項的產品中找到268項產品擁有顧客團購網誌586篇,透過文字探勘技術從中擷取產品特徵相關資訊,並以「k最近鄰居法」為基礎建置kNN分群器,以進行群集分析。本研究依不同的k值以及分群門檻值進行分群,並對大群集進行階段式分群,單項群集進行質心合併,以尋求較佳之分群結果。 研究結果顯示,268項團購商品經過kNN分群器進行四個階段的群集分析後可獲得28個群集,群內相似度從未分群時的0.029834提升至0.177428。在經過第一階段的分群後,可將商品分為3個主要大群集,即「麵包類」、「蛋糕類」以及「其他口感類」。在進行完四個階段的分群後,「麵包類」可分為2種類型的群集,即『麵包類產品』以及『擁有麵包特質的產品』,而「蛋糕類」則是可依口味區分為不同的蛋糕群集。產品重要特徵詞彙不像一般文章的關鍵字詞會重複出現於文章中,因此在特徵詞彙過濾時應避免刪減過多的產品特徵詞彙。群集特性可由詞彙權重前20%之詞彙依人工過濾及商品出現頻率挑選出產品特徵代表詞來做描繪。研究所獲得之分群結果除了提供團購消費者選擇產品時參考外,也可幫助團購網站業者規劃更適切的行銷活動。本研究亦提出一些未來研究方向。 / Group-buying is prevailing, the items of merchandise diverse recently. In order to let consumer find the commodities they are interested in, the research focus on the cluster analysis about group-buying products and clusters products by the features of them. We catch the blogs of products posted by customers, via text mining to retrieve the features of products, and then establish the kNN clustering device to cluster them. This research sets different threshold values to test, and multiply clusters big groups, and merges small groups by centroid, we expect to obtain the best quality cluster. From the results, 268 items of group-buying foods can be divided into 28 clusters, and the mean of Intra-Similarity also can be improved. The 28 clusters can be categorized to three main clusters:Bread, Cake, and Other mouthfeel foods. We can define and name each cluster by catch the top twenty percent of the keywords in each cluster. The results of this paper could help buyers find similar commodities which they like, and also help sellers make the great marketing activity plan.
22

Modelling Bitcell Behaviour

Sebastian, Maria Treesa January 2020 (has links)
With advancements in technology, the dimensions of transistors are scaling down. It leads to shrinkage in the size of memory bitcells, increasing its sensitivity to process variations introduced during manufacturing. Failure of a single bitcell can cause the failure of an entire memory; hence careful statistical analysis is essential in estimating the highest reliable performance of the bitcell before using them in memory design. With high repetitiveness of bitcell, the traditional method of Monte Carlo simulation would require along time for accurate estimation of rare failure events. A more practical approach is importance sampling where more samples are collected from the failure region. Even though importance sampling is much faster than Monte Carlo simulations, it is still fairly time-consuming as it demands an iterative search making it impractical for large simulation sets. This thesis proposes two machine learning models that can be used in estimating the performance of a bitcell. The first model predicts the time taken by the bitcell for read or write operation. The second model predicts the minimum voltage required in maintaining the bitcell stability. The models were trained using the K-nearest neighbors algorithm and Gaussian process regression. Three sparse approximations were implemented in the time prediction model as a bigger dataset was available. The obtained results show that the models trained using Gaussian process regression were able to provide promising results.
23

Swedish Stock and Index Price Prediction Using Machine Learning

Wik, Henrik January 2023 (has links)
Machine learning is an area of computer science that only grows as time goes on, and there are applications in areas such as finance, biology, and computer vision. Some common applications are stock price prediction, data analysis of DNA expressions, and optical character recognition. This thesis uses machine learning techniques to predict prices for different stocks and indices on the Swedish stock market. These techniques are then compared to see which performs best and why. To accomplish this, we used some of the most popular models with sets of historical stock and index data. Our best-performing models are linear regression and neural networks, this is because they are the best at handling the big spikes in price action that occur in certain cases. However, all models are affected by overfitting, indicating that feature selection and hyperparameter optimization could be improved.
24

Assessing Machine Learning Algorithms to Develop Station-based Forecasting Models for Public Transport : Case Study of Bus Network in Stockholm

Movaghar, Mahsa January 2022 (has links)
Public transport is essential for both residents and city planners because of its environmentally and economically beneficial characteristics. During the past decade climatechange, coupled with fuel and energy crises have attracted significant attention toward public transportation. Increasing the demand for public transport on the one hand and its complexity on the other hand have made the optimum network design quite challenging for city planners. The ridership is affected by numerous variables and features like space and time. These fluctuations, coupled with inherent uncertaintiesdue to different travel behaviors, make this procedure challenging. Any demand and supply mismatching can result in great user dissatisfaction and waste of energy on the horizon. During the past years, due to recent technologies in recording and storing data and advances in data analysis techniques, finding patterns, and predicting ridership based on historical data have improved significantly. This study aims to develop forecasting models by regressing boardings toward population, time of day, month, and station. Using the available boarding dataset for blue bus line number 4 in Stockholm, Sweden, seven different machine learning algorithms were assessed for prediction: Multiple Linear Regression, Decision Tree, Random Forest, Bayesian Ridge Regression, Neural Networks, Support Vector Machines, K-Nearest Neighbors. The models were trained and tested on the dataset from 2012 to 2019, before the start of the pandemic. The best model, KNN, with an average R-squared of 0.65 in 10-fold cross-validation was accepted as the best model. This model is then used to predict reduced ridership during the pandemic in 2020 and 2021. The results showed a reduction of 48.93% in 2020 and 82.24% in 2021 for the studied bus line.
25

Comparison of Recommendation Systems for Auto-scaling in the Cloud Environment

Boyapati, Sai Nikhil January 2023 (has links)
Background: Cloud computing’s rapid growth has highlighted the need for efficientresource allocation. While cloud platforms offer scalability and cost-effectiveness for a variety of applications, managing resources to match dynamic workloads remains a challenge. Auto-scaling, the dynamic allocation of resources in response to real-time demand and performance metrics, has emerged as a solution. Traditional rule-based methods struggle with the increasing complexity of cloud applications. Machine Learning models offer promising accuracy by learning from performance metrics and adapting resource allocations accordingly.  Objectives: This thesis addresses the topic of cloud environments auto-scaling recommendations emphasizing the integration of Machine Learning models and significant application metrics. Its primary objectives are determining the critical metrics for accurate recommendations and evaluating the best recommendation techniques for auto-scaling. Methods: The study initially identifies the crucial metrics—like CPU usage and memory consumption that have a substantial impact on auto-scaling selections through thorough experimentation and analysis. Machine Learning(ML) techniques are selected based on literature review, and then further evaluated through thorough experimentation and analysis. These findings establish a foundation for the subsequent evaluation of ML techniques for auto-scaling recommendations. Results: The performance of Random Forests (RF), K-Nearest Neighbors (KNN), and Support Vector Machines (SVM) are investigated in this research. The results show that RF have higher accuracy, precision, and recall which is consistent with the significance of the metrics which are identified earlier. Conclusions: This thesis enhances the understanding of auto-scaling recommendations by combining the findings from metric importance and recommendation technique performance. The findings show the complex interactions between metrics and recommendation methods, establishing the way for the development of adaptive auto-scaling systems that improve resource efficiency and application functionality.
26

Klassificering av refuger baserat på spatiala vektorpolygoner i vägnät : En fallstudie om utmaningar och lösningar till att klassificera företeelser till det norska vägnätet / Classifying traffic islands based on spatial vector polygons in a road network : A case study on challenges and solutions when classifying features to the Norwegian road network

Andersson, Jens, Berg, Marcus January 2022 (has links)
Geografiska informationssystems användning blir allt viktigare i dagens samhälle där spatiala data kan lagras, hämtas, analyseras och visualiseras. Genom att sammanställa spatiala data kan en bild av verkligheten abstraheras. Detaljerad information om vägnat och företeelser (refuger, bullerplank, skyltar etcetera) för analys leder till ett effektivare drift- och underhållsarbete. Vilket i sin tur ger en ökad framkomlighet för trafikanter. Teknikföretaget Triona har en kartapplikation där utmaningar har uppstått gällande algoritmisk knytning av inmätta refuger (benämnd Norge-datasamlingen) till det norska vägnatet. En refug ar en upphöjning i gatan som avgränsar körfalt och påminner om en trottoar i utseendet. Denna fallstudie behandlade ett delproblem där klassificering av refuger skulle kunna underlätta knytningen och förutsättningarna for analys. Syftet med studien kan sammanfattas till att presentera förslag på metoder for att klassificera refugerna med övervakad maskininlärning. Med algoritmerna K-nearest neighbors (KNN) och Decision tree studerades möjligheten att automatiskt klassificera refugerna. En refug bestod av en vektorpolygon vilket är en lista med koordinater. Polygonens hörn bestod av koordinatparen latitud och longitud. Norge-datasamlingen var inte i forväg kategoriserad till sina elva typer och kunde därfor inte anvandas. En datasamling med 2157 refuger med sju typer från Portland, USA tillämpades i stället. De spatiala vektorpolygonerna transformerades med Elliptical Fourier Descriptors (EFD). Maskinlärningsmodellerna tränades på att klassificera refugerna baserat på matematiska approximationer av dess konturer från EFD. Slutsatser kunde dras genom att refugtypernas konturer analyserades och prestationer observerades. Prestationer utvärderades utifrån traffsäkerhet med kompletterande mätvarden som precision och återkallelse på Portland-datasamlingen. Traffsäkerhet är andelen rätta klassificeringar av refugerna. KNN uppnådde 64 % och Decisiontree 69 % traffsäkerhet. Då båda datasamlingarna var verkliga exempel på refuger i vägnat kunde ett antagande göras att det inte skulle bli en mycket högre traffsäkerhet om studiens metod appliceras på Norge-datasamlingen. Modellernas prestationer bedömdes därmed inte vara tillrackligt bra for en rekommendation. / Geographical information systems are becoming increasingly important in today´s society where spatial data can be stored, collected, analysed, and visualized. By compiling spatial data reality can be abstracted. Detailed information on road networks and objects (traffic islands, noise barriers, signs, etcetera) for analysis leads to more efficient operation and maintenance work. Which in turn provides increased accessibility for road users. The technology company Triona has a map application where algorithmic connection of traffic islands (Norway-dataset) to the Norwegian road network has been challenging. A traffic island is an elevation in the street that delimits lanes and is reminiscent of a sidewalk in appearance. This case study addressed a sub-problem where classification of traffic islands could facilitate the connection and prerequisites for analysis. The aim was to present methods that could classify the traffic islands with supervised machine learning. With the algorithms K-nearest neighbors (KNN) and Decision tree, the possibility of automatically classifying the traffic islands was studied. A traffic island consisted of a vector polygon which is a list storing its corners (latitude and longitude). The Norway-dataset was not previously labelled into its eleven types. A data collection of 2157 refuges with seven types from Portland, USA was therefore applied instead. The traffic islands were transformed with Elliptical Fourier Descriptors which extracted an approximation of its contours to train the machine learning models on. Conclusions could be drawn by analysing the contours and observing performance. Performance was evaluated based on accuracy with precision and recall on the Port-land-dataset. Accuracy is the proportion of correct classifications. KNN achieved 64% and Decision Tree 69% accuracy. As both datasets contained real traffic islands in road networks, an assumption could be made that the accuracy would not be much higher if applied on the Norway-dataset. The result was not considered sufficient for a recommendation.
27

Real-Time Estimation of Traffic Stream Density using Connected Vehicle Data

Aljamal, Mohammad Abdulraheem 02 October 2020 (has links)
The macroscopic measure of traffic stream density is crucial in advanced traffic management systems. However, measuring the traffic stream density in the field is difficult since it is a spatial measurement. In this dissertation, several estimation approaches are developed to estimate the traffic stream density on signalized approaches using connected vehicle (CV) data. First, the dissertation introduces a novel variable estimation interval that allows for higher estimation precision, as the updating time interval always contains a fixed number of CVs. After that, the dissertation develops model-driven approaches, such as a linear Kalman filter (KF), a linear adaptive KF (AKF), and a nonlinear Particle filter (PF), to estimate the traffic stream density using CV data only. The proposed model-driven approaches are evaluated using empirical and simulated data, the former of which were collected along a signalized approach in downtown Blacksburg, VA. Results indicate that density estimates produced by the linear KF approach are the most accurate. A sensitivity of the estimation approaches to various factors including the level of market penetration (LMP) of CVs, the initial conditions, the number of particles in the PF approach, traffic demand levels, traffic signal control methods, and vehicle length is presented. Results show that the accuracy of the density estimate increases as the LMP increases. The KF is the least sensitive to the initial traffic density estimate, while the PF is the most sensitive to the initial traffic density estimate. The results also demonstrate that the proposed estimation approaches work better at higher demand levels given that more CVs exist for the same LMP scenario. For traffic signal control methods, the results demonstrate a higher estimation accuracy for fixed traffic signal timings at low traffic demand levels, while the estimation accuracy is better when the adaptive phase split optimizer is activated for high traffic demand levels. The dissertation also investigates the sensitivity of the KF estimation approach to vehicle length, demonstrating that the presence of longer vehicles (e.g. trucks) in the traffic link reduces the estimation accuracy. Data-driven approaches are also developed to estimate the traffic stream density, such as an artificial neural network (ANN), a k-nearest neighbor (k-NN), and a random forest (RF). The data-driven approaches also utilize solely CV data. Results demonstrate that the ANN approach outperforms the k-NN and RF approaches. Lastly, the dissertation compares the performance of the model-driven and the data-driven approaches, showing that the ANN approach produces the most accurate estimates. However, taking into consideration the computational time needed to train the ANN approach, the large amount of data needed, and the uncertainty in the performance when new traffic behaviors are observed (e.g., incidents), the use of the linear KF approach is highly recommended in the application of traffic density estimation due to its simplicity and applicability in the field. / Doctor of Philosophy / Estimating the number of vehicles (vehicle counts) on a road segment is crucial in advanced traffic management systems. However, measuring the number of vehicles on a road segment in the field is difficult because of the need for installing multiple detection sensors in that road segment. In this dissertation, several estimation approaches are developed to estimate the number of vehicles on signalized roadways using connected vehicle (CV) data. The CV is defined as the vehicle that can share its instantaneous location every time t. The dissertation develops model-driven approaches, such as a linear Kalman filter (KF), a linear adaptive KF (AKF), and a nonlinear Particle filter (PF), to estimate the number of vehicles using CV data only. The proposed model-driven approaches are evaluated using real and simulated data, the former of which were collected along a signalized roadway in downtown Blacksburg, VA. Results indicate that the number of vehicles produced by the linear KF approach is the most accurate. The results also show that the KF approach is the least sensitive approach to the initial conditions. Machine learning approaches are also developed to estimate the number of vehicles, such as an artificial neural network (ANN), a k-nearest neighbor (k-NN), and a random forest (RF). The machine learning approaches also use CV data only. Results demonstrate that the ANN approach outperforms the k-NN and RF approaches. Finally, the dissertation compares the performance of the model-driven and the machine learning approaches, showing that the ANN approach produces the most accurate estimates. However, taking into consideration the computational time needed to train the ANN approach, the huge amount of data needed, and the uncertainty in the performance when new traffic behaviors are observed (e.g., incidents), the use of the KF approach is highly recommended in the application of vehicle count estimation due to its simplicity and applicability in the field.
28

Aceleração de uma variação do problema k-nearest neighbors / Acceleration of a variation of the K-nearest neighbors problem

Morais Neto, Jorge Peixoto de 29 January 2014 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2014-11-25T13:07:50Z No. of bitstreams: 2 Dissertação - Jorge Peixoto de Morais Neto - 2014.pdf: 1582808 bytes, checksum: 3115f942e2c8a9cf83601835af3af1c5 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2014-11-25T14:42:09Z (GMT) No. of bitstreams: 2 Dissertação - Jorge Peixoto de Morais Neto - 2014.pdf: 1582808 bytes, checksum: 3115f942e2c8a9cf83601835af3af1c5 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2014-11-25T14:42:09Z (GMT). No. of bitstreams: 2 Dissertação - Jorge Peixoto de Morais Neto - 2014.pdf: 1582808 bytes, checksum: 3115f942e2c8a9cf83601835af3af1c5 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-01-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Let M be a metric space and let P be a subset of M. The well known k-nearest neighbors problem (KNN) consists in finding, given q 2 M, the k elements of P with are closest to q according to the metric of M. We discuss a variation of KNN for a particular class of pseudo-metric spaces, described as follows. Let m 2 N be a natural number and let d be the Euclidean distance in Rm. Given p 2 Rm: p := (p1; : : : ; pm) let C (p) be the set of the m rotations of p’s coordinates: C (p) := f(p1; : : : ; pm); (p2; : : : ; pm; p1); : : : ; (pm; p1; : : : ; pm􀀀1)g we define the special distance de as: de(p;q) := min p02C (p) d(p0;q): de is a pseudo-metric, and (Rm;de) is a pseudo-metric space. The class of pseudo-metric spaces under discussion is f(Rm;de) j m 2 N:g The brute force approach is too costly for instances of practical size. We present a more efficient solution employing parallelism, the FFT (fast Fourier transform) and the fast elimination of unfavorable training vectors.We describe a program—named CyclicKNN —which implements this solution.We report the speedup of this program over serial brute force search, processing reference datasets. / Seja M um espaço métrico e P um subconjunto de M. O conhecido problema k vizinhos mais próximos (k-neareast neighbors, KNN) consiste em encontrar, dado q 2 M, os k elementos de P mais próximos de q conforme a métrica de M. Abordamos uma variação do problema KNN para uma classe particular de espaços pseudo-métricos, descrita a seguir. Seja m 2 N um natural e seja d a distância euclidiana em Rm. Dado um vetor p 2 Rm: p := (p1; : : : ; pm) seja C (p) o conjunto das m rotações das coordenadas de p: C (p) := f(p1; : : : ; pm); (p2; : : : ; pm; p1); : : : ; (pm; p1; : : : ; pm􀀀1)g definimos a distância especial de como: de(p;q) := min p02C (p) d(p0;q): de é uma pseudo-métrica, e (Rm;de) é um espaço pseudo-métrico. A classe de espaços pseudo-métricos abordada é (Rm;de) j m 2 N: A solução por força bruta é cara demais para instâncias de tamanho prático. Nós apresentamos uma solução mais eficiente empregando paralelismo, a FFT (transformada rápida de Fourier) e a eliminação rápida de vetores de treinamento desfavoráveis. Desenvolvemos um programa—chamado CyclicKNN—que implementa essa solução. Reportamos o speedup desse programa em comparação com a força bruta sequencial, processando bases de dados de referência.
29

以文件分類技術預測股價趨勢 / Predicting Trends of Stock Prices with Text Classification Techniques

陳俊達, Chen, Jiun-da Unknown Date (has links)
股價的漲跌變化是由於證券市場中眾多不同投資人及其投資決策後所產生的結果。然而,影響股價變動的因素眾多且複雜,新聞也屬於其中一種,新聞事件不但是投資人用來得知該股票上市公司的相關營運資訊的主要媒介,同時也是影響投資人決定或變更其股票投資策略的主要因素之一。本研究提出以新聞文件做為股價漲跌預測系統的基礎架構,透過文字探勘技術及分類技術來建置出能預測當日個股收盤股價漲跌趨勢之系統。 本研究共提出三種分類模型,分別是簡易貝氏模型、k最近鄰居模型以及混合模型,並設計了三組實驗,分別是分類器效能的比較、新聞樣本資料深度的比較、以及新聞樣本資料廣度的比較來檢驗系統的預測效能。實驗結果顯示,本研究所提出的分類模型可以有效改善相關研究中整體正確率高但各個類別的預測效能卻差異甚大的情況。而對於影響投資人獲利與否的關鍵類別"漲"及類別"跌"的平均預測效能上,本研究所提出的這三種分類模型亦同時具有良好的成效,可以做為投資人進行投資決策時的有效參考依據。 / Stocks' closing price levels can provide hints about investors' aggregate demands and aggregate supplies in the stock trading markets. If the level of a stock's closing price is higher than its previous closing price, it indicates that the aggregate demand is stronger than the aggregate supply in this trading day. Otherwise, the aggregate demand is weaker than the aggregate supply. It would be profitable if we can predict the individual stock's closing price level. For example, in case that one stock's current price is lower than its previous closing price. We can do the proper strategies(buy or sell) to gain profit if we can predict the stock's closing price level correctly in advance. In this thesis, we propose and evaluate three models for predicting individual stock's closing price in the Taiwan stock market. These models include a naïve Bayes model, a k-nearest neighbors model, and a hybrid model. Experimental results show the proposed methods perform better than the NewsCATS system for the "UP" and "DOWN" categories.
30

Data-Driven Predictions of Heating Energy Savings in Residential Buildings

Lindblom, Ellen, Almquist, Isabelle January 2019 (has links)
Along with the increasing use of intermittent electricity sources, such as wind and sun, comes a growing demand for user flexibility. This has paved the way for a new market of services that provide electricity customers with energy saving solutions. These include a variety of techniques ranging from sophisticated control of the customers’ home equipment to information on how to adjust their consumption behavior in order to save energy. This master thesis work contributes further to this field by investigating an additional incentive; predictions of future energy savings related to indoor temperature. Five different machine learning models have been tuned and used to predict monthly heating energy consumption for a given set of homes. The model tuning process and performance evaluation were performed using 10-fold cross validation. The best performing model was then used to predict how much heating energy each individual household could save by decreasing their indoor temperature by 1°C during the heating season. The highest prediction accuracy (of about 78%) is achieved with support vector regression (SVR), closely followed by neural networks (NN). The simpler regression models that have been implemented are, however, not far behind. According to the SVR model, the average household is expected to lower their heating energy consumption by approximately 3% if the indoor temperature is decreased by 1°C.

Page generated in 0.4662 seconds