Global ETD Search

51	Modeling Success Factors for Start-ups in Western Europe through a Statistical Learning Approach / Modellering av framgångsfaktorer för startups i Västeuropa genom statistisk inlärning Kamal, Adib, Sabani, Kenan January 2021 (has links) The purpose of this thesis was to use a quantitative method to expand on previous research in the field of start-up success prediction. This was accomplished by including more criteria in the study, which was made possible by the Crunchbase database, which is the largest available information source for start-ups. Furthermore, the data used in this thesis was limited to Western European start-ups only in order to study the effects of limiting the data to a certain geographical region on the prediction models, which to our knowledge has not been done before in this type of research. The quantitative method used was machine learning and specifically the three machine learning predictors used in this thesis were Logistic Regression, Random Forest and K-nearest Neighbor (KNN). All three models proposed and evaluated have a better prediction accuracy than guessing the outcome at random. When tested on data previously unknown to the model, Random Forest produced the greatest results, predicting a successful company as a success and a failed company as a failure with 79 percent accuracy. With accuracies of 65 percent and 59 percent, respectively, both logistic regression and K-Nearest Neighbor (KNN) were close behind. / Syftet med denna avhandling var att använda en kvantitativ metod för att utöka tidigare forskning inom modellering av framgångsfaktorer för start-ups genom maskininlärning. Detta kunde åstadkommas genom att inkludera fler kriterier i studien än vad som har gjorts tidigare, vilket möjliggjordes av Crunchbase-databasen, som är den största tillgängliga informationskällan för nystartade företag. Dessutom är den data som användes i denna avhandling begränsad till endast västeuropeiska start-ups för att studera effekterna av att begränsa data till ett visst geografiskt område i prediktionsmodellerna, vilket inte har gjorts tidigare i denna typ av forskning. Den kvantitativa metoden som användes var maskininlärning och specifikt var de tre maskininlärningsmodellerna som användes i denna avhandling Logistic Regression, Random Forest och K-Nearest Neighbor (KNN). Alla tre modeller som inkluderats och utvärderats har en bättre förutsägelsesnoggrannhet än att gissa resultatet slumpmässigt. När modellerna testades med data som tidigare varit okänd för modellerna, gav Random Forest det bästa resultatet och predikterade ett framgångsrikt företag korrekt och ett misslyckat företag korrekt med 79 procents noggrannhet. Nära efter kom både K-Nearest Neighbor (KNN) och Logistic Regression med respektive noggrannheter på 65 och 59 procent. Machine learning KNN Random Forest Logistic Regression Start-up Success Maskininlärning KNN Random Forest Logistic Regression Start-up Framgångsfaktorer Economics and Business Ekonomi och näringsliv Other Engineering and Technologies Annan teknik
52	Novel control of a high performance rotary wood planing machine Chamberlain, Matthew January 2013 (has links) Rotary planing, and moulding, machining operations have been employed within the woodworking industry for a number of years. Due to the rotational nature of the machining process, cuttermarks, in the form of waves, are created on the machined timber surface. It is the nature of these cuttermarks that determine the surface quality of the machined timber. It has been established that cutting tool inaccuracies and vibrations are a prime factor in the form of the cuttermarks on the timber surface. A principal aim of this thesis is to create a control architecture that is suitable for the adaptive operation of a wood planing machine in order to improve the surface quality of the machined timber. In order to improve the surface quality, a thorough understanding of the principals of wood planing is required. These principals are stated within this thesis and the ability to manipulate the rotary wood planing process, in order to achieve a higher surface quality, is shown. An existing test rig facility is utilised within this thesis, however upgrades to facilitate higher cutting and feed speeds, as well as possible future implementations such as extended cutting regimes, the test rig has been modified and enlarged. This test rig allows for the dynamic positioning of the centre of rotation of the cutterhead during a cutting operation through the use of piezo electric actuators, with a displacement range of ±15μm. A new controller for the system has been generated. Within this controller are a number of tuneable parameters. It was found that these parameters were dependant on a high number external factors, such as operating speeds and run‐out of the cutting knives. A novel approach to the generation of these parameters has been developed and implemented within the overall system. Both cutterhead inaccuracies and vibrations can be overcome, to some degree, by the vertical displacement of the cutterhead. However a crucial information element is not known, the particular displacement profile. Therefore a novel approach, consisting of a subtle change to the displacement profile and then a pattern matching approach, has been implemented onto the test rig. Within the pattern matching approach the surface profiles are simplified to a basic form. This basic form allows for a much simplified approach to the pattern matching whilst producing a result suitable for the subtle change approach. In order to compress the data levels a Principal Component Analysis was performed on the measured surface data. Patterns were found to be present in the resultant data matrix and so investigations into defect classification techniques have been carried out using both K‐Nearest Neighbour techniques and Neural Networks. The application of these novel approaches has yielded a higher system performance, for no additional cost to the mechanical components of the wood planing machine, both in terms of wood throughput and machined timber surface quality. 621.9
53	Application of Machine Learning Techniques for Real-time Classification of Sensor Array Data Li, Sichu 15 May 2009 (has links) There is a significant need to identify approaches for classifying chemical sensor array data with high success rates that would enhance sensor detection capabilities. The present study attempts to fill this need by investigating six machine learning methods to classify a dataset collected using a chemical sensor array: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Classification and Regression Trees (CART), Random Forest (RF), Naïve Bayes Classifier (NB), and Principal Component Regression (PCR). A total of 10 predictors that are associated with the response from 10 sensor channels are used to train and test the classifiers. A training dataset of 4 classes containing 136 samples is used to build the classifiers, and a dataset of 4 classes with 56 samples is used for testing. The results generated with the six different methods are compared and discussed. The RF, CART, and KNN are found to have success rates greater than 90%, and to outperform the other methods. KNN SVM CART Random Forest Naïve Bayes PCR machine learning classification sensor array
54	Incremental Support Vector Machine Approach for DoS and DDoS Attack Detection Seunghee Lee (6636224) 14 May 2019 (has links) <div> <div> <div> <p>Support Vector Machines (SVMs) have generally been effective in detecting instances of network intrusion. However, from a practical point of view, a standard SVM is not able to handle large-scale data efficiently due to the computation complexity of the algorithm and extensive memory requirements. To cope with the limitation, this study presents an incremental SVM method combined with a k-nearest neighbors (KNN) based candidate support vectors (CSV) selection strategy in order to speed up training and test process. The proposed incremental SVM method constructs or updates the pattern classes by incrementally incorporating new signatures without having to load and access the entire previous dataset in order to cope with evolving DoS and DDoS attacks. Performance of the proposed method is evaluated with experiments and compared with the standard SVM method and the simple incremental SVM method in terms of precision, recall, F1-score, and training and test duration.<br></p> </div> </div> </div> Applied Computer Science SVM Incremental SVM ISVM KNN algorithm Incremental learning Intrusion Detection System (IDS)
55	Evaluating and Reducing the Effects of Misclassification in a Sequential Multiple Assignment Randomized Trial (SMART) He, Jun 01 January 2018 (has links) SMART designs tailor individual treatment by re-randomizing patients to subsequent therapies based on their response to initial treatment. However, the classification of patients being responders/non-responders could be inaccurate and thus lead to inappropriate treatment assignment. In a two-step SMART design, by assuming equal randomization, and equal variances of misclassified patients and correctly classified patients, we evaluated misclassification effects on mean, variance, and type I error/ power of single sequential treatment outcome (SST), dynamic treatment outcome (DTRs), and overall outcome. The results showed that misclassification could introduce bias to estimates of treatment effect in all types of outcome. Though the magnitude of bias could vary according to different templates, there were a few constant conclusions: 1) for any fixed sensitivity the bias of mean of SSTs responders always approached to 0 as specificity increased to 1, and for any fixed specificity the bias of mean of SSTs non-responders always approached to 0 as sensitivity increased to 1; 2) for any fixed specificity there was monotonic nonlinear relationship between the bias of mean of SSTs responders and sensitivity, and for any fixed sensitivity there was also monotonic nonlinear relationship between the bias of mean of SSTs non-responders and specificity; 3) the bias of variance of SSTs was always non-monotone nonlinear equation; 4) the variance of SSTs under misclassification was always over-estimated; 5) the maximized absolute relative bias of variance of SSTs was always ¼ of the squared mean difference between misclassified patients and correctly classified patients divided by true variance, but it might not be observed in the range of sensitivity and specificity (0,1); 6) regarding to sensitivity and specificity, the bias of mean of DTRs or overall outcomes was always linear equation and their bias of variance was always non-monotone nonlinear equation; 7) the relative bias of mean/ variance of DTRs or overall outcomes could approach to 0 where sensitivity or specificity wasn’t necessarily to be 1. Furthermore, the results showed that the misclassification could affect statistical inference. Power could be less or bigger than planned 80% under misclassification and showed either monotonic or non-monotonic pattern as sensitivity or specificity decreased. To mitigate these adverse effects, patient observations could be weighted by the likelihood that their response was correctly classified. We investigated both normal-mixture-model (NM) and k-nearest-neighbor (KNN) strategies to attempt to reduce bias of mean and variance and improve inference at final stage outcome. The NM estimated the early stage probabilities of being a responder for each patient through optimizing the likelihood function by EM algorithm, while KNN estimated these probabilities based upon classifications for the k nearest observations. Simulations were used to compare the performance of these approaches. The results showed that 1) KNN and NM produced modest reductions of bias of point estimates of SSTs; 2) both strategies reduced bias on point estimates of DTRs when the misclassified patients and correctly classified patients from same initial treatment had unequal means; 3) NM reduced the bias of point estimates of overall outcome more than KNN; 4) in general, there were little effect on power adjustment; 5) type I error should always be preserved at 0.05 regardless of misclassification when same response rate and same treatment effects among responders or among non-responders were assumed, but the observed type I error tended to be less than 0.05; 6) KNN preserved type I error at 0.05, but NM might increase type I error rate. Even though most of time both KNN and NM strategies improved point estimates in SMART designs while we knew misclassification might be involved, the tradeoff were increased type I error rate and little effect on power. Our work showed that misclassification should be considered in SMART design because it introduced bias, but KNN or NM strategies at the final stage couldn’t completely reduce bias of point estimates or improve power. However, in future by adjusting with covariates, these two strategies might be used to improve the classification accuracy in the early stage outcomes. clinical trial SMART misclassification KNN normal mixture simulation Life Sciences Medicine and Health Sciences Social and Behavioral Sciences
56	Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases Cui, Bin, Shen, Heng Tao, Shen, Jialie, Tan, Kian Lee 01 1900 (has links) In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets. / Singapore-MIT Alliance (SMA) High-dimensional index structure bit difference approximate KNN query memory processing BID+
57	Minimisation de fonctions de perte calibrée pour la classification des images Bel Haj Ali, Wafa 11 October 2013 (has links) (PDF) La classification des images est aujourd'hui un défi d'une grande ampleur puisque ça concerne d'un côté les millions voir des milliards d'images qui se trouvent partout sur le web et d'autre part des images pour des applications temps réel critiques. Cette classification fait appel en général à des méthodes d'apprentissage et à des classifieurs qui doivent répondre à la fois à la précision ainsi qu'à la rapidité. Ces problèmes d'apprentissage touchent aujourd'hui un grand nombre de domaines d'applications: à savoir, le web (profiling, ciblage, réseaux sociaux, moteurs de recherche), les "Big Data" et bien évidemment la vision par ordinateur tel que la reconnaissance d'objets et la classification des images. La présente thèse se situe dans cette dernière catégorie et présente des algorithmes d'apprentissage supervisé basés sur la minimisation de fonctions de perte (erreur) dites "calibrées" pour deux types de classifieurs: k-Plus Proches voisins (kNN) et classifieurs linéaires. Ces méthodes d'apprentissage ont été testées sur de grandes bases d'images et appliquées par la suite à des images biomédicales. Ainsi, cette thèse reformule dans une première étape un algorithme de Boosting des kNN et présente ensuite une deuxième méthode d'apprentissage de ces classifieurs NN mais avec une approche de descente de Newton pour une convergence plus rapide. Dans une seconde partie, cette thèse introduit un nouvel algorithme d'apprentissage par descente stochastique de Newton pour les classifieurs linéaires connus pour leur simplicité et leur rapidité de calcul. Enfin, ces trois méthodes ont été utilisées dans une application médicale qui concerne la classification de cellules en biologie et en pathologie. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Apprentissage supervisé Classification des images KNN Classification linéaire Fonctions de coût
58	應用文字探勘技術萃取設計概念之研究 / A study of using text mining on design concept extraction 羅康維, Luo, Kang Wei Unknown Date (has links) 近年來，設計已成為提高產品附加價值並增進利潤的利器之一，企業在全球競爭壓力下為了提升競爭力，積極透過設計力開發創新產品。在政府的積極推動下，許多傳統產業與設計公司媒合。然而如何將產品創新需求，轉換並傳達成設計概念，成為極其重要且困難的問題。本研究為有效傳達設計概念，蒐集2005年至2012年參加德國iF國際產品設計大獎以及RedDot設計獎得獎作品，鎖定所有桌椅櫃類的產品描述，應用文字探勘技術將產品描述過濾並找出對應特徵值亦即設計元素，再利用KNN技術將設計元素分群，試圖從各群中萃取出設計概念。本研究將260篇桌椅櫃類產品設計文件中分成16群設計概念。分群係以群內平均相似度大於0.05做為門檻以形成設計概念。本研究結果分為16群設計概念，分別命名為「特色零件多樣感覺概念」、「傳統與現代木椅概念」、「以系統為主的豪華家具」、「波型的時尚概念」、「多樣設計感沙發」、「多造型十字腳椅」、「仿生化人體工學概念」、「親子概念」、「舒適躺臥概念」、「具設計感的室內外用椅」、「注重靠背設計概念」、「多角度對稱概念」、「各式形狀桌面與沙發概念」、「殼形靠背椅」、「中國傳統」、「強調地點取向的概念」等概念，需求者可透過需求之設計元素對應出相關設計概念群與設計者進行有效溝通，更快的了解所想要設計之產品，設計師可以大大縮短在需求階段所消耗的時間以及力氣。最後本研究亦提出一些未來研究方向。關鍵字：文字探勘、kNN、設計概念、萃取文字探勘 kNN 設計概念萃取
59	K-nearest neighbors queries in time-dependent road networks: analyzing scenarios where points of interest move to the query point Chucre, Mirla Rafaela Rafael Braga January 2015 (has links) CHUCRE, Mirla Rafaela Rafael Braga. K-nearest neighbors queries in time-dependent road networks: analyzing scenarios where points of interest move to the query point. 2015. 65 f. Dissertação (Mestrado em Ciência da Computação)-Universidade Federal do Ceará, Fortaleza, 2015. / Submitted by Jonatas Martins (jonatasmartins@lia.ufc.br) on 2017-06-29T12:26:58Z No. of bitstreams: 1 2015_dis_mrrbchucre.pdf: 15845328 bytes, checksum: a2e4d0a03ca943372c92852d4bcf7236 (MD5) / Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2017-06-29T13:54:36Z (GMT) No. of bitstreams: 1 2015_dis_mrrbchucre.pdf: 15845328 bytes, checksum: a2e4d0a03ca943372c92852d4bcf7236 (MD5) / Made available in DSpace on 2017-06-29T13:54:36Z (GMT). No. of bitstreams: 1 2015_dis_mrrbchucre.pdf: 15845328 bytes, checksum: a2e4d0a03ca943372c92852d4bcf7236 (MD5) Previous issue date: 2015 / A kNN query retrieve the k points of interest that are closest to the query point, where proximity is computed from the query point to the points of interest. Time-dependent road networks are represented as weighted graphs, where the weight of an edge depends on the time one passes through that edge. This way, we can model periodic congestions during rush hour and similar effects. Travel time on road networks heavily depends on the traffic and, typically, the time a moving object takes to traverse a segment depends on departure time. In time-dependent networks, a kNN query, called TD-kNN, returns the k points of interest with minimum travel-time from the query point. As a more concrete example, consider the following scenario. Imagine a tourist in Paris who is interested to visit the touristic attraction closest from him/her. Let us consider two points of interest in the city, the Eiffel Tower and the Cathedral of Notre Dame. He/she asks a query asking for the touristic attraction whose the path leading up to it is the fastest at that time, the answer depends on the departure time. For example, at 10h it takes 10 minutes to go to the Cathedral. It is the nearest attraction. Although, if he/she asks the same query at 22h, in the same spatial point, the nearest attraction is the Eiffel Tower. In this work, we identify a variation of nearest neighbors queries in time-dependent road networks that has wide applications and requires novel algorithms for processing. Differently from TD-kNN queries, we aim at minimizing the travel time from points of interest to the query point. With this approach, a cab company can find the nearest taxi in time to a passenger requesting transportation. More specifically, we address the following query: find the k points of interest (e.g. taxi drivers) which can move to the query point (e.g. a taxi user) in the minimum amount of time. Previous works have proposed solutions to answer kNN queries considering the time dependency of the network but not computing the proximity from the points of interest to the query point. We propose and discuss a solution to this type of query which are based on the previously proposed incremental network expansion and use the A∗ search algorithm equipped with suitable heuristic functions. We also discuss the design and correctness of our algorithm and present experimental results that show the efficiency and effectiveness of our solution. / Uma consulta de vizinhos mais próximos (ou kNN, do inglês k nearest neighbours) recupera o conjunto de k pontos de interesse que são mais próximos a um ponto de consulta, onde a proximidade é computada do ponto de consulta para cada ponto de interesse. Nas redes de rodovias tradicionais (estáticas) o custo de deslocamento de um ponto a outro é dado pela distância física entre esses dois pontos. Por outro lado, nas redes dependentes do tempo o custo de deslocamento (ou seja, o tempo de viagem) entre dois pontos varia de acordo com o instante de partida. Nessas redes, as consultas kNN são denominadas TD-kNN (do inglês Time-Dependent kNN). As redes de rodovias dependentes do tempo representam de forma mais adequada algumas situações reais, como, por exemplo, o deslocamento em grandes centros urbanos, onde o tempo para se deslocar de um ponto a outro durante os horários de pico, quando o tráfego é intenso e as ruas estão congestionadas, é muito maior do que em horários normais. Neste contexto, uma consulta típica consiste em descobrir os k restaurantes (pontos de interesse) mais próximos de um determinado cliente (ponto de consulta) caso este inicie o seu deslocamento ao meio dia. Nesta dissertação nós estudamos o problema de processar uma variação de consulta de vizinhos mais próximos em redes viárias dependentes do tempo. Diferentemente das consultas TD-kNN, onde a proximidade é calculada do ponto de consulta para um determinado ponto de interesse, estamos interessados em situações onde a proximidade deve ser calculada de um ponto de interesse para o ponto de consulta. Neste caso, uma consulta típica consiste em descobrir os k taxistas (pontos de interesse) mais próximos (ou seja, com o menor tempo de viagem) de um determinado cliente (ponto de consulta) caso eles iniciem o seu deslocamento até o referido cliente ao meio dia. Desta forma, nos cenários investigados nesta dissertação, são os pontos de interesse que se deslocam até o ponto de consulta, e não o contrário. O método proposto para executar este tipo de consulta aplica uma busca A∗ à medida que vai, de maneira incremental, explorando a rede. O objetivo do método é reduzir o percentual da rede avaliado na busca. A construção e a corretude do método são discutidas e são apresentados resultados experimentais com dados reais e sintéticos que mostram a eficiência da solução proposta. Processamento de consultas espaciais Redes dependentes do tempo Consultas de vizinho mais próximo TD-kNN queries Time-dependent networks
60	SPSR Efficient Processing of Socially k-Nearest Neighbors with Spatial Range Filter January 2016 (has links) abstract: Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp. / Dissertation/Thesis / Masters Thesis Computer Science 2016 Computer science Computer engineering database geosocial graph index knn shortest path

Search results