Global ETD Search

111	Hybrid Recommender Systems via Spectral Learning and a Random Forest Williams, Alyssa 01 December 2019 (has links) We demonstrate spectral learning can be combined with a random forest classifier to produce a hybrid recommender system capable of incorporating meta information. Spectral learning is supervised learning in which data is in the form of one or more networks. Responses are predicted from features obtained from the eigenvector decomposition of matrix representations of the networks. Spectral learning is based on the highest weight eigenvectors of natural Markov chain representations. A random forest is an ensemble technique for supervised learning whose internal predictive model can be interpreted as a nearest neighbor network. A hybrid recommender can be constructed by first deriving a network model from a recommender's similarity matrix then applying spectral learning techniques to produce a new network model. The response learned by the new version of the recommender can be meta information. This leads to a system capable of incorporating meta data into recommendations. similarity learning collaborative filtering nearest neighbors Databases and Information Systems Other Mathematics Theory and Algorithms
112	Disc : Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs / Disc : Approximativ närmaste grannsökning med ellipsoider för fotonmappning på GPU:er Bergholm, Marcus, Kronvall, Viktor January 2016 (has links) Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries for photon gathering are shown to reduce artifacts due to this false bleeding. Shifted Sorting has been found to yield high performance on GPUs while simultaneously retaining a high approximation rate. This study presents an algorithm for approximatively solving the K-Nearest Neighbors problem modified to use a distance measure creating an ellipsoidical search boundary. The ellipsoidical search boundary is used to alleviate the issue of false bleeding of colors between surfaces in Photon Mapping. The Approximative K-Nearest Neighbors algorithm presented is a modification of the Shifted Sorting algorithm. The algorithm is found to be highly parallelizable and performs to a factor of 86% queries processed per millisecond compared to a reference implementation using spherical search boundaries implied by the euclidean distance. The rate of compression from spherical to ellipsoidical search boundary is appropriately chosen in the range 3.0 to 7.0. The algorithm is found to scale well in respect to increases in both number of data points and number of query points. / Grafikprocessorer (GPU-er) har på senare tid möjliggjort högprestandaberäkningar till låga kostnader för generella applikationer. K-Nearest Neighbors problemet har vida applikationsområden, från klassifikation inom maskininlärning till insamlande av fotoner i Photon Mapping för rendering av tredimensionella scener. Användning av euklidiska avstånd vid insamling av fotoner kan leda till en felaktig bladning av färger mellan ytor. Ellipsoidiska sökområden vid fotoninsamling har visats reducera artefakter oraskade av denna typ av felaktiga färgutblandning. Shifted Sorting har visats ge hög prestanda på GPU-er utan att förlora kvalitet av approximationsgrad. Denna rapport undersöker hur den approximativa varianten av K-Nearest Neighborsalgoritmen med Shifted Sorting presterar på GPU-er med avståndsmåttet modifierat sådant att ett ellipsoidiskt sökområde bildas. Algoritmen används för att reduceras problemet av felaktig blanding av färg i Photon Mapping. Algoritmen visas vara mycket parallelliserbar och presterar till en grad av 86% behandlade sökpunkter per millisekund i jämförelse med en referensimplementation som använder sfäriska sökområden. Kompressionsgraden längs sökpunktens ytnormal väljs fördelaktligen till ett värde i intervallet 3,0 till 7,0. Algoritmen visas skala väl med avseende på både ökningar i antal data punkter och antal sökpunkter. photon mapping k-nearest neighbors ellipsoid gpu parallel Computer Sciences Datavetenskap (datalogi)
113	DISTRIBUTED NEAREST NEIGHBOR CLASSIFICATION WITH APPLICATIONS TO CROWDSOURCING Jiexin Duan (11181162) 26 July 2021 (has links) The aim of this dissertation is to study two problems of distributed nearest neighbor classification (DiNN) systematically. The first one compares two DiNN classifiers based on different schemes: majority voting and weighted voting. The second one is an extension of the DiNN method to the crowdsourcing application, which allows each worker data has a different size and noisy labels due to low worker quality. Both statistical guarantees and numerical comparisons are studied in depth.<br><div><br></div><div><div>The first part of the dissertation focuses on the distributed nearest neighbor classification in big data. The sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed a considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one.</div></div><div><br></div><div>The second part of the dissertation extends the DiNN methods to the application in crowdsourcing. The noisy labels in crowdsourcing data and different sizes of worker data will deteriorate the performance of DiNN methods. We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue. Our proposed method achieves the same regret as its oracle version on the expert data with the same size. We also propose two algorithms to estimate the worker quality if it is unknown in practice. One method constructs the estimators for worker quality based on the denoised worker labels through applying kNN classifier on expert data. Unlike previous worker quality estimation methods, which have no statistical guarantee, it achieves the same regret as the ENN with observed worker quality. The other method estimates the worker quality iteratively based on ENN, and it works well without expert data required by most previous methods.<br></div> Statistics nearest neighbor classification crowdsourcing distributed learning regret analysis worker quality big data
114	Methods for Efficient Synthesis of Large Reversible Binary and Ternary Quantum Circuits and Applications of Linear Nearest Neighbor Model Hawash, Maher Mofeid 30 May 2013 (has links) This dissertation describes the development of automated synthesis algorithms that construct reversible quantum circuits for reversible functions with large number of variables. Specifically, the research area is focused on reversible, permutative and fully specified binary and ternary specifications and the applicability of the resulting circuit to the physical limitations of existing quantum technologies. Automated synthesis of arbitrary reversible specifications is an NP hard, multiobjective optimization problem, where 1) the amount of time and computational resources required to synthesize the specification, 2) the number of primitive quantum gates in the resulting circuit (quantum cost), and 3) the number of ancillary qubits (variables added to hold intermediate calculations) are all minimized while 4) the number of variables is maximized. Some of the existing algorithms in the literature ignored objective 2 by focusing on the synthesis of a single solution without the addition of any ancillary qubits while others attempted to explore every possible solution in the search space in an effort to discover the optimal solution (i.e., sacrificed objective 1 and 4). Other algorithms resorted to adding a huge number of ancillary qubits (counter to objective 3) in an effort minimize the number of primitive gates (objective 2). In this dissertation, I first introduce the MMDSN algorithm that is capable of synthesizing binary specifications up to 30 variables, does not add any ancillary variables, produces better quantum cost (8-50% improvement) than algorithms which limit their search to a single solution and within a minimal amount of time compared to algorithms which perform exhaustive search (seconds vs. hours). The MMDSN algorithm introduces an innovative method of using the Hasse diagram to construct candidate solutions that are guaranteed to be valid and then selects the solution with the minimal quantum cost out of this subset. I then introduce the Covered Set Partitions (CSP) algorithm that expands the search space of valid candidate solutions and allows for exploring solutions outside the range of MMDSN. I show a method of subdividing the expansive search landscape into smaller partitions and demonstrate the benefit of focusing on partition sizes that are around half of the number of variables (15% to 25% improvements, over MMDSN, for functions less than 12 variables, and more than 1000% improvement for functions with 12 and 13 variables). For a function of n variables, the CSP algorithm, theoretically, requires n times more to synthesize; however, by focusing on the middle k (k by MMDSN which typically yields lower quantum cost. I also show that using a Tabu search for selecting the next set of candidate from the CSP subset results in discovering solutions with even lower quantum costs (up to 10% improvement over CSP with random selection). In Chapters 9 and 10 I question the predominant methods of measuring quantum cost and its applicability to physical implementation of quantum gates and circuits. I counter the prevailing literature by introducing a new standard for measuring the performance of quantum synthesis algorithms by enforcing the Linear Nearest Neighbor Model (LNNM) constraint, which is imposed by the today's leading implementations of quantum technology. In addition to enforcing physical constraints, the new LNNM quantum cost (LNNQC) allows for a level comparison amongst all methods of synthesis; specifically, methods which add a large number of ancillary variables to ones that add no additional variables. I show that, when LNNM is enforced, the quantum cost for methods that add a large number of ancillary qubits increases significantly (up to 1200%). I also extend the Hasse based method to the ternary and I demonstrate synthesis of specifications of up to 9 ternary variables (compared to 3 ternary variables that existed in the literature). I introduce the concept of ternary precedence order and its implication on the construction of the Hasse diagram and the construction of valid candidate solutions. I also provide a case study comparing the performance of ternary logic synthesis of large functions using both a CUDA graphic processor with 1024 cores and an Intel i7 processor with 8 cores. In the process of exploring large ternary functions I introduce, to the literature, eight families of ternary benchmark functions along with a Multiple Valued file specification (the Extended Quantum Specification XQS). I also introduce a new composite quantum gate, the multiple valued Swivel gate, which swaps the information of qubits around a centrally located pivot point. In summary, my research objectives are as follows: * Explore and create automated synthesis algorithms for reversible circuits both in binary and ternary logic for large number of variables. * Study the impact of enforcing Linear Nearest Neighbor Model (LNNM) constraint for every interaction between qubits for reversible binary specifications. * Advocate for a revised metric for measuring the cost of a quantum circuit in concordance with LNNM, where, on one hand, such a metric would provide a way for balanced comparison between the various flavors of algorithms, and on the other hand, represents a realistic cost of a quantum circuit with respect to an ion trap implementation. * Establish an open source repository for sharing the results, software code and publications with the scientific community. With the dwindling expectations for a new lifeline on silicon-based technologies, quantum computations have the potential of becoming the future workhorse of computations. Similar to the automated CAD tools of classical logic, my work lays the foundation for creating automated tools for constructing quantum circuits from reversible specifications. Quantum computers -- Research Many-valued logic Reversible computing Nearest neighbor analysis (Statistics)
115	Discovery of Outlier Points and Dense Regions in Large Data-Sets Using Spark Environment Nadella, Pravallika 04 October 2021 (has links) No description available. Computer Science Outliers Dense regions KD-Tree K-Nearest Neighbors Spark MapReduce
116	Practical Web-scale Recommender Systems / 実用的なWebスケール推薦システム / # ja-Kana Tagami, Yukihiro 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21390号 / 情博第676号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島久嗣, 教授山本章博, 教授下平英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Recommender systems Online advertising Extreme multi-label classification Learning-to-rank Approximate nearest neighbor search 007
117	Classification of Dense Masses in Mammograms Naram, Hari Prasad 01 May 2018 (has links) (PDF) This dissertation material provided in this work details the techniques that are developed to aid in the Classification of tumors, non-tumors, and dense masses in a Mammogram, certain characteristics such as texture in a mammographic image are used to identify the regions of interest as a part of classification. Pattern recognizing techniques such as nearest mean classifier and Support vector machine classifier are also used to classify the features. The initial stages include the processing of mammographic image to extract the relevant features that would be necessary for classification and during the final stage the features are classified using the pattern recognizing techniques mentioned above. The goal of this research work is to provide the Medical Experts and Researchers an effective method which would aid them in identifying the tumors, non-tumors, and dense masses in a mammogram. At first the breast region extraction is carried using the entire mammogram. The extraction is carried out by creating the masks and using those masks to extract the region of interest pertaining to the tumor. A chain code is employed to extract the various regions, the extracted regions could potentially be classified as tumors, non-tumors, and dense regions. Adaptive histogram equalization technique is employed to enhance the contrast of an image. After applying the adaptive histogram equalization for several times which will provide a saturated image which would contain only bright spots of the mammographic image which appear like dense regions of the mammogram. These dense masses could be potential tumors which would need treatment. Relevant Characteristics such as texture in the mammographic image are used for feature extraction by using the nearest mean and support vector machine classifier. A total of thirteen Haralick features are used to classify the three classes. Support vector machine classifier is used to classify two class problems and radial basis function (RBF) kernel is used to find the best possible (c and gamma) values. Results obtained in this research suggest the best classification accuracy was achieved by using the support vector machines for both Tumor vs Non-Tumor and Tumor vs Dense masses. The maximum accuracies achieved for the tumor and non-tumor is above 90 % and for the dense masses is 70.8% using 11 features for support vector machines. Support vector machines performed better than the nearest mean majority classifier in the classification of the classes. Various case studies were performed using two distinct datasets in which each dataset consisting of 24 patients’ data in two individual views. Each patient data will consist of both the cranio caudal view and medio lateral oblique views. From these views the region of interest which could possibly be a tumor, non-tumor, or a dense regions(mass). Adaptive Histogram Equalization Mammograms Nearest Neighbor Classifier Pattern Recognition Support Vector Machines
118	Maskininlärning som medel för att betygsätta samtal med språklärande syfte mellan robot och människa / Machine learning as tool to grade language learning conversations between robot and human Melander, Gustav, Wänlund, Robin January 2019 (has links) Det svenska företaget Furhat Robotics har skapat en robot kallad Furhat vilken är kapabel till att interagera med människor i språkcafé-liknande miljöer. Syftet med den robotledda konversationen är att utveckla deltagarnas språkkunskaper, vilka efter varje konversation får svara på en enkät om vad de tyckte om samtalet med Furhat. Ur detta har frågan huruvida det är möjligt att förutspå vad deltagarna tyckte om samtalet baserat på konversationens struktur uppstått. Syftet med denna rapport är att analysera huruvida det är möjligt att kvantifiera konversationerna och förutspå svaren i enkäten med hjälp av maskininlärning. Det dataset som rapporten baserar sig på erhölls från tidigare studier i Kollaborativ Robotassisterad Språkinlärning (Collaborative Robot Assisted Language Learning). Resultaten visade på ett RMSE högre än variansen för medelvärdet av enkätsvaren vilket indikerar att den framtagna modellen inte är särskilt effektiv. Modellen presterade dock bättre i vissa förutsägelser då varje enskilt enkätsvar förutspåddes var för sig. Detta antyder att modellen skulle kunna användas till vissa frågeformuleringar / The Swedish company Furhat Robotic have created a robot called Furhat, which is able to interact with humans in a language café setting. The purpose of the robot led conversation is for the participants to develop their language skills. After the conversation the humans will answer a survey about what they thought about the conversation with Furhat. A question that has arisen from this is if it is possible to predict the survey answers based on just the conversation. The purpose of this paper is to analyze if it is possible to quantify the conversations linked to the survey answers, and by doing so be able to predict the answers in new conversations with a machine learning approach. The data set being used was obtained from an earlier study in Collaborative Robot Assisted Language Learning. The result returned a RMSE that was greater than the variance of the average conversation score which indicates that the model is not very effective. However, it excelled in some predictions trying to give scores to each separate survey answer, indicating that the model could be used for certain question formulations. Furhat machine learning RALL robot nearest neighbor text quantification Computer and Information Sciences Data- och informationsvetenskap
119	A Direct Algorithm for the K-Nearest-Neighbor Classifier via Local Warping of the Distance Metric Neo, TohKoon 30 November 2007 (has links) (PDF) The k-nearest neighbor (k-NN) pattern classifier is a simple yet effective learner. However, it has a few drawbacks, one of which is the large model size. There are a number of algorithms that are able to condense the model size of the k-NN classifier at the expense of accuracy. Boosting is therefore desirable for increasing the accuracy of these condensed models. Unfortunately, there does not exist a boosting algorithm that works well with k-NN directly. We present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN. computer science machine learning k nearest neighbor knn boosting Computer Sciences
120	Generating Exploration Mission-3 Trajectories to a 9:2 NRHO Using Machine Learning Guzman, Esteban 01 December 2018 (has links) (PDF) The purpose of this thesis is to design a machine learning algorithm platform that provides expanded knowledge of mission availability through a launch season by improving trajectory resolution and introducing launch mission forecasting. The specific scenario addressed in this paper is one in which data is provided for four deterministic translational maneuvers through a mission to a Near Rectilinear Halo Orbit (NRHO) with a 9:2 synodic frequency. Current launch availability knowledge under NASA’s Orion Orbit Performance Team is established by altering optimization variables associated to given reference launch epochs. This current method can be an abstract task and relies on an orbit analyst to structure a mission based off an established mission design methodology associated to the performance of Orion and NASA's Space Launch System. Introducing a machine learning algorithm trained to construct mission scenarios within the feasible range of known trajectories reduces the required interaction of the orbit analyst by removing the needed step of optimizing the orbit to fit an expected translational response required of the spacecraft. In this study, k-Nearest Neighbor and Bayesian Linear Regression successfully predicted classical orbital elements for the launch windows observed. However both algorithms had limitations due to their approaches to model fitting. Training machine learning algorithms off of classical orbital elements introduced a repetitive approach to reconstructing mission segments for different arrival opportunities through the launch window and can prove to be a viable method of launch window scan generation for future missions. machine learning nrho space exploration regression k-Nearest Neighbor Bayesian Linear Astrodynamics

Search results