Global ETD Search

401	Analyse automatique de données par Support Vector Machines non supervisés D'Orangeville, Vincent January 2012 (has links) Cette dissertation présente un ensemble d'algorithmes visant à en permettre un usage rapide, robuste et automatique des « Support Vector Machines » (SVM) non supervisés dans un contexte d'analyse de données. Les SVM non supervisés se déclinent sous deux types algorithmes prometteurs, le « Support Vector Clustering » (SVC) et le « Support Vector Domain Description » (SVDD), offrant respectivement une solution à deux problèmes importants en analyse de données, soit la recherche de groupements homogènes (« clustering »), ainsi que la reconnaissance d'éléments atypiques (« novelty/abnomaly detection ») à partir d'un ensemble de données. Cette recherche propose des solutions concrètes à trois limitations fondamentales inhérentes à ces deux algorithmes, notamment I) l'absence d'algorithme d'optimisation efficace permettant d'exécuter la phase d'entrainement des SVDD et SVC sur des ensembles de données volumineux dans un délai acceptable, 2) le manque d'efficacité et de robustesse des algorithmes existants de partitionnement des données pour SVC, ainsi que 3) l'absence de stratégies de sélection automatique des hyperparamètres pour SVDD et SVC contrôlant la complexité et la tolérance au bruit des modèles générés. La résolution individuelle des trois limitations mentionnées précédemment constitue les trois axes principaux de cette thèse doctorale, chacun faisant l'objet d'un article scientifique proposant des stratégies et algorithmes permettant un usage rapide, robuste et exempt de paramètres d'entrée des SVDD et SVC sur des ensembles de données arbitraires. Support Vector Machines Support Vector Clustering Support Vector Domain Description
402	Clustering student interaction data using Bloom's Taxonomy to find predictive reading patterns 2016 January 1900 (has links) In modern educational technology we have the ability to capture click-stream interaction data from a student as they work on educational problems within an online environment. This provides us with an opportunity to identify student behaviours within the data (captured by the online environment) that are predictive of student success or failure. The constraints that exist within an educational setting provide the ability to associate these student behaviours to specific educational outcomes. This information could be then used to inform environments that support student learning while improving a student’s metacognitive skills. In this dissertation, we describe how reading behaviour clusters were extracted in an experiment in which students were embedded in a learning environment where they read documents and answered questions. We tracked their keystroke level behaviour and then applied clustering techniques to find pedagogically meaningful clusters. The key to finding these clusters were categorizing the questions as to their level in Bloom’s educational taxonomy: different behaviour patterns predicted success and failure in answering questions at various levels of Bloom. The clusters found in the first experiment were confirmed through two further experiments that explored variations in the number, type, and length of documents and the kinds of questions asked. In the final experiment, we also went beyond the actual keystrokes and explored how the pauses between keystrokes as a student answers a question can be utilized in the process of determining student success. This research suggests that it should be possible to diagnose learner behaviour even in “ill-defined” domains like reading. It also suggests that Bloom’s taxonomy can be an important (even necessary) input to such diagnosis.
403	Graph analysis combining numerical, statistical, and streaming techniques Fairbanks, James Paul 27 May 2016 (has links) Graph analysis uses graph data collected on a physical, biological, or social phenomena to shed light on the underlying dynamics and behavior of the agents in that system. Many fields contribute to this topic including graph theory, algorithms, statistics, machine learning, and linear algebra. This dissertation advances a novel framework for dynamic graph analysis that combines numerical, statistical, and streaming algorithms to provide deep understanding into evolving networks. For example, one can be interested in the changing influence structure over time. These disparate techniques each contribute a fragment to understanding the graph; however, their combination allows us to understand dynamic behavior and graph structure. Spectral partitioning methods rely on eigenvectors for solving data analysis problems such as clustering. Eigenvectors of large sparse systems must be approximated with iterative methods. This dissertation analyzes how data analysis accuracy depends on the numerical accuracy of the eigensolver. This leads to new bounds on the residual tolerance necessary to guarantee correct partitioning. We present a novel stopping criterion for spectral partitioning guaranteed to satisfy the Cheeger inequality along with an empirical study of the performance on real world networks such as web, social, and e-commerce networks. This work bridges the gap between numerical analysis and computational data analysis. Graph analysis Graph algorithms Data analysis Spectral clustering Numerical analysis
404	Tracing large-scale structure with radio sources Lindsay, Samuel Nathan January 2015 (has links) In this thesis, I investigate the spatial distribution of radio sources, and quantify their clustering strength over a range of redshifts, up to z _ 2:2, using various forms of the correlation function measured with data from several multi-wavelength surveys. I present the optical spectra of 30 radio AGN (S1:4 > 100 mJy) in the GAMA/H-ATLAS fields, for which emission line redshifts could be deduced, from observations of 79 target sources with the EFOSC2 spectrograph on the NTT. The mean redshift of these sources is z = 1:2; 12 were identified as quasars (40 per cent), and 6 redshifts (out of 24 targets) were found for AGN hosts to multiple radio components. While obtaining spectra for hosts of these multi-component sources is possible, their lower success rate highlights the difficulty in acheiving a redshift-complete radio sample. Taking an existing spectroscopic redshift survey (GAMA) and radio sources from the FIRST survey (S1:4 > 1 mJy), I then present a cross-matched radio sample with 1,635 spectroscopic redshifts with a median value of z = 0:34. The spatial correlation function of this sample is used to find the redshiftspace (s0) and real-space correlation lengths (r0 _ 8:2 h 1Mpc), and a mass bias of _1.9. Insight into the redshift-dependence of these quantities is gained by using the angular correlation function and Limber inversion to measure the same spatial clustering parameters. Photometric redshifts from SDSS/UKIDSS are incorporated to produce a larger matched radio sample at z ' 0:48 (and low- and high-redshift subsamples at z ' 0:30 and z ' 0:65), while their redshift distribution is subtracted from that taken from the SKADS radio simulations to estimate the redshift distribution of the remaining unmatched sources (z ' 1:55). The observed bias evolution over this redshift range is compared with model predictions based on the SKADS simulations, with good agreement at low redshift. The bias found at high redshift significantly exceeds these predictions, however, suggesting a more massive population of galaxies than expected, either due to the relative proportions of different radio sources, or a greater typical halo mass for the high-redshift sources. Finally, the reliance on a model redshift distribution to reach to higher redshifts is removed, as the angular cross-correlation function is used with deep VLA data (S1:4 > 90 _Jy) and optical/IR data from VIDEO/CFHTLS (Ks < 23:5) over 1 square degree. With high-quality photometric redshifts up to z _ 4, and a high signal-to-noise clustering measurement (due to the _100,000 Ks-selected galaxies), I am able to find the bias of a matched sample of only 766 radio sources (as well as of v vi the VIDEO sources), divided into 4 redshift bins reaching a median bias at z ' 2:15. Again, at high redshift, the measured bias appears to exceed the prediction made from the SKADS simulations. Applying luminosity cuts to the radio sample at L > 1023 WHz 1 and higher (removing any non-AGN sources), I find a bias of 8–10 at z _ 1:5, considerably higher than for the full sample, and consistent with the more numerous FRI AGN having similar mass to the FRIIs (M _ 1014 M_), contrary to the assumptions made in the SKADS simulations. Applying this adjustment to the model bias produces a better fit to the observations for the FIRST radio sources cross-matched with GAMA/SDSS/UKIDSS, as well as for the high-redshift radio sources in VIDEO. Therefore, I have shown that we require a more robust model of the evolution of AGN, and their relation to the underlying dark matter distribution. In particular, understanding these quantities for the abundant FRI population is crucial if we are to use such sources to probe the cosmological model as has been suggested by a number of authors (e.g. Raccanelli et al., 2012; Camera et al., 2012; Ferramacho et al., 2014). 523.1
405	Bayesian-based techniques for tracking multiple humans in an enclosed environment ur-Rehman, Ata January 2014 (has links) This thesis deals with the problem of online visual tracking of multiple humans in an enclosed environment. The focus is to develop techniques to deal with the challenges of varying number of targets, inter-target occlusions and interactions when every target gives rise to multiple measurements (pixels) in every video frame. This thesis contains three different contributions to the research in multi-target tracking. Firstly, a multiple target tracking algorithm is proposed which focuses on mitigating the inter-target occlusion problem during complex interactions. This is achieved with the help of a particle filter, multiple video cues and a new interaction model. A Markov chain Monte Carlo particle filter (MCMC-PF) is used along with a new interaction model which helps in modeling interactions of multiple targets. This helps to overcome tracking failures due to occlusions. A new weighted Markov chain Monte Carlo (WMCMC) sampling technique is also proposed which assists in achieving a reduced tracking error. Although effective, to accommodate multiple measurements (pixels) produced by every target, this technique aggregates measurements into features which results in information loss. In the second contribution, a novel variational Bayesian clustering-based multi-target tracking framework is proposed which can associate multiple measurements to every target without aggregating them into features. It copes with complex inter-target occlusions by maintaining the identity of targets during their close physical interactions and handles efficiently a time-varying number of targets. The proposed multi-target tracking framework consists of background subtraction, clustering, data association and particle filtering. A variational Bayesian clustering technique groups the extracted foreground measurements while an improved feature based joint probabilistic data association filter (JPDAF) is developed to associate clusters of measurements to every target. The data association information is used within the particle filter to track multiple targets. The clustering results are further utilised to estimate the number of targets. The proposed technique improves the tracking accuracy. However, the proposed features based JPDAF technique results in an exponential growth of computational complexity of the overall framework with increase in number of targets. In the final work, a novel data association technique for multi-target tracking is proposed which more efficiently assigns multiple measurements to every target, with a reduced computational complexity. A belief propagation (BP) based cluster to target association method is proposed which exploits the inter-cluster dependency information. Both location and features of clusters are used to re-identify the targets when they emerge from occlusions. The proposed techniques are evaluated on benchmark data sets and their performance is compared with state-of-the-art techniques by using, quantitative and global performance measures. 620.001
406	On the Autoregressive Conditional Heteroskedasticity Models Stenberg, Erik January 2016 (has links) No description available. ARCH GARCH Volatility clustering fat tail forecasting intra-daily returns
407	News Feeds Clustering Research Study Abuel-Futuh, Haytham 01 April 2015 (has links) With over 0.25 billion web pages hosted in the World Wide Web, it is virtually impossible to navigate through the Internet. Many applications try to help users achieve this task. For example, search engines build indexes to make the entire World Wide Web searchable, and news curators allow users to browse topics of interest on different structured sites. One problem that arises for these applications and others with similar goals is identifying documents with similar contents. This helps the applications show users documents with unique contents as well as group various similar documents under similar topics. There has been a lot of effort into algorithms that can achieve that task. Prior research include Yang, Pierce & Carbonell (1998) research where they looked at the problem of identifying news events exploiting chronology order, Nallapati, et al (2004) research who built a dependency model for news events and Shah & Elbahesh (2004) research where they used Jaccard coefficient to generate a flat list of topics. This research will identify training and testing datasets, and it will train and evaluate (Pera & Ng) algorithm. The chosen algorithm is a hierarchical clustering algorithm that incorporates many of the ideas researched earlier. In evaluation phase, error will be measured in the ratio of miss-categorized documents to the total number of documents. The research will show error can be as low as 0.03 with a model built on a single node processing 1000 random distinct documents. In evaluation of the algorithm, the experiments will show that (Pera & Ng)’s fuzzy equivalence algorithm does produce acceptable results when compared to Google News as a reference. The algorithm, however, requires a huge amount of memory to hold the trained model. This renders it not suitable to run on portable devices. Computer Science Information technology Clustering News feeds Computer Sciences
408	Automated protein-family classification based on hidden Markov models Frisk, Christoffer January 2015 (has links) The aim of the project presented in this paper was to investigate the possibility toautomatically sub-classify the superfamily of Short-chain Dehydrogenase/Reductases (SDR).This was done based on an algorithm previously designed to sub-classify the superfamily ofMedium-chain Dehydrogenase/Reductases (MDR). While the SDR-family is interesting andimportant to sub-classify there was also a focus on making the process as automatic aspossible so that future families also can be classified using the same methods.To validate the results generated it was compared to previous sub-classifications done on theSDR-family. The results proved promising and the work conducted here can be seen as a goodinitial part of a more comprehensive full investigation Hidden Markov model sequence identity cluster automatic clustering
409	The role of talent in firm location decision:A multiple-case study of clean-tech firms in Uppsala Schröder, Catharina, Azargoon, Sara January 2016 (has links) The shift from an industrial-based to a knowledge-based economy has impacted market conditions and created a demand for a talented and skilled workforce in knowledge-intensive industries. This paper investigates what the role of talent is, when firms decide for a location by carrying out two studies. Firstly, an extensive literature review was conducted where three factors of firm location decision were identified as: clustering, soft and hard factors, and personal networks. The role of talent continuously emerged in the literature in regards to the three firm location decision factors and appeared to be intertwined in these. Consequently, these factors and the role of talent were conceptualized in an analytical framework. Thereafter, the analytical framework was applied to the second study when conducting a multiple-case study of three clean-tech firms in Uppsala, Sweden. This was done in order to investigate the role of talent in the firms’ location decision. The findings of the multiple-case study revealed that the role of talent was important during the location decision of all three firms where talent was intertwined with the identified location factors. Thus, the multiple-case study confirms that the role of talent impacts firm location decision through being intertwined in the identified firm location decision factors. Firm Location Decision Talent Clustering Soft and Hard Factors Personal Networks
410	Optimization of Nodes in Mixed Network Using Three Distance Measures Woldearegay, Yonas, Traore, Oumar 10 1900 (has links) ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada / This paper presents a method for the management of mixed networks as envisioned in future iNET applications and develops a scheme for global optimal performance for features that include signal to Noise Ratio (SNR), Quality of service (QoS), and Interference. This scheme demonstrates potential for significant enhancement of performance for dense traffic environments envisioned in future telemetry applications. Previous research conducted at Morgan State University has proposed a cellular and Ad hoc mixed network for optimum capacity and coverage using two distance measures: QoS and SNR. This paper adds another performance improvement technique, interference, as a third distance measure using an analytical approach and using extensive simulation with MATLAB. This paper also addresses solutions where performance parameters are correlated and uncorrelated. The simulations show the optimization of mixed network nodes using distance, traffic and interference measures all at one time. This has great potential in mobile communication and iNET. Mixed Network Ad hoc Clustering QoS Interference Management

Search results