401 |
Model-based Learning: t-Families, Variable Selection, and Parameter EstimationAndrews, Jeffrey Lambert 27 August 2012 (has links)
The phrase model-based learning describes the use of mixture models in machine learning problems. This thesis focuses on a number of issues surrounding the use of mixture models in statistical learning tasks: including clustering, classification, discriminant analysis, variable selection, and parameter estimation. After motivating the importance of statistical learning via mixture models, five papers are presented. For ease of consumption, the papers are organized into three parts: mixtures of multivariate t-families, variable selection, and parameter estimation. / Natural Sciences and Engineering Research Council of Canada through a doctoral postgraduate scholarship.
|
402 |
Automatic text summarization in digital librariesMlynarski, Angela, University of Lethbridge. Faculty of Arts and Science January 2006 (has links)
A digital library is a collection of services and information objects for storing, accessing, and retrieving digital objects. Automatic text summarization presents salient information in a condensed form suitable for user needs. This thesis amalgamates digital libraries and automatic text summarization by extending the Greenstone Digital Library software suite to include the University of Lethbridge Summarizer. The tool generates summaries, nouns, and non phrases for use as metadata for searching and browsing digital collections. Digital collections of newspapers, PDFs, and eBooks were created with summary metadata. PDF documents were processed the fastest at 1.8 MB/hr, followed by the newspapers at 1.3 MB/hr, with eBooks being the slowest at 0.9 MV/hr. Qualitative analysis on four genres: newspaper, M.Sc. thesis, novel, and poetry, revealed narrative newspapers were most suitable for automatically generated summarization. The other genres suffered from incoherence and information loss. Overall, summaries for digital collections are suitable when used with newspaper documents and unsuitable for other genres. / xiii, 142 leaves ; 28 cm.
|
403 |
Market segmentation and factors affecting stock returns on the JSE.Chimanga, Artwell S. January 2008 (has links)
<p><font face="F59" size="3"><font face="F59" size="3">
<p align="left">This study examines the relationship between stock returns and market segmentation. Monthly returns of stocks listed on the JSE from 1997-2007 are analysed using mostly the analytic factor and cluster analysis techniques. Evidence supporting the use of multi-index models in explaining the return generating process on the JSE is found. The results provide additional support for Van Rensburg (1997)'s hypothesis on market segmentation on the JSE.</p>
</font></font></p>
|
404 |
Nonnegative matrix factorization for clusteringKuang, Da 27 August 2014 (has links)
This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods.
Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments.
Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents.
|
405 |
Clusteranalyse der Gemeinden in der Kernregion MitteldeutschlandGeyler, Stefan, Warner, Barbara, Brandl, Anja, Kuntze, Martina 19 September 2014 (has links) (PDF)
Der hier vorgelegte Band befasst sich mit einer Typisierung der Gemeinden in der Kernregion Mitteldeutschland, die im Rahmen einer Clusteranalyse durchgeführt wurde. Dieses multivariate Verfahren integriert Aspekte der Raumstruktur, der demographischen und wirtschaftlichen Entwicklung, der technischen und verkehrlichen Infrastruktur sowie der öffentlichen Finanzen. Die 16 aus einem
größeren Datenset ausgewählten Kennzahlen fokussieren wichtige Entwicklungsverläufe, die derzeitige Situation sowie die Rahmenbedingungen der einzelnen Gemeinden. Ziel ist es, auf dieser Grundlage Gemeinden mit ähnlicher Merkmalsausprägung zu gruppieren, um auf dieser Basis Referenzgemeinden mit exemplarischen Ausgangsbedingungen und Problemstellungen zu identifizieren. Mit diesen sollen im weiteren Forschungsverlauf planerische und kommunalpolitische Zielkonflikte analysiert und instrumentelle Möglichkeiten zur Reduzierung der Inanspruchnahme von Flächen für Wohnen, Gewerbe und Verkehr durch stärkere interkommunale Kooperation erarbeitet werden.
|
406 |
GIS and cluster analysis : understanding settlement systems in early Christian IrelandAnderson, Jason Michael January 1997 (has links)
Using cluster analysis and a geographic information system (GIS), this study attempted to identify a settlement system in the Dingle Peninsula of Early Christian Ireland based on the morphological variability of ringforts. Cluster analysis was used to determine if an intuitive ringfort typological model created by the author had validity. Use of cluster analysis identified three distinct classes of univallate ringfort. Although these clusters have a higher variable mean than anticipated, they do appear to verify partial validity of the author's model. With the exception of Cluster 1, it appears that the assumption that as unvallate ringfort banks increase in elaboration, than so does their internal diameter.ARC/INFO, a GIS was used to help test the hypothesized relationship between ringfort clusters. It was assumed that the univallate ringforts with the smallest banks would be very close to and in the line of sight of bivallate and mulitvallate ringforts. Those with an intermediate bank size would tend to be farther away and not in the line of sight of bivallate and multivallate ringforts. These assumptions were determined to be invalid. / Department of Anthropology
|
407 |
Skills and competencies employers require from supply chain graduates : A job advertisements content analysisGrigoriadis, Nikolaos January 2014 (has links)
Background: The skills and competencies of the professionals in the supply chain sector have been highlighted since the 1960s as an area of academic interest. In modern days there are reports and articles highlighting a “skills-gap” between employers’ requirements and business graduates. In the meanwhile youth unemployment is a contemporary, acknowledged European problem and therefore there shouldn’t be a gap between supply and demand of young talents. Therefore it raises questions as to why employers report lack of young talents and at the same time youth unemployment is on the rise. Purpose: The present thesis will answer part of the abovementioned questions. More specifically it will measure the part of employers’ expectations. For that reason it will investigate in a transparent and systematic way, the requirements that employers state they expect from business graduates within the supply chain function through published job advertisements. Method: Empirical data consist of 60 publically available job advertisements aiming at supply chain graduates. The collected empirical data were analysed by the means of quantitative content analysis and then cluster analysis.Results and conclusion: The contemporary supply chain graduate is expected to demonstrate an all-around personality. The most frequently requested skills were teamwork, problem-solving ability, effective communication, English, and having a responsible, mature and professional attitude. Suggestions for future research: A longitudinal study in a broader linguistic context would raise awareness on emerging skills and track changes over time.
|
408 |
Extending low-rank matrix factorizations for emerging applicationsZhou, Ke 13 January 2014 (has links)
Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis.
A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation.
In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix
factorization} (fMF), a novel cold-start recommendation method that
solves the problem of adaptive interview construction based on low-rank matrix factorizations.
The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces.
Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items.
In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner.
In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space.
|
409 |
Cluster Potential In Industrial Sectors Of Samsun: Kutlukent Furniture Cluster StudyBozkirlioglu, Ali 01 December 2004 (has links) (PDF)
The present study investigated whether cluster potentials could be identified in the geographical area within the boundaries of Samsun province, and if identified, how such a potential could be promoted through corresponding support measures. Development of policy recommendations for promotion of identified cluster potential was the principal goal of the study. The course of the study was characterized by a cluster-based policy-making process in the policy environment, i.e. Samsun province. The process includes a descriptive part, i.e. cluster analysis, and a prescriptive part, i.e. determining policy goals and designing policy instruments. In the literature review, a guide to the field study was developed by review of various approaches to cluster concept / common features of clusters and the competitive advantages these give rise to / various practices in cluster-based policy development, and various cluster analysis methods. The field study starts with the initial identification of need for policy intervention, at which stage the rationale for pursuing a cluster-based policy in the specific conditions of Samsun and Turkey was discussed. The &ldquo / clusters as sectors&rdquo / approach was utilized in the identification of region&rsquo / s (potential) clusters and selection of the cluster as the subject of analysis and policy development. The analysis of industrial sectors in Samsun&rsquo / s economy was followed by selection of the target sector via employing various criteria assessing the importance of these sectors in terms of value added to the regional economy, and the clustering potential. Accordingly, furniture sector was selected, and the agglomeration of furniture sector enterprises in Kutlukent locality was identified as the potential cluster to be the subject of analysis and policy development. Following the identification of the potential cluster, the descriptive part was completed by second-stage micro-level analysis of the identified potential cluster, by which detailed information about the potential cluster was presented. At that phase, cluster potential of the structure was assessed by examining the elements in cluster value and production chain / public and private business support infrastructure / the flow of materials and goods in the chain / untraded relationships between the elements / characteristics of enterprises and workforce / and innovation performance. This comprehensive in-depth analysis of the cluster provided the required information to identify the specific needs of the cluster for cluster-based policy intervention. In the last part of the thesis, i.e. prescriptive part, cluster-oriented policy recommendations were developed including the determination of policy goal and the design/selection of policy instruments.
The necessary information was collected by two-stage expert interviews, and by overall scan of the enterprises involved in the cluster via enterprise survey, which was realized in interviews with all of the enterprises. Six experts and 283 enterprises participated in the study. The results of the analysis showed that, while Kutlukent furniture cluster had some features, which are common in effective cluster models, the cluster lacks some critical features, which are crucial for effective functioning of a successful cluster. Hence, Kutlukent furniture cluster was defined as a &ldquo / potential&rdquo / cluster, which should be promoted by utilizing the existing potentials and strengths, and by addressing the weaknesses and obstacles identified in the analysis of the cluster, via appropriate cluster-oriented policy measures, which were proposed in the prescriptive part of the policy-making process. By these measures, the elements of Kutlukent potential cluster would be able to realize competitive advantages associated with clustering as in successful cluster models.
|
410 |
Macroeconomic Study of Construction Firm's Profitability Using Cluster AnalysisArora, Parth 2012 August 1900 (has links)
This research aims to identify important factors contributing to a construction firm's profitability and to develop a prediction model which would help in determining the gross margin/profitability of a construction firm as a function of important parameters. All the data used in the research was taken from U.S Census Bureau reports. The novelty of the research lies on its focus at a state level, by dividing states into pertinent clusters and then analyzing the trends in each cluster independently.
The research was divided into two phases. Phase 1 of the research focused on identification of the most important factors contributing to gross margin of a construction firm. The variables used were derived from the U.S Census Bureau data. Based on the independent variables and gross margin, all the states were divided into three clusters. Subsequently, a prediction model was developed for each cluster using step-wise backward elimination, thus, eliminating non-significant variables.
Results of Model 1 gave impetus to developing Model 2. Model 1 clearly showed that labor productivity was the most important variable in determining gross margin. Model 2 was developed to predict gross margin as a function of single most important factor of labor productivity. Similar to Model 1, states were clustered based on their labor productivity and gross margin values. Prediction model was developed for each cluster.
In this study, an excel embedded decision support tool was also developed. This tool would aid the decision-makers to view the state's level of gross margin and labor productivity at a glance. Decision support tool developed was in the form of color-coded maps, each of which was linked to a spreadsheet containing pertinent data.
The most important conclusion of the research was that there exists a positive linear relationship between labor productivity and gross margin at a state level in the construction industry. The research also identified and quantified other important factors like percent of rental equipment used, percent of construction work sub-contracted out and percent of cost of materials, components and supplies which affect gross margin.
|
Page generated in 0.0288 seconds