Global ETD Search

241	Feature Ranking for Text Classifiers Makrehchi, Masoud January 2007 (has links) Feature selection based on feature ranking has received much attention by researchers in the field of text classification. The major reasons are their scalability, ease of use, and fast computation. %, However, compared to the search-based feature selection methods such as wrappers and filters, they suffer from poor performance. This is linked to their major deficiencies, including: (i) feature ranking is problem-dependent; (ii) they ignore term dependencies, including redundancies and correlation; and (iii) they usually fail in unbalanced data. While using feature ranking methods for dimensionality reduction, we should be aware of these drawbacks, which arise from the function of feature ranking methods. In this thesis, a set of solutions is proposed to handle the drawbacks of feature ranking and boost their performance. First, an evaluation framework called feature meta-ranking is proposed to evaluate ranking measures. The framework is based on a newly proposed Differential Filter Level Performance (DFLP) measure. It was proved that, in ideal cases, the performance of text classifier is a monotonic, non-decreasing function of the number of features. Then we theoretically and empirically validate the effectiveness of DFLP as a meta-ranking measure to evaluate and compare feature ranking methods. The meta-ranking framework is also examined by a stopword extraction problem. We use the framework to select appropriate feature ranking measure for building domain-specific stoplists. The proposed framework is evaluated by SVM and Rocchio text classifiers on six benchmark data. The meta-ranking method suggests that in searching for a proper feature ranking measure, the backward feature ranking is as important as the forward one. Second, we show that the destructive effect of term redundancy gets worse as we decrease the feature ranking threshold. It implies that for aggressive feature selection, an effective redundancy reduction should be performed as well as feature ranking. An algorithm based on extracting term dependency links using an information theoretic inclusion index is proposed to detect and handle term dependencies. The dependency links are visualized by a tree structure called a term dependency tree. By grouping the nodes of the tree into two categories, including hub and link nodes, a heuristic algorithm is proposed to handle the term dependencies by merging or removing the link nodes. The proposed method of redundancy reduction is evaluated by SVM and Rocchio classifiers for four benchmark data sets. According to the results, redundancy reduction is more effective on weak classifiers since they are more sensitive to term redundancies. It also suggests that in those feature ranking methods which compact the information in a small number of features, aggressive feature selection is not recommended. Finally, to deal with class imbalance in feature level using ranking methods, a local feature ranking scheme called reverse discrimination approach is proposed. The proposed method is applied to a highly unbalanced social network discovery problem. In this case study, the problem of learning a social network is translated into a text classification problem using newly proposed actor and relationship modeling. Since social networks are usually sparse structures, the corresponding text classifiers become highly unbalanced. Experimental assessment of the reverse discrimination approach validates the effectiveness of the local feature ranking method to improve the classifier performance when dealing with unbalanced data. The application itself suggests a new approach to learn social structures from textual data. Feature Ranking Feature Selection Information Theory Text Classifier Social Network Link Mining Electrical and Computer Engineering
242	Ranked Retrieval in Uncertain and Probabilistic Databases Soliman, Mohamed January 2011 (has links) Ranking queries are widely used in data exploration, data analysis and decision making scenarios. While most of the currently proposed ranking techniques focus on deterministic data, several emerging applications involve data that are imprecise or uncertain. Ranking uncertain data raises new challenges in query semantics and processing, making conventional methods inapplicable. Furthermore, the interplay between ranking and uncertainty models introduces new dimensions for ordering query results that do not exist in the traditional settings. This dissertation introduces new formulations and processing techniques for ranking queries on uncertain data. The formulations are based on marriage of traditional ranking semantics with possible worlds semantics under widely-adopted uncertainty models. In particular, we focus on studying the impact of tuple-level and attribute-level uncertainty on the semantics and processing techniques of ranking queries. Under the tuple-level uncertainty model, we introduce a processing framework leveraging the capabilities of relational database systems to recognize and handle data uncertainty in score-based ranking. The framework encapsulates a state space model, and efficient search algorithms that compute query answers by lazily materializing the necessary parts of the space. Under the attribute-level uncertainty model, we give a new probabilistic ranking model, based on partial orders, to encapsulate the space of possible rankings originating from uncertainty in attribute values. We present a set of efficient query evaluation algorithms, including sampling-based techniques based on the theory of Markov chains and Monte-Carlo method, to compute query answers. We build on our techniques for ranking under attribute-level uncertainty to support rank join queries on uncertain data. We show how to extend current rank join methods to handle uncertainty in scoring attributes. We provide a pipelined query operator implementation of uncertainty-aware rank join algorithm integrated with sampling techniques to compute query answers. Ranking Uncertainty Probabilistic Models Query Processing Top-k Partial Order Computer Science
243	Variance Estimation in Steady-State Simulation, Selecting the Best System, and Determining a Set of Feasible Systems via Simulation Batur, Demet 11 April 2006 (has links) In this thesis, we first present a variance estimation technique based on the standardized time series methodology for steady-state simulations. The proposed variance estimator has competitive bias and variance compared to the existing estimators in the literature. We also present the technique of rebatching to further reduce the bias and variance of our variance estimator. Second, we present two fully sequential indifference-zone procedures to select the best system from a number of competing simulated systems when best is defined by the maximum or minimum expected performance. These two procedures have parabola shaped continuation regions rather than the triangular continuation regions employed in several papers. The rocedures we present accommodate unequal and unknown ariances across systems and the use of common random numbers. However, we assume that basic observations are independent and identically normally distributed. Finally, we present procedures for finding a set of feasible or near-feasible systems among a finite number of simulated systems in the presence of multiple stochastic constraints, especially when the number of systems or constraints is large. Multiple performance measures Constraints Variance parameter estimation Ranking and selection Simulation output analysis Fully sequential procedures
244	A Dea-based Approach To Ranking Multi-criteria Alternatives Tuncer, Ceren 01 August 2006 (has links) (PDF) ABSTRACT A DEA-BASED APPROACH TO RANKING MULTI-CRITERIA ALTERNATIVES Tuncer, Ceren M.Sc., Department of Industrial Engineering Supervisor: Prof. Dr. Murat K&ouml / ksalan August 2006, 88 pages This thesis addresses the problem of ranking multi-criteria alternatives. A Data Envelopment Analysis (DEA)-based approach, the Method of the Area of the Efficiency Score Graph (AES) is proposed. Rather than assessing the alternatives with respect to the fixed original alternative set as done in the existing DEA-based ranking methods, AES considers the change in the efficiency scores of the alternatives while reducing the size of the alternative set. Producing a final score for each alternative that accounts for the progress of its efficiency score, AES favors alternatives that manage to improve quickly and maintain high levels of efficiency. The preferences of the Decision Maker (DM) are incorporated into the analysis in the form of weight restrictions. The utilization of the AES scores of the alternatives in an incremental clustering algorithm is also proposed. The AES Method is applied to rank MBA programs worldwide, sorting of the programs is also performed using their AES scores. Results are compared to another DEA-based ranking method. Keywords: Ranking, data envelopment analysis, weight restrictions.
245	Hybrid Ranking Approaches Based On Data Envelopment Analysis And Outranking Relations Eryilmaz, Utkan 01 December 2006 (has links) (PDF) In this study two different hybrid ranking approaches based on data envelopment analysis and outranking relations for ranking alternatives are proposed. Outranking relations are widely used in Multicriteria Decision Making (MCDM) for ranking the alternatives and appropriate in situations when we have limited information on the preference structure of the decision maker. Yet to apply these methods DM should provide exact values for method parameters (weights, thresholds etc.) as well as basic information such as alternative scores. DEA is used for classification of decision making units according to their efficiency scores in a non-parameteric way. The proposed hybrid approaches utilize PROMETHEE (a well known method based on outranking relations) to construct outranking relations by pairwise comparisons and a technique similar to DEA crossefficiency ranking for aggregating comparisons. While first of the proposed approaches can deal with imprecise specification of criterion weights, second approach can utilize imprecise weights and thresholds. HA By Region or Country 175-4737
246	A Service Oriented Peer To Peer Web Service Discovery Mechanism With Categorization Ozorhan, Mustafa Onur 01 March 2010 (has links) (PDF) This thesis, studies automated methods to achieve web service advertisement and discovery, and presents efficient search and matching techniques based on OWL-S. In the proposed system, the service discovery and matchmaking is performed via a centralized peer-to-peer web service repository. The repository has the ability to run on a software cloud, which improves the availability and scalability of the service discovery. The service advertisement is done semi-automatically on the client side, with an automatic WSDL to OWL-S conversion, and manual service description annotation. An OWL-S based unified ontology -Suggested Upper Merged Ontology- is used during annotation, to enhance semantic matching abilities of the system. The service advertisement and availability are continuously monitored on the client side to improve the accuracy of the query results. User-agents generate query specification using the system ontology, to provide semantic unification between the client and the system during service discovery. Query matching is performed via complex Hilbert Spaces composed of conceptual planes and categorical similarities for each web service. User preferences following the service queries are monitored and used to improve the service match scores in the long run. QA Computer Software 76.75-76.765
247	A Framework For Ranking And Categorizing Medical Documents Al Zamil, Mohammed Gh. I. 01 June 2010 (has links) (PDF) In this dissertation, we present a framework to enhance the retrieval, ranking, and categorization of text documents in medical domain. The contributions of this study are the introduction of a similarity model to retrieve and rank medical textdocuments and the introduction of rule-based categorization method based on lexical syntactic patterns features. We formulate the similarity model by combining three features to model the relationship among document and construct a document network. We aim to rank retrieved documents according to their topics / making highly relevant document on the top of the hit-list. We have applied this model on OHSUMED collection (TREC-9) in order to demonstrate the performance effectiveness in terms of topical ranking, recall, and precision metrics. In addition, we introduce ROLEX-SP (Rules Of LEXical Syntactic Patterns) / a method for the automatic induction of rule-based text-classifiers relies on lexical syntactic patterns as a set of features to categorize text-documents. The proposed method is dedicated to solve the problem of multi-class classification and feature imbalance problems in domain specific text documents. Furthermore, our proposed method is able to categorize documents according to a predefined set of characteristics such as: user-specific, domain-specific, and query-based categorization which facilitates browsing documents in search-engines and increase users ability to choose among relevant documents. To demonstrate the applicability of ROLEX-SP, we have performed experiments on OHSUMED (categorization collection). The results indicate that ROLEX-SP outperforms state-of-the-art methods in categorizing short-text medical documents.
248	A New Framework For Evaluation Of Field Based Academic Performances Of Higher Education Institutions Omruuzun, Fatih 01 August 2011 (has links) (PDF) Measurement and evaluation of academic performance is an highly debated research area and results of the studies in this area are closely followed by a large segment of the society. In general, researches conducted in this domain evaluate higher education institutions as a whole, but such an approach actually represents an average performance of the research fields, which are actively studied by the members of institutions. This may be misleading, because academic performance varies for each university depending on the field of research. However, people who are interested in the results of these studies require more detailed information about field based academic performances of institutions. One of these studies mentioned above have been implemented in 2011 by University Ranking by Academic Performance (URAP) research laboratory which was established in Middle East Technical University - Informatics Institute. In this study, 2000 universities around the world have been ranked according to multiple criteria in terms of overall academic performance. Interests shown to results of the system implemented by URAP revealed a need for a more comprehensive ranking system, which deals with the evaluation of field based academic performance. In this sense, within the scope of this study, universities ranked by URAP research laboratory were evaluated in terms of their academic performance in the following six research fields / Agriculture &amp / Environmental Sciences (AGE) Clinical Medicine (MED) Engineering, Computing &amp / Technology (ENG) Life Sciences (LIFE) Natural Sciences (SCI) Social Sciences (SOC) Institutions in this study has been evaluated according to data that have been collected from ISI - Web of Knowledge for the indicators listed below. Article Count (last year) Total Document Count (last 5 years) Cumulative Journal Impact (last 5 years) Total Citation Count (last 5 years) H-Index (average of last 5 years) The results indicate that status of universities from the point of academic performance varies according to the research field. ZA Databases 4450-4460
249	Advances in ranking and selection: variance estimation and constraints Healey, Christopher M. 16 July 2010 (has links) In this thesis, we first show that the performance of ranking and selection (R&S) procedures in steady-state simulations depends highly on the quality of the variance estimates that are used. We study the performance of R&S procedures using three variance estimators --- overlapping area, overlapping Cramer--von Mises, and overlapping modified jackknifed Durbin--Watson estimators --- that show better long-run performance than other estimators previously used in conjunction with R&S procedures for steady-state simulations. We devote additional study to the development of the new overlapping modified jackknifed Durbin--Watson estimator and demonstrate some of its useful properties. Next, we consider the problem of finding the best simulated system under a primary performance measure, while also satisfying stochastic constraints on secondary performance measures, known as constrained ranking and selection. We first present a new framework that allows certain systems to become dormant, halting sampling for those systems as the procedure continues. We also develop general procedures for constrained R&S that guarantee a nominal probability of correct selection, under any number of constraints and correlation across systems. In addition, we address new topics critical to efficiency of the these procedures, namely the allocation of error between feasibility check and selection, the use of common random numbers, and the cost of switching between simulated systems. Multiple performance measures Selection of the best system Steady-state simulation Ranking and selection (Statistics) Analysis of variance
250	A fast protein-ligand docking method Genheden, Samuel January 2006 (has links) <p>In this dissertation a novel approach to protein-ligand docking is presented. First an existing method to predict putative active sites is employed. These predictions are then used to cut down the search space of an algorithm that uses the fast Fourier transform to calculate the geometrical and electrostatic complementarity between a protein and a small organic ligand. A simplified hydrophobicity score is also calculated for each active site. The docking method could be applied either to dock ligands in a known active site or to rank several putative active sites according to their biological feasibility. The method was evaluated on a set of 310 protein-ligand complexes. The results show that with respect to docking the method with its initial parameter settings is too coarse grained. The results also show that with respect to ranking of putative active sites the method works quite well.</p> protein-ligand docking molecular modelling putative active sites ranking fast Fourier transform Bioinformatics Bioinformatik

Search results