Global ETD Search

1	Mathematical modeling with applications in high-performance coding Su, Yong, January 2005 (has links) Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xiv, 130 p.; also includes graphics (some col.). Includes bibliographical references (p. 125-130). Available online via OhioLINK's ETD Center
2	Inferring Network Status from Partial Observations Rangudu, Venkata Pavan Kumar 09 February 2017 (has links) In many network applications, such as the Internet and infrastructure networks, nodes fail or get congested dynamically, but tracking this information about all the nodes in a network where some dynamical processes are taking place is a fundamental problem. In this work, we study the problem of inferring the complete set of failed nodes, when only a sample of the node failures are known---we will be referring to this particular problem as prob{} . We consider the setting in which there exists correlations between node failures in networks, which has been studied in the case of many infrastructure networks. We formalize the prob{} problem using the Minimum Description Length (MDL) principle and we show that, in general, finding solutions that minimize the MDL cost is hard, and develop efficient algorithms with rigorous performance guarantees for finding near-optimal MDL cost solutions. We evaluate our methods on both synthetic and real world datasets, which includes the one from WAZE. WAZE is a crowd-sourced road navigation tool, that collects and presents the traffic incident reports. We found that the proposed greedy algorithm for this problem is able to recover $80%$, on average, of the failed nodes in a network for a given partial sample of input failures, which are sampled from the true set of failures at some predefined rate. Furthermore, we have also proved that this algorithm will find a solution that has MDL cost with an additive approximation guarantee of log(n) from the optimal. / Master of Science / In many real-world networks, such as Internet and Transportation networks, there will be some dynamical processes taking place. Due to the activity of these processes some of the elements in these networks may fail at random. For example service node failures in Internet, traffic congestion in road networks are some such scenarios. Identifying the complete state information of such networks is a fundamental problem. In this work, we study the problem of identifying unknown node failures in a network based on the partial observations – we referr to this problem as NetStateInf. Similar to some of the previous studies in this area we assume the settings where node failures in these networks are correlated. We approached this problem using Minimum Description Length (MDL) principle, which states that the information learned from a given data can be maximized by compressing it i.e., by identifying maximum number of patterns in the data. Using these concepts we have developed a mathematical representation of NetStateInf problem and proposed efficient algorithms with rigorous performance guarantees for finding the best set of failed nodes in the network that can best explain the observed faiures. We evaluated our algorithms against both synthetic – artificial network with failures generated based on a predefined mathematical model – and real-world data, for example traffic alerts data collected by WAZE, a crowdsourced navigation tool, for Boston road network. Using this approach we are able to recover around 80% of the failured nodes in the network from the given partial failure data. Furthermore, we have proved that our algorithm will find a solution that has a maximum cost difference of <i>log(n)</i> when compared with the optimal solution, where cost of a solution is the MDL way of representing its allignment with desired requirements. Network Topology Inference Network Tomography Minimum Description Length
3	Apport d'un algorithme de segmentation ultra-rapide et non supervisé pour la conception de techniques de segmentation d'images bruitées / Contribution of an ultrafast and unsupervised segmentation algorithm to the conception of noisy images segmentation techniques Liu, Siwei 16 December 2014 (has links) La segmentation d'image constitue une étape importante dans le traitement d'image et de nombreuses questions restent ouvertes. Il a été montré récemment, dans le cas d'une segmentation à deux régions homogènes, que l'utilisation de contours actifs polygonaux fondés sur la minimisation d'un critère issu de la théorie de l'information permet d'aboutir à un algorithme ultra-rapide qui ne nécessite ni paramètre à régler dans le critère d'optimisation, ni connaissance a priori sur les fluctuations des niveaux de gris. Cette technique de segmentation rapide et non supervisée devient alors un outil élémentaire de traitement.L'objectif de cette thèse est de montrer les apports de cette brique élémentaire pour la conception de nouvelles techniques de segmentation plus complexes, permettant de dépasser un certain nombre de limites et en particulier :- d'être robuste à la présence dans les images de fortes inhomogénéités ;- de segmenter des objets non connexes par contour actif polygonal sans complexifier les stratégies d'optimisation ;- de segmenter des images multi-régions tout en estimant de façon non supervisée le nombre de régions homogènes présentes dans l'image.Nous avons pu aboutir à des techniques de segmentation non supervisées fondées sur l'optimisation de critères sans paramètre à régler et ne nécessitant aucune information sur le type de bruit présent dans l'image. De plus, nous avons montré qu'il était possible de concevoir des algorithmes basés sur l'utilisation de cette brique élémentaire, permettant d'aboutir à des techniques de segmentation rapides et dont la complexité de réalisation est faible dès lors que l'on possède une telle brique élémentaire. / Image segmentation is an important step in many image processing systems and many problems remain unsolved. It has recently been shown that when the image is composed of two homogeneous regions, polygonal active contour techniques based on the minimization of a criterion derived from information theory allow achieving an ultra-fast algorithm which requires neither parameter to tune in the optimized criterion, nor a priori knowledge on the gray level fluctuations. This algorithm can then be used as a fast and unsupervised processing module. The objective of this thesis is therefore to show how this ultra-fast and unsupervised algorithm can be used as a module in the conception of more complex segmentation techniques, allowing to overcome several limits and particularly:- to be robust to the presence of strong inhomogeneity in the image which is often inherent in the acquisition process, such as non-uniform illumination, attenuation, etc.;- to be able to segment disconnected objects by polygonal active contour without complicating the optimization strategy;- to segment multi-region images while estimating in an unsupervised way the number of homogeneous regions in the image.For each of these three problems, unsupervised segmentation techniques based on the optimization of Minimum Description Length criteria have been obtained, which do not require the tuning of parameter by user or a priori information on the kind of noise in the image. Moreover, it has been shown that fast segmentation techniques can be achieved using this segmentation module, while keeping reduced implementation complexity. Segmentation Minimum Description Length Contour actif polygonal Inhomogénéités d'intensité Objets non connexes Images multi-Régions Segmentation Minimum Description Length Polygonal active contour Intensity inhomogeneity Disconnected object Multi-Region image
4	Comparison of Description Length for Text Corpus Huang, Chung-Hsiang 24 May 2012 (has links) In this thesis, we compare the description length of different grammars, and extend the research of automatic grammar learning to the grammar production of Stanford parser. In our research before, we have introduced that how to minimize the description length of the grammar which is generated from the Academia Sinica Balanced Corpus. Based on the concept of data compression, the encoding method in our research is effective in reducing the description length of a text corpus. Moreover, we further discussed about the description length of two special cases of context-free grammars: exhaustive and recursive. The exhaustive grammar is that for every distinct sentence in the corpus is derived, and the recursive one covers all strings. In our research of this thesis, we use a parsing tool called "Stanford parser" to parse sentences and generate grammar rules. We also compare the description length of the grammar parsed by machine with the grammar fixed by artificial. In one of the experiments, we use Stanford parser to parse ASBC corpus, and the description length is 53.0Mb. The description length of rule is only 52,683. In the other experiment, we use Stanford parser to parse Sinica Treebank and compare the description length of the generated grammar with the origin. The result shows that the description length of grammar of the Sinica Treebank is 2.76Mb, and the grammar generated by Stanford parser is 4.02Mb. grammar learning context-free grammar description length parse tree Stanford parser
5	Robust incremental relational learning Westendorp, James, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links) Real-world learning tasks present a range of issues for learning systems. Learning tasks can be complex and the training data noisy. When operating as part of a larger system, there may be limitations on available memory and computational resources. Learners may also be required to provide results from a stream. This thesis investigates the problem of incremental, relational learning from imperfect data with constrained time and memory resources. The learning process involves incremental update of a theory when an example is presented that contradicts the theory. Contradictions occur if there is an incorrect theory or noisy data. The learner cannot discriminate between the two possibilities, so both are considered and the better possibility used. Additionally, all changes to the theory must have support from multiple examples. These two principles allow learning from imperfect data. The Minimum Description Length principle is used for selection between possible worlds and determining appropriate levels of additional justification. A new encoding scheme allows the use of MDL within the framework of Inductive Logic Programming. Examples must be stored to provide additional justification for revisions without violating resource requirements. A new algorithm determines when to discard examples, minimising total usage while ensuring sufficient storage for justifications. Searching for revisions is the most computationally expensive part of the process, yet not all searches are successful. Another new algorithm uses a notion of theory stability as a guide to occasionally disallow entire searches to reduce overall time. The approach has been implemented as a learner called NILE. Empirical tests include two challenging domains where this type of learner acts as one component of a larger task. The first of these involves recognition of behavior activation conditions in another agent as part of an opponent modeling task. The second, more challenging task is learning to identify objects in visual images by recognising relationships between image features. These experiments highlight NILE'S strengths and limitations as well as providing new n domains for future work in ILP. Minimum Description Length (MDL) Learning tasks. Incremental relational learning. Inductive Logic Programming (ILP) NILE.
6	Towards algorithmic theory construction in physics Möller, Hampus January 2023 (has links) This master thesis explores the challenge of algorithmic hypothesis generation and its connection to potential inductive biases. In the scientific method, hypotheses are formulated and tested against data for validity. The process of hypothesis generation is, however, not explicitly formulated. A structured approach to hypothesis generation would allow for a near algorithmic process that could scale far beyond the current capabilities of science. The thesis explored the concepts of entropy, symmetry and minimum description length for use as inductive biases. Two algorithms were implemented and evaluated: one for symmetry finding and one for symbolic regression. The theoretical results show a strong connection between entropy and minimum description length with a weaker connection to symmetry. Both implementations indicate potential paths exist to partial or full automation of the hypothesis generation process. In general, there do not seem to exist fundamental issues in automating the process besides the challenge of its implementation. Thus, the thesis demonstrates a clear path forward towards partial or complete automation of the scientific method for physical research. Symbolic regression machine learning symmetry minimum description length entropy Other Physics Topics Annan fysik
7	Mathematical modeling with applications in high-performance coding Su, Yong 10 October 2005 (has links) No description available. Model selection minimum description length Fisher information componential analysis turbo codes bitmap index compression
8	Algorithmic Approaches to Pattern Mining from Structured Data / 構造データからのパターン発見におけるアルゴリズム論的アプローチ Otaki, Keisuke 23 March 2016 (has links) The contents of Chapter 6 are based on work published in IPSJ Transactions on Mathematical Modeling and Its Applications, vol.9(1), pp.32-42, 2016. / 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19846号 / 情博第597号 / 新制\|\|情\|\|104(附属図書館) / 32882 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授山本章博, 教授鹿島久嗣, 教授阿久津達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Data Mining Pattern Mining Structured Data Enumeration Algorithm Lattice Theory Minimum Description Length Principle 007
9	Advances in cluster detection via order statistics, renewal processes and model selection methods with applications to financial forensic statistics Boateng, Kwabena Ansong 13 December 2024 (has links) (PDF) This dissertation develops a novel method to detect clusters that appear in financial transaction data that are linked to fraud. The methods utilize properties of order statistics, renewal processes, and model segmentation. We apply the Minimum Description Length (MDL) principle to select the best model. Our technique makes use of a genetic algorithm to determine the number and location of anomalous cluster(s) embedded in a set of financial transactions. We apply our novel cluster detection method to the Paycheck Protection Program (PPP), a U.S. Government initiative launched in April 2020 amid the COVID-19 pandemic to aid small businesses affected by the crisis. A total of 11.46 million loans amounting to $796 billion were approved under the program.
10	Model Selection via Minimum Description Length Li, Li 10 January 2012 (has links) The minimum description length (MDL) principle originated from data compression literature and has been considered for deriving statistical model selection procedures. Most existing methods utilizing the MDL principle focus on models consisting of independent data, particularly in the context of linear regression. The data considered in this thesis are in the form of repeated measurements, and the exploration of MDL principle begins with classical linear mixed-effects models. We distinct two kinds of research focuses: one concerns the population parameters and the other concerns the cluster/subject parameters. When the research interest is on the population level, we propose a class of MDL procedures which incorporate the dependence structure within individual or cluster with data-adaptive penalties and enjoy the advantages of Bayesian information criteria. When the number of covariates is large, the penalty term is adjusted by data-adaptive structure to diminish the under selection issue in BIC and try to mimic the behaviour of AIC. Theoretical justifications are provided from both data compression and statistical perspectives. Extensions to categorical response modelled by generalized estimating equations and functional data modelled by functional principle components are illustrated. When the interest is on the cluster level, we use group LASSO to set up a class of candidate models. Then we derive a MDL criterion for this LASSO technique in a group manner to selection the final model via the tuning parameters. Extensive numerical experiments are conducted to demonstrate the usefulness of the proposed MDL procedures on both population level and cluster level. Minimum description length Model selection AIC BIC Data compression Linear mixed effects Generalized estimating equation Functional data 0463

Search results