Global ETD Search

1	Comparison of methods to calculate measures of inequality based on interval data Neethling, Willem Francois 12 1900 (has links) Thesis (MComm)—Stellenbosch University, 2015. / ENGLISH ABSTRACT: In recent decades, economists and sociologists have taken an increasing interest in the study of income attainment and income inequality. Many of these studies have used census data, but social surveys have also increasingly been utilised as sources for these analyses. In these surveys, respondents’ incomes are most often not measured in true amounts, but in categories of which the last category is open-ended. The reason is that income is seen as sensitive data and/or is sometimes difficult to reveal. Continuous data divided into categories is often more difficult to work with than ungrouped data. In this study, we compare different methods to convert grouped data to data where each observation has a specific value or point. For some methods, all the observations in an interval receive the same value; an example is the midpoint method, where all the observations in an interval are assigned the midpoint. Other methods include random methods, where each observation receives a random point between the lower and upper bound of the interval. For some methods, random and non-random, a distribution is fitted to the data and a value is calculated according to the distribution. The non-random methods that we use are the midpoint-, Pareto means- and lognormal means methods; the random methods are the random midpoint-, random Pareto- and random lognormal methods. Since our focus falls on income data, which usually follows a heavy-tailed distribution, we use the Pareto and lognormal distributions in our methods. The above-mentioned methods are applied to simulated and real datasets. The raw values of these datasets are known, and are categorised into intervals. These methods are then applied to the interval data to reconvert the interval data to point data. To test the effectiveness of these methods, we calculate some measures of inequality. The measures considered are the Gini coefficient, quintile share ratio (QSR), the Theil measure and the Atkinson measure. The estimated measures of inequality, calculated from each dataset obtained through these methods, are then compared to the true measures of inequality. / AFRIKAANSE OPSOMMING: Oor die afgelope dekades het ekonome en sosioloë ŉ toenemende belangstelling getoon in studies aangaande inkomsteverkryging en inkomste-ongelykheid. Baie van die studies maak gebruik van sensus data, maar die gebruik van sosiale opnames as bronne vir die ontledings het ook merkbaar toegeneem. In die opnames word die inkomste van ŉ persoon meestal in kategorieë aangedui waar die laaste interval oop is, in plaas van numeriese waardes. Die rede vir die kategorieë is dat inkomste data as sensitief beskou word en soms is dit ook moeilik om aan te dui. Kontinue data wat in kategorieë opgedeel is, is meeste van die tyd moeiliker om mee te werk as ongegroepeerde data. In dié studie word verskeie metodes vergelyk om gegroepeerde data om te skakel na data waar elke waarneming ŉ numeriese waarde het. Vir van die metodes word dieselfde waarde aan al die waarnemings in ŉ interval gegee, byvoorbeeld die ‘midpoint’ metode waar elke waarde die middelpunt van die interval verkry. Ander metodes is ewekansige metodes waar elke waarneming ŉ ewekansige waarde kry tussen die onder- en bogrens van die interval. Vir sommige van die metodes, ewekansig en nie-ewekansig, word ŉ verdeling oor die data gepas en ŉ waarde bereken volgens die verdeling. Die nie-ewekansige metodes wat gebruik word, is die ‘midpoint’, ‘Pareto means’ en ‘Lognormal means’ en die ewekansige metodes is die ‘random midpoint’, ‘random Pareto’ en ‘random lognormal’. Ons fokus is op inkomste data, wat gewoonlik ŉ swaar stertverdeling volg, en om hierdie rede maak ons gebruik van die Pareto en lognormaal verdelings in ons metodes. Al die metodes word toegepas op gesimuleerde en werklike datastelle. Die rou waardes van die datastelle is bekend en word in intervalle gekategoriseer. Die metodes word dan op die interval data toegepas om dit terug te skakel na data waar elke waarneming ŉ numeriese waardes het. Om die doeltreffendheid van die metodes te toets word ŉ paar maatstawwe van ongelykheid bereken. Die maatstawwe sluit in die Gini koeffisiënt, ‘quintile share ratio’ (QSR), die Theil en Atkinson maatstawwe. Die beraamde maatstawwe van ongelykheid, wat bereken is vanaf die datastelle verkry deur die metodes, word dan vergelyk met die ware maatstawwe van ongelykheid. Interval data UCTD
2	The Discovery and Retrieval of Temporal Rules in Interval Sequence Data Winarko, Edi, edwin@ugm.ac.id January 2007 (has links) Data mining is increasingly becoming important tool in extracting interesting knowledge from large databases. Many industries are now using data mining tools for analysing their large collections of databases and making business decisions. Many data mining problems involve temporal aspects, with examples ranging from engineering to scientific research, finance and medicine. Temporal data mining is an extension of data mining which deals with temporal data. Mining temporal data poses more challenges than mining static data. While the analysis of static data sets often comes down to the question of data items, with temporal data there are many additional possible relations. One of the tasks in temporal data mining is the pattern discovery task, whose objective is to discover time-dependent correlations, patterns or rules between events in large volumes of data. To date, most temporal pattern discovery research has focused on events existing at a point in time rather than over a temporal interval. In comparison to static rules, mining with respect to time points provides semantically richer rules. However, accommodating temporal intervals offers rules that are richer still. This thesis addresses several issues related to the pattern discovery from interval sequence data. Despite its importance, this area of research has received relatively little attention and there are still many issues that need to be addressed. Three main issues that this thesis considers include the definition of what constitutes an interesting pattern in interval sequence data, the efficient mining for patterns in the data, and the identification of interesting patterns from a large number of discovered patterns. In order to deal with these issues, this thesis formulates the problem of discovering rules, which we term richer temporal association rules, from interval sequence databases. Furthermore, this thesis develops an efficient algorithm, ARMADA, for discovering richer temporal association rules. The algorithm does not require candidate generation. It utilizes a simple index, and only requires at most two database scans. In this thesis, a retrieval system is proposed to facilitate the selection of interesting rules from a set of discovered richer temporal association rules. To this end, a high-level query language specification, TAR-QL, is proposed to specify the criteria of the rules to be retrieved from the rule sets. Three low-level methods are developed to evaluate queries involving rule format conditions. In order to improve the performance of the methods, signature file based indexes are proposed. In addition, this thesis proposes the discovery of inter-transaction relative temporal association rules from event sequence databases. data mining temporal rule interval data sequence data
3	Semi-Continuous Robust Approach for Strategic Infrastructure Planning of Reverse Production Systems Assavapokee, Tiravat 06 April 2004 (has links) Growing attention is being paid to the problem of efficiently designing and operating reverse supply chain systems to handle the return flows of production wastes, packaging, and end-of-life products. Because uncertainty plays a significant role in all fields of decision-making, solution methodologies for determining the strategic infrastructure of reverse production systems under uncertainty are required. This dissertation presents innovative optimization algorithms for designing a robust network infrastructure when uncertainty affects the outcomes of the decisions. In our context, robustness is defined as minimizing the maximum regret under all realization of the uncertain parameters. These new algorithms can be effectively used in designing supply chain network infrastructure when the joint probability distributions of key parameters are unknown. These algorithms only require the information on potential ranges and possible discrete values of uncertain parameters, which often are available in practice. These algorithms extend the state of the art in robust optimization, both in the structure of the problems they address and the size of the formulations. An algorithm for dealing with the problem with correlated uncertain parameters is also presented. Case studies in reverse production system infrastructure design are presented. The approach is generalizable to the robust design of network supply chain systems with reverse production systems as one of their subsystems. Robust optimization Min-Max regret Decision making under uncertainty Interval data minmax regret
4	Intervalová data a výběrový rozptyl: výpočetní aspekty / Interval data and sample variance: computational aspects Sokol, Ondřej January 2014 (has links) This thesis deals with the calculation of the upper limit of the sample variance when the exact data are not known but intervals which certainly contain them are available. Generally, finding the upper limit of the sample variance knowing only interval data is an NP-hard problem, but under certain conditions imposed on the input data an appropriate efficient algorithm can be used. In this work algorithms were modified so that, even at the cost of exponential complexity, one can always find the optimal solution. The goal of this thesis is to compare selected algorithms for calculating the upper limit of sample variance over interval data from the perspective of the average computational complexity on the generated data. Using simulations it is shown that if the data meets certain conditions, the complexity of the average case is polynomial.
5	"Spaghetti "主成份分析之延伸－應用於時間相關之區間型台灣股價資料 / An extension of Spaghetti PCA for time dependent interval data 陳品達, Chen, Pin-Da Unknown Date (has links) 摘要近幾年發展的區間型態資料之主成份分析，運用在某些領域的資料上尚未成熟，例如股票價格的資料，這些資料是與時間息息相關地，於是有了時間相關的區間資料分析 (Irpino, 2006. Pattern Recognition Letters 27, 504-513)。本文延續這個分析，針對時間相關之區間型台灣股價資料進行研究。Irpino (2006) 的方法只考慮每週的開盤價與收盤價，為了得到更多資訊，我們提出三種方法，第一個方法，將每週的最高價(最低價)納入分析，由兩點的分析變成三點的分析；第二個方法，我們同時考慮最高價與最低價，變成四點的分析，這兩個方法都能得到原始方法不能得到的資訊－公司的穩定度，其中又以第二個方法較為準確；第三種方法引用Irpino (2006) 的建議，我們改變區間的分配，而此方法得到的結果與原始的方法差異不大。本文分別收集了台灣金融市場三十家半導體與台指五十中的四十七家公司於民國九十七年九月一號到十二月二十六號共十七週的股價資料進行實證分析。以台指五十為例，分析結果顯示編號17的台達電子工業股份有限公司、編號24的鴻海科技集團，這兩家公司的未來被看好；而編號10的聯陽半導體股份有限公司、編號35的統一超商股份有限公司，此兩家公司的未來不被看好，這四家公司在民國九十八年一月五號到一月七號三天的走勢確實是如此！此外，結果顯示金融體系的公司比電子體系的公司來得穩定。關鍵字：主成份分析，區間型資料，時間相關 / ABSTRACT The methods for principal component analysis on interval data have not been ripe yet in some areas, for example, the data of stock prices that are closely related to the time, so the analysis of time dependent interval data was proposed (Irpino, 2006. Pattern Recognition Letters 27, 504-513). In this paper, we apply this approach to the stock prices data in Taiwan. The original “Spaghetti” PCA in Irpino (2006) considered only the starting and the ending prices for each week. In order to get more information we propose three methods. We consider the highest (lowest) price for each week to our analysis in Method 1, and the analysis changes from two points to three points. In Method 2, we consider all information to our analysis which considers four points. These two methods can get more information than the original one. For example, we can get the information of stability degree of the company. For the Method 3, we quote the suggestion from Irpino (2006) to change the distribution of intervals from uniform to beta. However, the result is similar to the original result. In our approach, we collect data of stock prices from 37 companies of semiconductor and 47 companies of TSEC Taiwan 50 index in Taiwan financial market during the 17 weeks from September 1 to December 26, 2008. For TSEC Taiwan 50 index, the results of this analysis are that the future trend of Delta (Delta Electronics Incorporation) which numbers 17 and Foxconn (Foxconn Electronics Incorporation) which numbers 24 are optimistic; And ITE (Integrated Technology Express) which numbers 10 and 7-ELEVEn (President Chain Store Corporation) which numbers 35 are not good. In fact, the trends of these four companies are indicated these results during January 5th to 7th. What’s more, the financial companies are steadier than the electronic industry. Keywords: Principal component analysis; Interval data; Time dependent 主成份分析區間型資料時間相關 Principal component analysis Interval data Time dependent
6	Statistická analýza intervalových dat / Statistical analysis of interval data Troshkov, Kirill January 2011 (has links) Traditional statistical analysis starts with computing the basic statisti- cal characteristics such as the population mean E, population variance V , cova- riance and correlation. In computing these characteristics, it is usually assumed that the corresponding data values are known exactly. In real life there are many situations in which a more complete information can be achieved by describing a set of statistical units in terms of interval data. For example, daily tempera- tures registered as minimum and maximum values offer a more realistic view on the weather conditions variations with respect to the simple average values. In environmental analysis, we observe a pollution level x(t) in a lake at different mo- ments of time t, and we would like to estimate standard statistical characteristics such as mean, variance and correlation with other measurements. Another exam- ple can be given by financial series. The minimum and the maximum transaction prices recorded daily for a set of stocks represent a more relevant information for experts in order to evaluate the stocks tendency and volatility in the same day. We must therefore modify the existing statistical algorithms to process such interval data. In this work we will analyze algorithms and their modifications for computing various statistics under...
7	區間模糊相關係數及其在數學成就評量 / Fuzzy correlation with interval data and its application in the evaluation of mathematical achievement 羅元佐, Ro, Yuan Tso Unknown Date (has links) 在統計學上，我們常使用皮爾森相關係數(Pearson’s Correlation Coefficient)來表達兩變數間線性關係的強度，同時也表達出關係之方向。傳統之相關係數所處理的資料都是明確的實數值，但是當資料是模糊數時，並不適合使用傳統的方法來計算模糊相關係數。而本研究探討區間模糊樣本資料值求得模糊相關係數，首先將區間型模糊資料分為離散型和連續型，提出區間模糊相關係數定義，並提出廣義誤差公式，將相關係數作合理的調整，使所求的出相關係數更加精確。在第三章我們以影響數學成就評量的因素，作實證研究分析，得出合理的分析。而此相關係數定義和廣義誤差公式也能應用在兩資料值為實數或其中一筆資料值為實數的情況，可以解釋更多在實務上所發生的相關現象。 / In the statistic research, we usually express the magnitude of linear relation between two variables by means of Pearson’s Correlation Coefficient, which is also used to convey the direction of such relation. Traditionally, correlation coefficient deals with data which consist of specific real numbers. But when the data are composed of fuzzy numbers, it is not feasible to use this traditional approach to figure out the fuzzy correlation coefficient. The present study investigates the fuzzy samples of interval data to find out the fuzzy correlation coefficient. First, we categorize the fuzzy interval data into two types: discrete and continuous. Second, we define fuzzy correlation with interval data and propose broad formulas of error in order to adjust the coefficient more reasonably and deal with it more accurately. In Chapter Three, we conduct empirical research by the factor which affects the evaluation of mathematical achievement to acquire reasonable analysis. By doing so, broad definition of coefficient and formulas of error can also be applied to the conditions of either both values of the data are real number or one value of the data is real number, and can explain more related practical phenomenon. 模糊相關係數區間資料數學成就評量 Fuzzy correlation interval data evaluation of mathematical achievement
8	Cooperative Interval Games Alparslan Gok, Sirma Zeynep 01 January 2009 (has links) (PDF) Interval uncertainty affects our decision making activities on a daily basis making the data structure of intervals of real numbers more and more popular in theoretical models and related software applications. Natural questions for people or businesses that face interval uncertainty in their data when dealing with cooperation are how to form the coalitions and how to distribute the collective gains or costs. The theory of cooperative interval games is a suitable tool for answering these questions. In this thesis, the classical theory of cooperative games is extended to cooperative interval games. First, basic notions and facts from classical cooperative game theory and interval calculus are given. Then, the model of cooperative interval games is introduced and basic definitions are given. Solution concepts of selection-type and interval-type for cooperative interval games are intensively studied. Further, special classes of cooperative interval games like convex interval games and big boss interval games are introduced and various characterizations are given. Some economic and Operations Research situations such as airport, bankruptcy and sequencing with interval data and related interval games have been also studied. Finally, some algorithmic aspects related with the interval Shapley value and the interval core are considered. QA Numerical Analysis 297-299.4
9	"Spaghetti" 主成份分析應用於時間相關區間型資料的研究---以台灣股價為例 / A study of Spaghetti PCA for time dependent interval data applied to stock prices in Taiwan 邱大倞, Chiu, Ta Ching Unknown Date (has links) 區間型資料一般定義為由一個連續型變數的上限及下限所構成，本文中，我們特別引進了一種與時間相關的區間型資料，在Irpino (2006, Pattern Recognition Letters, 27, 504-513)，他提出每個觀測值皆是由某個時段的起始值及終點值之有方向性的區間所組成，譬如某一支股票在某一周的開盤價和收盤價。過去已經有許多方法運用在區間型資料，但尚未有方法來處理有方向性的區間型資料，然而Irpino 延伸主成分方法來處理有方向性的區間資料。Irpino提出的方法以幾何學的觀點來說，可視為定義在多維度空間上對有方向性線段(一般都稱作“spaghetti”)的分析，在本文中我們有更作進一步的延伸，不僅引入股票的開盤價及收盤價，且引入當周的最高價及最低價來探索Irpino所遺漏的資訊。此外，我們也嘗試用貝他分配來取代Irpino所使用的均勻分配來檢測是否有明顯的改善。延伸的方法需要計算大量複雜的式子，包含了平均數，變異數，共變異數等，最後利用相關係數矩陣進行主成分分析。然而最後的結論為若考慮資訊的價值，以加入最高值和最小值的延伸方法是較好的選擇。 / Interval data are generally defined by the upper and the lower value assumed by a unit for a continuous variable. In this study, we introduce a special type of interval description depending on time. The original idea (Irpino, 2006, Pattern Recognition Letters, 27, 504-513) is that each observation is characterized by an oriented interval of values with a starting and a closing value for each period of observation: for example, the beginning and the closing price of a stock in a week. Several factorial methods have been discovered in order to treat interval data, but not yet for oriented intervals. Irpino presented an extension of principle component analysis to time dependent interval data, or, in general, to oriented intervals. From a geometrical point of view, the proposed approach can be considered as an analysis of oriented segments (nicely called “spaghetti”) defined in a multidimensional space identified by periods. In this paper, we make further extension not only provide the opening and the closing value but also the highest and the lowest value in a week to find out more information that cannot simply obtained from the original idea. Besides, we also use beta distribution to see if there is any improvement corresponding to the original ones. After we make these extensions, many complicated computations should be calculated such as the mean, variance, covariance in order to obtain correlation matrix for PCA. With regard to the value of information, the extended idea with the highest prices and the lowest price is the best choice. 主成份分析區間型資料時間相關方向性的區間型資料 Principal component analysis Interval data Time dependent Oriented intervals
10	Essays on methodologies in contingent valuation and the sustainable management of common pool resources Kang, Heechan 15 March 2006 (has links) No description available. Sinble bounded model Interval data model Bivariate probit model Behavioral inconsistency Framining effect model Common pool resources Tragedy of commons Social norm Internal cost

Search results