261 |
Deduplikační metody v databázích / Deduplication methods in databasesVávra, Petr January 2010 (has links)
In the present work we study the record deduplication problem as an issue of data quality. We define duplicates as records having different syntax and the same semantics and which are representing the same real-world entity. The main goal of this work is to provide the overview of existing deduplication methods according to their requirements, results and usability. We focus on the comparison of two groups of record deduplication methods - with and without the domain knowledge. Therefore, the second part of this work is dedicated to the implementation of our method which does not utilize any domain knowledge and compare its results with the results of commercial tool deeply utilizing the domain knowledge.
|
262 |
Performance of map matching and route tracking depending on the quality of the GPS dataHouda, Prokop January 2016 (has links)
Satellite positioning measurements are never perfectly unbiased. Due to multiple types of errors affecting the signal transmission through an open space and urban areas each positioning measurement contains certain degree of uncertainty. Satellite signal receivers also do not receive the signal continuously, but the localization information is received discretely. Sampling rate and positioning error provide uncertainty towards the various positioning algorithms used in localization, logistics and in intelligent transport systems applications. This thesis examines the effect of positioning error and sampling rate on geometric and topological map matching algorithms and on the precision of route tracking within these algorithms. Also the effects of the different network density on the performance of the algorithms are evaluated. It also creates the platform for simulation and evaluation of map matching algorithms. Map matching is the process of attaching the initial positioning measurement to the network. A number of authors presented their algorithms during past decades, which shows how complex topic the map matching is, mostly due to the changing environmental and network conditions. Geometric and topological map matching algorithms are chosen, modelled and simulated and their response to the different input combinations is evaluated. Also the recommendations for possible ITS applications are carried out in terms of proposed requirements of the receiver. The results confirm general expectation that the map matching overall improves the initial position error and that map matching serves as a form of error mitigation. Also the correlation between the increase of the original positioning error and the increase of the map matching error is universal for all the algorithms in the thesis. But the comparison of the algorithm also showed large differences between the topological and geometric algorithms and their ability to cope with distorted input data. Whereas topological algorithms were clearly performing better in scenarios with smaller initial error and smaller sampling rate, geometric matching proves to be more effective in heavily distorted or very sparsely sampled data set. That is caused mostly by the ability to easily leave the wrongly mapped position which is in these situations comparative advantage of simple geometric algorithms. Following work should concentrate on involving even more algorithms into the comparison, which would produce more valuable results. Also the simulation of the errors using the error magnitude simulation with known an improved error modelling could increase the generalization of the results.
|
263 |
Mapování PMML a BKEF dokumentů v projektu SEWEBAR-CMS / Mapping of PMML and BKEF documents using PHP in the SEWEBAR CMSVojíř, Stanislav January 2010 (has links)
In the data mining process, it is necessary to prepare the source dataset - for example, to select the cutting or grouping of continuous data attributes etc. and use the knowledge from the problem area. Such a preparation process can be guided by background (domain) knowledge obtained from experts. In the SEWEBAR project, we collect the knowledge from experts in a rich XML-based representation language, called BKEF, using a dedicated editor, and save into the database of our custom-tailored (Joomla!-based) CMS system. Data mining tools are then able to generate, from this dataset, mining models represented in the standardized PMML format. It is then necessary to map a particular column (attribute) from the dataset (in PMML) to a relevant 'metaattribute' of the BKEF representation. This specific type of schema mapping problem is addressed in my thesis in terms of algorithms for automatic suggestion of mapping of columns to metaattributes and from values of these columns to BKEF 'metafields'. Manual corrections of this mapping by the user are also supported. The implementation is based on the PHP language and then it was tested on datasets with information about courses taught in 5 universities in the U.S.A. from Illinois Semantic Integration Archive. On this datasets, the auto-mapping suggestion process archieved the precision about 70% and recall about 77% on unknown columns, but when mapping the previously user-mapped data (using implemented learning module), the recall is between 90% and 100%.
|
264 |
Comparação entre uma solução combinatória e um método de planos-de-corte para o problema do emparelhamento de peso máximo / Comparison between a combinatorial solution and plane-cut method for the maximum weight matching problem.Oliveira, Ander Conselvan de 10 December 2010 (has links)
Um emparelhamento em um grafo é um conjunto de arestas duas a duas não adjacentes. Dado um grafo G com pesos em suas arestas, o problema do emparelhamento de peso é máximo é encontrar um emparelhamento cuja soma dos pesos de suas arestas é máxima. Neste trabalho estudamos diferentes soluções para esse problema. Estudamos algoritmos combinatórios que resolvem o problema no caso em que G é bipartido e no caso geral. O algoritmo de Edmonds é um algoritmo polinomial cuja complexidade de tempo é O(n^4), onde n é o número de vértices do grafo G. Discutimos nesse trabalho nossa implementação desse algoritmo. Num trabalho de 1985, Grötschel e Holland propuseram o uso de ferramentas de programação linear para resolver o mesmo problema. O método chamado de planos-de-corte baseia-se em um resultado de Padberg e Rao de que o problema da separação associado ao poliedro dos emparelhamentos pode ser resolvido em tempo polinomial. Neste trabalho fizemos implementações dos dois métodos e os utilizamos para resolver diversos tipos de instâncias do problema. Nossa conclusão é que o método poliédrico, apesar de utilizar ferramentas genéricas, é bastante eficiente na prática. / A matching in a graph G is a set of pairwise disjoint edges of G. Given a graph G with edge weights, we define the maximum weight matching problem as that of finding a matching which maximizes the sum of its weights. In this thesis we study different solutions to this problem. We studied combinatorial algorithms that solve this problem in the case where G is bipartite and also in the general case. Edmonds algorithm [Edm65a] is a polynomial time algorithm with complexity O(n4 ), where n is the number of vertices in the graph G. We discuss in this document our implementation of this algorithm. In a paper from 1985, Gr tschel & Holland [GH85] discussed the use of linear programming o tools for solving the maximum weight matching problem. This so called cut-plane method relies on a result by Padberg & Rao [PR82] that proves that the separation problem associated with matching polyhedron is solvable in polinomial time. In this work we implemented both methods and used then to solve different instances of the problem. Our conclusion is that the polyhedral method, although using generical tools is very efficient in practice.
|
265 |
Essays in Matching Theory and Mechanism DesignBó, Inácio G. L. January 2014 (has links)
Thesis advisor: Utku Ünver / This dissertation consists of three chapters. The first chapter consists of a survey of the literature on affirmative action and diversity objective in school choice mechanisms. It presents and analyzes some of the main papers on the subject, showing the evolution of our understanding of the effects that different affirmative action policies have on the welfare and fairness of student assignments, the satisfaction of the diversity objectives as well as the domain of policies that allows for stable outcomes. The second chapter analyzes the problem of school choice mechanisms when policy-makers have objectives over the distribution of students by type across the schools. I show that mechanisms currently available in the literature may fail to a great extent in satisfying those objectives, and introduce a new one, which satisfies two properties. First, it produces assignments that satisfy a fairness criterion which incorporates the diversity objectives as an element of fairness. Second, it approximates optimally the diversity objectives while still satisfying the fairness criterion. We do so by embedding "preference" for those objectives into the schools' choice functions in a way that satisfies the substitutability condition and then using the school-proposing deferred acceptance procedure. This leads to the equivalence of stability with the desired definition of fairness and the maximization of those diversity objectives among the set of fair assignments. A comparative analysis also shows analytically that the mechanism that we provide has a general ability to satisfy those objectives, while in many familiar classes of scenarios the alternative ones yield segregated assignments. Finally, we analyze the incentives induced by the proposed mechanism in different market sizes and informational structures. The third chapter (co-authored with Orhan Aygün) presents an analysis of the Brazilian affirmative action initiative for access to public federal universities. In August 2012 the Brazilian federal government enacted a law mandating the prioritization of students who claim belonging to the groups of those coming from public high schools, low income families and being racial minorities to defined proportions of the seats available in federal public universities. In this problem, individuals may be part of one or more of those groups, and it is possible for students not to claim some of the privileges associated with them. This turns out to be a problem not previously studied in the literature. We show that under the choice function induced by the current guidelines, students may be better off by not claiming privileges that they are eligible to. Moreover, the resulting assignments may not be fair or satisfy the affirmative action objectives, even when there are enough students claiming low--income and minority privileges. Also, any stable mechanism that uses the current choice functions is neither incentive compatible nor fair. We propose a new choice function to be used by the universities that guarantees that a student will not be worse off by claiming an additional privilege, is fair and satisfies the affirmative action objectives whenever it is possible and there are enough applications claiming low--income and minority privileges. Next, we suggest a stable, incentive compatible and fair mechanism to create assignments for the entire system. / Thesis (PhD) — Boston College, 2014. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics.
|
266 |
Essays in Applied Microeconomic TheoryRaykov, Radoslav S. January 2012 (has links)
Thesis advisor: Utku Unver / This dissertation consists of three essays in microeconomic theory: two focusing on insurance theory and one on matching theory. The first chapter is concerned with catastrophe insurance. Motivated by the aftermath of hurricane Katrina, it studies a strategic model of catastrophe insurance in which consumers know that they may not get reimbursed if too many other people file claims at the same time. The model predicts that the demand for catastrophe insurance can ``bend backwards'' to zero, resulting in multiple equilibria and especially in market failure, which is always an equilibrium. This shows that a catastrophe market can fail entirely due to demand-driven reasons, a result new to the literature. The model suggests that pricing is key for the credibility of catastrophe insurers: instead of increasing demand, price cuts may backfire and instead cause a ``race to the bottom.'' However, small amounts of extra liquidity can restore the system to stable equilibrium, highlighting the importance of a functioning reinsurance market for large risks. These results remain robust both for expected utility consumer preferences and for expected utility's most popular alternative, rank-dependent expected utility. The second chapter develops a model of quality differentiation in insurance markets, focusing on two of their specific features: the fact that costs are uncertain, and the fact that firms are averse to risk. Cornerstone models of price competition predict that firms specialize in products of different quality (differentiate their products) as a way of softening price competition. However, real-world insurance markets feature very little differentiation. This chapter offers an explanation to this phenomenon by showing that cost uncertainty fundamentally alters the nature of price competition among risk-averse firms by creating a drive against differentiation. This force becomes particularly pronounced when consumers are picky about quality, and is capable of reversing standard results, leading to minimum differentiation instead. The chapter concludes with a study of how the costs of quality affect differentiation by considering two benchmark cases: when quality is costless and when quality costs are convex (quadratic). The third chapter focuses on the theory of two-sided matching. Its main topic are inefficiencies that arise when agent preferences permit indifferences. It is well-known that two-sided matching under weak preferences can result in matchings that are stable, but not Pareto efficient, which creates bad incentives for inefficiently matched agents to stay together. In this chapter I show that in one-to-one matching with weak preferences, the fraction of inefficiently matched agents decreases with market size if agents are sufficiently diverse; in particular, the proportion of agents who can Pareto improve in a randomly chosen stable matching approaches zero when the number of agents goes to infinity. This result shows that the relative degree of the inefficiency vanishes in sufficiently large markets, but this does not provide a "cure-all'' solution in absolute terms, because inefficient individuals remain even when their fraction is vanishing. Agent diversity is represented by the diversity of each person's preferences, which are assumed randomly drawn, i.i.d. from the set of all possible weak preferences. To demonstrate its main result, the chapter relies on the combinatorial properties of random weak preferences. / Thesis (PhD) — Boston College, 2012. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics.
|
267 |
Essays in Market Design and Industrial OrganizationDimakopoulos, Philipp Dimitrios 27 April 2018 (has links)
Diese Dissertation besteht aus drei unabhängigen Kapiteln in den Bereichen Matching Market Design, Industrieökonomie und Wettbewerbspolitik.
Kapitel 1 behandelt den Matching-Markt für juristische Referendariatsstellen in Deutschland. Wegen übermäßiger Nachfrage müssen Anwälte oft warten, bevor sie zugewiesen werden. Der aktuell verwendete Algorithmus berücksichtigt nicht die Zeitpräferenzen der Anwälte. Daher werden viele wünschenswerte Eigenschaften nicht erfüllt. Basierend auf dem matching with contracts Modell schlage ich dann einen neuen Mechanismus vor, der die Wartezeit als Vertragsterm verwendet, so dass die Mängel des gegenwärtigen Mechanismus überwunden werden können.
In Kapitel 2 analysiere ich den Wettbewerb von zweiseitigen Online-Plattformen, wie sozialen Netzwerken oder Suchmaschinen. Werbetreibende zahlen Geld, um ihre Anzeigen zu platzieren, während Nutzer mit ihren privaten Daten "bezahlen", um Zugang zu der Plattform zu erhalten. Ich zeige, dass das Gleichgewichtsniveau der Datenerhebung verzerrt ist, abhängig von der Intensität des Wettbewerbs und den Targeting-Vorteilen. Weniger Wettbewerb auf jeder Marktseite führt zu mehr Datensammeln. Wenn jedoch Plattformen Geldzahlungen auf beiden Marktseiten verwenden, wird die effiziente Menge an Daten gesammelt.
Kapitel 3 untersucht die dynamische Preissetzung auf Märkten für Flug- oder Reisebuchungen, auf denen Wettbewerb während einer endlichen Verkaufszeit mit einer Frist stattfindet. Unter Berücksichtigung der intertemporalen Probleme von Firmen und vorausschauenden Konsumenten hängen die Gleichgewichtspreispfade von der Anzahl der nicht verkauften Kapazitäten und der verbleibenden Verkaufszeit ab. Ich ermittle, dass mehr Voraussicht der Konsumenten die Konsumentenrente erhöht, aber die Effizienz reduziert. Ferner ist Wettbewerbspolitik besonders wertvoll, wenn die Marktkapazitäten zu hoch sind. Des Weiteren kann die ex-ante Produktion von Kapazitäten ineffizient niedrig sein. / This thesis consists of three independent chapters in the fields of matching market design, industrial organization and competition policy.
Chapter 1 covers the matching market for lawyer trainee-ship positions in Germany. Because of excess demand lawyers often must wait before being allocated. The currently used algorithm does not take lawyers’ time-preferences into account. Hence, many desirable properties are not satisfied. Then, based on the matching with contacts model, I propose a new mechanism using waiting time as the contractual term, so that the shortcomings of the current mechanism can be overcome.
In Chapter 2 I analyze competition of two-sided online platforms, such as social networks or search engines. Advertisers pay money to place their ads, while users “pay” with their private data to gain access to the platform. I show that the equilibrium level of data collection is distorted, depending on the competition intensity and targeting benefits. Less competition on either market side leads to more data collection. However, if platforms use monetary payments on both market sides, data collection would be efficient.
Chapter 3 studies dynamic pricing as in markets for airline or travel bookings, where competition takes place throughout a finite selling time with a deadline. Considering the inter-temporal problems of firms and forward-looking consumers, the equilibrium price paths depend on the number of unsold capacities and remaining selling time. I find that more consumer foresight increases consumer surplus yet reduces efficiency. Further, competition policy is especially valuable when market capacities are excessive. Moreover, ex-ante capacity production can be inefficiently low.
|
268 |
Contributions to accurate and efficient cost aggregation for stereo matchingChen, Dongming 12 March 2015 (has links)
Les applications basées sur 3D tels que les films 3D, l’impression 3D, la cartographie 3D, la reconnaissance 3D, sont de plus en plus présentes dans notre vie quotidienne; elles exigent une reconstruction 3D qui apparaît alors comme une technique clé. Dans cette thèse, nous nous intéressons à l’appariement stéréo qui est au coeur de l’acquisition 3D. Malgré les nombreuses publications traitant de l’appariement stéréo, il demeure un défi en raison des contraintes de précision et de temps de calcul: la conduite autonome requiert le temps réel; la modélisation d’objets 3D exige une précision et une résolution élevées. La méthode de pondération adaptative des pixels de support (adaptative-supportweight), basée sur le bien connu filtre bilatéral, est une méthode de l’état de l’art, de catégorie locale, qui en dépit de ses potentiels atouts peine à lever l’ambiguïté induite par des pixels voisins, de disparités différentes mais avec des couleurs similaires. Notre première contribution, à base de filtre trilatéral, est une solution pertinente qui tout en conservant les avantages du filtre bilatéral permet de lever l’ambiguïté mentionnée. Evaluée sur le corpus de référence, communément acceptée, Middlebury, elle se positionne comme étant la plus précise au moment où nous écrivons ces lignes. Malgré ces performances, la complexité de notre première contribution est élevée. Elle dépend en effet de la taille de la fenêtre support. Nous avons proposé alors une implémentation récursive du filtre trilatérale, inspirée par les filtres récursifs. Ici, les coûts bruts en chaque pixel sont agrégés à travers une grille organisée en graphe. Quatre passages à une dimension permettent d’atteindre une complexité en O(N), indépendante cette fois de la taille de la fenêtre support. C’est-à-dire des centaines de fois plus rapide que la méthode originale. Pour le calcul des pondérations des pixels du support, notre méthode basée sur le filtre trilatéral introduit un nouveau terme, qui est une fonction d’amplitude du gradient. Celui-ci est remarquable aux bords des objets, mais aussi en cas de changement de couleurs et de texture au sein des objets. Or, le premier cas est déterminant dans l’estimation de la profondeur. La dernière contribution de cette thèse vise alors à distinguer les contours des objets de ceux issus du changement de couleur au sein de l’objet. Les évaluations, sur Middlebury, prouvent l’efficacité de la méthode proposée. Elle est en effet plus précise que la méthode basée sur le filtre trilatéral d’origine, mais aussi d’autres méthodes locales. / 3D-related applications are becoming more and more popular in our daily life, such as 3D movies, 3D printing, 3D maps, 3D object recognition, etc. Many applications require realistic 3D models and thus 3D reconstruction is a key technique behind them. In this thesis, we focus on a basic problem of 3D reconstruction, i.e. stereo matching, which searches for correspondences in a stereo pair or more images of a 3D scene. Although various stereo matching methods have been published in the past decades, it is still a challenging task since the high requirement of accuracy and efficiency in practical applications. For example, autonomous driving demands realtime stereo matching technique; while 3D object modeling demands high quality solution. This thesis is dedicated to develop efficient and accurate stereo matching method. The well-known bilateral filter based adaptive support weight method represents the state-of-the-art local method, but it hardly sorts the ambiguity induced by nearby pixels at different disparities but with similar colors. Therefore, we proposed a novel trilateral filter based method that remedies such ambiguities by introducing a boundary strength term. As evaluated on the commonly accepted Middlebury benchmark, the proposed method is proved to be the most accurate local stereo matching method at the time of submission (April 2013). The computational complexity of the trilateral filter based method is high and depends on the support window size. In order to enhance its computational efficiency, we proposed a recursive trilateral filter method, inspired by recursive filter. The raw costs are aggregated on a grid graph by four one-dimensional aggregations and its computational complexity proves to be O(N), which is independent of the support window size. The practical runtime of the proposed recursive trilateral filter based method processing 375 _ 450 resolution image is roughly 260ms on a PC with a 3:4 GHz Inter Core i7 CPU, which is hundreds times faster than the original trilateral filter based method. The trilateral filter based method introduced a boundary strength term, which is computed from color edges, to handle the ambiguity induced by nearby pixels at different disparities but with similar colors. The color edges consist of two types of edges, i.e. depth edges and texture edges. Actually, only depth edges are useful for the boundary strength term. Therefore, we presented a depth edge detection method, aiming to pick out depth edges and proposed a depth edge trilateral filter based method. Evaluation on Middlebury benchmark proves the effectiveness of the proposed depth edge trilateral filter method, which is more accurate than the original trilateral filter method and other local stereo matching methods.
|
269 |
Broadband Impedance Matching of Antenna Radiatorsiyer, vishwanath 29 September 2010 (has links)
"In the design of any antenna radiator, single or multi-element, a significant amount of time and resources is spent on impedance matching. There are broadly two approaches to impedance matching; the first is the distributed impedance matching approach which leads to modifying the antenna geometry itself by identifying appropriate degrees of freedom within the structure. The second option is the lumped element approach to impedance matching. In this approach instead of modifying the antenna geometry a passive network attempts to equalize the impedance mismatch between the source and the antenna load. This thesis introduces a new technique of impedance matching using lumped circuits (passive, lossless) for electrically small (short) non-resonant dipole/monopole antennas. A closed form upper-bound on the achievable transducer gain (and therefore the reflection coefficient) is derived starting with the Bode-Fano criterion. A 5 element equalizer is proposed which can equalize all dipole/monopole like antennas. Simulation and experimental results confirm our hypothesis. The second contribution of this thesis is in the design of broadband, small size, modular arrays (2, 4, 8 or 16 elements) using the distributed approach to impedance matching. The design of arrays comprising a small number of elements cannot follow the infinite array design paradigm. Instead, the central idea is to find a single optimized radiator (unit cell) which if used to build the 2x1, 4x1, 2x2 arrays, etc. (up to a 4x4 array) will provide at least the 2:1 bandwidth with a VSWR of 2:1 and stable directive gain (not greater than 3 dB variation) in each configuration. Simulation and experimental results for a solution to the 2x1, 4x1 and 2x2 array configurations is presented. "
|
270 |
Essays on models of the labour market with on-the-job searchGottfries, Axel January 2018 (has links)
In my first chapter, I provide a solution for how to model bargaining when there is on-the-job search and worker turnover depends on the wage. Bargaining is a standard feature in models without on-the-jobs search, but, due to endogeneity of the match surplus, a solution does not exist when worker turnover depends on the wage. My solution is based on wages being infrequently renegotiated. With renegotiation, the equilibrium wage distribution and the bargaining outcomes are both unique and the model nests earlier models in the literature as limit cases when wages are either continuously or never renegotiated. Furthermore, the rate of renegotiation has important implications for the nature of the equilibrium. A higher rate of renegotiation lowers the response of the match duration to a wage increase, which decreases a firm's willingness to accept a higher wage. This results in a lower share of the match surplus going to the worker. Moreover, a high rate of renegotiation also lowers the positive wage spillovers from a minimum wage increase, since these spillovers rely on firms' incentives to use higher wages to reduce turnover. In the standard job ladder model, search is modelled via an employment-specific Poisson rate. The size of the Poisson rate governs the size of the search friction. The Poisson rate can represent the frequency of applications by workers or the rate at which firms post suitable vacancies. In the second chapter, which is co-authored with Jake Bradley, we set up a model which has both of these aspects. Firms infrequently post vacancies and workers occasionally apply for these vacancies. The model nests the standard job ladder model and a version of the stock-flow model as special cases while remaining analytically tractable and easy to estimate empirically from standard panel data sets. The structurally estimated parameters are consistent with recent survey evidence of worker behavior. The model fits moments of the data that are inconsistent with the standard job ladder model and in the process reconciles the level of frictional wage dispersion in the data with replacement ratios used in the macro labor literature. In my third chapter, which is co-authored with Coen Teulings, we develop a simple method to measure the position in the job ladder in models with on-the-job search. The methodology uses two implications from models with on-the-job search: workers gradually select into better paying jobs until they get laid off at which time they start again to climb the job ladder. The measure relies on two sources of variation: (i) time-variation in job-finding rates and (ii) individual variation in the time since the last lay-off. We use the method to quantify the returns to on-the-job search and to establish the shape of the wage offer distribution by means of simple OLS regressions with wages as dependent variables. Moreover, we derive a simple prediction on the distribution of job durations. Applying the method to the NLSY 79, we find strong support for this class of models. We estimate the standard deviation of the wage offer distribution to be 12%. OJS accounts for 30% of the experience profile and 9% of the total wage dispersion.
|
Page generated in 0.0371 seconds