• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 7
  • 4
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 28
  • 14
  • 13
  • 7
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Aplicación de imágenes de satélite SAR,en los estudios de contaminación marina y de dinámica de las aguas en el mediterráneo noroccidental

Platonov, Alexei 19 July 2002 (has links)
This thesis presents diverse theoretical and practical applications of the ERS 1/2 and RADARSAT satellite images of the Synthetic Aperture Radar SAR sensor (near 330 images in the Northwest Mediterranean NWM and 980 in European coastal waters) Other types of satellite images were used to study the marine pollution and dynamics studies of the NWM and other European areas. Principal source of information constitutes a collection of the SAR images obtained on periodic form during the 1996-1998 years generally by the Clean Seas European project, also by the Oil Watch and the ERS-1/2 SAR Exploitation Study in Catalonia projects. The geographical area of interest includes the NWM maritime zone: Balearic Islands - Ebro Delta - Cap of Creus - Gulf of Leon - Marseilles - Balearic Islands. During the presents studies there we elaborated a thematic collection of the SAR (full and detailed fragments) images of the almost 300 oil spills and of the 42 coastal plumes detected in the NWM in the period 1996-1998 also thematic maps and the statistical analysis of their topologic characteristics and of their temporary/spatial occurrences were elaborated. The area of the all accidental spills and plumes that took place in the period 1996-1998 in the NWM was also estimated (the diameter is 146 km and it oil mass is 4.477 Tm), we made a comparative analysis of the results of the Clean Seas project from different zones of studies (NWM, North Sea, Baltic Sea) and the statistical analysis of the occurrence of disasters involving oil spills in European waters in base of the results of the present work and of the historical information of the past 34 years, relating them to Zipf's law. The general conclusion is that the small habitual oil spills play a significant role in the overall marine pollution due to its very frequent occurrence.In the area of study of the dynamics of the NWM we present the results of the topologic and spatial analysis of the vortices detected by the satellite sensors, the thematic maps, the comparison with the laboratory experiments, the quantitative analysis of the particularities of tide in different points of the NWM, the examples of the application of the multifractal analysis and also a practical method proposed in order to distinguish the sea surface structures of different origins.The obtained results have allowed to obtain a general and statistically justified vision of the level of the marine pollution of the NWM, as well as in other European maritime zones. It was also possible to obtain quantitative information on the complex superficial dynamics of the NWM, which can be useful to quantify the capacity of surface diffusion of the ocean. / PER NAVEGAR ENTRE ELS FITXERS D'AQUESTA TESI PARTIU DEL FITXER: inicio.pdfPARA NAVEGAR ENTRE LOS FICHEROS DE ÉSTA TESIS PARTIR DEL FICHERO: inicio.pdfTHE FIRST FILE IS inicio.pdf
22

Mínimos quadrados ordinários (MQO) na produção científica brasileira: a interdisciplinaridade entre a econometria e as metrias da informação (bibliometria, informetria e cientometria)

Santos, Levi Alã Neves dos 05 December 2017 (has links)
Submitted by Levi Santos (levis@ufba.br) on 2018-01-30T21:19:42Z No. of bitstreams: 1 Tese Levi PPGCI-UFBA 05.12.2017.pdf: 3296241 bytes, checksum: c7064236d23f11486d498f569f5185f1 (MD5) / Approved for entry into archive by Urania Araujo (urania@ufba.br) on 2018-02-19T20:06:50Z (GMT) No. of bitstreams: 1 Tese Levi PPGCI-UFBA 05.12.2017.pdf: 3296241 bytes, checksum: c7064236d23f11486d498f569f5185f1 (MD5) / Made available in DSpace on 2018-02-19T20:06:50Z (GMT). No. of bitstreams: 1 Tese Levi PPGCI-UFBA 05.12.2017.pdf: 3296241 bytes, checksum: c7064236d23f11486d498f569f5185f1 (MD5) / Analisa a produção científica brasileira (artigos nacionais, artigos internacionais, anais de eventos e livros) através dos Mínimos Quadrados Ordinários (MQO). Para tanto, discorre sobre o percurso histórico e de aplicação das metrias que a Ciência da Informação (CI) vem construindo, desde a mais primordial de todas, a bibliometria, oriunda da biblioteconomia, passando pelas visões modernas como a cienciometria até a informetria. Explica como a econometria constrói o seu modelo de análise, que é utilizado para pesquisas na economia e, ao mesmo tempo, reflete como esse método pode ser trazido para as metrias da informação. Explica e expõe o método de estimação por MQO para a análise de regressão, que é a proposta desta tese. Pesquisa aplicada descritiva com abordagem quantitativa com procedimentos baseados no tipo de pesquisa estudo de caso do levantamento de dados a partir do Portal do Plano Tabular do CNPq do ano de 2010. Os critérios para delineamento da pesquisa foram aprofundados, na revisão de literatura, em referências tanto da área da CI quanto da bibliometria, estatística e econometria. Este estudo, metodologicamente, conta com a abordagem conceitual da bibliometria e da CI em busca de teorias aplicáveis aos estudos em MQO e a aplicação empírica do MQO se aproxima da concepção econométrica. A tese conclui que a utilização de técnicas de análises das funções de regressão construída por meio de MQO possibilita a criação de um modelo de previsão da produção científica brasileira. Esse modelo é construído a partir da correlação e determinação detectada entre o número de doutores e a produção científica destes em cada estado do Brasil. Com a aplicação de estratégias econométricas (índice de correlação, índice de determinação, forma funcional de curva de regressão e cálculo dos parâmetros da função por MQO), foi possível construir um modelo de previsão.
23

Mínimos quadrados ordinários (MQO) na produção científica brasileira: a interdisciplinaridade entre a econometria e as metrias da informação (bibliometria, informetria e cientometria)

Santos, Levi Alã Neves 05 December 2017 (has links)
Submitted by Programa de Pós-graduação em Ciência da Informação Informação (posici@ufba.br) on 2018-02-20T16:02:35Z No. of bitstreams: 1 Tese Levi Santos PPGCI-UFBA 05.12.2017.pdf: 2156886 bytes, checksum: 4eed68b043eb80a2c3b2e4ca1214aaf3 (MD5) / Approved for entry into archive by Uillis de Assis Santos (uillis.assis@ufba.br) on 2018-09-14T17:32:00Z (GMT) No. of bitstreams: 1 Tese Levi Santos PPGCI-UFBA 05.12.2017.pdf: 2156886 bytes, checksum: 4eed68b043eb80a2c3b2e4ca1214aaf3 (MD5) / Made available in DSpace on 2018-09-14T17:32:00Z (GMT). No. of bitstreams: 1 Tese Levi Santos PPGCI-UFBA 05.12.2017.pdf: 2156886 bytes, checksum: 4eed68b043eb80a2c3b2e4ca1214aaf3 (MD5) / Analisa a produção científica brasileira (artigos nacionais, artigos internacionais, anais de eventos e livros) através dos Mínimos Quadrados Ordinários (MQO). Para tanto, discorre sobre o percurso histórico e de aplicação das metrias que a Ciência da Informação (CI) vem construindo, desde a mais primordial de todas, a bibliometria, oriunda da biblioteconomia, passando pelas visões modernas como a cienciometria até a informetria. Explica como a econometria constrói o seu modelo de análise, que é utilizado para pesquisas na economia e, ao mesmo tempo, reflete como esse método pode ser trazido para as metrias da informação. Explica e expõe o método de estimação por MQO para a análise de regressão, que é a proposta desta tese. Pesquisa aplicada descritiva com abordagem quantitativa com procedimentos baseados no tipo de pesquisa estudo de caso do levantamento de dados a partir do Portal do Plano Tabular do CNPq do ano de 2010. Os critérios para delineamento da pesquisa foram aprofundados, na revisão de literatura, em referências tanto da área da CI quanto da bibliometria, estatística e econometria. Este estudo, metodologicamente, conta com a abordagem conceitual da bibliometria e da CI em busca de teorias aplicáveis aos estudos em MQO e a aplicação empírica do MQO se aproxima da concepção econométrica. A tese conclui que a utilização de técnicas de análises das funções de regressão construída por meio de MQO possibilita a criação de um modelo de previsão da produção científica brasileira. Esse modelo é construído a partir da correlação e determinação detectada entre o número de doutores e a produção científica destes em cada estado do Brasil. Com a aplicação de estratégias econométricas (índice de correlação, índice de determinação, forma funcional de curva de regressão e cálculo dos parâmetros da função por MQO), foi possível construir um modelo de previsão.
24

Um modelo de correção da distribuição de população para municípios brasileiros

Santos, Pedro João Costa Santos 12 August 2015 (has links)
Submitted by Pedro Santos (pcostasantos@gmail.com) on 2015-09-14T19:50:58Z No. of bitstreams: 2 Um modelo de correção da distribuição de população para municípios brasileiros - Pedro João Costa Santos, 2015.pdf: 1974775 bytes, checksum: 0911e2715fa61fdd7bdfcbd4c1dd5250 (MD5) SANTOS, 2015.zip: 1966660 bytes, checksum: 89c5e062d1015b9a26d49eeca1171f2a (MD5) / Rejected by Renata de Souza Nascimento (renata.souza@fgv.br), reason: Pedro, Será necessário alguns ajustes na formatação das primeiras páginas, conforme as normas da ABNT. Segue abaixo: CAPA: Retirar a acentuação do nome GETULIO. CAPA e CONTRA CAPA: São Paulo deve estar em letras maiúsculas. Centralizar o título Agradecimentos e retirar os espaços entre os parágrafos. Centralizar os títulos RESUMO e ABSTRACT e retirar os dois pontos. O texto deve estar abaixo do título. Após alterações, realize uma nova submissão. Att. on 2015-09-14T21:18:30Z (GMT) / Submitted by Pedro Santos (pcostasantos@gmail.com) on 2015-09-14T21:33:15Z No. of bitstreams: 2 SANTOS, 2015.zip: 1966660 bytes, checksum: 89c5e062d1015b9a26d49eeca1171f2a (MD5) Um modelo de correção da distribuição de população para municípios brasileiros - Pedro João Costa Santos, 2015.pdf: 1974743 bytes, checksum: 556fbae388c4f56704a2295832f301ea (MD5) / Approved for entry into archive by Renata de Souza Nascimento (renata.souza@fgv.br) on 2015-09-14T21:38:36Z (GMT) No. of bitstreams: 2 SANTOS, 2015.zip: 1966660 bytes, checksum: 89c5e062d1015b9a26d49eeca1171f2a (MD5) Um modelo de correção da distribuição de população para municípios brasileiros - Pedro João Costa Santos, 2015.pdf: 1974743 bytes, checksum: 556fbae388c4f56704a2295832f301ea (MD5) / Made available in DSpace on 2015-09-14T22:36:38Z (GMT). No. of bitstreams: 2 SANTOS, 2015.zip: 1966660 bytes, checksum: 89c5e062d1015b9a26d49eeca1171f2a (MD5) Um modelo de correção da distribuição de população para municípios brasileiros - Pedro João Costa Santos, 2015.pdf: 1974743 bytes, checksum: 556fbae388c4f56704a2295832f301ea (MD5) Previous issue date: 2015-08-12 / Este trabalho propõe um método para corrigir a distorção observada na distribuição de população dos municípios brasileiros presente nos dados de Censo Demográfico. Essa distorção se caracteriza por uma concentração elevada de municípios com valores de população próximos das mudanças de faixa do Fundo de Participação de Municípios (FPM). O método identifica os municípios candidatos a ajuste, ou seja, com maiores distorções obtidas através de um método Jackknife, e sugere uma correção para sua população de acordo com um modelo linear que segue a Lei de Zipf de distribuição de população de cidades (ZIPF, 1949). Após o ajuste o proposto, o teste de McCrary (2008) captura significativa redução nas descontinuidades na distribuição da população dos municípios para os anos de 2000, 2007 e 2010. / This paper proposes a correction method for the distortion observed in the Brazilian municipalities population distribution in the Demographic Census data. This distortion is defined by a heightened concentration of municipalities with population close to the values of the interchanging levels of the Municipal Participation Fund (Fundo de Participação de Municípios ­ FPM). This method identifies the candidate municipalities for adjustment by evaluating their distortion, obtained through a Jackknife method application, and suggests a corrected value for its population, according to a linear model which follows Zipf`s Law for cities (ZIPF, 1949). After the proposed adjustment, the McCrary test (2008) observes significative reduction of discontinuities in the municipalities population distribution for the years 2000, 2007 and 2010.
25

Ampliación y perfeccionamiento de los métodos cuantitativos y leyes clásicas en recuperación de la información: desarrollo de un sistema de indización y segmentación automática para textos en español

Rodríguez Luna, Manuela 29 July 2013 (has links)
Se desarrolla e implementa un Sistema de Indización y Segmentación Automática para textos largos en español, contribuyendo a su categorización textual e indización automática. Para su desarrollo, se estudian y perfeccionan los métodos cuantitativos y leyes clásicas en Recuperación de Información, como son los modelos relativos al proceso de repetición de palabras (Zipf, 1949), (Mandelbrot, 1953) y al proceso de creación de vocabulario (Heaps, 1978). Se realiza una crítica de las circunstancias de aplicación de los modelos y se estudia la estabilidad de los parámetros de manera experimental mediante recuentos en textos y sus fragmentos. Se establecen recomendaciones a priori para los valores de sus parámetros, dependiendo de las circunstancias de aplicación y del tipo de texto analizado. Se observa el comportamiento de los parámetros de las fórmulas para vislumbrar una relación directa con la tipología de texto analizado. Se propone un nuevo modelo (Log-%) para la visualización de la distribución de frecuencias de las palabras de un texto. El objetivo final es detectar los cambios temáticos que se producen en un documento, para establecer su estructura temática y obtener la indización automática de cada una de sus partes. De este modo, se obtiene la categorización del texto o documento utilizando la enumeración de sus partes temáticas a modo de niveles o estructura arbórea. Una vez constituidas las partes temáticas del texto en sus niveles correspondientes con los términos indizados, estos se agrupan en bloques distribuidos jerárquicamente según se desglose el documento en cuestión. El bloque inicial describe el contenido global de todo el documento con una cantidad inicial de palabras o descriptores. Seguidamente este bloque inicial se subdivide en varios bloques, los cuales corresponden a distintas partes del documento total, cada uno de estos también contiene una serie de palabras que describe el contenido y así sucesivamente hasta poder formar las div.... / Rodríguez Luna, M. (2013). Ampliación y perfeccionamiento de los métodos cuantitativos y leyes clásicas en recuperación de la información: desarrollo de un sistema de indización y segmentación automática para textos en español [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31517 / Palancia
26

Les Structures Spatiales de l'Est Algérien. Les maillages territoriaux, urbains et routiers.

Raham, Djamel 11 April 2001 (has links) (PDF)
L'analyse régionale est une investigation délicate puisqu'une région est un ensemble hétéroclite et complexe d'invariants et de paramètres, visibles ou invisibles, mobiles ou inertes, en relation continue et interdépendante. Cerner toutes les composantes spatiales d'une région donnée est presque du domaine de l'impossible; seulement, il faut que les facteurs pris en considération soient essentiels et déterminants et permettent donc de mettre en relief les principaux écarts et décalages qui caractérisent une région quelconque.<br />C'est ainsi que l'objectif qui a déterminé la démarche de notre étude a été d'essayer de dépeindre la configuration spatiale antérieure et actuelle de l'Algérie à travers sa partie orientale qui est "l'Est Algérien". Pour tenter d'y parvenir, nous avons pris en considération trois types de maillages qui sont les territoires (wilayas et communes), le réseau urbain (toute taille confondue) et le réseau des voies de communication avec la trame routière et le réseau ferroviaire. Pour chaque type de réseau (territoires, réseau urbain et voies de communication), il a été à chaque fois nécessaire de présenter son évolution depuis presque l'antiquité, d'étudier sa configuration actuelle puis de la confronter avec des modèles théoriques en utilisant des outils d'analyse et d'investigation mis au point dans ce contexte.<br />Il ressort cependant que quel que soit le réseau ou le maillage pris en considération, l'Est Algérien s'est toujours montré comme un exemple typique d'un espace qui présente des formes opposées identifiables quelle que soit la méthode d'analyse utilisée. Il en résulte ainsi que la région, c'est à dire l'Est Algérien, est dominée par deux systèmes spatiaux dualistes :<br />+ un système classique traditionnel caractérisant les régions périphériques (partie occidentale du Tell, sud des Hautes Plaines ou la Steppe, les Nememcha, la région du Hodna et l'Atlas Saharien) qui accusent souvent des retards et des décalages négatifs dans tous les domaines;<br />+ un système spatial hérité légué principalement par le pouvoir colonial et qui se présente globalement comme une région polarisée linéairement le long des principales voies de communication en reliant les villes les plus importantes.<br />Il s'agit en fait du modèle de la région anisotropique qui se présente sous la forme d'une succession de sous-régions polarisées autour de grands centres urbains et bien connectés par les voies de communication suivant un axe préferentiel. De part et d'autre de ce premier système hérité subsistent des sous-systèmes spatiaux marginaux.
27

Επεξεργασία πολύπλοκων ερωτημάτων και εκτίμηση ανομοιόμορφων κατανομών σε κατανεμημένα δίκτυα κλίμακας ίντερνετ / Complex query processing and estimation of distribution skewness in Internet-scale distributed networks

Πιτουρά, Θεώνη 12 January 2009 (has links)
Τα κατανεμημένα δίκτυα κλίμακας Ίντερνετ και κυρίως τα δίκτυα ομοτίμων εταίρων, γνωστά και ως peer-to-peer (p2p), που αποτελούν το πιο αντιπροσωπευτικό παράδειγμά τους, προσελκύουν τα τελευταία χρόνια μεγάλο ενδιαφέρον από τους ερευνητές και τις επιχειρήσεις λόγω των ιδιόμορφων χαρακτηριστικών τους, όπως ο πλήρης αποκεντρωτικός χαρακτήρας, η αυτονομία των κόμβων, η ικανότητα κλιμάκωσης, κ.λπ. Αρχικά σχεδιασμένα να υποστηρίζουν εφαρμογές διαμοιρασμού αρχείων με βασική υπηρεσία την επεξεργασία απλών ερωτημάτων, σύντομα εξελίχτηκαν σε ένα καινούργιο μοντέλο κατανεμημένων συστημάτων, με μεγάλες και αυξανόμενες δυνατότητες για διαδικτυακές εφαρμογές, υποστηρίζοντας πολύπλοκες εφαρμογές διαμοιρασμού δομημένων και σημασιολογικά προσδιορισμένων δεδομένων. Η προσέγγισή μας στην περιοχή αυτή γίνεται προς δύο βασικές κατευθύνσεις: (α) την επεξεργασία πολύπλοκων ερωτημάτων και (β) την εκτίμηση των ανομοιομορφιών των διαφόρων κατανομών που συναντάμε στα δίκτυα αυτά (π.χ. φορτίου, προσφοράς ή κατανάλωσης ενός πόρου, τιμών των δεδομένων των κόμβων, κ.λπ.), που εκτός των άλλων αποτελεί ένα σημαντικό εργαλείο στην υποστήριξη πολύπλοκων ερωτημάτων. Συγκεκριμένα, ασχολούμαστε και επιλύουμε τρία βασικά ανοικτά προβλήματα. Το πρώτο ανοικτό πρόβλημα είναι η επεξεργασία ερωτημάτων εύρους τιμών σε ομότιμα συστήματα κατανεμημένου πίνακα κατακερματισμού, με ταυτόχρονη εξασφάλιση της εξισορρόπησης του φορτίου των κόμβων και της ανοχής σε σφάλματα. Προτείνουμε μια αρχιτεκτονική επικάλυψης, που ονομάζουμε Saturn, που εφαρμόζεται πάνω από ένα δίκτυο κατανεμημένου πίνακα κατακερματισμού. Η αρχιτεκτονική Saturn χρησιμοποιεί: (α) μια πρωτότυπη συνάρτηση κατακερματισμού που τοποθετεί διαδοχικές τιμές δεδομένων σε γειτονικούς κόμβους, για την αποδοτική επεξεργασία των ερωτημάτων εύρους τιμών και (β) την αντιγραφή, για την εξασφάλιση της εξισορρόπησης του φορτίου προσπελάσεων (κάθετη, καθοδηγούμενη από το φορτίο αντιγραφή) και της ανοχής σε σφάλματα (οριζόντια αντιγραφή). Μέσα από μια εκτεταμένη πειραματική αξιολόγηση του Saturn και σύγκριση με δύο βασικά δίκτυα κατανεμημένου πίνακα κατακερματισμού (Chord και OP-Chord) πιστοποιούμε την ανωτερότητα του Saturn να αντιμετωπίζει και τα τρία ζητήματα που θέσαμε, αλλά και την ικανότητά του να συντονίζει το βαθμό αντιγραφής ώστε να ανταλλάζει ανάμεσα στο κόστος αντιγραφής και στο βαθμό εξισορρόπησης του φορτίου. Το δεύτερο ανοικτό πρόβλημα που αντιμετωπίζουμε αφορά την έλλειψη κατάλληλων μετρικών που να εκφράζουν τις ανομοιομορφίες των διαφόρων κατανομών (όπως, για παράδειγμα, το βαθμό δικαιοσύνης μιας κατανομής φορτίου) σε κατανεμημένα δίκτυα κλίμακας Ίντερνετ και την μη αποτελεσματική ή δυναμική εκμετάλλευση μετρικών ανομοιομορφίας σε συνδυασμό με αλγορίθμους διόρθωσης (όπως ο αλγόριθμος εξισορρόπησης φορτίου). Το πρόβλημα είναι σημαντικό γιατί η εκτίμηση των κατανομών συντελεί στην ικανότητα κλιμάκωσης και στην επίδοση αυτών των δικτύων. Αρχικά, προτείνουμε τρεις μετρικές ανομοιομορφίας (το συντελεστή του Gini, τον δείκτη δικαιοσύνης και το συντελεστή διασποράς) μετά από μια αναλυτική αξιολόγηση μεταξύ γνωστών μετρικών εκτίμησης ανομοιομορφίας και στη συνέχεια, αναπτύσσουμε τεχνικές δειγματοληψίας (τρεις γνωστές τεχνικές και τρεις προτεινόμενες) για τη δυναμική εκτίμηση αυτών των μετρικών. Με εκτεταμένα πειράματα αξιολογούμε συγκριτικά τους προτεινόμενους αλγορίθμους εκτίμησης και τις τρεις μετρικές και επιδεικνύουμε πώς αυτές οι μετρικές και ειδικά, ο συντελεστής του Gini, μπορούν να χρησιμοποιηθούν εύκολα και δυναμικά από υψηλότερου επιπέδου αλγορίθμους, οι οποίοι μπορούν τώρα να ξέρουν πότε να επέμβουν για να διορθώσουν τις άδικες κατανομές. Το τρίτο και τελευταίο ανοικτό πρόβλημα αφορά την εκτίμηση του μεγέθους αυτοσύνδεσης μιας σχέσης όπου οι πλειάδες της είναι κατανεμημένες σε κόμβους δεδομένων που αποτελούν ένα ομότιμο δίκτυο επικάλυψης. Το μέγεθος αυτοσύνδεσης έχει χρησιμοποιηθεί εκτεταμένα σε συγκεντρωτικές βάσεις δεδομένων για τη βελτιστοποίηση ερωτημάτων και υποστηρίζουμε ότι μπορεί να χρησιμοποιηθεί και σε ένα πλήθος άλλων εφαρμογών, ειδικά στα ομότιμα δίκτυα (π.χ. συσταδοποίηση του Ιστού, αναζήτηση στον Ιστό, κ.λπ.). Η συνεισφορά μας περιλαμβάνει, αρχικά, τις προσαρμογές πέντε γνωστών συγκεντρωτικών τεχνικών εκτίμησης του μεγέθους αυτοσύνδεσης (συγκεκριμένα, σειριακή, ετεροδειγματοληπτική, προσαρμοστική και διεστιακή δειγματοληψία και δειγματοληψία με μέτρηση δείγματος) στο περιβάλλον ομοτίμων εταίρων και η ανάπτυξη μια πρωτότυπης τεχνικής εκτίμησης του μεγέθους αυτοσύνδεσης, βασισμένη στο συντελεστή του Gini. Με μαθηματική ανάλυση δείχνουμε ότι οι εκτιμήσεις του συντελεστή του Gini μπορούν να οδηγήσουν σε εκτιμήσεις των υποκείμενων κατανομών δεδομένων, όταν αυτά ακολουθούν το νόμο της δύναμης ή το νόμο του Zipf και αυτές, με τη σειρά τους, σε εκτιμήσεις του μεγέθους αυτοσύνδεσης των σχέσεων των δεδομένων. Μετά από αναλυτική πειραματική μελέτη και σύγκριση όλων των παραπάνω τεχνικών αποδεικνύουμε ότι η καινούργια τεχνική που προτείνουμε είναι πολύ αποτελεσματική ως προς την ακρίβεια, την πιστότητα και την απόδοση έναντι των άλλων πέντε μεθόδων. / The distributed, Internet-scale networks, and mainly, the peer-to-peer networks (p2p), that constitute their most representative example, recently attract a great interest from the researchers and the industry, due to their outstanding properties, such as full decentralization, autonomy of nodes, scalability, etc. Initially designed to support file sharing applications with simple lookup operations, they soon developed in a new model of distributed systems, with many and increasing possibilities for Internet applications, supporting complex applications of structured and semantically rich data. Our research to the area has two basic points of view: (a) complex query processing and (b) estimation of skewness in various distributions existing in these networks (e.g. load distribution, distribution of offer, or consumption of resources, data value distributions, etc), which, among others, it is an important tool to complex query processing support. Specifically, we deal with and solve three basic open problems. The first open problem is range query processing in p2p systems based on distributed hash tables (DHT), with simultaneous guarantees of access load balancing and fault tolerance. We propose an overlay DHT architecture, coined Saturn. Saturn uses a novel order-preserving hash function that places consecutive data values in successive nodes to provide efficient range query processing, and replication to guarantee access load balancing (vertical, load-driven replication) and fault tolerance (horizontal replication). With extensive experimentation, we evaluate and compare Saturn with two basic DHT networks (Chord and OP - Chord), and certify its superiority to cope with the three above requirements, but also its ability to tune the degree of replication to trade off replication costs for access load balancing. The second open problem that we face concerns the lack of appropriate metrics to express the degree of skewness of various distributions (for example, the fairness degree of load balancing) in p2p networks, and the inefficient and offline-only exploitation of metrics of skewness, which does not enable any cooperation with corrective algorithms (for example, load balancing algorithms). The problem is important because estimation of distribution fairness contributes to system scalability and efficiency. First, after a comprehensive study and evaluation of popular metrics of skewness, we propose three of them (the coefficient of Gini, the fairness index, and the coefficient of variation), and, then, we develop sampling techniques (three already known techniques, and three novel ones) to dynamically estimate these metrics. With extensive experimentation, which comparatively evaluates both the various proposed estimation algorithms and the three metrics we propose, we show how these three metrics, and especially, the coefficient of Gini, can be easily utilized online by higher-level algorithms, which can now know when to best intervene to correct unfairness. The third and last open problem concerns self-join size estimation of a relation whose tuples are distributed over data nodes which comprise an overlay network. Self-join size has been extensively used in centralized databases for query optimization purposes, and we support that it can also be used in various other applications, specifically in p2p networks (e.g. web clustering, web searching, etc). Our contribution first includes the adaptations of five well-known self-join size estimation, centralized techniques (specifically, sequential sampling, cross-sampling, adaptive and bifocal sampling, and sample-count) to the p2p environment and a novel estimation technique which is based on the Gini coefficient. With mathematical analysis we show that, the estimates of the Gini coefficient can lead to estimates of the degree of skewness of the underlying data distribution, when these follow the power, or Zipf’s law, and these estimates can lead to self-join size estimates of those data relations. With extensive experimental study and comparison of all above techniques, we prove that the proposed technique is very efficient in terms of accuracy, precision, and cost of estimation against the other five methods.
28

Modelling of extremes

Hitz, Adrien January 2016 (has links)
This work focuses on statistical methods to understand how frequently rare events occur and what the magnitude of extreme values such as large losses is. It lies in a field called extreme value analysis whose scope is to provide support for scientific decision making when extreme observations are of particular importance such as in environmental applications, insurance and finance. In the univariate case, I propose new techniques to model tails of discrete distributions and illustrate them in an application on word frequency and multiple birth data. Suitably rescaled, the limiting tails of some discrete distributions are shown to converge to a discrete generalized Pareto distribution and generalized Zipf distribution respectively. In the multivariate high-dimensional case, I suggest modeling tail dependence between random variables by a graph such that its nodes correspond to the variables and shocks propagate through the edges. Relying on the ideas of graphical models, I prove that if the variables satisfy a new notion called asymptotic conditional independence, then the density of the joint distribution can be simplified and expressed in terms of lower dimensional functions. This generalizes the Hammersley- Clifford theorem and enables us to infer tail distributions from observations in reduced dimension. As an illustration, extreme river flows are modeled by a tree graphical model whose structure appears to recover almost exactly the actual river network. A fundamental concept when studying limiting tail distributions is regular variation. I propose a new notion in the multivariate case called one-component regular variation, of which Karamata's and the representation theorem, two important results in the univariate case, are generalizations. Eventually, I turn my attention to website visit data and fit a censored copula Gaussian graphical model allowing the visualization of users' behavior by a graph.

Page generated in 0.029 seconds