141 |
Quantifying Phonological Feature Co-occurrenceApril Lynn Grotberg (13171419) 29 July 2022 (has links)
<p>This study argues that the observed-over-expected ratio, or O/E, is an inadequate metric for measuring the strength of consonant co-occurrence in Similar Place Avoidance. I advocate for the use of Yule's Q, an odds ratio based statistic that is not influenced by the relative proportions of labials, coronals, and dorsals in the dataset. This position is advanced on general statistical and linguistic considerations as well as through the analysis of empirical data from 32 languages. </p>
<p><br></p>
<p>Parallel typological analyses are conducted using O/E and Yule's Q. Cross-linguistic comparisons using O/E suggest that CVC sequences with two coronals are the least marked of the homorganic pairs. The same analysis using Yule's Q suggests any place of articulation may be the least/most marked in a given language; there are no cross-linguistic preferences. The disagreement between the two statistics can be accounted for by the fact that O/E is sensitive to the margin totals: coronals only appear to pattern separately from the labials and dorsals in the O/E analysis because they are considerably more frequent than are labial and dorsal segments.</p>
<p><br></p>
<p>To advance the use of Yule's Q in the study of Similar Place Avoidance, the paper provides guidance on constructing confidence intervals, measuring/interpreting effect size, and appropriate use of significance testing. Two case studies on aspects of Similar Place Avoidance in Latin and Medieval Castilian illustrate the proposed methodology.</p>
|
142 |
A Descriptive Statistical Analysis of the Relationships Between Socioeconomic Status, Attendance Rates, Per Pupil Expenditures, Teacher Qualifications, and On-Time Educational Attainment Rates within the State of Virginia Including a Comparative Study of the Appalachian and Non-Appalachian School DivisionSiers, Kevin W. 20 April 2010 (has links)
PURPOSE
This study had two purposes: (a) to examine the possible predicting abilities of socioeconomic status, per pupil expenditures, percentage of highly qualified teachers and attendance rates for on-time educational attainment in the state of Virginia and (b) to compare the Appalachian School Divisions of Virginia with the non-Appalachian school divisions for each of these variables.
METHOD
Data pertaining to socioeconomic status, per pupil expenditures, attendance rates, teacher qualifications, and on-time educational attainment were collected for the graduating cohorts of 2005, 2006, 2007, and 2008. A stepwise multiple regression analysis was conducted on these variables to address the first purpose. A general linear model repeated measures ANOVA was conducted for each variable to compare differences between the Appalachian, non-Appalachian divisions of similar size, non-Appalachian large school divisions, and the total non-Appalachian divisions to address the second purpose of the study.
RESULTS
Socioeconomic status and attendance rates were found to be the independent variables that were significantly able to predict on-time educational attainment rates. Socioeconomic status rates were found to be significantly higher in the Appalachian divisions than in the non-Appalachian large school divisions. Teacher qualification rates were found to be significantly higher in the Appalachian divisions than the non-Appalachian divisions of similar size. On-time educational attainment rates were found to be significantly higher in the Appalachian school divisions than in all three classifications of the non-Appalachian divisions. / Ed. D.
|
143 |
Author Profiling en Social Media: Identificación de Edad, Sexo y Variedad del LenguajeRangel Pardo, Francisco Manuel 07 July 2016 (has links)
[EN] The possibility of knowing people traits on the basis of what they write is a field of growing interest named author profiling. To infer a user's gender, age, native language or personality traits, simply by analysing her texts, opens a wide range of possibilities from the point of view of forensics, security and marketing.
Furthermore, social media proliferation, which allows for new communication models and human relations, strengthens this wide range of possibilities to bounds never seen before. Idiosyncrasy inherent to social media makes them a special environment of communication, where freedom of expression, informality and spontaneous generation of topics and trends, enhances the knowledge of the daily reality of people in their use of language. However, the same idiosyncrasy makes difficult, or extremely costly, the application of linguistic techniques.
In this work we have proposed EmoGraph, a graph-based approach with the aim at modelling the way that users express their emotions, and the way they include them in their discourse, bearing in mind not only their frequency of occurrence, but also their position and relationship with other elements in the discourse. Our starting hypothesis is that users express themselves and their emotions differently depending on their age and gender, and besides, we think that this is independent on their language and social media where they write. We have collaborated in the creation of a common framework of evaluation at the PAN Lab of CLEF, generating resources that allowed us to verify our hypothesis achieving comparable and competitive results with the best ones obtained by other researchers on the field.
In addition, we have investigated whether the expression of emotions would help to differentiate among users of different varieties of the same language, for example, Spanish from Spain, Mexican and Argentinian, or Portuguese from Portugal and Brazil. Our hypothesis is that the variation among languages is based more on lexical aspects, and we have corroborated it after comparing EmoGraph with representations based on word patterns, distributed representations and a representation that uses the whole vocabulary, but reducing its dimensionality to only 6 features per class, what is suitable for its application to big data environments such as social media. / [ES] La posibilidad de conocer rasgos de una persona a partir únicamente de los textos que escribe se ha convertido en un área de gran interés denominada author profiling. Ser capaz de inferir de un usuario su sexo, edad, idioma nativo o los rasgos de su personalidad, simplemente analizando sus textos, abre todo un abanico de posibilidades desde el punto de vista forense, de la seguridad o del marketing.
Además, la proliferación de los medios sociales, que favorece nuevos modelos de comunicación y relación humana, potencia este abanico de posibilidades hasta cotas nunca antes vistas. La idiosincrasia inherente a estos medios sociales hace de ellos un entorno de comunicación especial, donde la libertad de expresión, la informalidad y la generación espontánea de temáticas y tendencias propician el acercamiento a la realidad diaria de las personas en su uso de la lengua. Sin embargo, esa misma idiosincrasia hace que en muchas ocasiones la aplicación de técnicas lingüísticas de análisis no sea posible, o sea extremadamente costoso.
En este trabajo hemos propuesto EmoGraph, una representación basada en grafos con el objetivo de modelar el modo en que los usuarios expresan sus emociones, y el modo en que las articulan en el marco de su discurso, teniendo en consideración no sólo su frecuencia, sino también su posición y relación con y respecto a los elementos del mismo. Nuestra hipótesis de partida es que los usuarios se expresan y expresan sus emociones de manera diferente dependiendo de su edad y sexo, y además, pensamos que esto es así independientemente de su idioma y del medio donde escriban. Hemos colaborado en la creación de un marco común de evaluación en el laboratorio PAN del CLEF, generando recursos que nos han permitido verificar nuestra hipótesis y conseguir resultados comparables y competitivos con los mejores resultados obtenidos por los investigadores del área.
Además, hemos querido investigar si la expresión de emociones permitiría diferenciar entre hablantes de diferentes variedades de una misma lengua, por ejemplo españoles, mexicanos o argentinos, o portugueses y brasileños. Nuestra hipótesis es que la variación entre lenguas se basa más en aspectos léxicos, y así lo hemos corroborado tras comparar EmoGraph con representaciones basadas en patrones, representaciones distribuidas y una representación que toma en consideración el vocabulario completo, pero reduciendo su dimensionalidad a únicamente 6 características por clase y que se erige idónea para su aplicación en entornos big data como los medios sociales. / [CA] La possibilitat de conèixer trets d'una persona únicament a partir dels textos que escriu s'ha convertit en una àrea de gran interès anomenada author profiling. Ser capaç d'inferir d'un usuari el sexe, l'edat, l'idioma nadiu o els trets de la seua personalitat tan sols analitzant els seus textos, obre tot un ventall de possibilitats des del punt de vista forense, de la seguretat o del màrketing.
A més, la proliferació dels mitjans socials, que afavoreix nous models de comunicació i de relació humana, potencia aquest ventall de possibilitats fins a cotes que no s'han vist fins ara. La idiosincràsia inherent a aquests mitjans socials en fa d'ells un entorn de comunicació especial, on la llibertat d'expressió, la informalitat i la generació espontània de temàtiques i tendències propicien l'aproximació a la realitat diària de les persones en l'ús que fan de la llengua. Tanmateix, aquesta idiosincràsia fa que en moltes ocasions no es puguin aplicar tècniques lingüístiques d'anàlisi, o que fer-ho resulti extremadament costós.
En aquest treball hem proposat EmoGraph, una representació basada en grafs que té l'objectiu de modelar la manera en què els usaris expressen les seves emocions, i la manera com les articulen en el marc de llur discurs, considerant-ne no només la freqüència sinó també la posició i la relació amb i respecte als elements del discurs. La nostra hipòtesi de partida és que els usuaris s'expressen i expressen llurs emocions de manera diferent depenent de l'edat i el sexe, i a més, pensem que això és així independentment de l'idioma i del mitjà en què escriguin. Hem col·laborat en la creació d'un marc comú d'avaluació al laboratori PAN del CLEF, generant recursos que ens han permès verificar la nostra hipòtesi i aconseguir resultats comparables i competitius amb els millors resultats obtinguts pels investigadors de l'àrea.
A més, hem volgut investigar si l'expressió d'emocions permetria establir diferències enre parlants de diferents varietats d'una mateixa llengua, per exemple espanyols, mexicans o argentins, o portuguesos i brasilers. La nostra hipòtesi és que la variació entre llengües es basa més en aspectes lèxics, i així ho hem corroborat després de comparar EmoGraph amb representacions basades en patrons, representacions distribuïdes i una representació que considera el vocabulari complet, però reduint-ne la dimensionalitat només a 6 característiques per classe i que s'erigeix de manera idònia per a aplicar-la en entorns big data com els mitjans socials. / Rangel Pardo, FM. (2016). Author Profiling en Social Media: Identificación de Edad, Sexo y Variedad del Lenguaje [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/67270
|
144 |
A critical analysis of equal remuneration claims in South African lawEbrahim, Shamier 20 July 2015 (has links)
The legislation relating to equal remuneration claims is an area of law which is nuanced and consequently poorly understood. It has posed an unattainable mountain for many claimants who came before the South African courts. This is as a direct result of the lack of an adequate legal framework providing for same in the Employment Equity Act 55 of 1998. The case law recognises two causes of action relating to equal remuneration. The first cause of action is equal remuneration for the same/similar work. The second is equal remuneration for work of equal value. The former is easily understood by both claimants and courts but the latter is poorly understood and poses many difficulties. The aim of this dissertation is fourfold. Firstly, the problems and criticisms regarding equal remuneration claims will be briefly highlighted. Secondly, a comprehensive analysis of the current legal framework will be set out together with the inadequacies. Thirdly, an analysis of international law and the law of the United Kingdom relating to equal remuneration claims will be undertaken. Fourthly, this dissertation will conclude by proposing recommendations to rectify the inadequacies. / Mercantile Law / LL.M. (Labour law)
|
145 |
Partly exchangeable fragmentationsChen, Bo January 2009 (has links)
We introduce a simple tree growth process that gives rise to a new two-parameter family of discrete fragmentation trees that extends Ford's alpha model to multifurcating trees and includes the trees obtained by uniform sampling from Duquesne and Le Gall's stable continuum random tree. We call these new trees the alpha-gamma trees. In this thesis, we obtain their splitting rules, dislocation measures both in ranked order and in sized-biased order, and we study their limiting behaviour. We further extend the underlying exchangeable fragmentation processes of such trees into partly exchangeable fragmentation processes by weakening the exchangeability. We obtain the integral representations for the measures associated with partly exchangeable fragmentation processes and subordinator of the tagged fragments. We also embed the trees associated with such processes into continuum random trees and study their limiting behaviour. In the end, we generate a three-parameter family of partly exchangeable trees which contains the family of the alpha-gamma trees and another important two-parameter family based on Poisson-Dirichlet distributions.
|
146 |
Zdaňování provozovatelů hazardních her / Taxation of gambling operatorsDuda, Václav January 2015 (has links)
Title: Taxation of gambling operators The aim of this thesis is to introduce the system of the taxation of gambling operators currently applicable in the Czech republic together with the proposed bill on the gambling and the proposed bill on the taxation of gambling. At the beginning of the first chapter the thesis discusses the social costs of gambling addiction being a current social issue and its possible solutions from the viewpoint of law. Further parts of the chapter subsequently deal with the european legislation on public regulation of gambling, institutes of the Act on lotteries on other similar games as the primary piece of gambling legislation and their relation to the Civil Code. The subchapter regarding the right of municipalities to regulate some of the types of gambling by the means of municipal ordinances follows, including the question of an assesment of the said issue by the Constitutional Court. The last part of the first chapter contains detailed analysis of the proposed bill on the gambling, including comparation of some of its institutes with the current legislation and their partial assesment. The second chapter focuses on the actual system of legal regulation of the taxation of gambling operators, analyses the types of financial duties imposed on the operators and does not...
|
147 |
O controle do comportamento de escolha: um modelo experimental do merchandising no ponto de venda / The control of the choice behavior: an experimental model of the point-of-purchase merchandisingParucker, Fabio 29 May 2006 (has links)
Made available in DSpace on 2016-04-29T13:17:59Z (GMT). No. of bitstreams: 1
Dissertacao Fabio Parucker.pdf: 908833 bytes, checksum: cade17bb9b7a9c18bedb34b88711ab46 (MD5)
Previous issue date: 2006-05-29 / One of the most used Marketing tools for the influence of the consumers` choice is the point-of-purchase merchandising, whith which one attempts to dettach the product in an universe of very similar options. From the behavioral analytical point of view the point-of-purchase merchandising could be seen as a stimulus control in which subjects (consumers) respond (choose varied goods) diferentially in the presence or absence of the exteroceptive stimulus. To test this set of contingencies six experimental subjects were used, all of them adult, male, Wistar rats, experimentally naive at the beggining of the experiment. The bar pressure response was first modeled in all subjects. They were distributed into three groups. During the first part of the experiment, each group was trainned to respond in one bar and was exposed to multiple or mixed schedule of reinforcement in which VI were alternated with CRF as a function of an amount of reinforces obtained. Groups VsCn and VnCs were exposed to multiple schedules in which the stimulus (light) was associated with VI and CRF respectively, whilst group VnCn was exposed to a mixed schedule of reinforcement with no presentation of the stimulus. During second part of the experiment, all subjects were exposed to a reinforcement schedule conc VI VI to determine the base line of responding when two identical options of operanda were available to the subject. During this part of the experiment, the stimulus was not presented. During the third part of the experiment the reinforcement concurrent schedule was maintained and the stimululs were presented randomically over one of the bars to evaluate the control exerted over the pressing bar response. The results of groups VsCn and VnCs show that the stimulus has acquired certain control over the responding. Subject 85 responded more in the bar over which the stimulus was presented. This fact was not repeated by subject 86, whose responding was rather controlled by the position of the bar. The analysis of the control group (VnCn) data show that there has been a development of control by the stimulus only when it was presented over the bar which the subject demonstrated preference to during part 2 of the experiment. Group VnCs responded more on the Bar 1 in the post-presentation periods of the stimulus, suggesting that the control by past history with the stimulus was stablished, i.e., it was determining on the present responding of the subjects having been exposed to multiple schedules with the stimulus paired with CRF condition. The analysis of control group data (VnCn) shown that there has been a development of control by the stimulus during part 3 only when it was presented over the bar with which the subject developed a preference during part 2 / Uma das ferramentas mais utilizadas pelo Marketing para influenciar a escolha do consumidor é o merchandising no ponto de venda, em que se busca destacar o produto em um universo de escolhas muito parecidas entre si devido a um processo de populariazação das tecnologias produtivas. Do ponto de vista analítico-comportamental, o merchandising no ponto de venda pode ser considerado como uma produção de estímulo ao qual os sujeitos (consumidores) respondem (escolhem bens variados) diferencialmente na presença ou na ausência do estímulo exteroceptivo. Para testar esse arranjo de contingências foram utilizados seis sujeitos experimentais, ratos adultos da raça Wistar, ingênuos experimentalmente no início do experimento. A resposta de pressão à barra foi modelada em todos os sujeitos que foram distribuídos em três grupos. Na primeira fase do experimento, cada grupo foi, então, treinado a responder em uma barra e exposto a esquemas múltiplos ou mistos de reforçamento em que se alternavam VI e CRF em função de uma quantidade fixa de reforços obtidos. Os grupos VsCn e VnCs foram expostos a esquemas múltiplos em que houve o pareamento do estímulo luminoso com VI e com CRF respectivamente, enquanto o grupo VnCn foi exposto a um esquema de reforçamento misto em que se alternou o VI com CRF sem apresentação do estímulo luminoso. Na segunda fase do experimento, todos os sujeitos foram expostos a um esquema de reforçamento concorrente VI VI para se determinar a linha de base do responder quando duas opções idênticas de operanda estavam disponíveis ao sujeito. Nesta fase, o estímulo luminoso não foi apresentado. Na terceira fase, manteve-se o esquema de reforçamento conc VI VI e apresentou-se o estímulo luminoso aleatoriamente sobre uma ou outra barra durante períodos de dois minutos para avaliar quanto controle o estímulo estaria exercendo sobre a resposta de pressionar a barra. Os resultados do grupo VsCn mostram que o estímulo luminoso exerceu um certo controle sobre o responder dos sujeitos. O sujeito 85 respondeu mais na barra sobre a qual foi apresentado o estímulo luminoso, fato que não se repetiu com o sujeito 86, cujo responder foi mais controlado pela posição da barra. O grupo VnCs respondeu mais acentuadamente na Barra 1 nos períodos pós-apresentação do estímulo luminoso, sugerindo que houve o estabelecimento de controle por parte da história anterior do sujeitos com o estímulo luminoso, ou seja, terem sido treinados em esquema múltiplo, onde a condição de não luz foi pareada ao CRF foi determinante no responder atual dos sujeitos. A análise dos dados do grupo controle (VnCn) mostrou que houve o desenvolvimento de controle pelo estímulo luminoso apenas quando ele foi apresentado sobre a barra em que o sujeito demonstrou preferência na Fase 2 do experimento, ou seja, o controle do estímulo luminoso sobre o responder só foi exercido na Fase 3 quando a apresentação coincidiu com a barra pela qual o sujeito tinha demonstrado preferência na Fase 2
|
148 |
Résolution de deux types d’équations opératorielles et interactions / Solution of 2 kind of operator equations and interactionsMansour, Abdelouahab 15 September 2016 (has links)
Le sujet de cette thèse porte sur la résolution d'équations d'opérateurs dans l'algèbre B(H) des opérateurs linéaires bornés sur un espace de Hilbert H. Nous étudié celles qui sont associées aux dérivations généralisées. Mon sujet de thèse explore aussi des équations beaucoup plus générales comme celles du type AXB - XD = E ou AXB - CXD = E où A, B, C, D et E appartiennent à B(H). Plus précisément il s'agit de donner une description des solutions de ces équations pour E appartenant à une famille précise(autoadjoint, normal, rang un, rang fini, compact, couple de Fuglède Putnam) et pour des opérateurs A, B, C et D appartenant à des bonnes classes d'opérateurs ( celles qui interviennent dans les applications, notamment en physique) comme les opérateurs autoadjoints, les opérateurs normaux, sous normaux,... En dehors du cas où les spectres de A et B sont disjoints, il n'existe pas de méthode générale pour construire de manière effective l'ensemble des solutions de l'équation de Sylvester AX - XB = C à partir des opérateurs A, B et C. Un des objectifs de mon travail de thèse est de fournir une méthode constructive dans le cas où A, B et C appartiennent à des bonnes classes d'opérateurs. Une étude spectrale des solutions est également faite. A coté de cette étude qualitative, il y a aussi une étude quantitative.Il s'agit d'obtenir aussi des estimations précises de la norme d'opérateur(ou norme de Schatten) des solutions en fonction des normes des opérateurs correspondants aux données. Ceci nous a d'ailleurs conduit à des résultats concernant quelques inégalités intéressantes pour les dérivations généralisées, et enfin quelques résultats concernant les opérateurs dans un espace de Banach sont également donnés / The subject of this thesis focuses on the resolution of operator equationsin B(H) algebra of bounded linear operators on a Hilbert space. We studythose associated with generalized derivations. In this thesis, we also exploremore general equations such as the type AXB - XD = E or AXB -CXD = E where A, B, C, D and E belong to B(H). Specifically it is adescription of the solutions of these equations for E belongs in a precisefamily (Self-adjoint, normal, rank one, finite rank, compact, pair of FugledePutnam) and the operators A, B, C and D belonging to the good classesof operators (Those involved in applications , especially in physics) as theself-adjoint operators, normal operators, subnormal operators... Apart fromthe case where the spectra of A and B are disjoint, there is not any generalmethod for constructing effectively all solutions of the Sylvester equationAX - XB = C from the given operators A, B and C. One objective of thisthesis is to provide a constructive approach in when A, B and C belong toconventional families of operators. A spectral study of the solutions is alsostudied. Besides this qualitative study, there is also a quantitative study.It is also to obtain accurate estimates of the operator norm (or norm ofSchatten) of the solutions in terms of operator norms corresponding to data.This also led us to obtain results concerning some interesting inequalitiesfor generalized derivations, and finally some examples and properties ofoperators on a Banach space are also given
|
149 |
Synthèse de mimes de mycolactones pour l’étude mécanistique de l’ulcère de Buruli / Synthesis of mycolactone mimetics for the mechanistic study of Buruli ulcerTresse, Cédric 29 September 2014 (has links)
Ce projet de recherche se focalise sur les infections par mycobacterium ulcerans (maladie de l’ulcère de Buruli), une maladie de la peau dévastatrice caractérisée par la formation de lésions nécrotiques progressives et l’absence d’une réponse inflammatoire. Bien que négligée, cette infection est la troisième maladie mycobactérienne la plus répandue après la tuberculose et la lèpre et des cas sont rapportés dans plus de 30 pays à travers le monde. Mycobacterium ulcerans sécrète une toxine polycétidique complexe, appelée mycolactone A/B, qui est directement responsable des effets pathogènes de la maladie. Depuis sa découverte, les propriétés biologiques inhabituelles de la mycolactone A/B ont suscité de nombreux efforts de recherche dans différents domaines. Dans ce contexte, ce projet s’intéresse à l’élucidation du mécanisme d’action des mycolactones en utilisant la synthèse totale comme outil principal. Dans cette optique, notre équipe a mis en place une voie de synthèse permettant un accès facile et robuste à différents mimes de mycolactone. L’utilisation de cette méthode a conduit à la préparation de 13 mimes de la toxine au cours de cette thèse. D’autre part notre équipe s’intéresse également à la préparation de mimes possédant un ou plusieurs atomes de fluor. Ces derniers présentent un intérêt particulier pour améliorer la compréhension des interactions ayant lieu entre la toxine et sa cible cellulaire. Les travaux réalisés autours de la synthèse de mycolactones fluorés ont conduit à la mise au point d’une méthode générale et simple pour introduire un groupe trifluorométhyle sur un alcyne terminal, permettant ainsi des modulations inédites de la structure de la toxine. / This research project focuses on mycobacterium ulcerans infection (Buruli ulcer disease), a severe skin disease characterized by the formation of progressive necrotic lesions and the lack of an acute inflammatory response. Although neglected, this infection is the third most common mycobacteriosis after Mycobacterium tuberculosis and Mycobacterium leprae, and cases are reported in more than 30 countries worldwide. Mycobacterium ulcerans secretes a complex polyketidic macrolide, called mycolactone A/B, which is directly involved in the biological effects of the disease. Since its discovery, the unusual biology triggered by this toxin has spurred research efforts. In this context, this research project aims at a better understanding of mycolactone A/B molecular interactions by using total synthesis as main tool. To this end, our research team has developed an efficient synthetic pathway allowing the preparation of different mimetics of the toxin. This synthesis has been used to prepare thirteen new mycolactone mimetics during this thesis. Moreover our team has also been interested in the synthesis of fluorinated mycolactone analogs. Such fluorinated mycolactones are of great interest to improve the interactions that occur between the toxin and its biological binding site. Work in this field led to the development of a simple and general method to introduce a trifluoromethyl group onto a terminal alkyne, allowing novel modulation of the structure of the toxin.
|
150 |
在高度分散式環境下進行Top-k相似文件檢索 / Similar Top-k documents retrieval in highly distributed environments王俊閎, Wang, Chun Hung Unknown Date (has links)
在文件資料庫的查詢處理上,Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中,檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度,選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫,因其覆蓋性和可擴充性的不足,使得這種排名傾向的文件查詢處理,需耗費大量時間及運算成本。近年來,使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢,但在高度分散式環境下,支援排名傾向的相似文件查詢是困難的,因為缺乏全域資訊和適當的系統協調者。
在本研究中,我們先針對各節點資料庫作分群前處理,並提出一個利用區域切割的作法[1],將P2P環境劃分成數個子區塊後,建立特徵索引表。因此在查詢處理時,可透過索引表加快挑選出Top-k相似群集的速度,並且確保有適當數量的回傳結果。最後在實驗中,我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較,在高度分散式環境下,我們的方法在執行Top-k相似文件查詢時,會比上述兩種作法有較為優異的表現。 / On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty.
In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment.
|
Page generated in 0.04 seconds