Global ETD Search

21	Detecting Near-Duplicate Documents using Sentence-Level Features and Machine Learning Liao, Ting-Yi 23 October 2012 (has links) From the large scale of documents effective to find the near-duplicate document, has been a very important issue. In this paper, we propose a new method to detect near-duplicate document from the large scale dataset, our method is divided into three parts, feature selection, similarity measure and discriminant derivation. In feature selection, document will be detected after preprocessed. Documents have to remove signals, stop words ... and so on. We measure the value of the term weight in the sentence, and then choose the terms which have higher weight in the sentence. These terms collected as a feature of the document. The document¡¦s feature set collected by these features. Similarity measure is based on similarity function to measure the similarity value between two feature sets. Discriminant derivation is based on support vector machine which train a classifiers to identify whether a document is a near-duplicate or not. support vector machine is a supervised learning strategy. It trains a classifier by the training patterns. In the characteristics of documents, the sentence-level features are more effective than terms-level features. Besides, learning a discriminant by SVM can avoid trial-and-error efforts required in conventional methods. Trial-and-error is going to find a threshold, a discriminant value to define document¡¦s relation. In the final analysis of experiment, our method is effective in near-duplicate document detection than other methods. Near-duplicate threshold trial-and-error support vector machine feature selection stop words similarity function
22	Adaptive windows for duplicate detection Draisbach, Uwe, Naumann, Felix, Szott, Sascha, Wonneberg, Oliver January 2012 (has links) Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons). / Duplikaterkennung beschreibt das Auffinden von mehreren Datensätzen, die das gleiche Realwelt-Objekt repräsentieren. Diese Aufgabe ist nicht trivial, da sich (i) die Datensätze geringfügig unterscheiden können, so dass Ähnlichkeitsmaße für einen paarweisen Vergleich benötigt werden, und (ii) aufgrund der Datenmenge ein vollständiger, paarweiser Vergleich nicht möglich ist. Zur Lösung des zweiten Problems existieren verschiedene Algorithmen, die die Datenmenge partitionieren und nur noch innerhalb der Partitionen Vergleiche durchführen. Einer dieser Algorithmen ist die Sorted-Neighborhood-Methode (SNM), welche Daten anhand eines Schlüssels sortiert und dann ein Fenster über die sortierten Daten schiebt. Vergleiche werden nur innerhalb dieses Fensters durchgeführt. Wir beschreiben verschiedene Variationen der Sorted-Neighborhood-Methode, die auf variierenden Fenstergrößen basieren. Diese Ansätze basieren auf der Intuition, dass Bereiche mit größerer und geringerer Ähnlichkeiten innerhalb der sortierten Datensätze existieren, für die entsprechend größere bzw. kleinere Fenstergrößen sinnvoll sind. Wir beschreiben und evaluieren verschiedene Adaptierungs-Strategien, von denen nachweislich einige bezüglich Effizienz besser sind als die originale Sorted-Neighborhood-Methode (gleiches Ergebnis bei weniger Vergleichen). Informationssysteme Datenqualität Datenintegration Duplikaterkennung Duplicate Detection Data Quality Data Integration Information Systems Data processing Computer science
23	Modeling and Querying Uncertainty in Data Cleaning Beskales, George January 2012 (has links) Data quality problems such as duplicate records, missing values, and violation of integrity constrains frequently appear in real world applications. Such problems cost enterprises billions of dollars annually, and might have unpredictable consequences in mission-critical tasks. The process of data cleaning refers to detecting and correcting errors in data in order to improve the data quality. Numerous efforts have been taken towards improving the effectiveness and the efficiency of the data cleaning. A major challenge in the data cleaning process is the inherent uncertainty about the cleaning decisions that should be taken by the cleaning algorithms (e.g., deciding whether two records are duplicates or not). Existing data cleaning systems deal with the uncertainty in data cleaning decisions by selecting one alternative, based on some heuristics, while discarding (i.e., destroying) all other alternatives, which results in a false sense of certainty. Furthermore, because of the complex dependencies among cleaning decisions, it is difficult to reverse the process of destroying some alternatives (e.g., when new external information becomes available). In most cases, restarting the data cleaning from scratch is inevitable whenever we need to incorporate new evidence. To address the uncertainty in the data cleaning process, we propose a new approach, called probabilistic data cleaning, that views data cleaning as a random process whose possible outcomes are possible clean instances (i.e., repairs). Our approach generates multiple possible clean instances to avoid the destructive aspect of current cleaning systems. In this dissertation, we apply this approach in the context of two prominent data cleaning problems: duplicate elimination, and repairing violations of functional dependencies (FDs). First, we propose a probabilistic cleaning approach for the problem of duplicate elimination. We define a space of possible repairs that can be efficiently generated. To achieve this goal, we concentrate on a family of duplicate detection approaches that are based on parameterized hierarchical clustering algorithms. We propose a novel probabilistic data model that compactly encodes the defined space of possible repairs. We show how to efficiently answer relational queries using the set of possible repairs. We also define new types of queries that reason about the uncertainty in the duplicate elimination process. Second, in the context of repairing violations of FDs, we propose a novel data cleaning approach that allows sampling from a space of possible repairs. Initially, we contrast the existing definitions of possible repairs, and we propose a new definition of possible repairs that can be sampled efficiently. We present an algorithm that randomly samples from this space, and we present multiple optimizations to improve the performance of the sampling algorithm. Third, we show how to apply our probabilistic data cleaning approach in scenarios where both data and FDs are unclean (e.g., due to data evolution or inaccurate understanding of the data semantics). We propose a framework that simultaneously modifies the data and the FDs while satisfying multiple objectives, such as consistency of the resulting data with respect to the resulting FDs, (approximate) minimality of changes of data and FDs, and leveraging the trade-off between trusting the data and trusting the FDs. In presence of uncertainty in the relative trust in data versus FDs, we show how to extend our cleaning algorithm to efficiently generate multiple possible repairs, each of which corresponds to a different level of relative trust. Data Cleaning Duplicate Elimination Functional Dependency Violation Probabilistic Cleaning Computer Science
24	Invariant Subspaces Of Positive Operators On Riesz Spaces And Observations On Cd0(k)-spaces Caglar, Mert 01 August 2005 (has links) (PDF) The present work consists of two main parts. In the first part, invariant subspaces of positive operators or operator families on locally convex solid Riesz spaces are examined. The concept of a weakly-quasinilpotent operator on a locally convex solid Riesz space has been introduced and several results that are known for a single operator on Banach lattices have been generalized to families of positive or close-to-them operators on these spaces. In the second part, the so-called generalized Alexandroff duplicates are studied and CDsigma, gamma(K, E)-type spaces are investigated. It has then been shown that the space CDsigma, gamma(K, E) can be represented as the space of E-valued continuous functions on the generalized Alexandroff duplicate of K. QA Functional Analysis 319-329
25	Desenvolvimento de um aplicativo em ambiente Web para a gestão do conhecimento explícito entre projetos Seis Sigma Pellini, Diego January 2008 (has links) O conhecimento desempenha papel fundamental no crescimento e competitividade das empresas. No entanto, apenas gerar conhecimento não proporciona maior poder de competição para uma organização. É preciso gerenciar esse conhecimento. A criação e a implantação de processos que gerem, armazenem, gerenciem e disseminem o conhecimento representam um dos desafios enfrentado pelas empresas. Uma fonte geradora de informações e conhecimento são projetos que são executados por equipes multifuncionais que visam diagnosticar oportunidades e implementar melhorias. A Metodologia Seis Sigma é um exemplo de projeto conduzido de forma disciplinada entre equipes que visam a melhoria na qualidade. Esta metodologia é, sem dúvida, uma das fontes geradoras de conhecimento. Entretanto, a falta de um mecanismo de gerenciamento dos projetos Seis Sigma pode acarretar em uma perda de conhecimentos adquiridos ao longo dos anos, sendo estes perdidos ou não aproveitados, uma vez que projetos são executados por diferentes equipes e as pessoas podem mudar de empresa ou função. Esta dissertação apresenta uma metodologia de gerenciamento da informação para transformá-lo em conhecimento em ambiente Intranet para difundir e compartilhar informações, objetivando ampliar o alcance e acelerar a velocidade de transferência do conhecimento. Propõe-se um software aplicativo de tipologia voltada para Intranet para favorecer a troca de conhecimento denominado de AGPS (Aplicativo para Gerenciar informações entre Projetos Seis Sigma). O sistema foi aplicado à empresa MWM International Motores e representa uma inovação para a organização como o primeiro sistema desenvolvido utilizando-se dos princípios da gestão do conhecimento, a ser utilizado como referência interna entre as empresas do grupo Navistar na América do Sul. O aplicativo auxilia na captura e estruturação do conhecimento resultante do desenvolvimento de projetos Seis Sigma, disponibilizando-o em uma base de dados compartilhada e difundida por toda a organização. Os resultados finais demonstram a efetividade do sistema aplicável aos projetos Seis Sigma e recomenda que o aplicativo seja estendido, utilizado-o como uma ferramenta na gestão do portfólio de projetos de melhoria da empresa. / Knowledge has a fundamental role in the growing and competitiveness of the companies. However, only to generate knowledge it does not provide a bigger power of competition to an organization. It is needed to manage this knowledge. The creation and implementation of processes that generate, store, manage and disseminate the knowledge, represents one of the challenges faced by the companies. A generator source of information and knowledge are projects that are executed by multifunctional teams that aim to diagnose opportunities and implement improvements. The six sigma methodology is an example of project guided in a disciplined way among teams that look for a quality improvement. This methodology is, with no doubts, one of the sources that generate knowledge. However, a lack of a mechanism that manage the Six Sigma projects can bring a loss of knowledge acquired during the last years, being that a loss of or not utilized knowledge, once projects are executed by different teams and people, sometimes, want to change company or position. This dissertation presents an information management methodology to transform it in Intranet knowledge to diffuse and share information objectifying to enlarge the reach and to accelerate the speed of knowledge transference. It is proposed an applicative software in direction to the management knowledge of Intranet named AGPS(Applicative for managing information between projects Six Sigma). The system was applied in the MWM International Engines Company and represents and innovation to the organization as a first project developed that uses knowledge management to be used as an internal reference among the Navistar group in South America. The applicative helps it to catch and structure the knowledge resultant of the development of Six Sigma projects, making it available in a data base shared by the organization. The final results show the effectiveness of the system applicable to the Six Sigma projects and it is recommended that the applicative be extended, using it as a management tool in the management projects portfolio of improvement company. Seis Sigma Gestão do conhecimento Intranet Duplicate projects Six Sigma Knowledge management Knowledge information Intranet
26	Desenvolvimento de um aplicativo em ambiente Web para a gestão do conhecimento explícito entre projetos Seis Sigma Pellini, Diego January 2008 (has links) O conhecimento desempenha papel fundamental no crescimento e competitividade das empresas. No entanto, apenas gerar conhecimento não proporciona maior poder de competição para uma organização. É preciso gerenciar esse conhecimento. A criação e a implantação de processos que gerem, armazenem, gerenciem e disseminem o conhecimento representam um dos desafios enfrentado pelas empresas. Uma fonte geradora de informações e conhecimento são projetos que são executados por equipes multifuncionais que visam diagnosticar oportunidades e implementar melhorias. A Metodologia Seis Sigma é um exemplo de projeto conduzido de forma disciplinada entre equipes que visam a melhoria na qualidade. Esta metodologia é, sem dúvida, uma das fontes geradoras de conhecimento. Entretanto, a falta de um mecanismo de gerenciamento dos projetos Seis Sigma pode acarretar em uma perda de conhecimentos adquiridos ao longo dos anos, sendo estes perdidos ou não aproveitados, uma vez que projetos são executados por diferentes equipes e as pessoas podem mudar de empresa ou função. Esta dissertação apresenta uma metodologia de gerenciamento da informação para transformá-lo em conhecimento em ambiente Intranet para difundir e compartilhar informações, objetivando ampliar o alcance e acelerar a velocidade de transferência do conhecimento. Propõe-se um software aplicativo de tipologia voltada para Intranet para favorecer a troca de conhecimento denominado de AGPS (Aplicativo para Gerenciar informações entre Projetos Seis Sigma). O sistema foi aplicado à empresa MWM International Motores e representa uma inovação para a organização como o primeiro sistema desenvolvido utilizando-se dos princípios da gestão do conhecimento, a ser utilizado como referência interna entre as empresas do grupo Navistar na América do Sul. O aplicativo auxilia na captura e estruturação do conhecimento resultante do desenvolvimento de projetos Seis Sigma, disponibilizando-o em uma base de dados compartilhada e difundida por toda a organização. Os resultados finais demonstram a efetividade do sistema aplicável aos projetos Seis Sigma e recomenda que o aplicativo seja estendido, utilizado-o como uma ferramenta na gestão do portfólio de projetos de melhoria da empresa. / Knowledge has a fundamental role in the growing and competitiveness of the companies. However, only to generate knowledge it does not provide a bigger power of competition to an organization. It is needed to manage this knowledge. The creation and implementation of processes that generate, store, manage and disseminate the knowledge, represents one of the challenges faced by the companies. A generator source of information and knowledge are projects that are executed by multifunctional teams that aim to diagnose opportunities and implement improvements. The six sigma methodology is an example of project guided in a disciplined way among teams that look for a quality improvement. This methodology is, with no doubts, one of the sources that generate knowledge. However, a lack of a mechanism that manage the Six Sigma projects can bring a loss of knowledge acquired during the last years, being that a loss of or not utilized knowledge, once projects are executed by different teams and people, sometimes, want to change company or position. This dissertation presents an information management methodology to transform it in Intranet knowledge to diffuse and share information objectifying to enlarge the reach and to accelerate the speed of knowledge transference. It is proposed an applicative software in direction to the management knowledge of Intranet named AGPS(Applicative for managing information between projects Six Sigma). The system was applied in the MWM International Engines Company and represents and innovation to the organization as a first project developed that uses knowledge management to be used as an internal reference among the Navistar group in South America. The applicative helps it to catch and structure the knowledge resultant of the development of Six Sigma projects, making it available in a data base shared by the organization. The final results show the effectiveness of the system applicable to the Six Sigma projects and it is recommended that the applicative be extended, using it as a management tool in the management projects portfolio of improvement company. Seis Sigma Gestão do conhecimento Intranet Duplicate projects Six Sigma Knowledge management Knowledge information Intranet
27	Ambiente independente de idioma para suporte a identificação de tuplas duplicadas por meio da similaridade fonética e numérica: otimização de algoritmo baseado em multithreading Andrade, Tiago Luís de [UNESP] 05 August 2011 (has links) (PDF) Made available in DSpace on 2014-06-11T19:29:40Z (GMT). No. of bitstreams: 0 Previous issue date: 2011-08-05Bitstream added on 2014-06-13T19:38:58Z : No. of bitstreams: 1 andrade_tl_me_sjrp.pdf: 1077520 bytes, checksum: 1573dc8642ce7969baffac2fd03d22fb (MD5) / Com o objetivo de garantir maior confiabilidade e consistência dos dados armazenados em banco de dados, a etapa de limpeza de dados está situada no início do processo de Descoberta de Conhecimento em Base de Dados (Knowledge Discovery in Database - KDD). Essa etapa tem relevância significativa, pois elimina problemas que refletem fortemente na confiabilidade do conhecimento extraído, como valores ausentes, valores nulos, tuplas duplicadas e valores fora do domínio. Trata-se de uma etapa importante que visa a correção e o ajuste dos dados para as etapas posteriores. Dentro dessa perspectiva, são apresentadas técnicas que buscam solucionar os diversos problemas mencionados. Diante disso, este trabalho tem como metodologia a caracterização da detecção de tuplas duplicadas em banco de dados, apresentação dos principais algoritmos baseados em métricas de distância, algumas ferramentas destinadas para tal atividade e o desenvolvimento de um algoritmo para identificação de registros duplicados baseado em similaridade fonética e numérica independente de idioma, desenvolvido por meio da funcionalidade multithreading para melhorar o desempenho em relação ao tempo de execução do algoritmo. Os testes realizados demonstram que o algoritmo proposto obteve melhores resultados na identificação de registros duplicados em relação aos algoritmos fonéticos existentes, fato este que garante uma melhor limpeza da base de dados / In order to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Database - KDD. This step has significant importance because it eliminates problems that strongly reflect the reliability of the knowledge extracted as missing values, null values, duplicate tuples and values outside the domain. It is an important step aimed at correction and adjustment for the subsequent stages. Within this perspective, techniques are presented that seek to address the various problems mentioned. Therefore, this work is the characterization method of detecting duplicate tuples in the database, presenting the main algorithms based on distance metrics, some tools designed for such activity and the development of an algorithm to identify duplicate records based on phonetic similarity numeric and language-independent, developed by multithreading functionality to improve performance over the runtime of the algorithm. Tests show that the proposed algorithm achieved better results in identifying duplicate records regarding phonetic algorithms exist, a fact that ensures better cleaning of the database Banco de dados - Gerencia Armazenamento de dados Algoritmos de computador Fonética Data cleaning Duplicate tuples Phonetics Multithreading
28	Enriquecimento de dados: uma pré-etapa em relação à limpeza de dados / Carreira , Juliano Augusto. January 2012 (has links) Orientador: Carlos Roberto Valêncio / Banca: José Márcio Machado / Banca: Marilde Terezinha Prado Santos / Resumo: A incidência de tuplas duplicadas é um problema significativo e inerente às grandes bases de dados atuais. Trata-se da repetição de registros que, na maioria das vezes, são representados de formas diferentes nas bases de dados, mas fazem referência a uma mesma entidade do mundo real, tornando, assim, a tarefa de identificação das duplicatas um trabalho árduo. As técnicas designadas para o tratamento deste tipo de problema são geralmente genéricas. Isso significa que não levam em consideração as características particulares dos idiomas o que, de certa forma, inibe a maximização quantitativa e qualitativa das tuplas duplicadas identificadas. Este trabalho propõe a criação de uma pré-etapa - intitulada "enriquecimento" - referente ao processo de identificação de tuplas duplicadas. Tal processo baseia-se no favorecimento do idioma e se dá por meio da utilização de regras de linguagem pré-definidas, de forma genérica, para cada idioma desejado. Assim, consegue-se enriquecer os registros de entrada, definidos em qualquer idioma, e, com a aproximação ortográfica que o enriquecimento proporciona, consegue-se aumentar a quantidade de tuplas duplicadas e/ou melhorar o nível de confiança em relação aos pares de tuplas duplicadas identificadas pelo processo / Abstract: The incidence of duplicate tuples is a significant problem inherent in current large databases. It is the repetition of records that, in most cases, are represented differently in the database but refer to the same real world entity thus making the task of identifying duplicates a hard work. The techniques designed to treat this type of problem are usually generic. That means they do not take into account the particular characteristics of the languages that somehow inhibits the quantitative and qualitative maximization of duplicate tuples identified. This dissertation proposes the creation of a pre-step - called "enrichment" - in relation to the process of duplicate tuples identification. This process is based on the language favoring and is through the use of predefined language rules in a general way for each language. Thus, it is possible to enrich the input records defined in any language and considering the spell approximation provided by the enrichment process, it is possible to increase the amount of duplicate tuples and/or improve the level of trust in relation to the pairs of duplicate tuples identified by the process / Mestre Banco de dados - Limpeza. Bases de dados - Tuplas duplicadas. Databases - Duplicate tuples. eng
29	Desenvolvimento de um aplicativo em ambiente Web para a gestão do conhecimento explícito entre projetos Seis Sigma Pellini, Diego January 2008 (has links) O conhecimento desempenha papel fundamental no crescimento e competitividade das empresas. No entanto, apenas gerar conhecimento não proporciona maior poder de competição para uma organização. É preciso gerenciar esse conhecimento. A criação e a implantação de processos que gerem, armazenem, gerenciem e disseminem o conhecimento representam um dos desafios enfrentado pelas empresas. Uma fonte geradora de informações e conhecimento são projetos que são executados por equipes multifuncionais que visam diagnosticar oportunidades e implementar melhorias. A Metodologia Seis Sigma é um exemplo de projeto conduzido de forma disciplinada entre equipes que visam a melhoria na qualidade. Esta metodologia é, sem dúvida, uma das fontes geradoras de conhecimento. Entretanto, a falta de um mecanismo de gerenciamento dos projetos Seis Sigma pode acarretar em uma perda de conhecimentos adquiridos ao longo dos anos, sendo estes perdidos ou não aproveitados, uma vez que projetos são executados por diferentes equipes e as pessoas podem mudar de empresa ou função. Esta dissertação apresenta uma metodologia de gerenciamento da informação para transformá-lo em conhecimento em ambiente Intranet para difundir e compartilhar informações, objetivando ampliar o alcance e acelerar a velocidade de transferência do conhecimento. Propõe-se um software aplicativo de tipologia voltada para Intranet para favorecer a troca de conhecimento denominado de AGPS (Aplicativo para Gerenciar informações entre Projetos Seis Sigma). O sistema foi aplicado à empresa MWM International Motores e representa uma inovação para a organização como o primeiro sistema desenvolvido utilizando-se dos princípios da gestão do conhecimento, a ser utilizado como referência interna entre as empresas do grupo Navistar na América do Sul. O aplicativo auxilia na captura e estruturação do conhecimento resultante do desenvolvimento de projetos Seis Sigma, disponibilizando-o em uma base de dados compartilhada e difundida por toda a organização. Os resultados finais demonstram a efetividade do sistema aplicável aos projetos Seis Sigma e recomenda que o aplicativo seja estendido, utilizado-o como uma ferramenta na gestão do portfólio de projetos de melhoria da empresa. / Knowledge has a fundamental role in the growing and competitiveness of the companies. However, only to generate knowledge it does not provide a bigger power of competition to an organization. It is needed to manage this knowledge. The creation and implementation of processes that generate, store, manage and disseminate the knowledge, represents one of the challenges faced by the companies. A generator source of information and knowledge are projects that are executed by multifunctional teams that aim to diagnose opportunities and implement improvements. The six sigma methodology is an example of project guided in a disciplined way among teams that look for a quality improvement. This methodology is, with no doubts, one of the sources that generate knowledge. However, a lack of a mechanism that manage the Six Sigma projects can bring a loss of knowledge acquired during the last years, being that a loss of or not utilized knowledge, once projects are executed by different teams and people, sometimes, want to change company or position. This dissertation presents an information management methodology to transform it in Intranet knowledge to diffuse and share information objectifying to enlarge the reach and to accelerate the speed of knowledge transference. It is proposed an applicative software in direction to the management knowledge of Intranet named AGPS(Applicative for managing information between projects Six Sigma). The system was applied in the MWM International Engines Company and represents and innovation to the organization as a first project developed that uses knowledge management to be used as an internal reference among the Navistar group in South America. The applicative helps it to catch and structure the knowledge resultant of the development of Six Sigma projects, making it available in a data base shared by the organization. The final results show the effectiveness of the system applicable to the Six Sigma projects and it is recommended that the applicative be extended, using it as a management tool in the management projects portfolio of improvement company. Seis Sigma Gestão do conhecimento Intranet Duplicate projects Six Sigma Knowledge management Knowledge information Intranet
30	Sökmotoroptimering med microdata : Hur påverkar användandet av microdata det organiska sökresultatet? Ottosson, Jacob January 2012 (has links) Arbetet handlar om sökmotoroptimering med hjälp av microdata. Hamnar webbsidor som använder sig av det högre upp i det organiska sökresultatet? Eller kan det ge mindre önskvärda konsekvenser? Från början åtogs arbetet att skapa en sökmotoroptimerad sida åt företaget Hårmakarna i Motala och som en naturlig del i arbetet dök användandet av microdataupp. Det tycks vara allmänt vedertaget att tekniker som microdata är gynnsamma vid sökmotoroptimering och frågeställningen kring huruvida ett sådant användande kunde ge en bättre placering i det organiska sökresultatet eller ej tog form. Det var särskilt intressant eftersom det inte verkade finnas några studier som bekräftade frågeställningen. Arbetet blev därför sedan uteslutet inriktat på frågeställningen. Undersökningen genomfördes genom att två webbsidor publicerades. Förutom att den ena sidan innehöll microdata var webbsidorna identiska. Direkt efter publicering genomfördes frekventa undersökningar av Googles sökmotor tills webbsidorna indexerats. Efter det utfördes sökningar på utvalda sökord med målet att få en uppfattning om webbsidornas placering över tid. Resultatet blev oväntat då webbsidan utan microdata visade sig få en sämre placering. Antagligen berodde det på det som Google kallar ”duplicate content”. / This study is dealing with the subject of search engine optimization with the usage of microdata. Do websites that use it obtain a higher placement in the organic searchresult? Or can it bring less desirable consequences? From the beginning the work consisted of creating a search engine optimized website for the Motala-based company Hårmakarna and as a natural part in the work the topicof microdata surfaced. It seems like it is common knowledge that techniques like microdata are beneficial for search engine optimization and the question whether that kind of technique could wield a better placement in the organic searchresult or not took shape. It was particulary interesting since it did not seem to exist any studies that could confirm the question. The study was performed by publishing two websites. Except that one of the websites was using microdata the were identical. When the websites were published examinations of Google's searchengine were performed until both websites were indexed. After that searches were made with selected searchwords and the aim was to get a notion of the websites' placement over time. The result was unexpected as the website with no microdata got an inferior placement probably due to what Google calls ”duplicate content”. microdata sökmotoroptimering google duplicate content SEO microformats RDFA Computer Sciences Datavetenskap (datalogi)

Search results