Global ETD Search

1	Visualization of Text Duplicates in Documents Wang, Chao, Pan, Han January 2009 (has links) In this thesis, a tool to visualize duplicate parts in a series of given documents is developed. Text duplicates are very common nowadays in all fields. This behavior severelyharms the rights of the original authors though it facilitates the work of those whocopy from them. Effective legal measures have been taken when it comes to copyrightissue. An increasing large number of people have paid serious attention to what theywrite when they refer to other people's works. Although references are properly madeby many who admire and respect others' achievements, plagiarism takes place all thetime. Therefore, an intuitive way of visualizing duplicate parts is needed so thatpeople can easily grasp the purpose and decide the legality of those duplicates. Whenit comes to computer science, software clone is very typical phenomenon amongdifferent development groups or even within one group. Since a piece of softwareusually have its hierarchy, it is also interesting to group members when they do aclone detection of their own or other software. For example, if a good overview of thehierarchies is provided in a tree representation, one can easily locate the clones of aparticular node in other trees. More interaction techniques can allow concrete codeaccesses through double clicking on a highlighted node. To visualize duplicate parts in a nice and intuitive way, a visualization tool isdeveloped for this thesis project. By the time it is done, the following features shouldbe fulfilled. First, the tool can visualize similar or identical parts given a data set.Second, hierarchies of those files can be demonstrated with proper layout. Third, theuser can manipulate the data items on the screen in order to get a better insight of thedata set and help with analysis tasks. Forth, different levels of abstraction areprovided so that the user can either get an overview of all the files or specificallycheck the duplicate parts in the documents of interest. / Visualization of Text Duplicates in Documents Duplicates PREFUSE Visualization Treemap Similarity Interaction
2	Visualization of Text Duplicates in Documents Wang, Chao, Pan, Han January 2009 (has links) <p>In this thesis, a tool to visualize duplicate parts in a series of given documents is developed.</p><p>Text duplicates are very common nowadays in all fields. This behavior severelyharms the rights of the original authors though it facilitates the work of those whocopy from them. Effective legal measures have been taken when it comes to copyrightissue. An increasing large number of people have paid serious attention to what theywrite when they refer to other people's works. Although references are properly madeby many who admire and respect others' achievements, plagiarism takes place all thetime. Therefore, an intuitive way of visualizing duplicate parts is needed so thatpeople can easily grasp the purpose and decide the legality of those duplicates. Whenit comes to computer science, software clone is very typical phenomenon amongdifferent development groups or even within one group. Since a piece of softwareusually have its hierarchy, it is also interesting to group members when they do aclone detection of their own or other software. For example, if a good overview of thehierarchies is provided in a tree representation, one can easily locate the clones of aparticular node in other trees. More interaction techniques can allow concrete codeaccesses through double clicking on a highlighted node.</p><p>To visualize duplicate parts in a nice and intuitive way, a visualization tool isdeveloped for this thesis project. By the time it is done, the following features shouldbe fulfilled. First, the tool can visualize similar or identical parts given a data set.Second, hierarchies of those files can be demonstrated with proper layout. Third, theuser can manipulate the data items on the screen in order to get a better insight of thedata set and help with analysis tasks. Forth, different levels of abstraction areprovided so that the user can either get an overview of all the files or specificallycheck the duplicate parts in the documents of interest.</p> / Visualization of Text Duplicates in Documents Duplicates PREFUSE Visualization Treemap Similarity Interaction
3	Differential expression of recent gene duplicates in developmental tissues of Arabidopsis thaliana Owens, Sarah Marie. January 2009 (has links) Title from first page of PDF document. Includes bibliographical references (p. 20-23).
4	Differential expression of recent gene duplicates in developmental tissues of <i>Arabidopsis thaliana</i> Owens, Sarah Marie 14 August 2009 (has links) No description available. Botany Genetics Molecular Biology Gene duplication recent gene duplicates differential expression gene expression unlinked duplicates expression divergence
5	A Web Scraper For Forums : Navigation and text extraction methods Palma, Michael, Zhou, Shidi January 2017 (has links) Web forums are a popular way of exchanging information and discussing various topics. These websites usually have a special structure, divided into boards, threads and posts. Although the structure might be consistent across forums, the layout of each forum is different. The way a web forum presents the user posts is also very different from how a news website presents a single piece of information. All of this makes the navigation and extraction of text a hard task for web scrapers. The focus of this thesis is the development of a web scraper specialized in forums. Three different methods for text extraction are implemented and tested before choosing the most appropriate method for the task. The methods are Word Count, Text-Detection Framework and Text-to-Tag Ratio. The handling of link duplicates is also considered and solved by implementing a multi-layer bloom filter. The thesis is conducted applying a qualitative methodology. The results indicate that the Text-to-Tag Ratio has the best overall performance and gives the most desirable result in web forums. Thus, this was the selected methods to keep on the final version of the web scraper. / Webforum är ett populärt sätt att utbyta information och diskutera olika ämnen. Dessa webbplatser har vanligtvis en särskild struktur, uppdelad i startsida, trådar och inlägg. Även om strukturen kan vara konsekvent bland olika forum är layouten av varje forum annorlunda. Det sätt på vilket ett webbforum presenterar användarinläggen är också väldigt annorlunda än hur en nyhet webbplats presenterar en enda informationsinlägg. Allt detta gör navigering och extrahering av text en svår uppgift för webbskrapor. Fokuset av detta examensarbete är utvecklingen av en webbskrapa specialiserad på forum. Tre olika metoder för textutvinning implementeras och testas innan man väljer den lämpligaste metoden för uppgiften. Metoderna är Word Count, Text Detection Framework och Text-to-Tag Ratio. Hanteringen av länk dubbleringar noga övervägd och löses genom att implementera ett flerlagers bloom filter. Examensarbetet genomförs med tillämpning av en kvalitativ metodik. Resultaten indikerar att Text-to-Tag Ratio har den bästa övergripande prestandan och ger det mest önskvärda resultatet i webbforum. Således var detta den valda metoden att behålla i den slutliga versionen av webbskrapan. Data mining Web Scraper Java Web forums Text-extraction Link Duplicates Data mining Web Scraper Java Web forums Text-extraction Link Duplicates Computer and Information Sciences Data- och informationsvetenskap
6	Transformation Of The Morphology Of The Old City Of Sulaimaniyah (northern Iraq) From The Perspective Of Ownership Patterns Amin, Hanaw Mohammed Taqi Mohammed 01 April 2010 (has links) (PDF) The main objective of the thesis is to study the forces that built up the morphology of the old city of Sulaimaniyah and the reasons of its existence. It focuses on the morphological elements of property (ownership elements of streets, blocks and parcels) and their existence in spite of the impacts of modernism in the transformation of these elements. Modernity in Sulaimaniyah city affected the city in terms of the transformation of traditional life pattern and traditional structure into modern function. Modern function covers a new administrative system and new commercial functions. The city adapted itself to these new institutions. As a consequence, the power of state displayed itself in the form of building roads, and then the new larger and regular parcels were created to adapt new functions. Furthermore, the study aims to establish the historical evolution of the city starting from its foundation and the periods, which are signified by the introduction of modernism concepts. This study is analyzed through a morphological research depending on the physical elements as quantitative characteristics. It starts with an evolutionary plan analysis, which is a tool of morphological study and covers the old part of the city in macro scale, mezzo scale, and consequently, micro scale. The old fabric of Sulaimaniyah city as organic fabric faced the modernism&#039 / s impact gradually due to the construction of streets. In spite of this fact, preserving old fabric&#039 / s morphology behind the power of ownership pattern is still surviving. In this thesis, typology is another research method, which is used to analyze the evolution of the building fabric of the city in both functional and formal configuration. In addition, a comparison of traditional and modern building types in the old city shows that there are similarities between these types, which suggest that these types are part of the old fabric of the city and they define the morphology of the old city. The study concluded that the morphology of the old city is the product of property in two dimensions (ownership parcels), and the building types on them.
7	Duplicate Gene Evolution and Expression After Polyploidization Chain, Frédéric J. J. 06 1900 (has links) Gene duplications can facilitate genetic innovation, reduce pleiotropy and catalyze reproductive incompatibilities and speciation. Therefore, the molecular and transcriptional fate of duplicate genes plays an important role in the evolutionary trajectory of entire genomes and transcriptomes. Using the polyploid African clawed frog Xenopus, I have investigated mechanisms that promote the retained expression of duplicate genes (paralogs) after whole genome duplication. The studies herein estimated molecular evolution and characterized expression divergence of thousands of duplicate genes and a singleton ortholog from a diploid outgroup. In this thesis, I have discussed the multiple mechanisms for the retention of duplicate genes in a polyploid genome and examined the potential effects that gene characteristics before duplication have on the odds of duplicate gene persistence. I have also explored the use of microarrays for comparative transcriptomics between duplicate genes, and between diverged genomes. The main objectives of my thesis were to better understand the genetic mechanisms that promote the retained expression of gene duplicates. My research utilized the duplicated genome from the allopolyploid clawed frog Xenopus. Genome duplication in clawed frogs offers a compelling opportunity to study factors that influence the genetic fates of gene duplicates because many paralogs in these frogs are of the same age, permitting one to control for the influence of time when evaluating the impact of duplication. My work has major impacts on several biological fronts including evolutionary genomics and comparative transcriptomics, and also on technical aspects of using microarrays. I have provided among the most comprehensive studies of its kind, in terms of examining molecular and regulatory aspects of thousands of expressed duplicates of the same age, and exploring various alternative hypotheses to explain how these genes are retained. / Thesis / Doctor of Philosophy (PhD) gene duplication clawed frog Xenopus retained expression of gene duplicates evolutionary genomics comparative transcriptomics microarrays
8	Evaluation of Machine Learning techniques for Master Data Management Toçi, Fatime January 2023 (has links) In organisations, duplicate customer master data present a recurring problem. Duplicate records can result in errors, complication, and inefficiency since they frequently result from dissimilar systems or inadequate data integration. Since this problem is made more complicated by changing client information over time, prompt detection and correction are essential. In addition to improving data quality, eliminating duplicate information also improves business processes, boosts customer confidence, and makes it easier to make wise decisions. This master’s thesis explores machine learning’s application to the field of Master Data Management. The main objective of the project is to assess how machine learning may improve the accuracy and consistency of master data records. The project aims to support the improvement of data quality within enterprises by managing issues like duplicate customer data. One of the research topics of study is if machine learning can be used to improve the accuracy of customer data, and another is whether it can be used to investigate scientific models for customer analysis when cleaning data using machine learning. Dimension identification, appropriate algorithm selection, appropriate parameter value selection, and output analysis are the four steps in the study's process. As a ground truth for our project, we came to conclusion that 22,000 is the correct number of clusters for our clustering algorithms which represents the number of unique customers. Saying this, the best performing algorithm based on number of clusters and the silhouette score metric turned out the be KMEANS with 22,000 clusters and a silhouette score of 0.596, followed by BIRCH with 22,000 number of clusters and a silhouette score of 0.591. Master Data Management Machine Learning data quality data duplicates Information Systems
9	Zpracování unikátních molekulárních indexů bez mapování k referenčnímu genomu / Processing of Unique Molecular Identifiers without Mapping to a Reference Genome Barilíková, Lujza January 2020 (has links) Hlavným cieľom tejto práce je návrh nového algoritmu k spracovaniu unikátnych molekulárnych indexov bez mapovania na referenčný genóm. O tieto náhodné oligonukleotidové sekvencie neustále vzrastá záujem, pretože uľahčujú rozpoznávať PCR chyby a skresľovanie údajov. Keďže používanie technológií sekvenovania novej generácie neustále rastie, je vynaložené veľké úsilie vyvíjať nástroje pre analýzu produkovaných dát. V súčasnosti sú nástroje na riešenie týchto chýb relatívne časovo náročné a zložité z dôvodu výpočtovo náročného zarovnania. Najdôležitejšie obmedzenie týchto nástrojov spočíva v skutočnosti, že pri spracovávaní duplikátov sú povolené multi-mapované čítania. Tieto čítania sú zvyčajne ignorované, čo môže viesť k zníženiu kvantitatívnej presnosti a spôsobiť zavádzajúcu interpretáciu výsledkov daného sekvenovania. V snahe vyriešiť tento problém je v tejto práci uvedený nový prístup, ktorý umožňuje odhad absolútneho počtu jedinečných molekúl s relatívne rýchlym a spoľahlivým spôsobom.
10	Выявление признаков постобработки изображений : магистерская диссертация / Photo tampering detecton Antselevich, A. A., Анцелевич, А. А. January 2015 (has links) An algorithm, which is able to find out, whether a given digital photo was tampered, and to generate tampering map, which depicts the processed parts of the image, was analyzed in details and implemented. The software was also optimized, deeply tested, the modes giving the best quality were found. The program can be launched on a usual user PC. / В процессе работы был детально разобран и реализован алгоритм поиска признаков постобработки в изображениях. Разработанное приложение было оптимизировано, было проведено его тестирование, были найдены режимы работы приложения с более высокими показателями точности. Реализованное приложение может быть запущено на обычном персональном компьютере. Помимо информации о наличии выявленных признаков постобработки полученное приложение генерирует карту поданного на вход изображения, на которой выделены его участки, возможно подвергнутые постобработке. MASTER'S THESIS IMAGE FORGERY PHOTO TAMPERING IMAGE SEGMENTATION GRAPH NORMALIZED CUT DUPLICATES ОБРАБОТКА ГРАФ ДУБЛИКАТЫ

Search results