Global ETD Search

1	A study on plagiarism detection and plagiarism direction identification using natural language processing techniques Chong, Man Yan Miranda January 2013 (has links) Ever since we entered the digital communication era, the ease of information sharing through the internet has encouraged online literature searching. With this comes the potential risk of a rise in academic misconduct and intellectual property theft. As concerns over plagiarism grow, more attention has been directed towards automatic plagiarism detection. This is a computational approach which assists humans in judging whether pieces of texts are plagiarised. However, most existing plagiarism detection approaches are limited to super cial, brute-force stringmatching techniques. If the text has undergone substantial semantic and syntactic changes, string-matching approaches do not perform well. In order to identify such changes, linguistic techniques which are able to perform a deeper analysis of the text are needed. To date, very limited research has been conducted on the topic of utilising linguistic techniques in plagiarism detection. This thesis provides novel perspectives on plagiarism detection and plagiarism direction identi cation tasks. The hypothesis is that original texts and rewritten texts exhibit signi cant but measurable di erences, and that these di erences can be captured through statistical and linguistic indicators. To investigate this hypothesis, four main research objectives are de ned. First, a novel framework for plagiarism detection is proposed. It involves the use of Natural Language Processing techniques, rather than only relying on the vii traditional string-matching approaches. The objective is to investigate and evaluate the in uence of text pre-processing, and statistical, shallow and deep linguistic techniques using a corpus-based approach. This is achieved by evaluating the techniques in two main experimental settings. Second, the role of machine learning in this novel framework is investigated. The objective is to determine whether the application of machine learning in the plagiarism detection task is helpful. This is achieved by comparing a thresholdsetting approach against a supervised machine learning classi er. Third, the prospect of applying the proposed framework in a large-scale scenario is explored. The objective is to investigate the scalability of the proposed framework and algorithms. This is achieved by experimenting with a large-scale corpus in three stages. The rst two stages are based on longer text lengths and the nal stage is based on segments of texts. Finally, the plagiarism direction identi cation problem is explored as supervised machine learning classi cation and ranking tasks. Statistical and linguistic features are investigated individually or in various combinations. The objective is to introduce a new perspective on the traditional brute-force pair-wise comparison of texts. Instead of comparing original texts against rewritten texts, features are drawn based on traits of texts to build a pattern for original and rewritten texts. Thus, the classi cation or ranking task is to t a piece of text into a pattern. The framework is tested by empirical experiments, and the results from initial experiments show that deep linguistic analysis contributes to solving the problems we address in this thesis. Further experiments show that combining shallow and viii deep techniques helps improve the classi cation of plagiarised texts by reducing the number of false negatives. In addition, the experiment on plagiarism direction detection shows that rewritten texts can be identi ed by statistical and linguistic traits. The conclusions of this study o er ideas for further research directions and potential applications to tackle the challenges that lie ahead in detecting text reuse. 378.195
2	Soil pattern recognition in a South Australian subcatchment / by Inakwo Ominyi Akots Odeh. Odeh, Inakwu Ominyi Akots January 1990 (has links) Copy of author's published article inserted. / Bibliography: leaves 191-202. / xix, 202 leaves : ill. (some col.) ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / A limitation of the geostatistical approach to spatial modelling of soil properties was redressed by adopting a continuum approach to soil classification. This involved the use of the fuzzy-c-means algorithm, to quantify pedons into intragrade and extragrade classes by minimization of the objective functional for the most "precise" classification. / Thesis (Ph.D.)--University of Adelaide, Dept. of Soil Science, 1991
3	Création et évaluation statistique d'une nouvelle de générateurs pseudo-aléatoires chaotiques / Creation and statistical evaluation of a new pseudo-random generators chaotic Wang, Qianxue 27 March 2012 (has links) Dans cette thèse, une nouvelle manière de générer des nombres pseudo-aléatoires est présentée.La proposition consiste à mixer deux générateurs exitants avec des itérations chaotiquesdiscrètes, qui satisfont à la définition de chaos proposée par Devaney. Un cadre rigoureux estintroduit, dans lequel les propriétés topologiques du générateur résultant sont données. Deuxréalisations pratiques d’un tel générateur sont ensuite présentées et évaluées. On montre que lespropriétés statistiques des générateurs fournis en entrée peuvent être grandement améliorées enprocédant ainsi. Ces deux propositions sont alors comparées, en profondeur, entre elles et avecun certain nombre de générateurs préexistants. On montre entre autres que la seconde manièrede mixer deux générateurs est largement meilleure que la première, à la fois en terme de vitesseet de performances.Dans la première partie de ce manuscrit, la fonction d’itérations considérée est la négation vectorielle.Dans la deuxième partie, nous proposons d’utiliser des graphes fortement connexescomme critère de sélection de bonnes fonctions d’itérations. Nous montrons que nous pouvonschanger de fonction sans perte de propriétés pour le générateur obtenu. Finalement, une illustrationdans le domaine de l’information dissimulée est présentée, et la robustesse de l’algorithmede tatouage numérique proposé est évalué. / In this thesis, a new way to generate pseudorandom numbers is presented. The propositionis to mix two exiting generators with discrete chaotic iterations that satisfy the Devaney’sdefinition of chaos. A rigorous framework is introduced, where topological properties of theresulting generator are given, and two practical designs are presented and evaluated. It is shownthat the statistical quality of the inputted generators can be greatly improved by this way, thusfulfilling the up-to-date standards. Comparison between these two designs and existing generatorsare investigated in details. Among other things, it is established that the second designedtechnique outperforms the first one, both in terms of performance and speed.In the first part of this manuscript, the iteration function embedded into chaotic iterations isthe vectorial Boolean negation. In the second part, we propose a method using graphs havingstrongly connected components as a selection criterion.We are thus able to modify the iterationfunction without deflating the good properties of the associated generator. Simulation resultsand basic security analysis are then presented to evaluate the randomness of this new family ofpseudorandom generators. Finally, an illustration in the field of information hiding is presented,and the robustness of the obtained data hiding algorithm against attacks is evaluated. Sécurité sur internet Suites chaotiques Tests statisques Systèmes dynamiques discrets Dissimulation d'information Internet security Chaotic sequences Statisical tests Discrete chaotic iterations Information hiding 005.8
4	Computational biology approaches in drug repurposing and gene essentiality screening Philips, Santosh 20 June 2016 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The rapid innovations in biotechnology have led to an exponential growth of data and electronically accessible scientific literature. In this enormous scientific data, knowledge can be exploited, and novel discoveries can be made. In my dissertation, I have focused on the novel molecular mechanism and therapeutic discoveries from big data for complex diseases. It is very evident today that complex diseases have many factors including genetics and environmental effects. The discovery of these factors is challenging and critical in personalized medicine. The increasing cost and time to develop new drugs poses a new challenge in effectively treating complex diseases. In this dissertation, we want to demonstrate that the use of existing data and literature as a potential resource for discovering novel therapies and in repositioning existing drugs. The key to identifying novel knowledge is in integrating information from decades of research across the different scientific disciplines to uncover interactions that are not explicitly stated. This puts critical information at the fingertips of researchers and clinicians who can take advantage of this newly acquired knowledge to make informed decisions. This dissertation utilizes computational biology methods to identify and integrate existing scientific data and literature resources in the discovery of novel molecular targets and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was able to identify a potential novel anti-estrogenic indication for the commonly used urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in the laboratory. In chapter 3 of my dissertation, through the use of a manually curated corpus and machine learning algorithms, I identified and extracted genes that are essential for cell survival. These results brighten the reality that novel knowledge with potential clinical applications can be discovered from existing data and literature by integrating information across various scientific disciplines. Drug repurposing Gene essentiality Literature mining Machine learning Biology -- Data processing Computational biology -- Methods Epidemiology -- Statisical methods Personalized medicine Genetic disorders -- Molecular diagnosis

1

Page generated in 0.0504 seconds