Although transposable elements (TEs) have been subjected to detailed study in various organisms such as humans, maize, and drosophila, this is not the case for all organisms. Despite numerous studies on the effects of TEs in the field of evolution and functional genomics, there has not been many studies yet on how much variation these elements show in populations. To address these questions, we identified TEs in Leptidea sinapis based on a newly produced high-quality genome assembly and identified novel TEs in this project. In the first step of the project, we manually curated consensus sequences of the 150 most abundant TE subfamilies. We could identify 145 of these subfamilies: two of which were non-curatable because of bad consensus sequences, three that were uncertain where they start and end, and one of the subfamilies were divided into two different subfamilies. Hence, we ended up with 146 different TE subfamilies, and the remaining part of the project was carried out using these. In the second step, we examined how the manually curated 146 subfamilies were distributed in 83 different L. sinapis individuals in the Swedish population. Before performing manual curation for our selected TEs, we looked at the TE landscape of the long-read sequenced L. sinapis genome and showed that 58.2% of the L. sinapis genome consists of TEs. In a recent study, it has been shown that 40% of L. sinapis consists of TEs. So, when compared to previous studies, our result showed that the L. sinapis genome contained more TEs than previously reported. When we made the same analysis after manual curation, we showed that this amount increased to 62.4%. The distribution of classified TEs by groups is as follows: LINE 22.6%, DNA 7.43%, SINE 4.76%, LTR 3.10%. After creating the final TE landscape for our reference genome, we analyzed 83 different individuals collected from different regions of Sweden such as Uppland, Östergötland, Västmanland, Närke, Värmland, Dalarna, Hälsingland, Småland, Medelpad, and Västerbotten for the individual number of non-reference insertions using RelocaTE2. We observed that these 146 subfamilies showed different distributions among individuals based on their sequence coverage. We couldn’t find any correlation between the number of insertions and the latitude of locations where individuals had been collected. When we look at the total number of insertions, we realized type I transposable elements were more abundant compared to type II transposable elements. Also, we checked the percentage of covered bases per individual in our dataset and observed that individuals with greater coverage had more TE insertions. After realizing this, when we analyzed individuals from different locations with very similar coverage, we could not see a significant correlation between the number of TE insertions and the latitude of locations of butterflies from different locations. For this reason, we can say that for the most abundant 146 TE subfamilies in the reference genome, there is not a significant difference between regions of Sweden. This study contributes to a better analysis of TE content in L. sinapis, and the know-how and possible problems with technical bias for individual TE insertion studies in general.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-478093 |
Date | January 2022 |
Creators | Öten, Ahmet Melih |
Publisher | Uppsala universitet, Institutionen för biologisk grundutbildning |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds