Global ETD Search

611	Topic discovery and document similarity via pre-trained word embeddings Chen, Simin January 2018 (has links) Throughout the history, humans continue to generate an ever-growing volume of documents about a wide range of topics. We now rely on computer programs to automatically process these vast collections of documents in various applications. Many applications require a quantitative measure of the document similarity. Traditional methods first learn a vector representation for each document using a large corpus, and then compute the distance between two document vectors as the document similarity.In contrast to this corpus-based approach, we propose a straightforward model that directly discovers the topics of a document by clustering its words, without the need of a corpus. We define a vector representation called normalized bag-of-topic-embeddings (nBTE) to encapsulate these discovered topics and compute the soft cosine similarity between two nBTE vectors as the document similarity. In addition, we propose a logistic word importance function that assigns words different importance weights based on their relative discriminating power.Our model is efficient in terms of the average time complexity. The nBTE representation is also interpretable as it allows for topic discovery of the document. On three labeled public data sets, our model achieved comparable k-nearest neighbor classification accuracy with five stateof-art baseline models. Furthermore, from these three data sets, we derived four multi-topic data sets where each label refers to a set of topics. Our model consistently outperforms the state-of-art baseline models by a large margin on these four challenging multi-topic data sets. These works together provide answers to the research question of this thesis:Can we construct an interpretable document represen-tation by clustering the words in a document, and effectively and efficiently estimate the document similarity? / Under hela historien fortsätter människor att skapa en växande mängd dokument om ett brett spektrum av publikationer. Vi förlitar oss nu på dataprogram för att automatiskt bearbeta dessa stora samlingar av dokument i olika applikationer. Många applikationer kräver en kvantitativmått av dokumentets likhet. Traditionella metoder först lära en vektorrepresentation för varje dokument med hjälp av en stor corpus och beräkna sedan avståndet mellan two document vektorer som dokumentets likhet.Till skillnad från detta corpusbaserade tillvägagångssätt, föreslår vi en rak modell som direkt upptäcker ämnena i ett dokument genom att klustra sina ord , utan behov av en corpus. Vi definierar en vektorrepresentation som kallas normalized bag-of-topic-embeddings (nBTE) för att inkapsla de upptäckta ämnena och beräkna den mjuka cosinuslikheten mellan två nBTE-vektorer som dokumentets likhet. Dessutom föreslår vi en logistisk ordbetydelsefunktion som tilldelar ord olika viktvikter baserat på relativ diskriminerande kraft.Vår modell är effektiv när det gäller den genomsnittliga tidskomplexiteten. nBTE-representationen är också tolkbar som möjliggör ämnesidentifiering av dokumentet. På tremärkta offentliga dataset uppnådde vår modell jämförbar närmaste grannklassningsnoggrannhet med fem toppmoderna modeller. Vidare härledde vi från de tre dataseten fyra multi-ämnesdatasatser där varje etikett hänvisar till en uppsättning ämnen. Vår modell överensstämmer överens med de högteknologiska baslinjemodellerna med en stor marginal av fyra utmanande multi-ämnesdatasatser. Dessa arbetsstöd ger svar på forskningsproblemet av tisthesis:Kan vi konstruera en tolkbar dokumentrepresentation genom att klustra orden i ett dokument och effektivt och effektivt uppskatta dokumentets likhet? Computer and Information Sciences Data- och informationsvetenskap
612	Of like mind: How neural representations are shaped by similarities in social perception Broom, Timothy Walter 25 August 2022 (has links) No description available. Social Psychology Psychology Neurosciences social cognitive neuroscience person perception person knowledge representational similarity analysis neural synchrony
613	Interaction of Natural Convection and Real Gas Radiation Over a Vertical Flat Plate Hale, Nathan 17 August 2023 (has links) (PDF) This study explores natural convection heat transfer and fluid flow from a vertical plate in a radiating gas accounting for real gas spectral behavior. Finite volume techniques are used to solve the coupled nonlinear partial differential equations for mass, momentum, and energy conservation, while radiation transfer is modeled using the Discrete Ordinates finite volume finite angle method. Real gas spectral behavior is accounted for using the Rank Correlated Spectral Line Weighted-sum-of-gray-gases method. It is found that gas temperature and velocity are higher in the boundary layer, thickening the thermal and hydrodynamic boundary layers compared to the limiting case of pure convection. Gas species and concentration significantly impact boundary layer development, affecting radiative heating, temperature, velocity, and wall heat fluxes. Wall radiation transport dominates over convective transport. Increasing the wall temperature for the same wall-quiescent surroundings temperature difference increases local radiative heating, temperature, and velocity, and results in higher wall heat fluxes. As Rayleigh number increases, convection gains importance relative to radiation. Higher total gas pressures moderately increase radiative heating, temperature, and velocity, while reducing wall heat fluxes and convective transport. Increased wall emissivity raises radiative heating, temperature, and velocity, while raising wall heat flux and reducing convective flux. It is concluded that the neglect of participating gas radiation effects can result in significant errors in the predicted flow and thermal behavior, and the total transport. These insights advance understanding of radiation-convection interplay in radiating gas scenarios. natural convection radiation volumetric SLW vertical plate participating gas similarity hydrodynamic wall flux Engineering
614	Glycoproteomics methods to quantify alterations in envelope protein glycosylation associated with viral evolution Chang, Deborah 13 March 2022 (has links) Infectious diseases caused by viruses such as influenza A virus (IAV) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pose major threats to human health. Glycosylation, a post-translational modification critical for biological functions including receptor recognition and binding, cell adhesion, and protein folding, is a key mediator of the interaction between viruses and host cells. IAV and SARS-CoV-2 recognize and bind to glycans on host cells prior to uptake by the cells; by the same token, the glycoproteins hemagglutinin of IAV and the spike protein of SARS-CoV-2 are the targets of both host immune molecules and vaccines. The diversity of glycans, structures made up of oligosaccharide residues in complex, branched configurations, can in part be attributed to the push and pull of evolutionary pressures from infectious disease agents such as these viral pathogens. Evolving host glycans may gain the ability to evade recognition by viruses, and likewise, the evolution of viral glycans may result in viral evasion from immune responses. Thus, for a complete understanding of host-pathogen interactions, detailed characterization of glycoproteins that quantitatively measures changes in glycosylation is necessary. However, a number of factors makes quantitative characterization of glycoproteins difficult. Firstly, glycans are highly heterogeneous with dozens of possible glycans at a given glycosylation site and different occupancy levels at each site. Secondly, a particular glycoform may have very low abundance, making the signals difficult to detect. Thirdly, it is difficult to achieve deep, quantitative measurement of glycoprotein glycans using conventional liquid chromatography-mass spectrometry experiments. The usual mass spectrometry methods are not adequate because they are biased towards selecting higher abundance precursors, which leave many glycopeptide glycoforms undetected. This dissertation begins with an assessment of the current state-of-the-art of glycoproteomics using mass spectrometry to give context to our primary research discussed in subsequent chapters. Chapter 2 describes the use of a modified Tanimoto similarity coefficient to quantify the glycosylation similarity between two variants of a strain of IAV, wild-type and mutant, both expressed in embryonated chicken eggs. Our results indicate that even subtle changes in the amino acid sequence of hemagglutinin can result in measurably distinct glycosylation. Chapter 3 expands the number of comparisons of IAV strains made in the previous chapter to include strains produced in a mammalian expression vector, Madin-Darby canine kidney cells. We show that the choice of expression system can change the population of glycoforms at some but not necessarily all glycosylation sites. In addition, we explore data-independent acquisition mass spectrometry to improve upon sensitivity and selectivity of glycopeptide identification. In Chapter 4, this data-independent acquisition method is applied to the quantitative characterization of SARS-CoV-2 spike protein. The work presented here provides a significant contribution toward improving the confident detection and assignment of site-specific glycopeptides. Furthermore, understanding how to measure changes in glycosylation in related viral glycoprotein variants offers opportunities to include consideration of specific glycosylations in the design of vaccines to potentially improve efficacy against continually evolving viruses. Biochemistry Data-independent acquisition Glycoproteomics Glycosylation similarity Influenza Mass spectrometry SARS-CoV-2
615	A framework for facial age progression and regression using exemplar face templates Elmahmudi, Ali A.M., Ugail, Hassan 20 March 2022 (has links) Yes / Techniques for facial age progression and regression have many applications and a myriad of challenges. As such, automatic aged or de-aged face generation has become an important subject of study in recent times. Over the past decade or so, researchers have been working on developing face processing mechanisms to tackle the challenge of generating realistic aged faces for applications related to smart systems. In this paper, we propose a novel approach to try and address this problem. We use template faces based on the formulation of an average face of a given ethnicity and for a given age. Thus, given a face image, the target aged image for that face is generated by applying it to the relevant template face image. The resulting image is controlled by two parameters corresponding to the texture and the shape of the face. To validate our approach, we compute the similarity between aged images and the corresponding ground truth via face recognition. To do this, we have utilised a pre-trained convolutional neural network based on the VGG-face model for feature extraction, and we then use well-known classifiers to compare the features. We have utilised two datasets, namely the FEI and the Morph II, to test, verify and validate our approach. Our experimental results do suggest that the proposed approach achieves accuracy, efficiency and possess flexibility when it comes to facial age progression or regression. Age progression Regression Average face Face similarity Template-based face generation
616	Comparing Text Similarity Functions For Outlier Detection : In a Dataset with Small Collections of Titles Rabo, Vide, Winbladh, Erik January 2022 (has links) Detecting when a title is put in an incorrect data category can be of interest for commercial digital services, such as streaming platforms, since they group movies by genre. Another example of a beneficiary is price comparison services, which categorises offers by their respective product. In order to find data points that are significantly different from the majority (outliers), outlier detection can be applied. A title in the wrong category is an example of an outlier. Outlier detection algorithms may require a metric that quantify nonsimilarity between two points. Text similarity functions can provide such a metric when comparing text data. The question therefore arises, "Which text similarity function is best suited for detecting incorrect titles in practical environments such as commercial digital services?" In this thesis, different text similarity functions are evaluated when set to detect outlying (incorrect) product titles, with both efficiency and effectiveness taken into consideration. Results show that the variance in performance between functions generally is small, with a few exceptions. The overall top performer is Sørensen-Dice, a function that divides the number of common words with the total amount of words found in both strings. While the function is efficient in the sense that it identifies most outliers in a practical time-frame, it is not likely to find all of them and is therefore deemed to not be effective enough to by applied in practical use. Therefore it might be better applied as part of a larger system, or in combination with manual analysis. / Att identifiera när en titel placeras i en felaktig datakategori kan vara av intresse för kommersiella digitala tjänster, såsom plattformar för filmströmning, eftersom filmer delas upp i genrer. Också prisjämförelsetjänster, som kategoriserar erbjudanden efter produkt skulle dra nytta. Outlier detection kan appliceras för att finna datapunkter som skiljer sig signifikant från de övriga (outliers). En titel i en felaktig kategori är ett exempel på en sådan outlier. Outlier detection algoritmer kan kräva ett mått som kvantifierar hur olika två datapunkter är. Text similarity functions kvantifierar skillnaden mellan textsträngar och kan därför integreras i dessa algoritmer. Med detta uppkommer en följdfråga: "Vilken text similarity function är bäst lämpad för att hitta avvikande titlar i praktiska miljöer såsom kommersiella digitala tjänster?”. I detta examensarbete kommer därför olika text similarity functions att jämföras när de används för att finna felaktiga produkttitlar. Jämförelsen tar hänsyn till både tidseffektivitet och korrekthet. Resultat visar att variationen i prestation mellan funktioner generellt är liten, med ett fåtal undantag. Den totalt sett högst presterande funktionen är Sørensen-Dice, vilken dividerar antalet gemensamma ord med det totala antalet ord i båda texttitlarna. Funktionen är effektiv då den identiferar de flesta outliers inom en praktisk tidsram, men kommer sannolikt inte hitta alla. Istället för att användas som en fullständig lösning, skulle det därför vara fördelaktigt att kombinera den med manuell analys eller en mer övergripande lösning. Outlier Detection Text Similarity Functions Natural Language Processing N-Nearest Neighbour Computer and Information Sciences Data- och informationsvetenskap
617	Global Slope Change Synopses for Measurement Maps Lehner, Wolfgang, Rosenthal, Frank, Fischer, Ulrike, Volk, Peter B. 01 November 2022 (has links) Quality control using scalar quality measures is standard practice in manufacturing. However, there are also quality measures that are determined at a large number of positions on a product, since the spatial distribution is important. We denote such a mapping of local coordinates on the product to values of a measure as a measurement map. In this paper, we examine how measurement maps can be clustered according to a novel notion of similarity - mapscape similarity - that considers the overall course of the measure on the map. We present a class of synopses called global slope change that uses the profile of the measure along several lines from a reference point to different points on the borders to represent a measurement map. We conduct an evaluation of global slope change using a real-world data set from manufacturing and demonstrate its superiority over other synopses. info:eu-repo/classification/ddc/004 ddc:004
618	Computational Approaches for Time Series Analysis and Prediction. Data-Driven Methods for Pseudo-Periodical Sequences. Lan, Yang January 2009 (has links) Time series data mining is one branch of data mining. Time series analysis and prediction have always played an important role in human activities and natural sciences. A Pseudo-Periodical time series has a complex structure, with fluctuations and frequencies of the times series changing over time. Currently, Pseudo-Periodicity of time series brings new properties and challenges to time series analysis and prediction. This thesis proposes two original computational approaches for time series analysis and prediction: Moving Average of nth-order Difference (MANoD) and Series Features Extraction (SFE). Based on data-driven methods, the two original approaches open new insights in time series analysis and prediction contributing with new feature detection techniques. The proposed algorithms can reveal hidden patterns based on the characteristics of time series, and they can be applied for predicting forthcoming events. This thesis also presents the evaluation results of proposed algorithms on various pseudo-periodical time series, and compares the predicting results with classical time series prediction methods. The results of the original approaches applied to real world and synthetic time series are very good and show that the contributions open promising research directions. Time series Time series analysis and prediction nth-order difference Similarity Feature extraction Data mining
619	MICROBIAL GLYCOSIDE HYDROLASE MEDIATED MODIFICATION OF HOST CELL SURFACE GLYCANS Pasupathi, Aarthi January 2023 (has links) All cells and extracellular matrices of prokaryotes and eukaryotes are made up of glycans, the carbohydrate macromolecules that play a predominant role in cell-to-cell interaction, protection, stabilization, and barrier functions. Glycans are also central to human microbiome-host interactions where bacterial glycans are recognized by innate immune signaling pathways, and host mucins are a major nutrient source for various gut bacteria. Many microorganisms encode glycoside hydrolases (GHs) to utilize the available host cell surface glycans as a nutrient source and to modulate host protein function. The GHs are divided into families having conserved linkage specificity within each family and individual family members can be specific for dramatically divergent macromolecular substrates. In general, within a given GH family very few members have been biochemically characterized and the substrate specificity is poorly understood. GH genes are abundant in the human gut microbiome and culture-enriched metagenomics identified more than 10,000 distinct bacterial GH genes in an individual. The focus of this thesis is endo-β-N-acetylglucosaminidases (ENGases) encoded by GH18 and GH85 families. Bioinformatic analysis shows that the predicted proteins within each of these GH families fell into separate clusters in the Sequence Similarity Networks of each family. The hypothesis of this project is that human microbiome-encoded ENGases from the same GH family differ in their substrate specificities and within the SSN network of the same GH family, enzymes with similar substrate specificity may fall in the same cluster. In this work, I established conditions for overexpression of GH18 and GH85 proteins and investigated the activity of these enzymes on various substrates. / Thesis / Master of Science (MSc) / All the cell surfaces of animals, plants, and microbes are coated with sugars, also known as glycans. These sugars on the cell surface act as a barrier and protect them from the external environment. Glycans on the cells of both microbes and humans are essential for basic interactions between them. Many bacteria produce enzymes such as glycoside hydrolases to obtain nutrients from dietary sugars and alter the sugars on host proteins. There are various families of these enzymes, and they act on specific sugars and cleavage sites. The substrate specificities and characterization of these enzymes from most bacteria found in the human microbiome have not been studied in detail. My work focuses on developing standard enzyme assays for determining specific substrate specificities. This tool can be used to reshape glycans and understand their role in cell processes. Glycoside Hydrolase endo-β-N-acetylglucosaminidases N-glycans GH18 GH85 Sequence similarity network enzyme specificity
620	Dare to integrate differently : A process case study of integrating knowledge differences to achieve complementarity within M&A Lindström, Eddie, Saeng-Uthat, Nitsara January 2023 (has links) Purpose: This paper aims to shed light on the process involved in acquiring and integrating complementary knowledge. The process model is based on a theoretical review of literature on complementarity in M&A’s, firm relatedness and knowledge integration. This literature acts as a foundation to construct the proposed model combined with a process oriented semi-structured interview based on a single case where we found complementary knowledge was integrated under conditions that would be challenging in this regard. The literature review provides the theoretical foundation and the process-oriented interview provides the process of implementing and understanding the theory. Therefore the following study should be considered a contribution to guide further research into this phenomena, which is currently underexplored especially from a qualitative point of view. Method: The theoretical study utilizes a synthesizing approach in connecting the literature findings and the empirical study adopts a qualitative lens by conducting three phases of study: pre-study,single case study and expert interviews. The primary data was collected through semi-structured interviews with M&A managers, and the sampling method is purposive. Findings: On a theoretical perspective we found that low external relatedness acts as a source of complementarity and low internal relatedness creates inefficiency in exploiting those complementary differences within knowledge. Allowing autonomy to the acquired firm is best when external relatedness is low as to maintain the differences that contribute to complementarity. On the contrary, if internal relatedness is low we find that high integration is recommended to ensure that internal relatedness is increased and efficiency issues are limited. When internal relatedness and external relatedness are both low, the required approach is a balance between autonomy and integration, described as symbiosis. From the empirical study, we conclude that the integration approach becomes an iterative process where the knowledge integration process plays an important role in learning to understand the acquired firm. In short, symbiosis requires close interaction and observation and an established process of integrating new knowledge to get familiar with the acquired firm. M&A integration Knowledge integration Knowledge transfer Relatedness Complementarity Similarity Business Administration Företagsekonomi

Search results