Global ETD Search

251	Using Machine Learning to Categorize Documents in a Construction Project Björkendal, Nicklas January 2019 (has links) Automation of document handling in the construction industries could save large amounts of time, effort and money and classifying a document is an important step in that automation. In the field of machine learning, lots of research have been done on perfecting the algorithms and techniques, but there are many areas where those techniques could be used that has not yet been studied. In this study I looked at how effectively the machine learning algorithm multinomial Naïve-Bayes would be able to classify 1427 documents split up into 19 different categories from a construction project. The experiment achieved an accuracy of 92.7% and the paper discusses some of the ways that accuracy can be improved. However, data extraction proved to be a bottleneck and only 66% of the original documents could be used for testing the classifier. Machine learning multinomial Naïve-Bayes construction industry document classification Computer and Information Sciences Data- och informationsvetenskap
252	Differential Abundance and Clustering Analysis with Empirical Bayes Shrinkage Estimation of Variance (DASEV) for Proteomics and Metabolomics Data Huang, Zhengyan 01 January 2019 (has links) Mass spectrometry (MS) is widely used for proteomic and metabolomic profiling of biological samples. Data obtained by MS are often zero-inflated. Those zero values are called point mass values (PMVs). Zero values can be further grouped into biological PMVs and technical PMVs. The former type is caused by the absence of components and the latter type is caused by detection limit. There is no simple solution to separate those two types of PMVs. Mixture models were developed to separate the two types of zeros apart and to perform the differential abundance analysis. However, we notice that the mixture model can be unstable when the number of non-zero values is small. In this dissertation, we propose a new differential abundance (DA) analysis method, DASEV, which applies an empirical Bayes shrinkage estimation on variance. We hypothesized that performance on variance estimation could be more robust and thus enhance the accuracy of differential abundance analysis. Disregarding the issue the mixture models have, the method has shown promising strategies to separate two types of PMVs. We adapted the mixture distribution proposed in the original mixture model design and assumed that the variances for all components follow a certain distribution. We proposed to calculate the estimated variances by borrowing information from other components via applying the assumed distribution of variance, and then re-estimate other parameters using the estimated variances. We obtained better and more stable estimations on variance, means abundances, and proportions of biological PMVs, especially where the proportion of zeros is large. Therefore, the proposed method achieved obvious improvements in DA analysis. We also propose to extend the method for clustering analysis. To our knowledge, commonly used cluster methods for MS omics data are only K-means and Hierarchical. Both methods have their own limitations while being applied to the zero-inflated data. Model-based clustering methods are widely used by researchers for various data types including zero-inflated data. We propose to use the extension (DASEV.C) as a model-based cluster method. We compared the clustering performance of DASEV.C with K-means and Hierarchical. Under certain scenarios, the proposed method returned more accurate clusters than the standard methods. We also develop an R package dasev for the proposed methods presented in this dissertation. The major functions DASEV.DA and DASEV.C in this R package aim to implement the Bayes shrinkage estimation on variance then conduct the differential abundance and cluster analysis. We designed the functions to allow the flexibility for researchers to specify certain input options. Mass spectrometry Zero inflated Point mass values Proteomics Metabolomics Bayes shrinkage estimation Bioinformatics Public Health
253	Bayesian and Empirical Bayes Approaches to Power Law Process and Microarray Analysis Chen, Zhao 12 July 2004 (has links) In this dissertation, we apply Bayesian and Empirical Bayes methods for reliability growth models based on the power law process. We also apply Bayes methods for the study of microarrays, in particular, in the selection of differentially expressed genes. The power law process has been used extensively in reliability growth models. Chapter 1 reviews some basic concepts in reliability growth models. Chapter 2 shows classical inferences on the power law process. We also assess the goodness of fit of a power law process for a reliability growth model. In chapter 3 we develop Bayesian procedures for the power law process with failure truncated data, using non-informative priors for the scale and location parameters. In addition to obtaining the posterior density of parameters of the power law process, prediction inferences for the expected number of failures in some time interval and the probability of future failure times are also discussed. The prediction results for the software reliability model are illustrated. We compare our result with the result of Bar-Lev,S.K. et al. Also, posterior densities of several parametric functions are given. Chapter 4 provides Empirical Bayes for the power law process with natural conjugate priors and nonparametric priors. For the natural conjugate priors, two-hyperparameter prior and a more generalized three-hyperparameter prior are used. In chapter 5, we review some basic statistical procedures that are involved in microarray analysis. We will also present and compare several transformation and normalization methods for probe level data. The objective of chapter 6 is to select differentially expressed genes from tens of thousands of genes. Both classical methods (fold change, T-test, Wilcoxon Rank-sum Test, SAM and local Z-score and Empirical Bayes methods (EBarrays and LIMMA) are applied to obtain the results. Outputs of a typical classical method and a typical Empirical Bayes Method are discussed in detail. Bayesian Empirical Bayes Power Law Process Nonhomogeneous Poisson Process Microarray EBarrays American Studies Arts and Humanities
254	Validity Generalization and Transportability: An Investigation of Distributional Assumptions of Random-Effects Meta-Analytic Methods Kisamore, Jennifer L 09 June 2003 (has links) Validity generalization work over the past 25 years has called into question the veracity of the assumption that validity is situationally specific. Recent theoretical and methodological work has suggested that validity coefficients may be transportable even if true validity is not a constant. Most transportability work is based on the assumption that the distribution of rho ( ρi ) is normal, yet, no empirical evidence exists to support this assumption. The present study used a competing model approach in which a new procedure for assessing transportability was compared with two more commonly used methods. Empirical Bayes estimation (Brannick, 2001; Brannick & Hall, 2003) was evaluated alongside both the Schmidt-Hunter multiplicative model (Hunter & Schmidt, 1990) and a corrected Hedges-Vevea (see Hall & Brannick, 2002; Hedges & Vevea, 1998) model. The purpose of the present study was two-fold. The first part of the study compared the accuracy of estimates of the mean, standard deviation, and the lower bound of 90 and 99 percent credibility intervals computed from the three different methods across 32 simulated conditions. The mean, variance, and shape of the distribution varied across the simulated conditions. The second part of the study involved comparing results of analyses of the three methods based on previously published validity coefficients. The second part of the study was used to show whether choice of method for determining whether transportability is warranted matters in practice. Results of the simulation analyses suggest that the Schmidt-Hunter method is superior to the other methods even when the distribution of true validity parameters violates the assumption of normality. Results of analyses conducted on real data show trends consistent with those evident in the analyses of the simulated data. Conclusions regarding transportability, however, did not change as a function of method used for any of the real data sets. Limitations of the present study as well as recommendations for practice and future research are provided. meta-analysis validity transport selection empirical bayes estimation credibility intervals American Studies Arts and Humanities
255	DEFT guessing: using inductive transfer to improve rule evaluation from limited data Reid, Mark Darren, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Algorithms that learn sets of rules describing a concept from its examples have been widely studied in machine learning and have been applied to problems in medicine, molecular biology, planning and linguistics. Many of these algorithms used a separate-and-conquer strategy, repeatedly searching for rules that explain different parts of the example set. When examples are scarce, however, it is difficult for these algorithms to evaluate the relative quality of two or more rules which fit the examples equally well. This dissertation proposes, implements and examines a general technique for modifying rule evaluation in order to improve learning performance in these situations. This approach, called Description-based Evaluation Function Transfer (DEFT), adjusts the way rules are evaluated on a target concept by taking into account the performance of similar rules on a related support task that is supplied by a domain expert. Central to this approach is a novel theory of task similarity that is defined in terms of syntactic properties of rules, called descriptions, which define what it means for rules to be similar. Each description is associated with a prior distribution over classification probabilities derived from the support examples and a rule's evaluation on a target task is combined with the relevant prior using Bayes' rule. Given some natural conditions regarding the similarity of the target and support task, it is shown that modifying rule evaluation in this way is guaranteed to improve estimates of the true classification probabilities. Algorithms to efficiently implement Deft are described, analysed and used to measure the effect these improvements have on the quality of induced theories. Empirical studies of this implementation were carried out on two artificial and two real-world domains. The results show that the inductive transfer of evaluation bias based on rule similarity is an effective and practical way to improve learning when training examples are limited. Machine learning. Transfer learning. Inductive transfer. Empirical Bayes. Multitask learning. Computer programming. Logic programming. Induction (Logic)
256	Bordautonome Ortung von Schienenfahrzeugen mit Wirbelstrom-Sensoren Geistler, Alexander. January 1900 (has links) Zugl.: Karlsruhe, Universiẗat, Diss., 2006.
257	台灣地區新上市/上櫃公司資訊結構與股價行為之研究 / A Study on the Effect of Information Structure on Valuation of Initial Public Offerings 邵靄如 Unknown Date (has links) 本研究首先以資訊差異模型描述IPOs股價橫斷面與縱斷面的比較與變化。在橫斷面比較上，本研究利用貝氏定理，令投資人在擬掛牌公司釋放出歷史資訊後，首先修正其事前信念以獲得事後信念，最後再以事後有限的資訊數量推估下一期的報酬率，然因歷史資訊之數量與品質不等，下一期預測報酬率之β係數亦顯然不同。資訊結構較佳者，估計風險較低，β係數較小；資訊結構較差者，估計風險較高，β係數較大。因此，為吸引投資人對資訊結構較差之IPOs的興趣，在必要報酬率要求較高的前提下，資訊結構較差之IPOs的承銷價格必須低訂，以製造投資人可以獲利的空間。因此，在其他條件相同的情況下，資訊結構較差之IPOs其掛牌初期的股價報酬率應該優於資訊結構較佳的IPOs。其次，在IPOs縱斷面股價行為差異之模型推導上，本研究將市場IPOs區分成資訊結構佳者與資訊結構較差者，在資訊數量與發行時間成正相關的假設下，推導出當掛牌時間t趨近時，證券間之資訊差異效果遞減，且新發行證券之β係數遞減，因而進一步推論，就所有IPOs而言，後市股價報酬率將低於估計風險相對較高的掛牌初期股價報酬率。另外，本研究之實證共分三個層次：第一層次就IPOs橫斷面股價行為方面。本研究首先就不同發行市場的IPOs之初期股價表現進行驗證。不同發行市場對擬掛牌公司之輔導期間與體質結構有不同的要求，一般而言，集中市場之發行面較店頭市場嚴格，因此，集中市場IPOs之資訊結構理論上比店頭市場IPOs佳。實證結果發現，資訊結構較佳之集中市場IPOs，其初期投資報酬率比店頭市場IPOs差。是故，實證結果支持不同發行市場之資訊結構差異假說。繼之，根據過去文獻與個案訪談的整理，若以內部人持股比、企業規模、企業年齡、承銷商聲譽、會計師聲譽與是否轉換發行市場作為單一發行市場內資訊結構優劣分際的標準時，發現，集中市場內資訊差異效果顯著；然店頭市場內，卻只有在空頭時期上櫃之IPOs，其初期投資報酬率具有資訊差異效果。實證之第二層次為檢定IPOs縱斷面之股價變化是否亦具有資訊差異效果。首先就不同發行市場做比較，實證結果發現，集中市場因資訊結構較佳，正式掛牌前投資人與發行公司間資訊不對稱情形較不嚴重，因此，當蜜月期過後，股價逐漸迴歸真值時，掛牌一年後股價之修正幅度較資訊結構相對不佳的店頭市場IPOs小，因此，不同發行市場間，IPOs資訊結構之縱斷面差異效果獲得支持。另外，集中市場IPOs類屬資訊結構較佳者，其後市股價下修程度遠比資訊結構差者來得少。至於店頭市場之差異效果，雖然資訊結構較佳者其股價修正幅度小於資訊結構較差者，然兩者間並未達到統計上顯著差異性，因此店頭市場縱斷面之資訊差異效果並未獲得支持。實證之第三層次，為檢定IPOs錯估訊號來源。實證結果發現，集中市場內，由聲譽較差之承銷商輔導上市及上市前每股盈餘越少之IPOs，越容易產生價格錯估行為。而店頭市場內，越是由聲譽較差之承銷商輔導上櫃或類屬傳統產業類股之IPOs，越容易產生價格錯估行為。 / The objective of this study is twofold. First, the paper develops a model to examine cross-sectionally and dynamically the effects of differential information on various initial public offerings (IPOs). Second, this paper examines the initial return and the after-market performance for IPOs, particularly the security valuation effects of structural differences in available information. There is a diversity of information among issuing firms at the time of their offering and particularly under certain trading system and certain market conditions. Through Bayesian model development, we support the effect of differential information among IPOs of structural differences. From empirical evidence, we find that during hot market conditions and under over-the-counter (OTC) trading system and for firms characterized by poor levels of available information, the market values of issuing firms are more likely to be overestimated in the immediate after-market. We also find positive overestimation of market values to be more likely for IPOs of smaller earnings per share (EPS) and those marketed by the less prestigious underwriters under Taiwan Security Exchange (TSE) trading system, and for IPOs other than hi-tech securities and those marketed by the less prestigious underwriters under OTC trading system. initial public offerings (IPOs) trading system information asymmetry Bayes' Rule information structure differential information
258	Quantitative Approaches to Information Hiding Braun, Christelle 17 May 2010 (has links) (PDF) In this thesis, we considered different approaches for quantifying information hiding in communication protocols. [INFO:INFO_OH] Computer Science/Other Information flow anonymity probability theory information theory Bayes inference
259	A priori structurés pour l'apprentissage supervisé en biologie computationnelle Jacob, Laurent 25 November 2009 (has links) (PDF) Les méthodes d'apprentissage supervisé sont utilisées pour construire des fonctions prédisant efficacement le comportement de nouvelles entités à partir de données observées. Elles sont de ce fait très utiles en biologie computationnelle, où elles permettent d'exploiter la quantité grandissante de données expérimentales disponible. Dans certains cas cependant, la quantité de données disponible n'est pas suffisante par rapport à la complexité du problème d'apprentissage. Heureusement ce type de problème mal posé n'est pas nouveau en statistiques. Une approche classique est d'utiliser des méthodes de régularisation ou de manière équivalente d'introduire un a priori sur la forme que la fonction devrait avoir. Dans cette thèse, nous proposons de nouvelles fonctions de régularisation basées sur la connaissance biologique de certains problèmes. Dans le contexte de la conception de vaccins ou de médicaments, nous montrons comment l'utilisation du fait que les cibles similaires lient des ligands similaires permet d'améliorer sensiblement les prédictions pour les cibles ayant peu ou n'ayant pas de ligands connus. Nous proposons également une fonction prenant en compte le fait que seuls certains groupes inconnus de cibles partagent leur comportement de liaison. Finalement, dans le cadre de la prédiction de métastase de tumeurs à partir de données d'expression, nous construisons une fonction de régularisation favorisant les estimateurs parcimonieux dont le support est une union de groupes de gènes potentiellement chevauchants définis a priori, ou un ensemble de gènes ayant tendance à être connectés sur un graphe défini a priori. [SDV] Life Sciences Bioinformatique Chémoinformatique Gène Cellule tumorale Méthode statistique prévision Bayes Apprentissage statistique Simulation
260	Filtering Social Tags for Songs based on Lyrics using Clustering Methods Chawla, Rahul 21 July 2011 (has links) In the field of Music Data Mining, Mood and Topic information has been considered as a high level metadata. The extraction of mood and topic information is difficult but is regarded as very valuable. The immense growth of Web 2.0 resulted in Social Tags being a direct interaction with users (humans) and their feedback through tags can help in classification and retrieval of music. One of the major shortcomings of the approaches that have been employed so far is the improper filtering of social tags. This thesis delves into the topic of information extraction from songs’ tags and lyrics. The main focus is on removing all erroneous and unwanted tags with help of other features. The hierarchical clustering method is applied to create clusters of tags. The clusters are based on semantic information any given pair of tags share. The lyrics features are utilized by employing CLOPE clustering method to form lyrics clusters, and Naïve Bayes method to compute probability values that aid in classification process. The outputs from classification are finally used to estimate the accuracy of a tag belonging to the song. The results obtained from the experiments all point towards the success of the method proposed and can be utilized by other research projects in the similar field. social tags lyrics classification hierarchical clustering Naive Bayes CLOPE clustering Last.fm

Search results