• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 301
  • 103
  • 39
  • 35
  • 32
  • 23
  • 11
  • 10
  • 9
  • 8
  • 8
  • 6
  • 6
  • 5
  • 5
  • Tagged with
  • 695
  • 128
  • 127
  • 125
  • 105
  • 93
  • 90
  • 83
  • 78
  • 70
  • 59
  • 57
  • 55
  • 54
  • 53
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
591

It’s a Match: Predicting Potential Buyers of Commercial Real Estate Using Machine Learning

Hellsing, Edvin, Klingberg, Joel January 2021 (has links)
This thesis has explored the development and potential effects of an intelligent decision support system (IDSS) to predict potential buyers for commercial real estate property. The overarching need for an IDSS of this type has been identified exists due to information overload, which the IDSS aims to reduce. By shortening the time needed to process data, time can be allocated to make sense of the environment with colleagues. The system architecture explored consisted of clustering commercial real estate buyers into groups based on their characteristics, and training a prediction model on historical transaction data from the Swedish market from the cadastral and land registration authority. The prediction model was trained to predict which out of the cluster groups most likely will buy a given property. For the clustering, three different clustering algorithms were used and evaluated, one density based, one centroid based and one hierarchical based. The best performing clustering model was the centroid based (K-means). For the predictions, three supervised Machine learning algorithms were used and evaluated. The different algorithms used were Naive Bayes, Random Forests and Support Vector Machines. The model based on Random Forests performed the best, with an accuracy of 99.9%. / Denna uppsats har undersökt utvecklingen av och potentiella effekter med ett intelligent beslutsstödssystem (IDSS) för att prediktera potentiella köpare av kommersiella fastigheter. Det övergripande behovet av ett sådant system har identifierats existerar på grund av informtaionsöverflöd, vilket systemet avser att reducera. Genom att förkorta bearbetningstiden av data kan tid allokeras till att skapa förståelse av omvärlden med kollegor. Systemarkitekturen som undersöktes bestod av att gruppera köpare av kommersiella fastigheter i kluster baserat på deras köparegenskaper, och sedan träna en prediktionsmodell på historiska transkationsdata från den svenska fastighetsmarknaden från Lantmäteriet. Prediktionsmodellen tränades på att prediktera vilken av grupperna som mest sannolikt kommer köpa en given fastighet. Tre olika klusteralgoritmer användes och utvärderades för grupperingen, en densitetsbaserad, en centroidbaserad och en hierarkiskt baserad. Den som presterade bäst var var den centroidbaserade (K-means). Tre övervakade maskininlärningsalgoritmer användes och utvärderades för prediktionerna. Dessa var Naive Bayes, Random Forests och Support Vector Machines. Modellen baserad p ̊a Random Forests presterade bäst, med en noggrannhet om 99,9%.
592

Language identification for typologically similar low-resource languages: : A case study of Meänkieli, Kven and Finnish / Språkidentifering för typologiskt närbesläktade lågresursspråk: : En fallstudie för meänkieli, kvänska och finska

Larsson, Jacob January 2024 (has links)
This study examines different methods of language identification for the languages Meänkieli, Kven, and Finnish. The methods explored are two n-gram-based classifiers; Naive Bayes and TextCat and one word embedding-based classifier; fastText. These models were trained on approximately 100.000 sentences taken from the three languages and further divided into four separate datasets to examine how data availability impacts the final performance of the trained models. The study found that the best model for the examined dataset was the fastText classifier, but for languages with less available material a naive Bayes classifier might be more appropriate. / Denna studie utforskar olika metoder av språkidentifering för språken meänkieli, kvänska och finska. Två metoder baserade på n-gram undersöks; naive Bayes och TextCat samt en metod med ordinbäddningar; fastText. Dessa modeller tränades på sammanlagt 100 000 meningar taget från dessa tre språk och delades vidare in i fyra delmängder för att utvärdera hur stor inverkan storleken av träningsdata har på de tränade modellerna. Studien fann att den bästa implementationen utifrån den undersökta datamängden var fastText, medans språk med färre resurser skulle förmodligen gynnas bättre av en språkidentifering byggd med en naive Bayes klassifierare.
593

在Spark大數據平台上分析DBpedia開放式資料:以電影票房預測為例 / Analyzing DBpedia Linked Open Data (LOD) on Spark:Movie Box Office Prediction as an Example

劉文友, Liu, Wen Yu Unknown Date (has links)
近年來鏈結開放式資料 (Linked Open Data,簡稱LOD) 被認定含有大量潛在價值。如何蒐集與整合多元化的LOD並提供給資料分析人員進行資料的萃取與分析,已成為當前研究的重要挑戰。LOD資料是RDF (Resource Description Framework) 的資料格式。我們可以利用SPARQL來查詢RDF資料,但是目前對於大量RDF的資料除了缺少一個高性能且易擴展的儲存和查詢分析整合性系統之外,對於RDF大數據資料分析流程的研究也不夠完備。本研究以預測電影票房為例,使用DBpedia LOD資料集並連結外部電影資料庫 (例如:IMDb),並在Spark大數據平台上進行巨量圖形的分析。首先利用簡單貝氏分類與貝氏網路兩種演算法進行電影票房預測模型實例的建構,並使用貝氏訊息準則 (Bayesian Information Criterion,簡稱BIC) 找到最佳的貝氏網路結構。接著計算多元分類的ROC曲線與AUC值來評估本案例預測模型的準確率。 / Recent years, Linked Open Data (LOD) has been identified as containing large amount of potential value. How to collect and integrate multiple LOD contents for effective analytics has become a research challenge. LOD is represented as a Resource Description Framework (RDF) format, which can be queried through SPARQL language. But large amount of RDF data is lack of a high performance and scalable storage analysis system. Moreover, big RDF data analytics pipeline is far from perfect. The purpose of this study is to exploit the above research issue. A movie box office sale prediction scenario is demonstrated by using DBpedia with external IMDb movie database. We perform the DBpedia big graph analytics on the Apache Spark platform. The movie box office prediction for optimal model selection is first evaluated by BIC. Then, Naïve Bayes and Bayesian Network optimal model’s ROC and AUC values are obtained to justify our approach.
594

研究Ferguson-Dirichlet過程和條件分配族相容性之新工具 / New tools for studying the Ferguson-Dirichlet process and compatibility of a family of conditionals

郭錕霖, Kuo,Kun Lin Unknown Date (has links)
單變量c-特徵函數已被證明可處理一些難以使用傳統特徵函數解決的問題, 在本文中,我們首先提出其反演公式,透過此反演公式,我們獲得(1)Dirichlet隨機向量之線性組合的機率密度函數;(2)以一些有趣測度為參數之Ferguson-Dirichlet過程其隨機動差的機率密度函數;(3)Ferguson-Dirichlet過程之隨機泛函的Lebesgue積分表示式。 本文給予對稱分配之多變量c-特徵函數的新性質,透過這些性質,我們證明在任何$n$維球面上之Ferguson-Dirichlet過程其隨機均值是一對稱分配,並且我們亦獲得其確切的機率密度函數,此外,我們將這些結果推廣至任何n維橢球面上。 我們亦探討條件分配相容性的問題,這個問題在機率理論與貝式計算上有其重要性,我們提出其充要條件。當給定相容的條件分配時,我們不但解決相關聯合分配唯一性的問題,而且也提供方法去獲得所有可能的相關聯合分配,我們亦給予檢驗相容性、唯一性及建構機率密度函數的演算法。 透過相容性的相關理論,我們提出完整且清楚地統合性貝氏反演公式理論,並建構可應用於一般測度空間的廣義貝氏反演公式。此外,我們使用廣義貝氏反演公式提供一個配適機率密度函數的演算法,此演算法沒有疊代演算法(如Gibbs取樣法)的收斂問題。 / The univariate c-characteristic function has been shown to be important in cases that are hard to manage using the traditional characteristic function. In this thesis, we first give its inversion formulas. We then use them to obtain (1) the probability density functions (PDFs) of a linear combination of the components of a Dirichlet random vector; (2) the PDFs of random functionals of a Ferguson-Dirichlet process with some interesting parameter measures; (3) a Lebesgue integral expression of any random functional of the Ferguson-Dirichlet process. New properties of the multivariate c-characteristic function with a spherical distribution are given in this thesis. With them, we show that the random mean of a Ferguson-Dirichlet process over a spherical surface in n dimensions has a spherical distribution on the n-dimensional ball. Moreover, we derive its exact PDF. Furthermore, we generalize this result to any ellipsoidal surface in n-space. We also study the issue of compatibility for specified conditional distributions. This issue is important in probability theory and Bayesian computations. Several necessary and sufficient conditions for the compatibility are provided. We also address the problem of uniqueness of the associated joint distribution when the given conditionals are compatible. In addition, we provide a method to obtain all possible joint distributions that have the given compatible conditionals. Algorithms for checking the compatibility and the uniqueness, and for constructing all associated densities are also given. Through the related compatibility theorems, we provide a fully and cleanly unified theory of inverse Bayes formula (IBF) and construct a generalized IBF (GIBF) that is applicable in the more general measurable space. In addition, using the GIBF, we provide a marginal density fitting algorithm, which avoids the problems of convergence in iterative algorithm such as the Gibbs sampler.
595

應用共變異矩陣描述子及半監督式學習於行人偵測 / Semi-supervised learning for pedestrian detection with covariance matrix feature

黃靈威, Huang, Ling Wei Unknown Date (has links)
行人偵測為物件偵測領域中一個極具挑戰性的議題。其主要問題在於人體姿勢以及衣著服飾的多變性,加之以光源照射狀況迥異,大幅增加了辨識的困難度。吾人在本論文中提出利用共變異矩陣描述子及結合單純貝氏分類器與級聯支持向量機的線上學習辨識器,以增進行人辨識之正確率與重現率。 實驗結果顯示,本論文所提出之線上學習策略在某些辨識狀況較差之資料集中能有效提升正確率與重現率達百分之十四。此外,即便於相同之初始訓練條件下,在USC Pedestrian Detection Test Set、 INRIA Person dataset 及 Penn-Fudan Database for Pedestrian Detection and Segmentation三個資料集中,本研究之正確率與重現率亦較HOG搭配AdaBoost之行人辨識方式為優。 / Pedestrian detection is an important yet challenging problem in object classification due to flexible body pose, loose clothing and ever-changing illumination. In this thesis, we employ covariance feature and propose an on-line learning classifier which combines naïve Bayes classifier and cascade support vector machine (SVM) to improve the precision and recall rate of pedestrian detection in a still image. Experimental results show that our on-line learning strategy can improve precision and recall rate about 14% in some difficult situations. Furthermore, even under the same initial training condition, our method outperforms HOG + AdaBoost in USC Pedestrian Detection Test Set, INRIA Person dataset and Penn-Fudan Database for Pedestrian Detection and Segmentation.
596

Textual data mining applications for industrial knowledge management solutions

Ur-Rahman, Nadeem January 2010 (has links)
In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.
597

Avaliação da distorção harmônica total de tensão no ponto de acoplamento comum industrial usando o processo KDD baseado em medição / Evaluation of total voltage harmonic distortion at the industrial joint coupling point using the KDD-based measurement process

OLIVEIRA, Edson Farias de 27 March 2018 (has links)
Submitted by Kelren Mota (kelrenlima@ufpa.br) on 2018-06-13T17:38:37Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoDistorcaoHarmonica.pdf: 4309009 bytes, checksum: 017d26b4d8e0ce6653f66d67f13f4cb6 (MD5) / Approved for entry into archive by Kelren Mota (kelrenlima@ufpa.br) on 2018-06-13T17:39:00Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoDistorcaoHarmonica.pdf: 4309009 bytes, checksum: 017d26b4d8e0ce6653f66d67f13f4cb6 (MD5) / Made available in DSpace on 2018-06-13T17:39:00Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Tese_AvaliacaoDistorcaoHarmonica.pdf: 4309009 bytes, checksum: 017d26b4d8e0ce6653f66d67f13f4cb6 (MD5) Previous issue date: 2018-03-27 / In the last decades, the transformation industry has provided the introduction of increasingly faster and more energy efficient products for residential, commercial and industrial use, however these loads due to their non-linearity have contributed significantly to the increase of distortion levels harmonic of voltage as a result of the current according to the Power Quality indicators of the Brazilian electricity distribution system. The constant increase in the levels of distortions, especially at the point of common coupling, has generated in the current day a lot of concern in the concessionaires and in the consumers of electric power, due to the problems that cause like losses of the quality of electric power in the supply and in the installations of the consumers and this has provided several studies on the subject. In order to contribute to the subject, this thesis proposes a procedure based on the Knowledge Discovery in Database - KDD process to identify the impact loads of harmonic distortions of voltage at the common coupling point. The proposed methodology uses computational intelligence and data mining techniques to analyze the data collected by energy quality meters installed in the main loads and the common coupling point of the consumer and consequently establish the correlation between the harmonic currents of the nonlinear loads with the harmonic distortion at the common coupling point. The proposed process consists in analyzing the loads and the layout of the location where the methodology will be applied, in the choice and installation of the QEE meters and in the application of the complete KDD process, including the procedures for collection, selection, cleaning, integration, transformation and reduction, mining, interpretation, and evaluation of data. In order to contribute, the data mining techniques of Decision Tree and Naïve Bayes were applied and several algorithms were tested for the algorithm with the most significant results for this type of analysis as presented in the results. The results obtained evidenced that the KDD process has applicability in the analysis of the Voltage Total Harmonic Distortion at the Point of Common Coupling and leaves as contribution the complete description of each step of this process, and for this it was compared with different indices of data balancing, training and test and different scenarios in different shifts of analysis and presented good performance allowing their application in other types of consumers and energy distribution companies. It also shows, in the chosen application and using different scenarios, that the most impacting load was the seventh current harmonic of the air conditioning units for the collected data set. / Nas últimas décadas, a indústria de transformação, tem proporcionado a introdução de produtos cada vez mais rápidos e energeticamente mais eficientes para utilização residencial, comercial e industrial, no entanto essas cargas devido à sua não linearidade têm contribuído significativamente para o aumento dos níveis de distorção harmônica de tensão em decorrência da corrente conforme indicadores de Qualidade de Energia Elétrica do sistema brasileiro de distribuição de energia elétrico. O constante aumento dos níveis das distorções, principalmente no ponto de acoplamento comum, tem gerado nos dias atuais muita preocupação nas concessionárias e nos consumidores de energia elétrica, devido aos problemas que causam como perdas da qualidade de energia elétrica no fornecimento e nas instalações dos consumidores e isso têm proporcionado diversos estudos sobre o assunto. Com o intuito de contribuir com o assunto, a presente tese propõe um procedimento com base no processo Knowledge Discovery in Database - KDD para identificação das cargas impactantes das distorções harmônicas de tensão no ponto de acoplamento comum. A metodologia proposta utiliza técnicas de Inteligência computacional e mineração de dados para análise dos dados coletados por medidores de qualidade de energia instalados nas cargas principais e no ponto de acoplamento comum do consumidor e consequentemente estabelecer a correlação entre as correntes harmônicas das cargas não lineares com a distorção harmônica no ponto de acoplamento comum. O processo proposto consiste na análise das cargas e do layout do local onde a metodologia será aplicada, na escolha e na instalação dos medidores de QEE e na aplicação do processo KDD completo, incluindo os procedimentos de coleta, seleção, limpeza, integração, transformação e redução, mineração, interpretação, e avaliação dos dados. Com o propósito de contribuição foram aplicadas as técnicas de mineração de dados Árvore de Decisão e Naïve Bayes e foram testados diversos algoritmos em busca do algoritmo com resultados mais significativos para esse tipo de análise conforme apresentado nos resultados. Os resultados obtidos evidenciaram que o processo KDD possui aplicabilidade na análise da Distorção Harmônica Total de Tensão no Ponto de Acoplamento Comum e deixa como contribuição a descrição completa de cada etapa desse processo, e para isso foram comparados com diferentes índices de balanceamento de dados, treinamento e teste e diferentes cenários em diferentes turnos de análise e apresentaram bom desempenho possibilitando sua aplicação em outros tipos de consumidores e empresas de distribuição de energia. Evidencia também, na aplicação escolhida e utilizando diferentes cenários, que a carga mais impactante foi a sétima harmônica de corrente das centrais de ar condicionado para o conjunto de dados coletados.
598

Identifying exoplanets and unmasking false positives with NGTS

Günther, Maximilian Norbert January 2018 (has links)
In my PhD, I advanced the scientific exploration of the Next Generation Transit Survey (NGTS), a ground-based wide-field survey operating at ESO’s Paranal Observatory in Chile since 2016. My original contribution to knowledge is the development of novel methods to 1) estimate NGTS’ yield of planets and false positives; 2) disentangle planets from false positives; and 3) accurately characterise planets. If an exoplanet passes (transits) in front of its host star, we can measure a periodic decrease in brightness. The study of transiting exoplanets gives insight into their size, formation, bulk composition and atmospheric properties. Transit surveys are limited by their ability to identify false positives, which can mimic planets and out-number them by a hundredfold. First, I designed a novel yield simulator to optimise NGTS’ observing strategy and identification of false positives (published in Günther et al., 2017a). This showed that NGTS’ prime targets, Neptune- and Earth-sized signals, are frequently mimicked by blended eclipsing binaries, allowing me to quantify and prepare strategies for candidate vetting and follow-up. Second, I developed a centroiding algorithm for NGTS, achieving a precision of 0.25 milli-pixel in a CCD image (published in Günther et al., 2017b). With this, one can measure a shift of light during an eclipse, readily identifying unresolved blended objects. Third, I innovated a joint Bayesian fitting framework for photometry, centroids, and radial velocity cross-correlation function profiles. This allows to disentangle which object (target or blend) is causing the signal and to characterise the system. My method has already unmasked numerous false positives. Most importantly, I confirmed that a signal which was almost erroneously rejected, is in fact an exoplanet (published in Günther et al., 2018). The presented achievements minimise the contamination with blended false positives in NGTS candidates by 80%, and show a new approach for unmasking hidden exoplanets. This research enhanced the success of NGTS, and can provide guidance for future missions.
599

Monte Carlo Simulation of Boundary Crossing Probabilities with Applications to Finance and Statistics

Gür, Sercan 04 1900 (has links) (PDF)
This dissertation is cumulative and encompasses three self-contained research articles. These essays share one common theme: the probability that a given stochastic process crosses a certain boundary function, namely the boundary crossing probability, and the related financial and statistical applications. In the first paper, we propose a new Monte Carlo method to price a type of barrier option called the Parisian option by simulating the first and last hitting time of the barrier. This research work aims at filling the gap in the literature on pricing of Parisian options with general curved boundaries while providing accurate results compared to the other Monte Carlo techniques available in the literature. Some numerical examples are presented for illustration. The second paper proposes a Monte Carlo method for analyzing the sensitivity of boundary crossing probabilities of the Brownian motion to small changes of the boundary. Only for few boundaries the sensitivities can be computed in closed form. We propose an efficient Monte Carlo procedure for general boundaries and provide upper bounds for the bias and the simulation error. The third paper focuses on the inverse first-passage-times. The inverse first-passage-time problem deals with finding the boundary given the distribution of hitting times. Instead of a known distribution, we are given a sample of first hitting times and we propose and analyze estimators of the boundary. Firstly, we consider the empirical estimator and prove that it is strongly consistent and derive (an upper bound of) its asymptotic convergence rate. Secondly, we provide a Bayes estimator based on an approximate likelihood function. Monte Carlo experiments suggest that the empirical estimator is simple, computationally manageable and outperforms the alternative procedure considered in this paper.
600

Apport des Systèmes Multi-Agent et de la logique floue pour l'assistance au tuteur dans une communauté d'apprentissage en ligne / Contribution of Multi-Agent Systems and Fuzzy logic to support tutors in Learning Communities

Chaabi, Youness 11 July 2016 (has links)
La place importante du tutorat dans la réussite d'un dispositif de formation en ligne a ouvert un nouvel axe de recherche dans le domaine des EIAH (Environnements Informatiques pour l'Apprentissage Humain). Nos travaux se situent plus particulièrement dans le champ de recherches des ACAO. Dans un contexte collaboratif, le tutorat et les outils « d'awareness » constituent des solutions admises pour faire face à l'isolement qui très souvent, mène à l'abandon de l'apprenant. Ainsi, du fait des difficultés rencontrées par le tuteur pour assurer un encadrement et un suivi appropriés à partir des traces de communication (en quantités conséquentes) laissées par les apprenants, nous proposons une approche multi-agents pour analyser les conversations textuelles asynchrones entre apprenants. Ces interactions sont révélatrices de comportements sociaux-animateur, indépendant, etc... qu'il nous paraît important de pouvoir repérer lors d'une pédagogie de projet pour permettre aux apprenants de situer leurs travaux par rapport aux autres apprenants et situer leur groupe par rapport aux autres groupes d'une part, et d'autre part permettre au tuteur d'accompagner les apprenants dans leur processus d'apprentissage, repérer et soutenir les individus en difficulté pour leur éviter l'abandon. Ces indicateurs seront déduits à partir des grands volumes d'échanges textuels entre apprenants.L'approche a été ensuite testée sur une situation réelle, qui a montré une parfaite concordance entre les résultatsobservés par des tuteurs humains et ceux déterminés automatiquement par notre système. / The growing importance of online training has put emphasis on the role of remote tutoring. A whole new area of research, dedicated to environment for human learning (EHL), is emerging. We are concerned with this field. More specifically, we will focus on the monitoring of learners.The instrumentation and observation of learners activities by exploiting interaction traces in the EHL and the development of indicators can help tutors to monitor activities of learners and support them in their collaborative learning process. Indeed, in a learning situation, the teacher needs to observe the behavior of learners in order to build an idea about their involvement, preferences and learning styles so that he can adapt the proposed activities. As part of the automatic analysis of collaborative learner¿s activities, we describe a multi agent approach for supporting learning activities in a Virtual Learning Environment context. In order to assist teachers who monitor learning processes, viewed as a specific type of collaboration, the proposed system estimates a behavioral (sociological) profile for each student. This estimation is based on automatic analysis of students textual asynchronous conversations. The determined profiles are proposed to the teacher and may provide assistance toteacher during tutoring tasks. The system was experimented with students of the master "software quality" of the Ibn Tofail University. The results obtained show that the proposed approach is effective and gives satisfactory results.

Page generated in 0.0365 seconds