Global ETD Search

461	The research on chinese text multi-label classification / Avancée en classification multi-labels de textes en langue chinoise / 中文文本多标签分类研究 Wei, Zhihua 07 May 2010 (has links) Text Classification (TC) which is an important field in information technology has many valuable applications. When facing the sea of information resources, the objects of TC are more complicated and diversity. The researches in pursuit of effective and practical TC technology are fairly challenging. More and more researchers regard that multi-label TC is more suited for many applications. This thesis analyses the difficulties and problems in multi-label TC and Chinese text representation based on a mass of algorithms for single-label TC and multi-label TC. Aiming at high dimensionality in feature space, sparse distribution in text representation and poor performance of multi-label classifier, this thesis will bring forward corresponding algorithms from different angles.Focusing on the problem of dimensionality “disaster” when Chinese texts are represented by using n-grams, two-step feature selection algorithm is constructed. The method combines filtering rare features within class and selecting discriminative features across classes. Moreover, the proper value of “n”, the strategy of feature weight and the correlation among features are discussed based on variety of experiments. Some useful conclusions are contributed to the research of n-gram representation in Chinese texts.In a view of the disadvantage in Latent Dirichlet Allocation (LDA) model, that is, arbitrarily revising the variable in smooth process, a new strategy for smoothing based on Tolerance Rough Set (TRS) is put forward. It constructs tolerant class in global vocabulary database firstly and then assigns value for out-of-vocabulary (oov) word in each class according to tolerant class.In order to improve performance of multi-label classifier and degrade computing complexity, a new TC method based on LDA model is applied for Chinese text representation. It extracts topics statistically from texts and then texts are represented by using the topic vector. It shows competitive performance both in English and in Chinese corpus.To enhance the performance of classifiers in multi-label TC, a compound classification framework is raised. It partitions the text space by computing the upper approximation and lower approximation. This algorithm decomposes a multi-label TC problem into several single-label TCs and several multi-label TCs which have less labels than original problem. That is, an unknown text should be classified by single-label classifier when it is partitioned into lower approximation space of some class. Otherwise, it should be classified by corresponding multi-label classifier.An application system TJ-MLWC (Tongji Multi-label Web Classifier) was designed. It could call the result from Search Engines directly and classify these results real-time using improved Naïve Bayes classifier. This makes the browse process more conveniently for users. Users could locate the texts interested immediately according to the class information given by TJ-MLWC. / La thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels. / 文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化，分类目标也逐渐多样化，研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务，对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上，针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题，从不同的角度尝试运用粗糙集理论加以解决，提出了相应的算法，主要包括：针对n-gram作为中文文本特征时带来的维数灾难问题，提出了两步特征选择的方法，即去除类内稀有特征和类间特征选择相结合的方法，并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验，得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低，开销大等问题，提出了基于LDA模型的多标签文本分类算法，利用LDA模型提取的主题作为文本特征，构建高效的分类器。在PT3多标签分类转换方法下，该分类算法在中英文数据集上都表现出很好的效果，与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点，提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类，再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能，在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题，提出了一种基于可变精度粗糙集的复合多标签文本分类框架，该框架通过可变精度粗糙集方法划分文本特征空间，进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即，当一篇未知文本被划分到某一类文本的下近似区域时，可以直接用简单的单标签文本分类器判断其类别；当未知文本被划分在边界域时，则采用相应区域的多标签分类器进行分类。实验表明，这种分类框架下，分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统（MLWC），该系统能够直接调用搜索引擎返回的搜索结果，并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类，使用户可以快速地定位搜索结果中感兴趣的文本。 La Classification de texte N-grams Codage de la texte chiniose La classification multi-labels Latent Dirichlet Model L’ensembles approximatifs Assouplissement Le corpus de textes chinois multi-labels Chinese text classification Text representation Multi-label classification Rough Set Latent Dirichlet Allocation (LDA) Classification method Smoothing model Chinese text multi-label corpus 中文文本分类文本表示多标签分类 N-gram 粗糙集隐含狄利克雷分配分类器设计同济多标签网页分类系统中文文本多标签语料库
462	Mining of Textual Data from the Web for Speech Recognition / Mining of Textual Data from the Web for Speech Recognition Kubalík, Jakub January 2010 (has links) Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
463	Three Essays on the Consequences of Transparency Witter, Tobias 01 September 2023 (has links) This dissertation comprises three essays which empirically investigate consequences of transparency. The first essay investigates how transparency, demanded by the government as a customer of firms, affects firms’ financial reporting. It provides evidence that, relative to firms without government customers, government suppliers have a higher quality of financial reporting. Findings indicate that government procurement requirements, which are linked to internal control over financial reporting, can positively affect the external information environment of firms. The second essay examines how managers react to a stricter transparency mandate in pension accounting, if this mandate increases the expected volatility of balance sheet items. Managers of affected firms change decisions on pension plans which mitigate volatility and in addition, affected firms exhibit less volatile accruals but more volatile discretionary real actions suggesting managers reduce volatility in balance sheets. Findings imply that a transparency mandate in pension accounting may have (unintended) consequences for managerial decision-making if the mandate reveals more economic volatility on balance sheets. The third essay studies how (data-transparently) researchers visualize their quantitative findings and how this affects the impact of academic work. It finds that, compared to articles in field-specific economics journals, articles in economics journals with a broader audience use more figures than tables and that articles visualizing (data-transparently) with figures receive more citations. An online experiment, which manipulates how a fictive study visualizes scientific results, finds that participants assess the internal validity of research as being higher and are more willing to cite research if it visualizes results data-transparently. The findings imply that (data-transparent) visualization can enhance the impact of academic work. / Die Dissertation besteht aus drei Aufsätzen, die die Auswirkungen von Transparenz untersuchen. Im ersten Aufsatz wird analysiert, wie sich die von einer Regierung im Rahmen der öffentlichen Auftragsvergabe geforderte Unternehmenstransparenz auf die Finanzberichterstattung von Unternehmen auswirkt. Lieferanten der Regierung weisen eine höhere Qualität der Finanzberichterstattung auf als Vergleichsunternehmen. Der zweite Aufsatz untersucht, wie Manager auf strengere Transparenzanforderungen in der Pensionsbilanzierung reagieren, wenn diese die Bilanzvolatilität erhöhen. Die Manager nehmen Bilanzanpassungen vor, die die Volatilität reduzieren, was auf eine beabsichtigte Bilanzglättung hindeutet. Der dritte Aufsatz untersucht den Zusammenhang zwischen der Visualisierung von quantitativen Forschungsergebnissen in wirtschaftswissenschaftlichen Zeitschriften und dem Einfluss akademischer Forschung. Economics-Journals verwenden mehr Abbildungen als Business-Journals, was Zitationen zu fördern scheint. Experimentelle Evidenz zeigt weiterhin, dass datentransparente Visualisierungen den Einfluss akademischer Forschung positiv beeinflussen können, dass dies aber auch stark disziplinabhängig ist. Transparenz Unternehmenstransparenz Szientometrie Rechnungslegung Datenvisualisierung Finanzberichterstattung Öffentliche Auftragsvergabe Finanzbuchhaltung Interne Kontrollsysteme Lieferantenbeziehungen Bilanzvolatilität Bilanzglättung Reale Effekte leistungsorientierte Pensionspläne IAS 19R Forschungsauswirkungen Wissenschaftliche Visualisierungen Statistische Grafiken Graphische Wahrnehmung transparency corporate transparency financial accounting scientometrics data visualization government procurement financial reporting buyer-supplier relationship government suppliers balance sheet volatility balance sheet smoothing real effects defined benefit pension plans IAS 19R research impact scientific visualizations statistical graphs graphical perception 657 Rechnungslegung QP 820 ddc:657 ddc:000
464	Evaluation of Target Tracking Using Multiple Sensors and Non-Causal Algorithms Vestin, Albin, Strandberg, Gustav January 2019 (has links) Today, the main research field for the automotive industry is to find solutions for active safety. In order to perceive the surrounding environment, tracking nearby traffic objects plays an important role. Validation of the tracking performance is often done in staged traffic scenarios, where additional sensors, mounted on the vehicles, are used to obtain their true positions and velocities. The difficulty of evaluating the tracking performance complicates its development. An alternative approach studied in this thesis, is to record sequences and use non-causal algorithms, such as smoothing, instead of filtering to estimate the true target states. With this method, validation data for online, causal, target tracking algorithms can be obtained for all traffic scenarios without the need of extra sensors. We investigate how non-causal algorithms affects the target tracking performance using multiple sensors and dynamic models of different complexity. This is done to evaluate real-time methods against estimates obtained from non-causal filtering. Two different measurement units, a monocular camera and a LIDAR sensor, and two dynamic models are evaluated and compared using both causal and non-causal methods. The system is tested in two single object scenarios where ground truth is available and in three multi object scenarios without ground truth. Results from the two single object scenarios shows that tracking using only a monocular camera performs poorly since it is unable to measure the distance to objects. Here, a complementary LIDAR sensor improves the tracking performance significantly. The dynamic models are shown to have a small impact on the tracking performance, while the non-causal application gives a distinct improvement when tracking objects at large distances. Since the sequence can be reversed, the non-causal estimates are propagated from more certain states when the target is closer to the ego vehicle. For multiple object tracking, we find that correct associations between measurements and tracks are crucial for improving the tracking performance with non-causal algorithms. evaluation target tracking multiple sensors non-causal smoother smoothing tracking vehicle tracking camera lidar estimate estimation prediction vehicle dynamics sensor fusion real-time tracking extended kalman filter filter validation validation position estimation velocity estimation dynamic model model complexity multi object tracking multiple object tracking single object tracking data association tracking fundamentals iterated kalman filter track management gnn global nearest neighbour mahalanobis mahalanobis distance performance evaluation differential gps dgps roi ego several sensors sensors rmse root mean square error invertible motion anti-causal motion anti-causal tracking constant velocity gnn imu tfs two filter smoother ekf rts radar inertial measurement unit nonlinear nonlinear systems mono camera monocular camera noise model tracking performance fixed interval smoothing m/n logic centralized fusion non-causal object tracker car tracking car dynamics automotive active safety object tracking automotive industry thesis master reverse dynamics reverse tracking reverse sequence sequence tracking data propagation ground truth estimating ground truth additional sensors mounted sensors true estimates environment comparison algorithm independent targets overlapping measurements occluded track switch improve lower uncertainty more certain state process noise covariance sampling image sprt adas cnn cv pdf track target ego tracker tentative track observatiom online tracking offline tracking online offline recorded sequences robust self driving self-driving car traffic trajectory true state scenario scenarios future accurate output advanced driver assistance systems non-linear complex noise pedestrian truck bus maneuvering vehicles processed measurement frame state correction probability density function tuning likelihood transition measurement motion model recursion gaussian approximation distribution linear jacobian multiplicative noise ratio ad hoc ad hoc state space approach backward auction euclidean distance statistical threshold gating association margin normalize covariance matrix fusion confirmed rejected tentative history absolute error modular ego motion parameters variables logg hardware specification fused causal factorization independent uncorrelated transform moving rotation translation oncoming overtaking Control Engineering Reglerteknik
465	GIS-based Episode Reconstruction Using GPS Data for Activity Analysis and Route Choice Modeling / GIS-based Episode Reconstruction Using GPS Data Dalumpines, Ron 26 September 2014 (has links) Most transportation problems arise from individual travel decisions. In response, transportation researchers had been studying individual travel behavior – a growing trend that requires activity data at individual level. Global positioning systems (GPS) and geographical information systems (GIS) have been used to capture and process individual activity data, from determining activity locations to mapping routes to these locations. Potential applications of GPS data seem limitless but our tools and methods to make these data usable lags behind. In response to this need, this dissertation presents a GIS-based toolkit to automatically extract activity episodes from GPS data and derive information related to these episodes from additional data (e.g., road network, land use). The major emphasis of this dissertation is the development of a toolkit for extracting information associated with movements of individuals from GPS data. To be effective, the toolkit has been developed around three design principles: transferability, modularity, and scalability. Two substantive chapters focus on selected components of the toolkit (map-matching, mode detection); another for the entire toolkit. Final substantive chapter demonstrates the toolkit’s potential by comparing route choice models of work and shop trips using inputs generated by the toolkit. There are several tools and methods that capitalize on GPS data, developed within different problem domains. This dissertation contributes to that repository of tools and methods by presenting a suite of tools that can extract all possible information that can be derived from GPS data. Unlike existing tools cited in the transportation literature, the toolkit has been designed to be complete (covers preprocessing up to extracting route attributes), and can work with GPS data alone or in combination with additional data. Moreover, this dissertation contributes to our understanding of route choice decisions for work and shop trips by looking into the combined effects of route attributes and individual characteristics. / Dissertation / Doctor of Philosophy (PhD) GPS time use diary episode extraction multinomial logit travel behavior mode detection episode reconstruction GIS map-matching route choice path size logit potential path area scale estimation Python ArcGIS activity analysis trip reconstruction smartphone global positioning system geographic information system toolkit work trip shop trip potential activity location land use activity episode travel episode stop episode transferability scalability modularity scripting big data travel survey respondent burden preprocessing multipath error tracking purpose detection segmentation data filter data smoothing fuzzy logic neural network decision tree rule-based algorithm trajectory point road network network dataset gateway shortest path horizontal dilution of precision HDOP commonality factor mode transfer point Halifax Nova Scotia Space-Time Activity Research variance inflation factor shapefile comma-separated values traveling salesman problem data mining branch-and-bound algorithm pedestrian network alternative route observed route route efficiency route attributes distance time heading bearing duration acceleration latitude longitude coordinate overlay analysis intersect shopping module data logger walk likelihood ratio test classification table path generation kappa statistic degrees data collection transportation research automate framework ArcToolbox spatio-temporal navigation positioning trace path spatial data topology horizontal accuracy SPSS Stata spatial resolution temporal resolution household survey trip reporting proximity analysis DMTI Desktop Mapping Technologies Inc. road intersection route overlap left turn right turn location analysis time geography spatial statistics buffer analysis cycling bus transit public transportation GEOIDE AGILE data need raw data urban canyon endpoint data cleaning elevation satellite outliers automatic processing nearest node classifier classification method short trip multi-point

Page generated in 0.0629 seconds