Global ETD Search

151	Building the Dresden Web Table Corpus: A Classification Approach Lehner, Wolfgang, Eberius, Julian, Braunschweig, Katrin, Hentsch, Markus, Thiele, Maik, Ahmadov, Ahmad 12 January 2023 (has links) In recent years, researchers have recognized relational tables on the Web as an important source of information. To assist this research we developed the Dresden Web Tables Corpus (DWTC), a collection of about 125 million data tables extracted from the Common Crawl (CC) which contains 3.6 billion web pages and is 266TB in size. As the vast majority of HTML tables are used for layout purposes and only a small share contains genuine tables with different surface forms, accurate table detection is essential for building a large-scale Web table corpus. Furthermore, correctly recognizing the table structure (e.g. horizontal listings, matrices) is important in order to understand the role of each table cell, distinguishing between label and data cells. In this paper, we present an extensive table layout classification that enables us to identify the main layout categories of Web tables with very high precision. We therefore identify and develop a plethora of table features, different feature selection techniques and several classification algorithms. We evaluate the effectiveness of the selected features and compare the performance of various state-of-the-art classification algorithms. Finally, the winning approach is employed to classify millions of tables resulting in the Dresden Web Table Corpus (DWTC). info:eu-repo/classification/ddc/004 ddc:004
152	Towards a Hybrid Imputation Approach Using Web Tables Lehner, Wolfgang, Ahmadov, Ahmad, Thiele, Maik, Eberius, Julian, Wrembel, Robert 12 January 2023 (has links) Data completeness is one of the most important data quality dimensions and an essential premise in data analytics. With new emerging Big Data trends such as the data lake concept, which provides a low cost data preparation repository instead of moving curated data into a data warehouse, the problem of data completeness is additionally reinforced. While traditionally the process of filling in missing values is addressed by the data imputation community using statistical techniques, we complement these approaches by using external data sources from the data lake or even the Web to lookup missing values. In this paper we propose a novel hybrid data imputation strategy that, takes into account the characteristics of an incomplete dataset and based on that chooses the best imputation approach, i.e. either a statistical approach such as regression analysis or a Web-based lookup or a combination of both. We formalize and implement both imputation approaches, including a Web table retrieval and matching system and evaluate them extensively using a corpus with 125M Web tables. We show that applying statistical techniques in conjunction with external data sources will lead to a imputation system which is robust, accurate, and has high coverage at the same time. info:eu-repo/classification/ddc/004 ddc:004
153	Development of a Machine Learning Algorithm to Identify Error Causes of Automated Failed Test Results Pallathadka Shivarama, Anupama 15 March 2024 (has links) The automotive industry is continuously innovating and adapting new technologies. Along with that, the companies work towards maintaining the quality of a hardware product and meeting the customer demands. Before delivering the product to the customer, it is essential to test and approve it for the safe use. The concept remains the same when it comes to a software. Adapting modern technologies will further improve the efficiency of testing a software. The thesis aims to build a machine learning algorithm for the implementation during the software testing. In general, the evaluation of a generated test report after the testing consumes more time. The built algorithm should be able to reduce the time spent and the manual effort during the evaluation. Basically, the machine learning algorithms will analyze and learn the data available in the old test reports. Based on the learnt data pattern, it will suggest the possible root causes for the failed test cases in the future. The thesis report has the literature survey that helped in understanding the machine learning concepts in different industries for similar problems. The tasks involved while building the model are data loading, data pre-processing, selecting the best conditions for each algorithm and comparison of the performance among them. It also suggest the possible future work towards improving the performance of the models. The entire work is implemented in Jupyter notebook using pandas and scikit-learn libraries. info:eu-repo/classification/ddc/004 ddc:004 Softwaretest Maschinelles Lernen
154	Monocular Depth Estimation with Edge-Based Constraints using Active Learning Optimization Saleh, Shadi 04 April 2024 (has links) Depth sensing is pivotal in robotics; however, monocular depth estimation encounters significant challenges. Existing algorithms relying on large-scale labeled data and large Deep Convolutional Neural Networks (DCNNs) hinder real-world applications. We propose two lightweight architectures that achieve commendable accuracy rates of 91.2% and 90.1%, simultaneously reducing the Root Mean Square Error (RMSE) of depth to 4.815 and 5.036. Our lightweight depth model operates at 29-44 FPS on the Jetson Nano GPU, showcasing efficient performance with minimal power consumption. Moreover, we introduce a mask network designed to visualize and analyze the compact depth network, aiding in discerning informative samples for the active learning approach. This contributes to increased model accuracy and enhanced generalization capabilities. Furthermore, our methodology encompasses the introduction of an active learning framework strategically designed to enhance model performance and accuracy by efficiently utilizing limited labeled training data. This novel framework outperforms previous studies by achieving commendable results with only 18.3% utilization of the KITTI Odometry dataset. This performance reflects a skillful balance between computational efficiency and accuracy, tailored for low-cost devices while reducing data training requirements.:1. Introduction 2. Literature Review 3. AI Technologies for Edge Computing 4. Monocular Depth Estimation Methodology 5. Implementation 6. Result and Evaluation 7. Conclusion and Future Scope Appendix info:eu-repo/classification/ddc/000 ddc:000
155	Prediction of designer-recombinases for DNA editing with generative deep learning Schmitt, Lukas Theo, Paszkowski-Rogacz, Maciej, Jug, Florian, Buchholz, Frank 04 June 2024 (has links) Site-specific tyrosine-type recombinases are effective tools for genome engineering, with the first engineered variants having demonstrated therapeutic potential. So far, adaptation to new DNA target site selectivity of designerrecombinases has been achieved mostly through iterative cycles of directed molecular evolution. While effective, directed molecular evolution methods are laborious and time consuming. Here we present RecGen (Recombinase Generator), an algorithm for the intelligent generation of designerrecombinases. We gather the sequence information of over one million Crelike recombinase sequences evolved for 89 different target sites with whichwe train Conditional Variational Autoencoders for recombinase generation. Experimental validation demonstrates that the algorithm can predict recombinase sequences with activity on novel target-sites, indicating that RecGen is useful to accelerate the development of future designer-recombinases. info:eu-repo/classification/ddc/500 ddc:500
156	Segmentation and Tracking of Cells and Nuclei Using Deep Learning Hirsch, Peter Johannes 27 September 2023 (has links) Die Analyse von großen Datensätzen von Mikroskopiebilddaten, insbesondere Segmentierung und Tracking, ist ein sehr wichtiger Aspekt vieler biologischer Studien. Für die leistungsfähige und verlässliche Nutzung ist der derzeitige Stand der Wissenschaft dennoch noch nicht ausreichend. Die vorhandenen Methoden sind oft schwer zu benutzen für ungeübte Nutzer, die Leistung auf anderen Datensätzen ist häufig verbesserungswürdig und sehr große Mengen an Trainingsdaten werden benötigt. Ich ging dieses Problem aus verschiedenen Richtungen an: (i) Ich präsentiere klare Richtlinien wie Artefakte beim Arbeiten mit sehr großen Bilddaten verhindert werden können. (ii) Ich präsentiere eine Erweiterung für eine Reihe von grundlegenden Methoden zur Instanzsegmentierung von Zellkernen. Durch Verwendung einer unterstützenden Hilfsaufgabe ermöglicht die Erweiterung auf einfache und unkomplizierte Art und Weise Leistung auf dem aktuellen Stand der Wissenschaft. Dabei zeige ich zudem, dass schwache Label ausreichend sind, um eine effiziente Objekterkennung auf 3d Zellkerndaten zu ermöglichen. (iii) Ich stelle eine neue Methode zur Instanzsegmentierung vor, die auf eine große Auswahl von Objekten anwendbar ist, von einfachen Formen bis hin zu Überlagerungen und komplexen Baumstrukturen, die das gesamte Bild umfassen. (iv) Auf den vorherigen Arbeiten aufbauend präsentiere ich eine neue Trackingmethode, die auch mit sehr großen Bilddaten zurecht kommt, aber nur schwache und dünnbesetzte Labels benötigt und trotzdem besser als die bisherigen besten Methoden funktioniert. Die Anpassungsfähigkeit an neue Datensätze wird durch eine automatisierte Parametersuche gewährleistet. (v) Für Nutzer, die das Tracking von Objekten in ihrer Arbeit verwenden möchten, präsentiere ich zusätzlich einen detaillierten Leitfaden, der es ihnen ermöglicht fundierte Entscheidungen zu treffen, welche Methode am besten zu ihrem Projekt passt. / Image analysis of large datasets of microscopy data, in particular segmentation and tracking, is an important aspect of many biological studies. Yet, the current state of research is still not adequate enough for copious and reliable everyday use. Existing methods are often hard to use, perform subpar on new datasets and require vast amounts of training data. I approached this problem from multiple angles: (i) I present clear guidelines on how to operate artifact-free on huge images. (ii) I present an extension for existing methods for instance segmentation of nuclei. By using an auxiliary task, it enables state-of-the-art performance in a simple and straightforward way. In the process I show that weak labels are sufficient for efficient object detection for 3d nuclei data. (iii) I present an innovative method for instance segmentation that performs extremely well on a wide range of objects, from simple shapes to complex image-spanning tree structures and objects with overlaps. (iv) Building upon the above, I present a novel tracking method that operates on huge images but only requires weak and sparse labels. Yet, it outperforms previous state-of-the-art methods. An automated weight search method enables adaptability to new datasets. (v) For practitioners seeking to employ cell tracking, I provide a comprehensive guideline on how to make an informed decision about what methods to use for their project. Maschinelles Lernen Bildsegmentierung Objektverfolgung Mikroskopie Zellen machine learning segmentation tracking microscopy cells 004 Informatik ST 640 ST 300 ddc:004
157	Aggregate-based Training Phase for ML-based Cardinality Estimation Woltmann, Lucas, Hartmann, Claudio, Lehner, Wolfgang, Habich, Dirk 22 April 2024 (has links) Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches may deliver more accurate cardinality estimations than traditional approaches. However, a lot of training queries have to be executed during the model training phase to learn a data-dependent ML model making it very time-consuming. Many of those training or example queries use the same base data, have the same query structure, and only differ in their selective predicates. To speed up the model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-based training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 90 with our aggregate-based training phase and thus outperform indexes. info:eu-repo/classification/ddc/004 ddc:004
158	Advancing Electron Ptychography for High-Resolution Imaging in Electron Microscopy Schloz, Marcel 13 May 2024 (has links) In dieser Arbeit werden Fortschritte in der Elektronenptychographie vorgestellt, die ihre Vielseitigkeit als Technik in der Elektronen-Phasenkontrastmikroskopie verbessern. Anstatt sich auf eine hochauflösende Elektronenoptik zu stützen, rekonstruiert die Ptychographie die Proben auf der Grundlage ihrer kohärenten Beugungssignale mit Hilfe von Berechnungsalgorithmen. Dieser Ansatz ermöglicht es, die Grenzen der konventionellen, auf Optik basierenden Elektronenmikroskopie zu überwinden und eine noch nie dagewesene sub-Angstrom Auflösung in den resultierenden Bildern zu erreichen. In dieser Arbeit werden zunächst die theoretischen, experimentellen und algorithmischen Grundlagen der Elektronenptychographie vorgestellt und in den Kontext der bestehenden rastergestützten Elektronenmikroskopietechniken gestellt. Darüber hinaus wird ein alternativer ptychographischer Phasengewinnungsalgorithmus entwickelt und seine Leistungsfähigkeit sowie die Qualität und räumliche Auflösung der Rekonstruktionen analysiert. Weiterhin befasst sich die Arbeit mit der Integration von Methoden des maschinellen Lernens in die Elektronenptychographie und schlägt einen spezifischen Ansatz zur Verbesserung der Rekonstruktionsqualität unter suboptimalen Versuchsbedingungen vor. Außerdem wird die Kombination von Ptychographie mit Defokusserienmessungen hervorgehoben, die eine verbesserte Tiefenauflösung bei ptychographischen Rekonstruktionen ermöglicht und uns somit dem ultimativen Ziel näher bringt, quantitative Rekonstruktionen von beliebig dicker Proben mit atomarer Auflösung in drei Dimensionen zu erzeugen. Der letzte Teil der Arbeit stellt einen Paradigmenwechsel bei den Scananforderungen für die Ptychographie vor und zeigt Anwendungen dieses neuen Ansatzes unter Bedingungen niedriger Dosis. / This thesis presents advancements in electron ptychography, enhancing its versatility as an electron phase-contrast microscopy technique. Rather than relying on high-resolution electron optics, ptychography reconstructs specimens based on their coherent diffraction signals using computational algorithms. This approach allows us to surpass the limitations of conventional optics-based electron microscopy, achieving an unprecedented sub-Angstrom resolution in the resulting images. The thesis initially introduces the theoretical, experimental, and algorithmic principles of electron ptychography, contextualizing them within the landscape of existing scanning-based electron microscopy techniques. Additionally, it develops an alternative ptychographic phase retrieval algorithm, analyzing its performance and also the quality and the spatial resolution of its reconstructions. Moreover, the thesis delves into the integration of machine learning methods into electron ptychography, proposing a specific approach to enhance reconstruction quality under suboptimal experimental conditions. Furthermore, it highlights the fusion of ptychography with defocus series measurements, offering improved depth resolution in ptychographic reconstructions, which therefore brings us closer to the ultimate goal of quantitative reconstructions of arbitrarily thick specimens at atomic resolution in three dimensions. The final part of the thesis introduces a paradigm shift in scanning requirements for ptychography and showcases applications of this novel approach under low-dose conditions. Elektronenmikroskopie Ptychography Maschinelles Lernen Computergestützte Physik Ptychography Machine Learning Computational Physics Electron Microscopy 621 Angewandte Physik ddc:621
159	Hand Gesture Recognition using mm-Wave RADAR Technology Zhao, Yanhua 24 July 2024 (has links) Die Interaktion zwischen Mensch und Computer ist zu einem Teil unseres täglichen Lebens geworden. Radarsensoren sind aufgrund ihrer geringen Größe, ihres niedrigen Stromverbrauchs und ihrer Erschwinglichkeit sehr vielversprechend. Im Vergleich zu anderen Sensoren wie Kameras und LIDAR kann RADAR in einer Vielzahl von Umgebungen eingesetzt werden, und wird dabei nicht durch Licht beeinträchtigt. Vor allem aber besteht keine Gefahr, dass die Privatsphäre des Benutzers verletzt wird. Unter den vielen Radararten wird das FMCW-Radar für die Gestenerkennung genutzt, da es mehrere Ziele beobachten, Reichweite, Geschwindigkeit und Winkel messen kann und die Hardware und Signalverarbeitung relativ einfach sind. Die radargestützte Gestenerkennung kann in einer Vielzahl von Bereichen eingesetzt werden. So kann z. B. bei Gesundheits- und Sicherheitsaspekten durch den Einsatz radargestützter Gestenerkennungssysteme Körperkontakt vermieden und die Möglichkeit einer Kontamination verringert werden. Auch in der Automobilbranche kann die berührungslose Steuerung bestimmter Funktionen, wie z. B. das Einschalten der Klimaanlage, das Benutzererlebnis verbessern und zu einem sichereren Fahrverhalten beitragen. Bei der Implementierung eines auf künstlicher Intelligenz basierenden Gestenerkennungssystems unter Verwendung von RADAR gibt es noch viele Herausforderungen, wie z. B. die Interpretation von Daten, das Sammeln von Trainingsdaten, die Optimierung der Berechnungskomplexität und die Verbesserung der Systemrobustheit. Diese Arbeit konzentriert sich auf die Bewältigung dieser Herausforderungen. Diese Arbeit befasst sich mit wichtigen Aspekten von Gestenerkennungssystemen. Von der Radarsignalverarbeitung, über maschinelle Lernmodelle, Datenerweiterung bis hin zu Multisensorsystemen werden die Herausforderungen der realen Welt angegangen. Damit wird der Grundstein für den umfassenden Einsatz von Gestenerkennungssystemen in der Praxis gelegt. / Human-computer interaction has become part of our daily lives. RADAR stands out as a very promising sensor, with its small size, low power consumption, and affordability. Compared to other sensors, such as cameras and LIDAR, RADAR can work in a variety of environments, and it is not affected by light. Most importantly, there is no risk of infringing on user's privacy. Among the many types of RADAR, FMCW RADAR is utilised for gesture recognition due to its ability to observe multiple targets and to measure range, velocity and angle, as well as its relatively simple hardware and signal processing. RADAR-based gesture recognition can be applied in a variety of domains. For example, for health and safety considerations, the use of RADAR-based gesture recognition systems can avoid physical contact and reduce the possibility of contamination. Similarly, in automotive applications, contactless control of certain functions, such as turning on the air conditioning, can improve the user experience and contribute to safer driving. There are still many challenges in implementing an artificial intelligence-based gesture recognition system using RADAR, such as interpreting data, collecting training data, optimising computational complexity and improving system robustness. This work will focus on addressing these challenges. This thesis addresses key aspects of gesture recognition systems. From RADAR signal processing, machine learning models, data augmentation to multi-sensor systems, the challenges posed by real-world scenarios are tackled. This lays the foundation for a comprehensive deployment of gesture recognition systems for many practical applications. FMCW RADAR Gestenerkennung Synthetisch GAN Maschinelles Lernen FMCW RADAR Gesture recognition Synthetic GAN Machine learning 004 Informatik ddc:004
160	Time Dynamic Topic Models Jähnichen, Patrick 30 March 2016 (has links) (PDF) Information extraction from large corpora can be a useful tool for many applications in industry and academia. For instance, political communication science has just recently begun to use the opportunities that come with the availability of massive amounts of information available through the Internet and the computational tools that natural language processing can provide. We give a linguistically motivated interpretation of topic modeling, a state-of-the-art algorithm for extracting latent semantic sets of words from large text corpora, and extend this interpretation to cover issues and issue-cycles as theoretical constructs coming from political communication science. We build on a dynamic topic model, a model whose semantic sets of words are allowed to evolve over time governed by a Brownian motion stochastic process and apply a new form of analysis to its result. Generally this analysis is based on the notion of volatility as in the rate of change of stocks or derivatives known from econometrics. We claim that the rate of change of sets of semantically related words can be interpreted as issue-cycles, the word sets as describing the underlying issue. Generalizing over the existing work, we introduce dynamic topic models that are driven by general (Brownian motion is a special case of our model) Gaussian processes, a family of stochastic processes defined by the function that determines their covariance structure. We use the above assumption and apply a certain class of covariance functions to allow for an appropriate rate of change in word sets while preserving the semantic relatedness among words. Applying our findings to a large newspaper data set, the New York Times Annotated corpus (all articles between 1987 and 2007), we are able to identify sub-topics in time, \\\\textit{time-localized topics} and find patterns in their behavior over time. However, we have to drop the assumption of semantic relatedness over all available time for any one topic. Time-localized topics are consistent in themselves but do not necessarily share semantic meaning between each other. They can, however, be interpreted to capture the notion of issues and their behavior that of issue-cycles. Topic Modelle maschinelles Lernen Bayes Modelle Automatische Sprachverarbeitung Topic Models Machine Learning Bayesian Models Time Series Analysis Natural Language Processing ddc:500

Search results