41 |
Benchmarking AutoML for regression tasks on small tabular data in materials designConrad, Felix, Mälzer, Mauritz, Schwarzenberger, Michael, Wiemer, Hajo, Ihlenfeldt, Steffen 05 March 2024 (has links)
Machine Learning has become more important for materials engineering in the last decade. Globally, automated machine learning (AutoML) is growing in popularity with the increasing demand for data analysis solutions. Yet, it is not frequently used for small tabular data. Comparisons and benchmarks already exist to assess the qualities of AutoML tools in general, but none of them elaborates on the surrounding conditions of materials engineers working with experimental data: small datasets with less than 1000 samples. This benchmark addresses these conditions and draws special attention to the overall competitiveness with manual data analysis. Four representative AutoML frameworks are used to evaluate twelve domain-specific datasets to provide orientation on the promises of AutoML in the field of materials engineering. Performance, robustness and usability are discussed in particular. The results lead to two main conclusions: First, AutoML is highly competitive with manual model optimization, even with little training time. Second, the data sampling for train and test data is of crucial importance for reliable results.
|
42 |
Variational AutoEncoders and Differential Privacy : balancing data synthesis and privacy constraints / Variational AutoEncoders och Differential Privacy : balans mellan datasyntes och integritetsbegränsningarBremond, Baptiste January 2024 (has links)
This thesis investigates the effectiveness of Tabular Variational Auto Encoders (TVAEs) in generating high-quality synthetic tabular data and assesses their compliance with differential privacy principles. The study shows that while TVAEs are better than VAEs at generating synthetic data that faithfully reproduces the distribution of real data as measured by the Synthetic Data Vault (SDV) metrics, the latter does not guarantee that the synthetic data is up to the task in practical industrial applications. In particular, models trained on TVAE-generated data from the Creditcards dataset are ineffective. The author also explores various optimisation methods on TVAE, such as Gumbel Max Trick, Drop Out (DO) and Batch Normalization, while pointing out that techniques frequently used to improve two-dimensional TVAE, such as Kullback–Leibler Warm-Up and B Disentanglement, are not directly transferable to the one-dimensional context. However, differential privacy to TVAE was not implemented due to time constraints and inconclusive results. The study nevertheless highlights the benefits of stabilising training with the Differential Privacy - Stochastic Gradient Descent (DP-SGD), as with a dropout, and the existence of an optimal equilibrium point between the constraints of differential privacy and the number of training epochs in the model. / Denna avhandling undersöker hur effektiva Tabular Variational AutoEncoders (TVAE) är när det gäller att generera högkvalitativa syntetiska tabelldata och utvärderar deras överensstämmelse med differentierade integritetsprinciper. Studien visar att även om TVAE är bättre än VAE på att generera syntetiska data som troget återger fördelningen av verkliga data mätt med Synthetic Data Vault (SDV), garanterar det senare inte att de syntetiska data är upp till uppgiften i praktiska industriella tillämpningar. I synnerhet är modeller som tränats på TVAE-genererade data från Creditcards-datasetet ineffektiva. Författaren undersöker också olika optimeringsmetoder för TVAE, såsom Gumbel Max Trick, DO och Batch Normalization, samtidigt som han påpekar att tekniker som ofta används för att förbättra tvådimensionell TVAE, såsom Kullback-Leibler Warm-Up och B Disentanglement, inte är direkt överförbara till det endimensionella sammanhanget. På grund av tidsbegränsningar och redan ofullständiga resultat implementerades dock inte differentierad integritet för TVAE. Studien belyser ändå fördelarna med att stabilisera träningen med Differential Privacy - Stochastic Gradient Descent (DP-SGD), som med en drop-out, och förekomsten av en optimal jämviktspunkt mellan begränsningarna för differential privacy och antalet träningsepoker i modellen.
|
43 |
Differential neural architecture search for tabular data : Efficient neural network design for tabular datasetsMedhage, Marcus January 2024 (has links)
Artificial neural networks are some of the most powerful machine learning models and have gained interest in the telecommunications domain as well as other fields and applications due to their strong performance and flexibility. Creating these models typically requires manually choosing their architecture along with other hyperparameters that are crucial for their performance. Neural Architecture Search (NAS) seeks to automate architecture choice and has gained increasing interest in recent years. In this thesis, we propose a new NAS method based on differential architecture search (DARTS) to find architectures of fully-connected feed forward networks on tabular datasets. We train a gating mechanism on a validation dataset and compare four candidate gate functions as a tool to determine the number of hidden units per hidden layer in our neural networks for different tasks. Our findings show that our new method can reliably find architectures that are more compact and outperform manually chosen architectures. Interestingly, we also found that extracting weights learned during the search process could generate models that achieve significantly higher and more stable performance than identical architectures retrained from scratch. Our method achieved equal in performance to that of another NAS-method, while only requiring half an hour of training compared to 280 hours. The trained models also demonstrated a competitive performance when benchmarked to other state-of-the-art machine learning models. The primary benefit of our method, stems from the extraction and fine-tuning of certain weights. Our results indicate that improvements from extracted weights could relate to the lottery ticket hypothesis of neural networks, which invites further study for a fuller understanding.
|
44 |
Aproksimativna diskretizacija tabelarno organizovanih podataka / Approximative Discretization of Table-Organized DataOgnjenović Višnja 27 September 2016 (has links)
<p>Disertacija se bavi analizom uticaja raspodela podataka na rezultate algoritama diskretizacije u okviru procesa mašinskog učenja. Na osnovu izabranih baza i algoritama diskretizacije teorije grubih skupova i stabala odlučivanja, istražen je uticaj odnosa raspodela podataka i tačaka reza određene diskretizacije.<br />Praćena je promena konzistentnosti diskretizovane tabele u zavisnosti od položaja redukovane tačke reza na histogramu. Definisane su fiksne tačke reza u zavisnosti od segmentacije multimodal raspodele, na osnovu kojih je moguće raditi redukciju preostalih tačaka reza. Za određivanje fiksnih tačaka konstruisan je algoritam FixedPoints koji ih određuje u skladu sa grubom segmentacijom multimodal raspodele.<br />Konstruisan je algoritam aproksimativne diskretizacije APPROX MD za redukciju tačaka reza, koji koristi tačke reza dobijene algoritmom maksimalne razberivosti i parametre vezane za procenat nepreciznih pravila, ukupni procenat klasifikacije i broj tačaka redukcije. Algoritam je kompariran u odnosu na algoritam maksimalne razberivosti i u odnosu na algoritam maksimalne razberivosti sa aproksimativnim rešenjima za α=0,95.</p> / <p>This dissertation analyses the influence of data distribution on the results of discretization algorithms within the process of machine learning. Based on the chosen databases and the discretization algorithms within the rough set theory and decision trees, the influence of the data distribution-cuts relation within certain discretization has been researched.<br />Changes in consistency of a discretized table, as dependent on the position of the reduced cut on the histogram, has been monitored. Fixed cuts have been defined, as dependent on the multimodal segmentation, on basis of which it is possible to do the reduction of the remaining cuts. To determine the fixed cuts, an algorithm FixedPoints has been constructed, determining these points in accordance with the rough segmentation of multimodal distribution.<br />An algorithm for approximate discretization, APPROX MD, has been constructed for cuts reduction, using cuts obtained through the maximum discernibility (MD-Heuristic) algorithm and the parametres related to the percent of imprecise rules, the total classification percent and the number of reduction cuts. The algorithm has been compared to the MD algorithm and to the MD algorithm with approximate solutions for α=0,95.</p>
|
45 |
Visualization of tabular data on mobile devices / Visualisering av tabulär data på mobila enheterCaspár, Sophia January 2018 (has links)
This thesis evaluates various ways of displaying tabular data on mobile devices using different responsive table solutions. It also presents a tool to help web developers and designers in the process of choosing and implementing a suitable table approach. The proposed solution for this thesis is a web system called The Visualizing Wizard that allows the user to answer some questions about the intended table and then get a recommended responsive table solution generated based on the answers. The system uses a rule-based approach via Prolog to match the answers to a set of rules and provide an appropriate result. In order to determine which table solutions are more appropriate to use for which type of data a statistical analysis and user tests were performed. The statistical analysis contains an investigation to identify the most common table approaches and data types used on various websites. The result indicates that solutions such as "squish", "collapse by rows", "click" and "scroll" are most common. The most common table categories are product comparison, product offerings, sports and stock market/statistics. This information was used to implement and establish user tests to collect feedback and opinions. The data and statistics gathered from the user tests were mapped into sets of rules to answer the question of which responsive table solution is more appropriate to use for which type of data. This serves as the foundation for The Visualizing Wizard.
|
46 |
Intégration de techniques CSP pour la résolution du problème WCSP / Integration of CSP techniques to solve WCSPParis, Nicolas 06 November 2014 (has links)
Cette thèse se situe dans le contexte de la programmation par contraintes (CP). Plus précisément, nous nous sommes intéressés au problème de satisfaction de contraintes pondérées (WCSP). De nombreuses approches ont été proposées pour traiter ce problème d’optimisation. Les méthodes les plus efficaces utilisent des cohérences locales souples sophistiquées comme par exemple la cohérence d’arc directionnelle complète FDAC∗, la cohérence d’arc directionnelle existentielle EDAC∗, etc. Établies grâce à des opérations de transferts de coût préservant l’équivalence des réseaux, l’utilisation de ces cohérences permet généralement d’accélérer la résolution en réduisant l’espace de recherche via la suppression de valeurs et le calcul de bornes inférieures utiles en pratique. Cependant, l’utilisation de ces méthodes pose un problème lorsque l’arité des contraintes augmente de manière significative. L’efficacité des techniques du cadre du problème de satisfaction de contraintes (CSP) étant avérée, nous pensons que l’intégration de techniques CSP peut être très utile à la résolution d’instances WCSP. Dans cette thèse, nous proposons tout d’abord un algorithme de filtrage établissant la cohérence d’arc souple généralisée GAC∗ sur des contraintes tables souples de grande arité. Cette approche combine la technique de réduction tabulaire simple (STR), issue du cadre CSP, et le principe de transfert de coûts. Notre approche qui est polynomiale calcule efficacement pour chaque valeur les coûts minimaux dans les tuples explicites et implicites des contraintes tables souples. Ces coûts minimaux sont ensuite utilisés pour transférer les coûts afin d’établir GAC∗. Dans un second temps, nous proposons une approche alternative aux méthodes de résolution habituelles du problème WCSP. Elle consiste à résoudre une instance WCSP en résolvant une séquence d’instances CSP classiques obtenues depuis cette instance WCSP. À partir d’une instance CSP dans laquelle toutes les contraintes de l’instanceWCSP d’origine sont durcies au maximum, les instances CSP suivantes correspondent à un relâchement progressif de contraintes de l’instance WCSP déterminées par l’extraction de noyaux insatisfaisables minimaux (MUC) depuis les réseaux insatisfaisables de la séquence. Nos résultats expérimentaux montrent que notre première approche est compétitive avec l’état de l’art, tandis que la deuxième représente une approche alternative aux méthodes de résolutionhabituelles d’instances WCSP. / This thesis is in the context of constraint programming (CP). Specifically, we are interested in the Weighted Constraint Satisfaction Problem (WCSP). Many approaches have been proposed to handle this optimization problem. The most effective methods use sophisticated soft local consistencies such as, for example, full directional arc consistency FDAC∗, existential directional arc consistency EDAC∗, etc. Established by equivalence preserving transformations (cost transfer operations), the use of these consistencies generally allows both to accelerate the resolution by reducing the search space through the elimination of values and to compute lower bounds useful in practice. However, these methods reach theirlimits when the arity of constraints increases significantly. The techniques of the Constraint Satisfaction Problem framework (CSP) having proved efficienty, we believe that the integration of CSP techniques can be very useful for solving WCSP instances. In this thesis, we first propose a filtering algorithm to enforce a soft version of generalized arc consistency (GAC∗) on soft table constraints of large arity. This approach combines the techniques of simple tabular reduction (STR), from the CSP framework, with the techniques of cost transfer. Our approach, proved polynomial, efficiently calculates for each value the minimum cost of the explicit and implicit tuples from soft table constraints. The minimum costs are thenused to transfer costs to establish GAC∗. In a second step, we propose an alternative approach to the usual techniques to solve WCSP. The principle is to solve a WCSP instance by solving a sequence of classical CSP instances obtained from this WCSP instance. From a CSP instance containing all the constraints hardened to the maximum from the WCSP instance, the next CSP instances correspond to a progressive relaxation of constraints defined by extraction of minimal unsatisfiable cores (MUC) from unsatisfiable networks of the sequence. Our experimental results show that our first approach is competitive with the state-of-the-art, whereas the second one represents an alternative approach to the usual methods to solve WCSP instances.
|
47 |
Fast Registration of Tabular Document Images Using the Fourier-Mellin TransformHutchison, Luke Alexander Daysh 24 March 2004 (has links)
Image registration, the process of finding the transformation that best maps one image to another, is an important tool in document image processing. Having properly-aligned microfilm images can help in manual and automated content extraction, zoning, and batch compression of images. An image registration algorithm is presented that quickly identifies the global affine transformation (rotation, scale, translation and/or shear) that maps one tabular document image to another, using the Fourier-Mellin Transform. Each component of the affine transform is recovered independantly from the others, dramatically reducing the parameter space of the problem, and improving upon standard Fourier-Mellin Image Registration (FMIR), which only directly separates translation from the other components. FMIR is also extended to handle shear, as well as different scale factors for each document axis. This registration method deals with all transform components in a uniform way, by working in the frequency domain. Registration is limited to foreground pixels (the document form and printed text) through the introduction of a novel, locally adaptive foreground-background segmentation algorithm, based on the median filter. The background removal algorithm is also demonstrated as a useful tool to remove ambient signal noise during correlation. Common problems with FMIR are eliminated by background removal, meaning that apodization (tapering down to zero at the edge of the image) is not needed for accurate recovery of the rotation parameter, allowing the entire image to be used for registration. An effective new optimization to the median filter is presented. Rotation and scale parameter detection is less susceptible to problems arising from the non-commutativity of rotation and "tiling" (periodicity) than for standard FMIR, because only the regions of the frequency domain directly corresponding to tabular features are used in registration. An original method is also presented for automatically obtaining blank document templates from a set of registered document images, by computing the "pointwise median" of a set of registered documents. Finally, registration is demonstrated as an effective tool for predictive image compression. The presented registration algorithm is reliable and robust, and handles a wider range of transformation types than most document image registration systems (which typically only perform deskewing).
|
48 |
Administrativní objekt / Commercial buildingTejkl, Jakub January 2013 (has links)
The master thesis „ Administrative object“ is processed as a project for building construction. The object is situated on plots No. 651/1 and 652/1 in the Brno. The building is designed to use for administration, 1 st floor is used for gallery, sweet-shop, newspaper shop and garrage. The house hasn´t cellar, it has 3 aboveground floors. The skeleton of construction is designed from reinforced concrete, walls are from brick-walled POROTHERM. Roofing is designed from reinforced concrete board. The building will be roofed with arched trusses with metal roofing. In 2nd and 3rd floor are situated offices, meeting rooms, conference halls and social background. The work includes the study of building and heat considerations.
|
49 |
A User-Centric Tabular Multi-Column Sorting Interface For Intact Transposition Of Columnar DataMiles, David B. L. 12 January 2006 (has links) (PDF)
Many usability features designed in software applications are not procedurally intuitive for software users. A good example of software usability involves tabular sorting in a spreadsheet. Single-column sorting, activated with a mouse click to a column header or toolbar button, often produces rearranged listings that reduce cognitive organization beyond the sorted column. Multi-column sorting, generated through menu-driven processes, provides derived organization, however, locating feature options through menu-based systems can be confusing. A means to overcome this confusion is prioritized selection of database arrays issued to columnar displays for the purpose of intact transposition of data. This is a unique process designed as a user-centric tabular multi-column sorting interface. Designed within this experimental software application is a "trickle down" logic perceived as a navigation rule. The design offers logic associated with decision choices as used to pursue a software solution. The solution in this instance is a compiled resultant of separate and distinct columnar data sorting. The design was initially implemented in a software application housing thousands of examination scores. Observations of the design concept's effectiveness in practice led to further investigation through this master's thesis. To validate the research design, research participants were introduced to an example of traditional database sort/selection with practice examples. These users were also provided sorting exercises to reinforce the discussed concepts--both experimental and traditional. Finally, a survey questionnaire allowed them to provide feedback about the different task methods for sorting as well as the experience of using these dissimilar methods. The hypothesis was not validated through the research survey. Consideration of observations of the design in a production environment for four years, however, provided impetus to suggest further research of the design concept.
|
50 |
Synthesis of Tabular Financial Data using Generative Adversarial Networks / Syntes av tabulär finansiell data med generativa motstridande nätverkKarlsson, Anton, Sjöberg, Torbjörn January 2020 (has links)
Digitalization has led to tons of available customer data and possibilities for data-driven innovation. However, the data needs to be handled carefully to protect the privacy of the customers. Generative Adversarial Networks (GANs) are a promising recent development in generative modeling. They can be used to create synthetic data which facilitate analysis while ensuring that customer privacy is maintained. Prior research on GANs has shown impressive results on image data. In this thesis, we investigate the viability of using GANs within the financial industry. We investigate two state-of-the-art GAN models for synthesizing tabular data, TGAN and CTGAN, along with a simpler GAN model that we call WGAN. A comprehensive evaluation framework is developed to facilitate comparison of the synthetic datasets. The results indicate that GANs are able to generate quality synthetic datasets that preserve the statistical properties of the underlying data and enable a viable and reproducible subsequent analysis. It was however found that all of the investigated models had problems with reproducing numerical data. / Digitaliseringen har fört med sig stora mängder tillgänglig kunddata och skapat möjligheter för datadriven innovation. För att skydda kundernas integritet måste dock uppgifterna hanteras varsamt. Generativa Motstidande Nätverk (GANs) är en ny lovande utveckling inom generativ modellering. De kan användas till att syntetisera data som underlättar dataanalys samt bevarar kundernas integritet. Tidigare forskning på GANs har visat lovande resultat på bilddata. I det här examensarbetet undersöker vi gångbarheten av GANs inom finansbranchen. Vi undersöker två framstående GANs designade för att syntetisera tabelldata, TGAN och CTGAN, samt en enklare GAN modell som vi kallar för WGAN. Ett omfattande ramverk för att utvärdera syntetiska dataset utvecklas för att möjliggöra jämförelse mellan olika GANs. Resultaten indikerar att GANs klarar av att syntetisera högkvalitativa dataset som bevarar de statistiska egenskaperna hos det underliggande datat, vilket möjliggör en gångbar och reproducerbar efterföljande analys. Alla modellerna som testades uppvisade dock problem med att återskapa numerisk data.
|
Page generated in 0.0575 seconds