• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 18
  • 9
  • 1
  • 1
  • 1
  • Tagged with
  • 31
  • 31
  • 14
  • 10
  • 8
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Předzpracování dat / Data Preprocessing

Vašíček, Radek January 2008 (has links)
This thesis surveys on problems preprocessing data. Forepart deal with view and description characteristic tests for description attributes, methods for work with data and attributes. Second part work describes work with program Rapidminer. It pays pay attention to single functions preprocessing in this programme describes their function. Third part equate to results with using methods preprocessing and without using data preprocessing.
2

Systém předzpracování dat pro dobývání znalostí z databází / Systém předzpracování dat pro dobývání znalostí z databází

Kotinová, Hana January 2009 (has links)
Abstract Aim of this diploma thesis was to create an aplication for data preprocessing. The aplication uses files in csv format and is useful for preparing data while solving datamining tasks. The aplication was created using the programing language Java. This text discusses problems, their solutions and algorithms associated with data preprocessing and discusses similar systems such as Mining Mart and SumatraTT. A complete aplication user guide is provided in the main part of this text.
3

Machine Learning in Logistics: Machine Learning Algorithms : Data Preprocessing and Machine Learning Algorithms

Andersson, Viktor January 2017 (has links)
Data Ductus is a Swedish IT-consultant company, their customer base ranging from small startups to large scale cooperations. The company has steadily grown since the 80s and has established offices in both Sweden and the US. With the help of machine learning, this project will present a possible solution to the errors caused by the human factor in the logistic business.A way of preprocessing data before applying it to a machine learning algorithm, as well as a couple of algorithms to use will be presented. / Data Ductus är ett svenskt IT-konsultbolag, deras kundbas sträcker sig från små startups till stora redan etablerade företag. Företaget har stadigt växt sedan 80-talet och har etablerat kontor både i Sverige och i USA. Med hjälp av maskininlärning kommer detta projket att presentera en möjlig lösning på de fel som kan uppstå inom logistikverksamheten, orsakade av den mänskliga faktorn.Ett sätt att förbehandla data innan den tillämpas på en maskininlärning algoritm, liksom ett par algoritmer för användning kommer att presenteras.
4

DATA ACQUISITION SYSTEM FOR AIRCRAFT QUALIFICATION

Eccles, Lee, O’Brien, Michael, Anderson, William 10 1900 (has links)
International Telemetering Conference Proceedings / October 13-16, 1986 / Riviera Hotel, Las Vegas, Nevada / The Boeing Commercial Airplane Company presently uses an Airborne Data Analysis and Monitor System (ADAMS) to support extensive qualification testing on new and modified commercial aircraft. The ADAMS system consists of subsystems controlled by independent processors which preprocess serial PCM data, perform application-specific processing, provide graphic display of data, and manage mass storage resources. Setup and control information is passed between processors using the Ethernet protocol on a fiber optic network. Tagged data is passed between processors using a data bus with networking characteristics. During qualification tests, data are dynamically selected, analyses performed, and results recorded. Decisions to proceed or repeat tests are made in real time on the aircraft. Instrumentation in present aircraft includes up to 3700 sensors, with projections for 5750 sensors in the next generation. Concurrently, data throughput rates are increasing, and data preprocessing requirements are becoming more complex. Fairchild Weston Systems, Inc., under contract to Boeing, has developed an Acquisition Interface Assembly (AIA) which accepts multiple streams of PCM data, controls recording and playback on analog tape, performs high speed data preprocessing, and distributes the data to the other ADAMS subsystems. The AIA processes one to three streams in any of the standard IRIG PCM formats using programmable bit, frame and subframe synchronizers. Data from ARINC buses with embedded measurement labels, bus ID’s, and time tags may also be processed by the AIA. Preprocessing is accomplished by two high-performance Distributed Processing Units (DPU) operating in either pipeline or parallel environments. The DPU’s perform concatenation functions, number system conversions, engineering unit conversions, and data tagging for distribution to the ADAMS system. Time information, from either a time code generator or tape playback, may be merged with data with a 0.1 msec resolution. Control and status functions are coordinated by an embedded processor, and are accessible to other ADAMS processors via both the Ethernet interface and a local operator’s terminal. Because the AIA assembly is used in aircraft, the entire functional capability has been packaged in a 14-inch high, rack-mountable chassis with EMI shielding. The unit has been designed for high temperature, high altitude, vibrating environments. The AIA will be a key element in aircraft qualification testing at Boeing well into the next generation of airframes, and specification, design, development, and implementation of the AIA has been carried out with the significance of that fact in mind.
5

Framework pro předzpracování dopravních dat pro zjištění semantických míst / Trajectory Data Preprocessing Framework for Discovering Semantic Locations

Ostroukh, Anna January 2018 (has links)
Cílem práce je vytvoření přehledu o existujících přístupech pro předzpracování dopravních dat se zaměřením na objevování sémantických trajektorií a návrh a vývoj rámce, který integruje dopravní data z GPS senzorů se sémantikou. Problém analýzy nezpracovaných trajektorií spočíva v tom, že není natolik vyčerpávající, jako analýza trajektorií, které obsahují smysluplný kontext. Po nastudování různých přístupů a algoritmů sleduje návrh a vývoj rámce, který objevuje semantická místa pomocí schlukovací metody záložené na hustotě, aplikované na body zastavení v trajektoriích. Návrh a implementace rámce byl zhodnotěn na veřejně přístupných datových souborech obsahujících nezpracované GPS záznamy.
6

Characterization of Botanicals by Nuclear Magnetic Resonance and Mass Spectrometric Chemical Profiling

Wang, Xinyi 13 July 2018 (has links)
No description available.
7

A partition based approach to approximate tree mining : a memory hierarchy perspective

Agarwal, Khushbu 11 December 2007 (has links)
No description available.
8

Applying unprocessed companydata to time series forecasting : An investigative pilot study

Rockström, August, Sevborn, Emelie January 2023 (has links)
Demand forecasting for sales is a widely researched topic that is essential for a business to prepare for market changes and increase profits. Existing research primarily focus on data that is more suitable for machine learning applications compared to the data accessible to companies lacking prior machine learning experience. This thesis performs demand forecasting on a known sales dataset and a dataset accessed directly from such a company, in the hopes of gaining insights that can help similar companies better utilize machine learning in their business model. LigthGBM, Linear Regression and Random Forest models are used along with several regression error metrics and plots to compare the performance of the two datasets. Both data sets are preprocessed into the same structure based on equivalent features found in each set. The company dataset is determined to be unfit for machine learning forecasting even after preprocessing measures and multiple possible reasons are established. The main contributors are a lack of observations per article and uniformity through the time series.
9

Building the Dresden Web Table Corpus: A Classification Approach

Lehner, Wolfgang, Eberius, Julian, Braunschweig, Katrin, Hentsch, Markus, Thiele, Maik, Ahmadov, Ahmad 12 January 2023 (has links)
In recent years, researchers have recognized relational tables on the Web as an important source of information. To assist this research we developed the Dresden Web Tables Corpus (DWTC), a collection of about 125 million data tables extracted from the Common Crawl (CC) which contains 3.6 billion web pages and is 266TB in size. As the vast majority of HTML tables are used for layout purposes and only a small share contains genuine tables with different surface forms, accurate table detection is essential for building a large-scale Web table corpus. Furthermore, correctly recognizing the table structure (e.g. horizontal listings, matrices) is important in order to understand the role of each table cell, distinguishing between label and data cells. In this paper, we present an extensive table layout classification that enables us to identify the main layout categories of Web tables with very high precision. We therefore identify and develop a plethora of table features, different feature selection techniques and several classification algorithms. We evaluate the effectiveness of the selected features and compare the performance of various state-of-the-art classification algorithms. Finally, the winning approach is employed to classify millions of tables resulting in the Dresden Web Table Corpus (DWTC).
10

Towards a Hybrid Imputation Approach Using Web Tables

Lehner, Wolfgang, Ahmadov, Ahmad, Thiele, Maik, Eberius, Julian, Wrembel, Robert 12 January 2023 (has links)
Data completeness is one of the most important data quality dimensions and an essential premise in data analytics. With new emerging Big Data trends such as the data lake concept, which provides a low cost data preparation repository instead of moving curated data into a data warehouse, the problem of data completeness is additionally reinforced. While traditionally the process of filling in missing values is addressed by the data imputation community using statistical techniques, we complement these approaches by using external data sources from the data lake or even the Web to lookup missing values. In this paper we propose a novel hybrid data imputation strategy that, takes into account the characteristics of an incomplete dataset and based on that chooses the best imputation approach, i.e. either a statistical approach such as regression analysis or a Web-based lookup or a combination of both. We formalize and implement both imputation approaches, including a Web table retrieval and matching system and evaluate them extensively using a corpus with 125M Web tables. We show that applying statistical techniques in conjunction with external data sources will lead to a imputation system which is robust, accurate, and has high coverage at the same time.

Page generated in 0.1075 seconds