Spelling suggestions: "subject:"data preprocessing"" "subject:"data reprocessing""
1 |
Předzpracování dat / Data PreprocessingVašíček, Radek January 2008 (has links)
This thesis surveys on problems preprocessing data. Forepart deal with view and description characteristic tests for description attributes, methods for work with data and attributes. Second part work describes work with program Rapidminer. It pays pay attention to single functions preprocessing in this programme describes their function. Third part equate to results with using methods preprocessing and without using data preprocessing.
|
2 |
Systém předzpracování dat pro dobývání znalostí z databází / Systém předzpracování dat pro dobývání znalostí z databázíKotinová, Hana January 2009 (has links)
Abstract Aim of this diploma thesis was to create an aplication for data preprocessing. The aplication uses files in csv format and is useful for preparing data while solving datamining tasks. The aplication was created using the programing language Java. This text discusses problems, their solutions and algorithms associated with data preprocessing and discusses similar systems such as Mining Mart and SumatraTT. A complete aplication user guide is provided in the main part of this text.
|
3 |
Machine Learning in Logistics: Machine Learning Algorithms : Data Preprocessing and Machine Learning AlgorithmsAndersson, Viktor January 2017 (has links)
Data Ductus is a Swedish IT-consultant company, their customer base ranging from small startups to large scale cooperations. The company has steadily grown since the 80s and has established offices in both Sweden and the US. With the help of machine learning, this project will present a possible solution to the errors caused by the human factor in the logistic business.A way of preprocessing data before applying it to a machine learning algorithm, as well as a couple of algorithms to use will be presented. / Data Ductus är ett svenskt IT-konsultbolag, deras kundbas sträcker sig från små startups till stora redan etablerade företag. Företaget har stadigt växt sedan 80-talet och har etablerat kontor både i Sverige och i USA. Med hjälp av maskininlärning kommer detta projket att presentera en möjlig lösning på de fel som kan uppstå inom logistikverksamheten, orsakade av den mänskliga faktorn.Ett sätt att förbehandla data innan den tillämpas på en maskininlärning algoritm, liksom ett par algoritmer för användning kommer att presenteras.
|
4 |
DATA ACQUISITION SYSTEM FOR AIRCRAFT QUALIFICATIONEccles, Lee, O’Brien, Michael, Anderson, William 10 1900 (has links)
International Telemetering Conference Proceedings / October 13-16, 1986 / Riviera Hotel, Las Vegas, Nevada / The Boeing Commercial Airplane Company presently uses an Airborne Data Analysis and Monitor System (ADAMS) to support extensive qualification testing on new and modified commercial aircraft. The ADAMS system consists of subsystems controlled by independent processors which preprocess serial PCM data, perform application-specific processing, provide graphic display of data, and manage mass storage resources. Setup and control information is passed between processors using the Ethernet protocol on a fiber optic network. Tagged data is passed between processors using a data bus with networking characteristics. During qualification tests, data are dynamically selected, analyses performed, and results recorded. Decisions to proceed or repeat tests are made in real time on the aircraft.
Instrumentation in present aircraft includes up to 3700 sensors, with projections for 5750 sensors in the next generation. Concurrently, data throughput rates are increasing, and data preprocessing requirements are becoming more complex. Fairchild Weston Systems, Inc., under contract to Boeing, has developed an Acquisition Interface Assembly (AIA) which accepts multiple streams of PCM data, controls recording and playback on analog tape, performs high speed data preprocessing, and distributes the data to the other ADAMS subsystems. The AIA processes one to three streams in any of the standard IRIG PCM formats using programmable bit, frame and subframe synchronizers. Data from ARINC buses with embedded measurement labels, bus ID’s, and time tags may also be processed by the AIA. Preprocessing is accomplished by two high-performance Distributed Processing Units (DPU) operating in either pipeline or parallel environments. The DPU’s perform concatenation functions, number system conversions, engineering unit conversions, and data tagging for distribution to the ADAMS system. Time information, from either a time code generator or tape playback, may be merged with data
with a 0.1 msec resolution. Control and status functions are coordinated by an embedded processor, and are accessible to other ADAMS processors via both the Ethernet interface and a local operator’s terminal.
Because the AIA assembly is used in aircraft, the entire functional capability has been packaged in a 14-inch high, rack-mountable chassis with EMI shielding. The unit has been designed for high temperature, high altitude, vibrating environments. The AIA will be a key element in aircraft qualification testing at Boeing well into the next generation of airframes, and specification, design, development, and implementation of the AIA has been carried out with the significance of that fact in mind.
|
5 |
Framework pro předzpracování dopravních dat pro zjištění semantických míst / Trajectory Data Preprocessing Framework for Discovering Semantic LocationsOstroukh, Anna January 2018 (has links)
Cílem práce je vytvoření přehledu o existujících přístupech pro předzpracování dopravních dat se zaměřením na objevování sémantických trajektorií a návrh a vývoj rámce, který integruje dopravní data z GPS senzorů se sémantikou. Problém analýzy nezpracovaných trajektorií spočíva v tom, že není natolik vyčerpávající, jako analýza trajektorií, které obsahují smysluplný kontext. Po nastudování různých přístupů a algoritmů sleduje návrh a vývoj rámce, který objevuje semantická místa pomocí schlukovací metody záložené na hustotě, aplikované na body zastavení v trajektoriích. Návrh a implementace rámce byl zhodnotěn na veřejně přístupných datových souborech obsahujících nezpracované GPS záznamy.
|
6 |
Applying unprocessed companydata to time series forecasting : An investigative pilot studyRockström, August, Sevborn, Emelie January 2023 (has links)
Demand forecasting for sales is a widely researched topic that is essential for a business to prepare for market changes and increase profits. Existing research primarily focus on data that is more suitable for machine learning applications compared to the data accessible to companies lacking prior machine learning experience. This thesis performs demand forecasting on a known sales dataset and a dataset accessed directly from such a company, in the hopes of gaining insights that can help similar companies better utilize machine learning in their business model. LigthGBM, Linear Regression and Random Forest models are used along with several regression error metrics and plots to compare the performance of the two datasets. Both data sets are preprocessed into the same structure based on equivalent features found in each set. The company dataset is determined to be unfit for machine learning forecasting even after preprocessing measures and multiple possible reasons are established. The main contributors are a lack of observations per article and uniformity through the time series.
|
7 |
Characterization of Botanicals by Nuclear Magnetic Resonance and Mass Spectrometric Chemical ProfilingWang, Xinyi 13 July 2018 (has links)
No description available.
|
8 |
A partition based approach to approximate tree mining : a memory hierarchy perspectiveAgarwal, Khushbu 11 December 2007 (has links)
No description available.
|
9 |
A Supervised-Reinforcement Learning Model for Automated Clash Resolution in the Construction IndustryHarode, Ashit 24 September 2024 (has links)
Clash Coordination is a crucial step in ensuring timely and cost-effective project delivery. While software tools like Navisworks and Solibri have improved the process of aggregating models and conducting clash tests and categorization, resolving clashes remains a slow and manual task. The reason for this slow process can be attributed to the meticulous nature of the process where design coordinators need to ensure that resolving one clash does not lead to new clashes.
With the advent of machine learning and its application in construction, more research is being conducted to automate construction tasks to increase productivity and reduce the cost of the project. One such task currently being researched is to automate clash resolution. Researchers have explored the use of machine learning, specifically supervised learning, to automate clash resolution with successful outcomes. A search of the Web of Science database shows 7 publications that discuss the automation of clash resolution, automation of clash correction sequence, and automation of selection of relevant clashes. The authors to further analyze the content of these publications used VOSviewer to create a word map of keywords contained in the title, keywords, and abstract fields of these publications to analyze word co-occurrence. The word co-occurrence analysis revealed that the publications have explored supervised learning as the machine learning category of choice for automating clash resolution. However, the same analysis also showed the lack of terms such as data scrubbing, feature selection, feature engineering, and domain knowledge. These terms are an essential part of developing a machine-learning model.
This analysis led the authors to believe that even though research is being conducted to automate clash resolution, a systematic approach to develop a machine learning model to support the automation of clash resolution is missing. Also, though these researches show significant accuracy in terms of automating clash resolution, they fail to justify the selection of their feature and label space. Another limitation of the current state of the art is that the effectiveness of supervised learning in automating tasks is limited by the availability of a large amount of labeled data, often unavailable.
To address these research gaps, in this dissertation the author's first contribution to the body of knowledge is a phased systematic approach to develop an automation model for clash resolution. Since in machine learning selection of appropriate feature and label space is critical in developing an optimum and explainable solution, it is crucial to identify features that accurately represent a clash and also represent the factors industry experts consider when resolving the clash. Along with features, labels need to be selected as well to represent clash resolution options available to the industry. To achieve this in chapter 2 the author using modified Delphi captured the domain knowledge that industry experts utilize to resolve clashes. Factors considered by industry experts to decide on how clashes are resolved and options to resolve clashes are extracted from the domain knowledge. As a result of this research, the author identified 23 factors that industry experts consider when resolving clashes and 5 options available to resolve the clash. The work concludes by identifying factors and options that can serve as features and labels for a machine-learning algorithm to automate clash resolution.
Once features and labels are identified the author in chapter 3 discusses the development of a prediction model to predict clash resolution options for a given clash. The discussion is focused on individual steps involved in the creation of machine learning models like data collection, data pre-processing, and machine learning algorithm development and selection. The author also addresses common challenges in the development of machine learning models like class imbalance and availability of limited data. The author utilizes a multi-label synthetic oversampling method (MLSOL) to generate different percentages of synthetic data to account for class imbalance and limited datasets. Using this dataset, the author then trained five different supervised learning algorithms and reported their accuracy. Based on this work the author concluded that increasing the dataset with 20% of synthetic data and using an artificial neural network to develop the machine learning model to automate clash resolution generated the best result with an average accuracy of around 80%.
To address the limitation of using only supervised learning and as a second contribution to the body of knowledge, the author in chapter 4 proposes the use of reinforcement learning to train a Deep Q Network (DQN) agent capable of learning how to resolve clashes through interactions with a Building Information Model (BIM) environment containing clashes. The work discusses the implementation of a dynamic reward function to guide the agent in making decisions based on industry best practices. Additionally, it outlines the setup of the interaction between the agent and the environment to facilitate learning. Considering that reinforcement learning requires a significant amount of time to develop knowledge, the author also tested the effect of using a pre-trained supervised learning model to initialize the reinforcement learning policy function and guide knowledge exploration. This approach resulted in three variations of supervised-reinforcement learning. The supervised learning model used in this research demonstrated an accuracy of 31%. To demonstrate the utility of reinforcement learning in training an agent, the authors plotted graphs showing the number of clashes resolved per episode and the cumulative reward received per episode. The clashes resolved by the agent in this research were limited to clashes between ducts and pipes. These graphs illustrated that with each successive episode, the agent became increasingly proficient at resolving clashes. Among the variations of supervised-reinforcement learning, the one that exhibited the most stable learning graph utilized the weights of the supervised learning model to initialize the policy function of reinforcement learning. This research confirmed that reinforcement learning can be employed to train an automated model instead of relying solely on supervised learning, especially in scenarios where limited or no clash resolution data is available. Moreover, pre-training reinforcement learning using a supervised learning model led to more consistent learning outcomes.
The research presented in this dissertation focuses on the holistic development of a machine learning model to automate clash resolution. By identifying appropriate features and labels before training the model the author ensures that the automation model accurately captures industry best practices and is explainable. Furthermore, by utilizing a systematic approach towards the development of a machine learning model the author addresses common challenges in developing a machine learning model and how we can overcome them. Lastly, through the utilization of supervised reinforcement learning the author proposes an alternative learning algorithm that can learn how to resolve clashes with fewer labeled examples through Building Information Model (BIM) interaction and with a more steady learning rate than reinforcement learning alone. / Doctor of Philosophy / Clash Coordination is a crucial step in ensuring timely and cost-effective project delivery. While software tools like Navisworks and Solibri have improved the process of aggregating models and conducting clash tests and categorization, resolving clashes remains a slow and manual task. The reason for this slow process can be attributed to the meticulous nature of the process where design coordinators need to make sure that resolving one clash does not lead to the creation of new clashes.
Research has been conducted to improve the clash coordination process through automation using supervised learning, where a machine is taught to resolve clashes by understanding existing examples of clash resolutions. However, these researches do not provide enough evidence on how the example of clashes are presented to the machine and skip the details on common challenges associated with machine learning and how to overcome them. Also, as these researches focuses on training a machine using existing examples of clash resolution, a large number of examples are required to develop an effective machine-learning solution.
The author of this dissertation addresses these limitations and contributes to the body of knowledge. In Chapter 2 the author discusses the use of modified Delphi to capture the industry's knowledge on how to make decisions about clash resolution and what options to consider when resolving clashes. The author also took measures during this process to reduce biases like intercoder reliability checks to make the results of modified Delphi more accurate. As a result of modified Delphi, the author identified 23 factors that industry experts consider when resolving clashes and 5 options available to resolve the clash.
These identified factors and options were later utilized by the author in chapter 3 as features and labels to represent clash resolution examples. Using these examples, the author then developed a supervised learning model able to predict the most likely solution for a given clash with 80% accuracy. While developing the supervised learning model the author discusses common challenges associated with machine learning like class imbalance, data scrubbing, and un-normalized data and their mitigative measures.
To address the limited availability of clash resolution examples the author in chapter 4 proposes and develops a supervised-reinforcement learning model. This model teaches how to resolve clashes by continuously interacting with a BIM model. To improve the learning rate the model also utilizes the knowledge gained through the development of a supervised learning model.
This research shows that using reinforcement learning it is possible to train a machine to resolve clashes and adding knowledge from supervised learning to reinforcement learning results in a steadier learning rate for the machine. The research also shows that a more accurate supervised learning model can be developed using limited clash resolution examples using deep artificial neural networks, though this kind of approach increases the learning time and can lead to the issue of overfitting.
|
10 |
Sentimental Analysis of CyberbullyingTweets with SVM TechniqueThanikonda, Hrushikesh, Koneti, Kavya Sree January 2023 (has links)
Background: Cyberbullying involves the use of digital technologies to harass, humiliate, or threaten individuals or groups. This form of bullying can occur on various platforms such as social media, messaging apps, gaming platforms, and mobile phones. With the outbreak of covid-19, there was a drastic increase in utilization of social media. And this upsurge was coupled with cyberbullying, making it a pressing issue that needs to be addressed. Sentiment analysis involves identifying and categorizing emotions and opinions expressed in text data using natural language processing and machine learning techniques. SVM is a machine learning algorithm that has been widely used for sentiment analysis due to its accuracy and efficiency. Objectives: The main objective of this study is to use SVM for sentiment analysis of cyberbullying tweets and evaluate its performance. The study aimed to determine the feasibility of using SVM for sentiment analysis and to assess its accuracy in detecting cyberbullying. Methods: The quantitative research method is used in this thesis, and data is analyzed using statistical analysis. The data set is from Kaggle and includes data about cyberbullying tweets. The collected data is preprocessed and used to train and test an SVM model. The created model will be evaluated on the test set using evaluation accuracy, precision, recall, and F1 score to determine the performance of the SVM model developed to detect cyberbullying. Results: The results showed that SVM is a suitable technique for sentiment analysis of cyberbullying tweets. The model had an accuracy of 82.3% in detecting cyberbullying, with a precision of 0.82, recall of 0.82, and F1-score of 0.83. Conclusions: The study demonstrates the feasibility of using SVM for sentimental analysis of cyberbullying tweets. The high accuracy of the SVM model suggests that it can be used to build automated systems for detecting cyberbullying. The findings highlight the importance of developing tools to detect and address cyberbullying in the online world. The use of sentimental analysis and SVM has the potential to make a significant contribution to the fight against cyberbullying.
|
Page generated in 0.0788 seconds