Global ETD Search

11	Using dynamic time warping for multi-sensor fusion Ko, Ming Hsiao January 2009 (has links) Fusion is a fundamental human process that occurs in some form at all levels of sense organs such as visual and sound information received from eyes and ears respectively, to the highest levels of decision making such as our brain fuses visual and sound information to make decisions. Multi-sensor data fusion is concerned with gaining information from multiple sensors by fusing across raw data, features or decisions. The traditional frameworks for multi-sensor data fusion only concern fusion at specific points in time. However, many real world situations change over time. When the multi-sensor system is used for situation awareness, it is useful not only to know the state or event of the situation at a point in time, but also more importantly, to understand the causalities of those states or events changing over time. / Hence, we proposed a multi-agent framework for temporal fusion, which emphasises the time dimension of the fusion process, that is, fusion of the multi-sensor data or events derived over a period of time. The proposed multi-agent framework has three major layers: hardware, agents, and users. There are three different fusion architectures: centralized, hierarchical, and distributed, for organising the group of agents. The temporal fusion process of the proposed framework is elaborated by using the information graph. Finally, the core of the proposed temporal fusion framework – Dynamic Time Warping (DTW) temporal fusion agent is described in detail. / Fusing multisensory data over a period of time is a challenging task, since the data to be fused consists of complex sequences that are multi–dimensional, multimodal, interacting, and time–varying in nature. Additionally, performing temporal fusion efficiently in real–time is another challenge due to the large amount of data to be fused. To address these issues, we proposed the DTW temporal fusion agent that includes four major modules: data pre-processing, DTW recogniser, class templates, and decision making. The DTW recogniser is extended in various ways to deal with the variability of multimodal sequences acquired from multiple heterogeneous sensors, the problems of unknown start and end points, multimodal sequences of the same class that hence has different lengths locally and/or globally, and the challenges of online temporal fusion. / We evaluate the performance of the proposed DTW temporal fusion agent on two real world datasets: 1) accelerometer data acquired from performing two hand gestures, and 2) a benchmark dataset acquired from carrying a mobile device and performing pre-defined user scenarios. Performance results of the DTW based system are compared with those of a Hidden Markov Model (HMM) based system. The experimental results from both datasets demonstrate that the proposed DTW temporal fusion agent outperforms HMM based systems, and has the capability to perform online temporal fusion efficiently and accurately in real–time.
12	Att hitta en nål i en höstack: Metoder och tekniker för att sålla och gradera stora mängder ostrukturerad textdata Pettersson, Emeli, Carlson, Albin January 2019 (has links) Big Data är i dagsläget ett populärt ämne som kan användas för en mängd olika syften. Bland annat kan det användas för att analysera data på webben i hopp om att identifiera brott mot mänskliga rättigheter. Genom att tillämpa tekniker inom områden som Artificiell Intelligens (AI), Information Retrieval (IR) samt data- visualisering, hoppas företaget Globalworks AB kunna identifiera röster vilka uttrycker sig om förtryck och kränkningar i social media. Artificiell intelligens och informationshämtning är dock breda områden och forskning som behandlar dem kan finnas långt tillbaka i tiden. Vi har därför valt att utföra en systematisk litteraturstudie i syfte att kartlägga existerande forskning inom dessa områden. Med en litterär sammanställning bistår vi med en ontologisk överblick i hur ett system som använder dessa tekniker är strukturerat, med vilka metoder och teknologier ett sådant system kan utvecklas, samt hur dessa kan kombineras. / Big Data is a popular topic these days which can be utilized for numerous purposes. It can, for instance, be used in order to analyse data made available online in hopes of identifying violations against human rights. By applying techniques within such areas as Artificial Intelligence (AI), Information Retrieval (IR), and Visual Analytics, the company Globalworks Ltd. aims to identify single voices in social media expressing grievances concerning such violations. Artificial Intelligence and Information Retrieval are broad topics however, and have been an active area of research for quite some time. We have therefore chosen to conduct a systematic literature review in hopes of mapping together existing research covering these areas. By presenting a literary compilation, we provide an ontological view of how an information system utilizing techniques within these areas could be structured, in addition to how such a system could deploy said techniques. Information Retrieval Artificial Intelligence Data pre-processing Data transformation Data scraping Machine learning Data visualisation Data storage Big data Insights generation Engineering and Technology Teknik och teknologier
13	A Framework for Fashion Data Gathering, Hierarchical-Annotation and Analysis for Social Media and Online Shop : TOOLKIT FOR DETAILED STYLE ANNOTATIONS FOR ENHANCED FASHION RECOMMENDATION Wara, Ummul January 2018 (has links) Due to the transformation of different recommendation system from contentbased to hybrid cross-domain-based, there is an urge to prepare a socialnetwork dataset which will provide sufficient data as well as detail-level annotation from a predefined hierarchical clothing category and attribute based vocabulary by considering user interactions. However, existing fashionbased datasets lack either in hierarchical-category based representation or user interactions of social network. The thesis intends to represent two datasets- one from photo-sharing platform Instagram which gathers fashionistas images with all possible user-interactions and another from online-shop Zalando with every cloths detail. We present a design of a customized crawler that enables the user to crawl data based on category or attributes. Moreover, an efficient and collaborative web-solution is designed and implemented to facilitate large-scale hierarchical category-based detaillevel annotation of Instagram data. By considering all user-interactions, the developed solution provides a detail-level annotation facility that reflects the user’s preference. The web-solution is evaluated by the team as well as the Amazon Turk Service. The annotated output from different users proofs the usability of the web-solution in terms of availability and clarity. In addition to data crawling and annotation web-solution development, this project analyzes the Instagram and Zalando data distribution in terms of cloth category, subcategory and pattern to provide meaningful insight over data. Researcher community will benefit by using these datasets if they intend to work on a rich annotated dataset that represents social network and resembles in-detail cloth information. / Med tanke på trenden inom forskning av rekommendationssystem, där allt fler rekommendationssystem blir hybrida och designade för flera domäner, så finns det ett behov att framställa en datamängd från sociala medier som innehåller detaljerad information om klädkategorier, klädattribut, samt användarinteraktioner. Nuvarande datasets med inriktning mot mode saknar antingen en hierarkisk kategoristruktur eller information om användarinteraktion från sociala nätverk. Detta projekt har syftet att ta fram två dataset, ett dataset som insamlats från fotodelningsplattformen Instagram, som innehåller foton, text och användarinteraktioner från fashionistas, samt ett dataset som insamlats från klädutbutdet som ges av onlinebutiken Zalando. Vi presenterar designen av en webbcrawler som är anpassad för att kunna hämta data från de nämnda domänerna och är optimiserad för mode och klädattribut. Vi presenterar även en effektiv webblösning som är designad och implementerad för att möjliggöra annotering av stora mängder data från Instagram med väldigt detaljerad information om kläder. Genom att vi inkluderar användarinteraktioner i applikationen så kan vår webblösning ge användaranpassad annotering av data. Webblösningen har utvärderats av utvecklarna samt genom AmazonTurk tjänsten. Den annoterade datan från olika användare demonstrerar användarvänligheten av webblösningen. Utöver insamling av data och utveckling av ett system för webb-baserad annotering av data så har datadistributionerna i två modedomäner, Instagram och Zalando, analyserats. Datadistributionerna analyserades utifrån klädkategorier och med syftet att ge datainsikter. Forskning inom detta område kan dra nytta av våra resultat och våra datasets. Specifikt så kan våra datasets användas i domäner som kräver information om detaljerad klädinformation och användarinteraktioner. Computer and Information Sciences Data- och informationsvetenskap
14	Comparision of Machine Learning Algorithms on Identifying Autism Spectrum Disorder Aravapalli, Naga Sai Gayathri, Palegar, Manoj Kumar January 2023 (has links) Background: Autism Spectrum Disorder (ASD) is a complex neurodevelopmen-tal disorder that affects social communication, behavior, and cognitive development.Patients with autism have a variety of difficulties, such as sensory impairments, at-tention issues, learning disabilities, mental health issues like anxiety and depression,as well as motor and learning issues. The World Health Organization (WHO) es-timates that one in 100 children have ASD. Although ASD cannot be completelytreated, early identification of its symptoms might lessen its impact. Early identifi-cation of ASD can significantly improve the outcome of interventions and therapies.So, it is important to identify the disorder early. Machine learning algorithms canhelp in predicting ASD. In this thesis, Support Vector Machine (SVM) and RandomForest (RF) are the algorithms used to predict ASD. Objectives: The main objective of this thesis is to build and train the models usingmachine learning(ML) algorithms with the default parameters and with the hyper-parameter tuning and find out the most accurate model based on the comparison oftwo experiments to predict whether a person is suffering from ASD or not. Methods: Experimentation is the method chosen to answer the research questions.Experimentation helped in finding out the most accurate model to predict ASD. Ex-perimentation is followed by data preparation with splitting of data and by applyingfeature selection to the dataset. After the experimentation followed by two exper-iments, the models were trained to find the performance metrics with the defaultparameters, and the models were trained to find the performance with the hyper-parameter tuning. Based on the comparison, the most accurate model was appliedto predict ASD. Results: In this thesis, we have chosen two algorithms SVM and RF algorithms totrain the models. Upon experimentation and training of the models using algorithmswith hyperparameter tuning. SVM obtained the highest accuracy score and f1 scoresfor test data are 96% and 97% compared to other model RF which helps in predictingASD. Conclusions: The models were trained using two ML algorithms SVM and RF andconducted two experiments, in experiment-1 the models were trained using defaultparameters and obtained accuracy, f1 scores for the test data, and in experiment-2the models were trained using hyper-parameter tuning and obtained the performancemetrics such as accuracy and f1 score for the test data. By comparing the perfor-mance metrics, we came to the conclusion that SVM is the most accurate algorithmfor predicting ASD. Autism Spectrum Disorder(ASD) Classification Data pre-processing Feature selection Machine learning algorithms Random Forest Classifier Support Vector Classifier. Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
15	Tracking domain knowledge based on segmented textual sources Kalledat, Tobias 11 May 2009 (has links) Die hier vorliegende Forschungsarbeit hat zum Ziel, Erkenntnisse über den Einfluss der Vorverarbeitung auf die Ergebnisse der Wissensgenerierung zu gewinnen und konkrete Handlungsempfehlungen für die geeignete Vorverarbeitung von Textkorpora in Text Data Mining (TDM) Vorhaben zu geben. Der Fokus liegt dabei auf der Extraktion und der Verfolgung von Konzepten innerhalb bestimmter Wissensdomänen mit Hilfe eines methodischen Ansatzes, der auf der waagerechten und senkrechten Segmentierung von Korpora basiert. Ergebnis sind zeitlich segmentierte Teilkorpora, welche die Persistenzeigenschaft der enthaltenen Terme widerspiegeln. Innerhalb jedes zeitlich segmentierten Teilkorpus können jeweils Cluster von Termen gebildet werden, wobei eines diejenigen Terme enthält, die bezogen auf das Gesamtkorpus nicht persistent sind und das andere Cluster diejenigen, die in allen zeitlichen Segmenten vorkommen. Auf Grundlage einfacher Häufigkeitsmaße kann gezeigt werden, dass allein die statistische Qualität eines einzelnen Korpus es erlaubt, die Vorverarbeitungsqualität zu messen. Vergleichskorpora sind nicht notwendig. Die Zeitreihen der Häufigkeitsmaße zeigen signifikante negative Korrelationen zwischen dem Cluster von Termen, die permanent auftreten, und demjenigen das die Terme enthält, die nicht persistent in allen zeitlichen Segmenten des Korpus vorkommen. Dies trifft ausschließlich auf das optimal vorverarbeitete Korpus zu und findet sich nicht in den anderen Test Sets, deren Vorverarbeitungsqualität gering war. Werden die häufigsten Terme unter Verwendung domänenspezifischer Taxonomien zu Konzepten gruppiert, zeigt sich eine signifikante negative Korrelation zwischen der Anzahl unterschiedlicher Terme pro Zeitsegment und den einer Taxonomie zugeordneten Termen. Dies trifft wiederum nur für das Korpus mit hoher Vorverarbeitungsqualität zu. Eine semantische Analyse auf einem mit Hilfe einer Schwellenwert basierenden TDM Methode aufbereiteten Datenbestand ergab signifikant unterschiedliche Resultate an generiertem Wissen, abhängig von der Qualität der Datenvorverarbeitung. Mit den in dieser Forschungsarbeit vorgestellten Methoden und Maßzahlen ist sowohl die Qualität der verwendeten Quellkorpora, als auch die Qualität der angewandten Taxonomien messbar. Basierend auf diesen Erkenntnissen werden Indikatoren für die Messung und Bewertung von Korpora und Taxonomien entwickelt sowie Empfehlungen für eine dem Ziel des nachfolgenden Analyseprozesses adäquate Vorverarbeitung gegeben. / The research work available here has the goal of analysing the influence of pre-processing on the results of the generation of knowledge and of giving concrete recommendations for action for suitable pre-processing of text corpora in TDM. The research introduced here focuses on the extraction and tracking of concepts within certain knowledge domains using an approach of horizontally (timeline) and vertically (persistence of terms) segmenting of corpora. The result is a set of segmented corpora according to the timeline. Within each timeline segment clusters of concepts can be built according to their persistence quality in relation to each single time-based corpus segment and to the whole corpus. Based on a simple frequency measure it can be shown that only the statistical quality of a single corpus allows measuring the pre-processing quality. It is not necessary to use comparison corpora. The time series of the frequency measure have significant negative correlations between the two clusters of concepts that occur permanently and others that vary within an optimal pre-processed corpus. This was found to be the opposite in every other test set that was pre-processed with lower quality. The most frequent terms were grouped into concepts by the use of domain-specific taxonomies. A significant negative correlation was found between the time series of different terms per yearly corpus segments and the terms assigned to taxonomy for corpora with high quality level of pre-processing. A semantic analysis based on a simple TDM method with significant frequency threshold measures resulted in significant different knowledge extracted from corpora with different qualities of pre-processing. With measures introduced in this research it is possible to measure the quality of applied taxonomy. Rules for the measuring of corpus as well as taxonomy quality were derived from these results and advice suggested for the appropriate level of pre-processing. Datenvorverarbeitung Text Data Mining Korpuskennzahlen Korpuslinguistik Computerlinguistik Vorverarbeitungsqualität Wissensextraktion Text Data Mining Corpus Measures Corpus Linguistics Computational Linguistics Data Pre-processing Pre-processing Quality Knowledge Extraction 330 Wirtschaft 17 Wirtschaft QP 345 ddc:330
16	Task Load Modelling for LTE Baseband Signal Processing with Artificial Neural Network Approach Wang, Lu January 2014 (has links) This thesis gives a research on developing an automatic or guided-automatic tool to predict the hardware (HW) resource occupation, namely task load, with respect to the software (SW) application algorithm parameters in an LTE base station. For the signal processing in an LTE base station it is important to get knowledge of how many HW resources will be used when applying a SW algorithm on a specic platform. The information is valuable for one to know the system and platform better, which can facilitate a reasonable use of the available resources. The process of developing the tool is considered to be the process of building a mathematical model between HW task load and SW parameters, where the process is dened as function approximation. According to the universal approximation theorem, the problem can be solved by an intelligent method called articial neural networks (ANNs). The theorem indicates that any function can be approximated with a two-layered neural network as long as the activation function and number of hidden neurons are proper. The thesis documents a work ow on building the model with the ANN method, as well as some research on data subset selection with mathematical methods, such as Partial Correlation and Sequential Searching as a data pre-processing step for the ANN approach. In order to make the data selection method suitable for ANNs, a modication has been made on Sequential Searching method, which gives a better result. The results show that it is possible to develop such a guided-automatic tool for prediction purposes in LTE baseband signal processing under specic precision constraints. Compared to other approaches, this model tool with intelligent approach has a higher precision level and a better adaptivity, meaning that it can be used in any part of the platform even though the transmission channels are dierent. / Denna avhandling utvecklar ett automatiskt eller ett guidat automatiskt verktyg for att forutsaga behov av hardvaruresurser, ocksa kallat uppgiftsbelastning, med avseende pa programvarans algoritmparametrar i en LTE basstation. I signalbehandling i en LTE basstation, ar det viktigt att fa kunskap om hur mycket av hardvarans resurser som kommer att tas i bruk nar en programvara ska koras pa en viss plattform. Informationen ar vardefull for nagon att forsta systemet och plattformen battre, vilket kan mojliggora en rimlig anvandning av tillgangliga resurser. Processen att utveckla verktyget anses vara processen att bygga en matematisk modell mellan hardvarans belastning och programvaruparametrarna, dar processen denieras som approximation av en funktion. Enligt den universella approximationssatsen, kan problemet losas genom en intelligent metod som kallas articiella neuronnat (ANN). Satsen visar att en godtycklig funktion kan approximeras med ett tva-skiktS neuralt natverk sa lange aktiveringsfunktionen och antalet dolda neuroner ar korrekt. Avhandlingen dokumenterar ett arbets- ode for att bygga modellen med ANN-metoden, samt studerar matematiska metoder for val av delmangder av data, sasom Partiell korrelation och sekventiell sokning som dataforbehandlingssteg for ANN. For att gora valet av uppgifter som lampar sig for ANN har en andring gjorts i den sekventiella sokmetoden, som ger battre resultat. Resultaten visar att det ar mojligt att utveckla ett sadant guidat automatiskt verktyg for prediktionsandamal i LTE basbandssignalbehandling under specika precisions begransningar. Jamfort med andra metoder, har dessa modellverktyg med intelligent tillvagagangssatt en hogre precisionsniva och battre adaptivitet, vilket innebar att den kan anvandas i godtycklig del av plattformen aven om overforingskanalerna ar olika. Automatic tool Signal Processing Function Approximation Prediction ANNs Data Pre-processing Task Load Prediction Automatiskt verktyg signalbehandling Funktionsanpassning Prediktion Articiella Neuronnat Dataforbehandling Annan elektroteknik och elektronik
17	Machine Learning Based Prediction and Classification for Uplift Modeling / Maskininlärningsbaserad prediktion och klassificering för inkrementell responsanalys Börthas, Lovisa, Krange Sjölander, Jessica January 2020 (has links) The desire to model the true gain from targeting an individual in marketing purposes has lead to the common use of uplift modeling. Uplift modeling requires the existence of a treatment group as well as a control group and the objective hence becomes estimating the difference between the success probabilities in the two groups. Efficient methods for estimating the probabilities in uplift models are statistical machine learning methods. In this project the different uplift modeling approaches Subtraction of Two Models, Modeling Uplift Directly and the Class Variable Transformation are investigated. The statistical machine learning methods applied are Random Forests and Neural Networks along with the standard method Logistic Regression. The data is collected from a well established retail company and the purpose of the project is thus to investigate which uplift modeling approach and statistical machine learning method that yields in the best performance given the data used in this project. The variable selection step was shown to be a crucial component in the modeling processes as so was the amount of control data in each data set. For the uplift to be successful, the method of choice should be either the Modeling Uplift Directly using Random Forests, or the Class Variable Transformation using Logistic Regression. Neural network - based approaches are sensitive to uneven class distributions and is hence not able to obtain stable models given the data used in this project. Furthermore, the Subtraction of Two Models did not perform well due to the fact that each model tended to focus too much on modeling the class in both data sets separately instead of modeling the difference between the class probabilities. The conclusion is hence to use an approach that models the uplift directly, and also to use a great amount of control data in each data set. / Behovet av att kunna modellera den verkliga vinsten av riktad marknadsföring har lett till den idag vanligt förekommande metoden inkrementell responsanalys. För att kunna utföra denna typ av metod krävs förekomsten av en existerande testgrupp samt kontrollgrupp och målet är således att beräkna differensen mellan de positiva utfallen i de två grupperna. Sannolikheten för de positiva utfallen för de två grupperna kan effektivt estimeras med statistiska maskininlärningsmetoder. De inkrementella responsanalysmetoderna som undersöks i detta projekt är subtraktion av två modeller, att modellera den inkrementella responsen direkt samt en klassvariabeltransformation. De statistiska maskininlärningsmetoderna som tillämpas är random forests och neurala nätverk samt standardmetoden logistisk regression. Datan är samlad från ett väletablerat detaljhandelsföretag och målet är därmed att undersöka vilken inkrementell responsanalysmetod och maskininlärningsmetod som presterar bäst givet datan i detta projekt. De mest avgörande aspekterna för att få ett bra resultat visade sig vara variabelselektionen och mängden kontrolldata i varje dataset. För att få ett lyckat resultat bör valet av maskininlärningsmetod vara random forests vilken används för att modellera den inkrementella responsen direkt, eller logistisk regression tillsammans med en klassvariabeltransformation. Neurala nätverksmetoder är känsliga för ojämna klassfördelningar och klarar därmed inte av att erhålla stabila modeller med den givna datan. Vidare presterade subtraktion av två modeller dåligt på grund av att var modell tenderade att fokusera för mycket på att modellera klassen i båda dataseten separat, istället för att modellera differensen mellan dem. Slutsatsen är således att en metod som modellerar den inkrementella responsen direkt samt en relativt stor kontrollgrupp är att föredra för att få ett stabilt resultat. Uplift Modeling Data Pre-Processing Predictive Modeling Incremental Modeling Random Forests Logistic Regression Neural Networks Ensemble Methods Machine Learning Multi-Layer Perceptron Inkrementell responsanalys databehandling prediktiv modellering random forests logistisk regression neurala nätverk mulit-layer perceptron ensemble metoder maskininlärning Mathematics Matematik

Page generated in 0.0715 seconds