Global ETD Search

231	Weiterentwicklung virtueller Inbetriebnahme-Modelle: Zustandsüberwachung und Ausbringungsprädiktion auf Basis eines prozessaktuellen virtuellen Maschinenzwillings Kramer-Pohlkötter, Fabian 17 December 2024 (has links) Im heutigen Produktionsumfeld erhöhen steigender Wettbewerbsdruck sowie Kundenanforderungen die Notwendigkeit, Terminpläne genau einzuhalten, speziell in der Phase vor dem Start der Produktion (SOP). Aus diesem Grund werden automatisierte Produktionssysteme in verschiedensten Industrien mittels virtueller Inbetriebnahme umfassend getestet, um Fehler bereits vor der realen Inbetriebnahme zu erkennen und somit Verzögerungen während der realen Inbetriebnahme zu vermeiden. Die Inbetriebnahmezeit kann dadurch um bis zu 30 % verkürzt werden. Grundlage der virtuellen Inbetriebnahme ist das virtuelle Inbetriebnahme-Modell. Während der realen Inbetriebnahme werden Geometrien, Schaltpläne und speicherprogrammierbare Steuerungen (SPS) durch Optimierungen verändert. Diese Änderungen fließen aber meist nicht in die Simulation ein, sodass nach erfolgter Inbetriebnahme die Simulation nicht mehr der realen Anlage entspricht. Infolgedessen bleibt das mögliche Potential des virtuellen Modells in der Betriebsphase von Anlagen ungenutzt. Das Ziel der vorliegenden Dissertation ist es, eine Vorgehensweise zu erarbeiten, mit der das virtuelle Inbetriebnahme-Modell zu einem prozessaktuellen virtuellen Maschinenzwilling weiterentwickelt werden kann. Dieser soll während der Betriebsphase dazu dienen, zukünftige Maschinenzustände nachzubilden und somit z. B. Ausbringungsreduktionen vorherzusagen. Dazu wird eine Methode entwickelt, die es ermöglicht, Alterungsphänomene von Teilprozessen im Maschinenzwilling zu simulieren, deren Auswirkungen darzustellen und daraus Erkenntnisse für Produktion und Instandhaltung abzuleiten. Die Kombination von realen und simulierten Erkenntnissen erzeugt einen hybriden Vorhersageansatz als Werkzeug für die Instandhaltung. Anschließend wird die Vorgehensweise zur Weiterentwicklung sowie die Methode zur Simulation von Alterungseffekten an einem realen Anwendungsfall demonstriert und validiert. Dabei handelt es sich um einen Engpassprozess in der Produktion der neuesten Generation von Elektromotoren der BMW AG. Unvorhergesehene Ausbringungsverringerungen werden auf diese Weise minimiert und die Effektivität der Gesamtanlage bleibt erhalten. Die Produktion bleibt damit in der Lage, ihre Produkte zur richtigen Zeit, in der richtigen Qualität und Quantität, und dem richtigen Preis an den Kunden auszuliefern. / In today’s production environment, increased competition, and growing customer demands have made it ever more important to meet schedules, especially in the phase before the start of production (SOP). For this reason, automated production systems in various industries are thoroughly tested by using virtual commissioning in order to detect and eliminate errors at an early stage – before the real commissioning on site. This helps to avoid delays and can reduce commissioning times by up to 30 %. The basis of the virtual commissioning is the virtual model. During real commissioning, geometries, electrical plans (E-plans) and programmable logic controllers (PLC) are optimized and modified. However, these changes usually remain unconsidered as not being updated in the simulation, resulting in a model that no longer reflects the real plant. As a consequence, the virtual model is no longer used in the production phase. The aim and objective of this dissertation is to create a general procedure that allows the further development of the virtual commissioning model into a process-actual virtual machine twin. This virtual machine twin will serve during production to emulate future machine states in order to predict for example output reductions. To this end, a method-ology will be defined to simulate aging effects of critical processes and to determine the impact on the overall process. The results are used to derive recommendations for production and maintenance. The combination of real and simulated outcomes establishes a so-called 'Hybrid Predictive Maintenance System'. Ultimately, the general procedure for the further development and the methodology for simulating aging effects will be demonstrated and validated on a critical process in the electric engines production of BMW AG. As a result, unexpected downtimes are minimized, and the overall equipment effectiveness (OEE) is ensured. This enables the production to deliver products at the right time, in the right quality and quantity, and at the right price. info:eu-repo/classification/ddc/620 ddc:620
232	Tillförlitlighet hos Big Social Data : En fallstudie om upplevd problematik kopplat till beslutfattande i en organisationskontext Rangnitt, Eric, Wiljander, Louise January 2020 (has links) Den växande globala användningen av sociala medier skapar enorma mängder social data online, kallat för Big Social Data (BSD). Tidigare forskning lyfter problem med att BSD ofta har bristande tillförlitlighet som underlag vid beslutsfattande och att det är starkt kopplat till dataoch informationskvalitet. Det finns dock en avsaknad av forskning som redogör för praktikers perspektiv på detta. Därför undersökte denna studie vad som upplevs problematiskt kring transformation av BSD till tillförlitlig information för beslutsfattande i en organisationskontext, samt hur detta skiljer sig i teori jämfört med praktik. En fallstudie gjordes av mjukvaruföretaget SAS Institute (SAS). Datainsamlingen genomfördes via intervjuer samt insamling av dokument och resultaten analyserades kvalitativt. Studien gjorde många intressanta fynd gällande upplevda problem kopplat till transformation av BSD, bl.a. hög risk för partisk data och låg analysmognad, samt flera skillnader mellan teori och praktik. Tidigare forskning gör inte heller skillnad mellan begreppen datakvalitet och informationskvalitet, vilket görs i praktiken. / The growing use of social media generates enormous amounts of online social data, called Big Social Data (BSD). Previous research highlights problems with BSD reliability related to decision making, and that reliability is strongly connected to data quality and information quality. However, there is a lack of research with a focus on practitioners’ perspectives on this matter. To address this gap, this study set out to investigate what is perceived as a problem when transforming BSD into reliable information for decision making in an organisational context, and also how this differs in theory compared with practice. A case study was conducted of the software company SAS Institute (SAS). Data collection was done through interviews and gathering of documents, and results were analysed qualitatively. The study resulted in many interesting findings regarding perceived problems connected to the transformation of BSD, e.g. high risk of biased data and low maturity regarding data analysis, as well as several differences between theory and practice. Furthermore, previous research makes no distinction between the terms data quality and information quality, but this is done in practice. Social media Data analysis Big Social Data Social Media Analytics Big Social Data Analytics Data quality Information quality Reliability Veracity Decision making Sociala medier Dataanalys Big Social Data Social Media Analytics Big Social Data Analytics Datakvalitet Informationskvalitet Tillförlitlighet Veracity Beslutsfattande Information Systems, Social aspects Information Systems
233	透過Spark平台實現大數據分析與建模的比較：以微博為例 / Accomplish Big Data Analytic and Modeling Comparison on Spark: Weibo as an Example 潘宗哲, Pan, Zong Jhe Unknown Date (has links) 資料的快速增長與變化以及分析工具日新月異，增加資料分析的挑戰，本研究希望透過一個完整機器學習流程，提供學術或企業在導入大數據分析時的參考藍圖。我們以Spark作為大數據分析的計算框架，利用MLlib的Spark.ml與Spark.mllib兩個套件建構機器學習模型，解決傳統資料分析時可能會遇到的問題。在資料分析過程中會比較Spark不同分析模組的適用性情境，首先使用本地端叢集進行開發，最後提交至Amazon雲端叢集加快建模與分析的效能。大數據資料分析流程將以微博為實驗範例，並使用香港大學新聞與傳媒研究中心提供的2012年大陸微博資料集，我們採用RDD、Spark SQL與GraphX萃取微博使用者貼文資料的特增值，並以隨機森林建構預測模型，來預測使用者是否具有官方認證的二元分類。 / The rapid growth of data volume and advanced data analytics tools dramatically increase the challenge of big data analytics services adoption. This paper presents a big data analytics pipeline referenced blueprint for academic and company when they consider importing the associated services. We propose to use Apache Spark as a big data computing framework, which Spark MLlib contains two packages Spark.ml and Spark.mllib, on building a machine learning model. This resolves the traditional data analytics problem. In this big data analytics pipeline, we address a situation for adopting suitable Spark modules. We first use local cluster to develop our data analytics project following the jobs submitted to AWS EC2 clusters to accelerate analytic performance. We demonstrate the proposed big data analytics blueprint by using 2012 Weibo datasets. Finally, we use Spark SQL and GraphX to extract information features from large amount of the Weibo users’ posts. The official certification prediction model is constructed for Weibo users through Random Forest algorithm. 大數據分析機器學習微博分析流程亞馬遜雲端服務 Big data analytics machine learning Weibo analytics pipeline Amazon EC2
234	Security Analytics: Using Deep Learning to Detect Cyber Attacks Lambert, Glenn M, II 01 January 2017 (has links) Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities. The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area. Thesis University of North Florida UNF Big Data Analytics Security Analytics Deep Learning Cyber Attacks Artificial Intelligence and Robotics Information Security
235	Reliable Information Exchange in IIoT : Investigation into the Role of Data and Data-Driven Modelling Lavassani, Mehrzad January 2018 (has links) The concept of Industrial Internet of Things (IIoT) is the tangible building block for the realisation of the fourth industrial revolution. It should improve productivity, efficiency and reliability of industrial automation systems, leading to revenue growth in industrial scenarios. IIoT needs to encompass various disciplines and technologies to constitute an operable and harmonious system. One essential requirement for a system to exhibit such behaviour is reliable exchange of information. In industrial automation, the information life-cycle starts at the field level, with data collected by sensors, and ends at the enterprise level, where that data is processed into knowledge for business decision making. In IIoT, the process of knowledge discovery is expected to start in the lower layers of the automation hierarchy, and to cover the data exchange between the connected smart objects to perform collaborative tasks. This thesis aims to assist the comprehension of the processes for information exchange in IIoT-enabled industrial automation- in particular, how reliable exchange of information can be performed by communication systems at field level given an underlying wireless sensor technology, and how data analytics can complement the processes of various levels of the automation hierarchy. Furthermore, this work explores how an IIoT monitoring system can be designed and developed. The communication reliability is addressed by proposing a redundancy-based medium access control protocol for mission-critical applications, and analysing its performance regarding real-time and deterministic delivery. The importance of the data and the benefits of data analytics for various levels of the automation hierarchy are examined by suggesting data-driven methods for visualisation, centralised system modelling and distributed data streams modelling. The design and development of an IIoT monitoring system are addressed by proposing a novel three-layer framework that incorporates wireless sensor, fog, and cloud technologies. Moreover, an IIoT testbed system is developed to realise the proposed framework. The outcome of this study suggests that redundancy-based mechanisms improve communication reliability. However, they can also introduce drawbacks, such as poor link utilisation and limited scalability, in the context of IIoT. Data-driven methods result in enhanced readability of visualisation, and reduced necessity of the ground truth in system modelling. The results illustrate that distributed modelling can lower the negative effect of the redundancy-based mechanisms on link utilisation, by reducing the up-link traffic. Mathematical analysis reveals that introducing fog layer in the IIoT framework removes the single point of failure and enhances scalability, while meeting the latency requirements of the monitoring application. Finally, the experiment results show that the IIoT testbed works adequately and can serve for the future development and deployment of IIoT applications. / SMART (Smarta system och tjänster för ett effektivt och innovativt samhälle) Industrial Internet of Things Industrial Automation Data Analytics Data-Driven Modelling Distributed Modelling Industrial Wireless Sensor Networks Wireless Sensor Networks Industrial Monitoring Framework Reliability Real-Time Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Communication Systems Kommunikationssystem
236	Properties of binary oxides:a DFT study Miroshnichenko, O. (Olga) 14 June 2019 (has links) Abstract Titanium dioxide nanoparticles are used in an enormous amount of applications. Their properties are different from bulk TiO₂ and are affected by adsorbates that are unavoidably present on the surface. In this thesis, the effect of OH and SO₄ groups (the adsorbants present on the surface during manufacturing) on the properties of anatase-structured TiO₂ nanoparticles is studied. It was found that the above mentioned groups change both the geometric and electronic structure of nanoparticles, resulting in changes in the photoabsorption spectrum. Bader charges are calculated using electron density from Density Functional Theory calculations. They can be used for determination of the oxidation state of the atom. The relation between computed partial charges and oxidation states for binary oxides using data from open materials database has been demonstrated in this work using a linear regression. The applicability of the oxidation state determination by Bader charges for mixed valence compounds and surfaces is considered. / Tiivistelmä Titaanidioksidinanopartikkeleita käytetään lukuisissa sovelluksissa. Niiden ominaisuudet poikkeavat kiinteän TiO₂:n ominaisuuksista, ja niihin vaikuttavat pinnalle väistämättä absorboituvat aineet. Tässä työssä on tutkittu OH- ja SO₄-ryhmien vaikutusta anataasirakenteisten TiO₂-nanopartikkelien ominaisuuksiin. Tällaisia ryhmiä esiintyy yleisesti nanopartikkelien pinnalla valmistusprosessien aikana. Työssä havaittiin, että nämä ryhmät muuttavat nanopartikkelien rakenteellisia ja sähköisiä ominaisuuksia, ja siten vaikuttavat myös fotoabsorptiospektriin. Baderin varaukset voidaan laskea käyttäen tiheysfunktionaaliteoriaan perustuvista laskuista saatavaa elektronitiheyttä. Niitä voidaan käyttää atomin hapetustilan laskemiseen. Tässä työssä on osoitettu, että binääristen oksidien tapauksessa laskettujen osittaisvarauksien ja hapetustilan välillä on yhteys. Tämä yhteys voitiin osoittaa käyttämällä lineaarista regressiota. Työssä tarkastellaan myös menetelmän soveltuvuutta hapetustilojen määrittämiseen sekavalenssiyhdisteille ja pinnoille. / Original papers Original publications are not included in the electronic version of the dissertation. Miroshnichenko O., Auvinen S., & Alatalo M. (2015). A DFT study of the effect of OH groups on the optical, electronic, and structural properties of TiO₂ nanoparticles. Phys. Chem. Chem. Phys., 17, 5321–5327. https://doi.org/10.1039/c4cp02789b Miroshnichenko O., Posysaev S., & Alatalo M. (2016). A DFT study of the effect of SO4 groups on the properties of TiO₂ nanoparticles. Phys. Chem. Chem. Phys., 18, 33068–33076. https://doi.org/10.1039/c6cp05681d http://jultika.oulu.fi/Record/nbnfi-fe201707037608 Posysaev S., Miroshnichenko O., Alatalo M., Le D., & Rahman T.S. (2019). Oxidation states of binary oxides from data analytics of the electronic structure. Comput. Mater. Sci., 161, 403–414. https://doi.org/10.1016/j.commatsci.2019.01.046 Bader charge data analytics density functional theory electronic structure nanoparticles open materials databases oxidation state photoabsorption titanium dioxide Baderin varaus avoimet materiaalitietokannat data-analytiikka elektronirakenne fotoabsorptio hapetustila nanopartikkelit tiheysfunktionaaliteoria titaanidioksidi
237	Feedback-Driven Data Clustering Hahmann, Martin 28 February 2014 (has links) (PDF) The acquisition of data and its analysis has become a common yet critical task in many areas of modern economy and research. Unfortunately, the ever-increasing scale of datasets has long outgrown the capacities and abilities humans can muster to extract information from them and gain new knowledge. For this reason, research areas like data mining and knowledge discovery steadily gain importance. The algorithms they provide for the extraction of knowledge are mandatory prerequisites that enable people to analyze large amounts of information. Among the approaches offered by these areas, clustering is one of the most fundamental. By finding groups of similar objects inside the data, it aims to identify meaningful structures that constitute new knowledge. Clustering results are also often used as input for other analysis techniques like classification or forecasting. As clustering extracts new and unknown knowledge, it obviously has no access to any form of ground truth. For this reason, clustering results have a hypothetical character and must be interpreted with respect to the application domain. This makes clustering very challenging and leads to an extensive and diverse landscape of available algorithms. Most of these are expert tools that are tailored to a single narrowly defined application scenario. Over the years, this specialization has become a major trend that arose to counter the inherent uncertainty of clustering by including as much domain specifics as possible into algorithms. While customized methods often improve result quality, they become more and more complicated to handle and lose versatility. This creates a dilemma especially for amateur users whose numbers are increasing as clustering is applied in more and more domains. While an abundance of tools is offered, guidance is severely lacking and users are left alone with critical tasks like algorithm selection, parameter configuration and the interpretation and adjustment of results. This thesis aims to solve this dilemma by structuring and integrating the necessary steps of clustering into a guided and feedback-driven process. In doing so, users are provided with a default modus operandi for the application of clustering. Two main components constitute the core of said process: the algorithm management and the visual-interactive interface. Algorithm management handles all aspects of actual clustering creation and the involved methods. It employs a modular approach for algorithm description that allows users to understand, design, and compare clustering techniques with the help of building blocks. In addition, algorithm management offers facilities for the integration of multiple clusterings of the same dataset into an improved solution. New approaches based on ensemble clustering not only allow the utilization of different clustering techniques, but also ease their application by acting as an abstraction layer that unifies individual parameters. Finally, this component provides a multi-level interface that structures all available control options and provides the docking points for user interaction. The visual-interactive interface supports users during result interpretation and adjustment. For this, the defining characteristics of a clustering are communicated via a hybrid visualization. In contrast to traditional data-driven visualizations that tend to become overloaded and unusable with increasing volume/dimensionality of data, this novel approach communicates the abstract aspects of cluster composition and relations between clusters. This aspect orientation allows the use of easy-to-understand visual components and makes the visualization immune to scale related effects of the underlying data. This visual communication is attuned to a compact and universally valid set of high-level feedback that allows the modification of clustering results. Instead of technical parameters that indirectly cause changes in the whole clustering by influencing its creation process, users can employ simple commands like merge or split to directly adjust clusters. The orchestrated cooperation of these two main components creates a modus operandi, in which clusterings are no longer created and disposed as a whole until a satisfying result is obtained. Instead, users apply the feedback-driven process to iteratively refine an initial solution. Performance and usability of the proposed approach were evaluated with a user study. Its results show that the feedback-driven process enabled amateur users to easily create satisfying clustering results even from different and not optimal starting situations. Data Mining Datenanalyse Clustering Clustering Aggregation Algorithmen Visualisierung Interaktive Nutzerschnittstelle Data Mining Data Analytics Clustering Ensemble Clustering Data Visualization Interactive Interface Feedback-driven Analytics ddc:004 rvk:ST 530 rvk:ST 270
238	Desafios e oportunidades para a Fundação Seade: sua transformação e adaptação ao complexo e dinâmico ambiente das estatísticas oficiais Leonardo, Fabrizio Clares, Calais, Gilson de Oliveira Silva, Coppede Junior, Wagner 01 December 2017 (has links) Submitted by Gilson de Oliveira Silva Calais (gilcalais@terra.com.br) on 2017-12-18T17:32:40Z No. of bitstreams: 1 MPGPP - Trabalho FInal - SEADE - 01-12-2017.pdf: 2421044 bytes, checksum: c213c0a77d4d53cd5e4ce850af4db68a (MD5) / Approved for entry into archive by Mayara Costa de Sousa (mayara.sousa@fgv.br) on 2017-12-18T22:56:16Z (GMT) No. of bitstreams: 1 MPGPP - Trabalho FInal - SEADE - 01-12-2017.pdf: 2421044 bytes, checksum: c213c0a77d4d53cd5e4ce850af4db68a (MD5) / Made available in DSpace on 2017-12-19T11:39:42Z (GMT). No. of bitstreams: 1 MPGPP - Trabalho FInal - SEADE - 01-12-2017.pdf: 2421044 bytes, checksum: c213c0a77d4d53cd5e4ce850af4db68a (MD5) Previous issue date: 2017-12-01 / Elaborado ao longo de 2017, o presente estudo tem por objetivo a análise organizacional da Fundação Sistema Estadual de Análise de Dados (SEADE), entidade de direito público vinculada à Secretaria Estadual de Planejamento e Gestão, responsável pela produção e disseminação de análises e estatísticas socioeconômicas e demográficas do Estado de São Paulo. Frente ao atual contexto de mudanças do setor, devido aos impactos das novas tecnologias e, em especial, aos efeitos do Big Data Analytics, a gestão focada em processos e uma nova estrutura organizacional, elaborada com base em melhores práticas e em modelagem de processos de referência internacional, mostram-se fundamentais para assegurar os investimentos necessários para manter sua capacidade de produzir e disseminar informações estatísticas em alto nível e adequadas às necessidades e expectativas dos usuários. Para conciliar harmonicamente esses objetivos, recomenda-se um conjunto de ações, de caráter estratégico, que, além de objetivar melhor atendimento àquilo que lhe é prioritário, também implique em mais acessos por parte dos usuários, através de sua principal ferramenta de comunicação com o mercado. Tais achados e recomendações baseiam-se em uma revisão do modelo de negócio subscrito na configuração de seus processos operacionais e de apoio, no arranjo organizacional e na forma de sua comunicação com os usuários, além de responder às crescentes demandas por maior eficiência e transparência na gestão dos recursos públicos. / Elaborated in 2017, the present study aims the organizational analysis of the Fundação Sistema Estadual de Análise de Dados (SEADE), an entity of public law linked to the State Department of Planning and Management, responsible for the production and dissemination of analyzes and socioeconomic and demographic statistics of the State of São Paulo. Given the current context of changes in the industry, due to the impacts of new technologies and the effects of Big Data Analytics, the process-focused management and a new organizational structure, based on best practices and processes modeling of international reference, are essential to ensure the necessary investments to maintain their capacity to produce and disseminate statistical information at a high level and adapted to the needs and expectations of the users. To harmoniously reconcile these objectives, it is recommended a set of actions, of strategic nature, which, in addition to objectifying better attendance to its priority, also implies more access by the users through its main communication tool with the market. These findings and recommendations are based on a review of the business model underwritten in the configuration of its operational and support processes, in the organizational arrangement and in the form of its communication with the users, in addition to responding to the growing demands for greater efficiency and transparency in the management of public resources. Informação estatística Diagnóstico organizacional Análises de big data Fundação SEADE Statistical information Organizational diagnosis Big data analytics SEADE Foundation Administração pública Indicadores estatísticos Gestão do conhecimento Administração - Métodos estatísticos
239	Interprétation sémantique d'images hyperspectrales basée sur la réduction adaptative de dimensionnalité / Semantic interpretation of hyperspectral images based on the adaptative reduction of dimensionality Sellami, Akrem 11 December 2017 (has links) L'imagerie hyperspectrale permet d'acquérir des informations spectrales riches d'une scène dans plusieurs centaines, voire milliers de bandes spectrales étroites et contiguës. Cependant, avec le nombre élevé de bandes spectrales, la forte corrélation inter-bandes spectrales et la redondance de l'information spectro-spatiale, l'interprétation de ces données hyperspectrales massives est l'un des défis majeurs pour la communauté scientifique de la télédétection. Dans ce contexte, le grand défi posé est la réduction du nombre de bandes spectrales inutiles, c'est-à-dire de réduire la redondance et la forte corrélation de bandes spectrales tout en préservant l'information pertinente. Par conséquent, des approches de projection visent à transformer les données hyperspectrales dans un sous-espace réduit en combinant toutes les bandes spectrales originales. En outre, des approches de sélection de bandes tentent à chercher un sous-ensemble de bandes spectrales pertinentes. Dans cette thèse, nous nous intéressons d'abord à la classification d'imagerie hyperspectrale en essayant d'intégrer l'information spectro-spatiale dans la réduction de dimensions pour améliorer la performance de la classification et s'affranchir de la perte de l'information spatiale dans les approches de projection. De ce fait, nous proposons un modèle hybride permettant de préserver l'information spectro-spatiale en exploitant les tenseurs dans l'approche de projection préservant la localité (TLPP) et d'utiliser l'approche de sélection non supervisée de bandes spectrales discriminantes à base de contraintes (CBS). Pour modéliser l'incertitude et l'imperfection entachant ces approches de réduction et les classifieurs, nous proposons une approche évidentielle basée sur la théorie de Dempster-Shafer (DST). Dans un second temps, nous essayons d'étendre le modèle hybride en exploitant des connaissances sémantiques extraites à travers les caractéristiques obtenues par l'approche proposée auparavant TLPP pour enrichir la sélection non supervisée CBS. En effet, l'approche proposée permet de sélectionner des bandes spectrales pertinentes qui sont à la fois informatives, discriminantes, distinctives et peu redondantes. En outre, cette approche sélectionne les bandes discriminantes et distinctives en utilisant la technique de CBS en injectant la sémantique extraite par les techniques d'extraction de connaissances afin de sélectionner d'une manière automatique et adaptative le sous-ensemble optimal de bandes spectrales pertinentes. La performance de notre approche est évaluée en utilisant plusieurs jeux des données hyperspectrales réelles. / Hyperspectral imagery allows to acquire a rich spectral information of a scene in several hundred or even thousands of narrow and contiguous spectral bands. However, with the high number of spectral bands, the strong inter-bands spectral correlation and the redundancy of spectro-spatial information, the interpretation of these massive hyperspectral data is one of the major challenges for the remote sensing scientific community. In this context, the major challenge is to reduce the number of unnecessary spectral bands, that is, to reduce the redundancy and high correlation of spectral bands while preserving the relevant information. Therefore, projection approaches aim to transform the hyperspectral data into a reduced subspace by combining all original spectral bands. In addition, band selection approaches attempt to find a subset of relevant spectral bands. In this thesis, firstly we focus on hyperspectral images classification attempting to integrate the spectro-spatial information into dimension reduction in order to improve the classification performance and to overcome the loss of spatial information in projection approaches.Therefore, we propose a hybrid model to preserve the spectro-spatial information exploiting the tensor model in the locality preserving projection approach (TLPP) and to use the constraint band selection (CBS) as unsupervised approach to select the discriminant spectral bands. To model the uncertainty and imperfection of these reduction approaches and classifiers, we propose an evidential approach based on the Dempster-Shafer Theory (DST). In the second step, we try to extend the hybrid model by exploiting the semantic knowledge extracted through the features obtained by the previously proposed approach TLPP to enrich the CBS technique. Indeed, the proposed approach makes it possible to select a relevant spectral bands which are at the same time informative, discriminant, distinctive and not very redundant. In fact, this approach selects the discriminant and distinctive spectral bands using the CBS technique injecting the extracted rules obtained with knowledge extraction techniques to automatically and adaptively select the optimal subset of relevant spectral bands. The performance of our approach is evaluated using several real hyperspectral data. Réduction de dimension Apprentissage automatique Analyse des données Imagerie hyperspectrale Algèbre multi-Linéaire Sélection de bandes Extraction des caractéristiques Interprétation sémantique Dimensionality reduction Machine learning Data analytics Hyperspectral imagery Multi-Linear algebra Band selection Feature extraction Semantic interpretation 004
240	Um framework de testes unitários para procedimentos de carga em ambientes de business intelligence Santos, Igor Peterson Oliveira 30 August 2016 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Business Intelligence (BI) relies on Data Warehouse (DW), a historical data repository designed to support the decision making process. Despite the potential benefits of a DW, data quality issues prevent users from realizing the benefits of a BI environment and Data Analytics. Problems related to data quality can arise in any stage of the ETL (Extract, Transform and Load) process, especially in the loading phase. This thesis presents an approach to automate the selection and execution of previously identified test cases for loading procedures in BI environments and Data Analytics based on DW. To verify and validate the approach, a unit test framework was developed. The overall goal is achieve data quality improvement. The specific aim is reduce test effort and, consequently, promote test activities in DW process. The experimental evaluation was performed by two controlled experiments in the industry. The first one was carried out to investigate the adequacy of the proposed method for DW procedures development. The Second one was carried out to investigate the adequacy of the proposed method against a generic framework for DW procedures development. Both results showed that our approach clearly reduces test effort and coding errors during the testing phase in decision support environments. / A qualidade de um produto de software está diretamente relacionada com os testes empregados durante o seu desenvolvimento. Embora os processos de testes para softwares aplicativos e sistemas transacionais já apresentem um alto grau de maturidade, estes devem ser investigados para os processos de testes em um ambiente de Business Intelligence (BI) e Data Analytics. As diferenças deste ambiente em relação aos demais tipos de sistemas fazem com que os processos e ferramentas de testes existentes precisem ser ajustados a uma nova realidade. Neste contexto, grande parte das aplicações de Business Intelligence (BI) efetivas depende de um Data Warehouse (DW), um repositório histórico de dados projetado para dar suporte a processos de tomada de decisão. São as cargas de dados para o DW que merecem atenção especial relativa aos testes, por englobar procedimentos críticos em relação à qualidade. Este trabalho propõe uma abordagem de testes, baseada em um framework de testes unitários, para procedimentos de carga em um ambiente de BI e Data Analytics. O framework proposto, com base em metadados sobre as rotinas de carga, realiza a execução automática de casos de testes, por meio da geração de estados iniciais e a análise dos estados finais, bem como seleciona os casos de testes a serem aplicados. O objetivo é melhorar a qualidade dos procedimentos de carga de dados e reduzir o tempo empregado no processo de testes. A avaliação experimental foi realizada através de dois experimentos controlados executados na indústria. O primeiro avaliou a utilização de casos de testes para as rotinas de carga, comparando a efetividade do framework com uma abordagem manual. O segundo experimento efetuou uma comparação com um framework genérico e similar do mercado. Os resultados indicaram que o framework pode contribuir para o aumento da produtividade e redução dos erros de codificação durante a fase de testes em ambientes de suporte à decisão. Computação Programas de computador Framework Engenharia de software Software -- Testes Testes de software Engenharia de software experimental Business Intelligence Data Warehouse Data Analytics Software testing Experimental software engineering

Search results