Global ETD Search

211	Automated Reasoning Support for Invasive Interactive Parallelization Moshir Moghaddam, Kianosh January 2012 (has links) To parallelize a sequential source code, a parallelization strategy must be defined that transforms the sequential source code into an equivalent parallel version. Since parallelizing compilers can sometimes transform sequential loops and other well-structured codes into parallel ones automatically, we are interested in finding a solution to parallelize semi-automatically codes that compilers are not able to parallelize automatically, mostly because of weakness of classical data and control dependence analysis, in order to simplify the process of transforming the codes for programmers.Invasive Interactive Parallelization (IIP) hypothesizes that by using anintelligent system that guides the user through an interactive process one can boost parallelization in the above direction. The intelligent system's guidance relies on a classical code analysis and pre-defined parallelizing transformation sequences. To support its main hypothesis, IIP suggests to encode parallelizing transformation sequences in terms of IIP parallelization strategies that dictate default ways to parallelize various code patterns by using facts which have been obtained both from classical source code analysis and directly from the user.In this project, we investigate how automated reasoning can supportthe IIP method in order to parallelize a sequential code with an acceptable performance but faster than manual parallelization. We have looked at two special problem areas: Divide and conquer algorithms and loops in the source codes. Our focus is on parallelizing four sequential legacy C programs such as: Quick sort, Merge sort, Jacobi method and Matrix multipliation and summation for both OpenMP and MPI environment by developing an interactive parallelizing assistance tool that provides users with the assistanceneeded for parallelizing a sequential source code. Multi-processors Dependence Analysis Code parallelization Semi-automatic parallelization IIP ISC OpenMP MPI Artificial Intelligence Reasoning Decision Tree Divide and Conquer (D&C) algorithms
212	預測模型的遺失值處理─選值順序的研究 / Handling Missing Values in Predictive Model - Research of the Order of Data Acquisition 黃秋芸, Huang, Chiu Yun Unknown Date (has links) 商業知識的發展突飛猛進，其中，預測模型在眾多商業智慧中扮演重要的角色，然而，當我們從大量資料萃取隱藏、未知與潛在具有實用性的資訊處理過程時，往往會遇到許多資料品質上的問題而難以著手分析，尤其是遺失值 (Missing value)的問題在資料前置處理階段更是常見的困難。因此，要如何在建立預測模型時有效的處理遺失值是一個很重要的議題。過去已有許多文獻致力於遺失值處理的議題，其中，Active Feature-Value Acquisition的相關研究更針對訓練資料的選填順序深入探討。Active Feature-Value Acquisition的概念是從具有遺失值的訓練資料中，選擇適當的遺失資料填補，讓預測的模型在最具效率的情況下達到理想的準確率。本研究將延續Active Feature-Value Acquisition的研究主軸，優先考量決策樹上的節點為遺失值選值填補的順序，提出一個新的訓練資料遺失值的選填順序方法─I Sampling，並透過實際的數據進行訓練與測試，同時我們也與過去文獻所提出的方法進行比較，了解不同的填值偵測與順序的選擇對於一個預測模型的分類準確率是否有影響，並了解各個方法的優缺點與在不同情境下的適用性。本研究所提出的新方法與驗證的結果，將可給予未來從事預測行為的管理或學術工作一些參考與建議，可以依據不同性質的資料採取合宜的選值方式，以節省取值的成本並提高預測模型的分類能力。 / The importance of business intelligence is accelerated developing nowadays. Especially predictive models play a key role in numerous business intelligence tasks. However, while we extract information from unidentified data, there are critical problems of how to handle the missing values, especially in the data pre-processing phase. Therefore, it is important to identify which methods best deal with the missing data when building predictive models. There are several papers dedicated in the research of strategies to deal with the missing values. The topic of Active-Feature Acquisition (aka. AFA) especially worked on the priority order of choosing which feature-value to acquire. The goal of AFA is to reduce the costs of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. Followed by the AFA concept, we present an approach- I Sampling, in which feature-values are selected for acquisition based on the attribute on the top node of the current decision tree. Also we compare our approach with other methods in different situations and data missing patterns. Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to alternative policies in some situations. The method we proposed can aid the further predictive works in academic and business area. They can therefore choose the right method based on their needs and obtain the informative data in an efficient way. 預測模型遺失值 Active Feature-value Acquisition 決策樹 Predictive Model Missing Value Active Feature-value Acquisition Decision Tree
213	An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. Smith Smith, Eugene Herbie January 2009 (has links) Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch. / Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010. Financial industry Data mining Management science techniques Clustering analysis Data envelopment analysis Decision tree induction Homogeneity Positivistic research Quantitative analysis Interpretative research Qualitative analysis
214	An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. Smith Smith, Eugene Herbie January 2009 (has links) Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch. / Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010. Financial industry Data mining Management science techniques Clustering analysis Data envelopment analysis Decision tree induction Homogeneity Positivistic research Quantitative analysis Interpretative research Qualitative analysis
215	Privacy preservation for training datasets in database: application to decision tree learning Fong, Pui Kuen 15 December 2008 (has links) Privacy preservation is important for machine learning and datamining, but measures designed to protect private information sometimes result in a trade off: reduced utility of the training samples. This thesis introduces a privacy preserving approach that can be applied to decision-tree learning, without concomitant loss of accuracy. It describes an approach to the preservation of privacy of collected data samples in cases when information of the sample database has been partially lost. This approach converts the original sample datasets into a group of unreal datasets, where an original sample cannot be reconstructed without the entire group of unreal datasets. This approach does not perform well for sample datasets with low frequency, or when there is low variance in the distribution of all samples. However, this problem can be solved through a modified implementation of the approach introduced later in this thesis, by using some extra storage. Database Data-mining Decision-tree Privacy Preservation Mechine-learning ID3 Data Perturbation Entropy Data Complementation Dataset Accuracy
216	Decision Tree Classification Of Multi-temporal Images For Field-based Crop Mapping Sencan, Secil 01 August 2004 (has links) (PDF) ABSTRACT DECISION TREE CLASSIFICATION OF MULTI-TEMPORAL IMAGES FOR FIELD-BASED CROP MAPPING Sencan, Se&ccedil / il M. Sc., Department of Geodetic and Geographic Information Technologies Supervisor: Assist. Prof. Dr. Mustafa T&uuml / rker August 2004, 125 pages A decision tree (DT) classification approach was used to identify summer (August) crop types in an agricultural area near Karacabey (Bursa), Turkey from multi-temporal images. For the analysis, Landsat 7 ETM+ images acquired in May, July, and August 2000 were used. In addition to the original bands, NDVI, PCA, and Tasselled Cap Transformation bands were also generated and included in the classification procedure. Initially, the images were classified on a per-pixel basis using the multi-temporal masking technique together with the DT approach. Then, the classified outputs were applied a field-based analysis and the class labels of the fields were directly entered into the Geographical Information System (GIS) database. The results were compared with the classified outputs of the three dates of imagery generated using a traditional maximum likelihood (ML) algorithm. It was observed that the proposed approach provided significantly higher overall accuracies for the May and August images, for which the number of classes were low. In May and July, the DT approach produced the classification accuracies of 91.10% and 66.15% while the ML classifier produced 84.38% and 63.55%, respectively. However, in August nearly the similar overall accuracies were obtained for the ML (70.82%) and DT (69.14%) approaches. It was also observed that the use of additional bands for the proposed technique improved the separability of the sugar beet, tomato, pea, pepper, and rice classes.
217	[en] HOURLY LOAD FORECASTING A NEW APPROACH THROUGH DECISION TREE / [pt] PREVISÃO DE CARGA HORÁRIA UMA NOVA ABORDAGEM POR ÁRVORE DE DECISÃO ANA PAULA BARBOSA SOBRAL 08 July 2003 (has links) [pt] A importância da previsão de carga a curto prazo (até uma semana à frente) em crescido recentemente. Com os processos de privatização e implantação de ompetição no setor elétrico brasileiro, a previsão de tarifas de energia vai se tornar extremamente importante. As previsões das cargas elétricas são fundamentais para alimentar as ferramentas analíticas utilizadas na sinalização das tarifas. Em conseqüência destas mudanças estruturais no setor, a variabilidade e a não-estacionaridade das cargas elétricas tendem a aumentar devido à dinâmica dos preços da energia. Em função das mudanças estruturais do setor elétrico, previsores mais autônomos são necessários para o novo cenário que se aproxima. As ferramentas disponíveis no mercado internacional para previsão de carga elétrica requerem uma quantidade significativa de informações on-line, principalmente no que se refere a dados meteorológicos. Como a realidade brasileira ainda não permite o acesso a essas informações será proposto um previsor de carga para o curto-prazo, considerando restrições na aquisição dos dados de temperatura. Logo, tem-se como proposta um modelo de previsão de carga horária de curto prazo (um dia a frente) empregando dados de carga elétrica e dados meteorológicos (temperatura) através de modelos de árvore de decisão. Decidiu-se pelo modelo de árvore de decisão, pois este modelo além de apresentar uma grande facilidade de interpretação dos resultados, apresenta pouquíssima ênfase em sua utilização na área de previsão de carga elétrica. / [en] The importance of load forecasting for the short term (up to one-week ahead) has been steadily growing in the last years. Load forecasts are the basis for the forecasting of energy prices, and the privatisation, and the introduction of competitiveness in the Brazilian electricity sector, have turned price forecasting into an extremely important task. As a consequence of structural changes in the electricity sector, the variability and the non-stationarity of the electrical loads have tended to increase, because of the dynamics of the energy prices. As a consequence of these structural changes, new forecasting methods are needed to meet the new scenarios. The tools that are available for load forecasting in the international market require a large amount of online information, specially information about weather data. Since this information is not yet readily available in Brazil, this thesis proposes a short-term load forecaster that takes into consideration the restrictions in the acquisition of temperature data. A short-term (one-day ahead) forecaster of hourly loads is proposed that combines load data and weather data (temperature), by means of decision tree models. Decision trees were chosen because those models, despite being easy to interpret, have been very rarely used for load forecasting. [pt] REDES NEURAIS [en] NEURAL NETWORKS [pt] PREVISAO [en] FORECASTING [pt] ARVORE DE DECISAO [en] DECISION TREE [pt] TEMPERATURA [en] TEMPERATURE [pt] MODELO HIBRIDO [en] HYBRID MODEL [pt] TUNGSTENIO [en] TUNGSTEN
218	Emprego de diferentes algoritmos de árvores de decisão na classificação da atividade celular in vitro para tratamentos de superfícies de titânio Fernandes, Fabiano Rodrigues January 2017 (has links) O interesse pela área de análise e caracterização de materiais biomédicos cresce, devido a necessidade de selecionar de forma adequada, o material a ser utilizado. Dependendo das condições em que o material será submetido, a caracterização poderá abranger a avaliação de propriedades mecânicas, elétricas, bioatividade, imunogenicidade, eletrônicas, magnéticas, ópticas, químicas e térmicas. A literatura relata o emprego da técnica de árvores de decisão, utilizando os algoritmos SimpleCart(CART) e J48, para classificação de base de dados (dataset), gerada a partir de resultados de artigos científicos. Esse estudo foi realizado afim de identificar características superficiais que otimizassem a atividade celular. Para isso, avaliou-se, a partir de artigos publicados, o efeito de tratamento de superfície do titânio na atividade celular in vitro (células MC3TE-E1). Ficou constatado que, o emprego do algoritmo SimpleCart proporcionou uma melhor resposta em relação ao algoritmo J48. Nesse contexto, o presente trabalho tem como objetivo aplicar, para esse mesmo estudo, os algoritmos CHAID (Chi-square iteration automatic detection) e CHAID Exaustivo, comparando com os resultados obtidos com o emprego do algoritmo SimpleCart. A validação dos resultados, mostraram que o algoritmo CHAID Exaustivo obteve o melhor resultado em comparação ao algoritmo CHAID, obtendo uma estimativa de acerto de 75,9% contra 58,6% respectivamente, e um erro padrão de 7,9% contra 9,1% respectivamente, enquanto que, o algoritmo já testado na literatura SimpleCart(CART) teve como resultado 34,5% de estimativa de acerto com um erro padrão de 8,8%. Com relação aos tempos de execução apurados sobre 22 mil registros, evidenciaram que o algoritmo CHAID Exaustivo apresentou os melhores tempos, com ganho de 0,02 segundos sobre o algoritmo CHAID e 14,45 segundos sobre o algoritmo SimpleCart(CART). / The interest for the area of analysis and characterization of biomedical materials as the need for selecting the adequate material to be used increases. However, depending on the conditions to which materials are submitted, characterization may involve the evaluation of mechanical, electrical, optical, chemical and thermal properties besides bioactivity and immunogenicity. Literature review shows the application decision trees, using SimpleCart(CART) and J48 algorithms, to classify the dataset, which is generated from the results of scientific articles. Therefore the objective of this study was to identify surface characteristics that optimizes the cellular activity. Based on published articles, the effect of the surface treatment of titanium on the in vitro cells (MC3TE-E1 cells) was evaluated. It was found that applying SimpleCart algorithm gives better results than the J48. In this sense, the present study has the objective to apply the CHAID (Chi-square iteration automatic detection) algorithm and Exhaustive CHAID to the surveyed data, and compare the results obtained with the application of SimpleCart algorithm. The validation of the results showed that the Exhaustive CHAID obtained better results comparing to CHAID algorithm, obtaining 75.9 % of accurate estimation against 58.5%, respectively, while the standard error was 7.9% against 9.1%, respectively. Comparing the obtained results with SimpleCart(CART) results which had already been tested and presented in the literature, the results for accurate estimation was 34.5% and the standard error 8.8%. In relation to execution time found through the 22.000 registers, it showed that the algorithm Exhaustive CHAID presented the best times, with a gain of 0.02 seconds over the CHAID algorithm and 14.45 seconds over the SimpleCart(CART) algorithm. Biomateriais Titânio Tratamento de superfícies Mineração de dados Algoritmos Algorithms MC3TE-E1 Titanium TiO2 Surface treatment Exhaustive CHAID CHAID CART SimpleCart Decision tree
219	A cloud-based intelligent and energy efficient malware detection framework : a framework for cloud-based, energy efficient, and reliable malware detection in real-time based on training SVM, decision tree, and boosting using specified heuristics anomalies of portable executable files Mirza, Qublai K. A. January 2017 (has links) The continuity in the financial and other related losses due to cyber-attacks prove the substantial growth of malware and their lethal proliferation techniques. Every successful malware attack highlights the weaknesses in the defence mechanisms responsible for securing the targeted computer or a network. The recent cyber-attacks reveal the presence of sophistication and intelligence in malware behaviour having the ability to conceal their code and operate within the system autonomously. The conventional detection mechanisms not only possess the scarcity in malware detection capabilities, they consume a large amount of resources while scanning for malicious entities in the system. Many recent reports have highlighted this issue along with the challenges faced by the alternate solutions and studies conducted in the same area. There is an unprecedented need of a resilient and autonomous solution that takes proactive approach against modern malware with stealth behaviour. This thesis proposes a multi-aspect solution comprising of an intelligent malware detection framework and an energy efficient hosting model. The malware detection framework is a combination of conventional and novel malware detection techniques. The proposed framework incorporates comprehensive feature heuristics of files generated by a bespoke static feature extraction tool. These comprehensive heuristics are used to train the machine learning algorithms; Support Vector Machine, Decision Tree, and Boosting to differentiate between clean and malicious files. Both these techniques; feature heuristics and machine learning are combined to form a two-factor detection mechanism. This thesis also presents a cloud-based energy efficient and scalable hosting model, which combines multiple infrastructure components of Amazon Web Services to host the malware detection framework. This hosting model presents a client-server architecture, where client is a lightweight service running on the host machine and server is based on the cloud. The proposed framework and the hosting model were evaluated individually and combined by specifically designed experiments using separate repositories of clean and malicious files. The experiments were designed to evaluate the malware detection capabilities and energy efficiency while operating within a system. The proposed malware detection framework and the hosting model showed significant improvement in malware detection while consuming quite low CPU resources during the operation.
220	Analys av nutidens tågindelning : Ett uppdrag framtaget av Trafikverket / Analysis of today's train division Grek, Viktoria, Gabrielsson, Molinia January 2018 (has links) The information used in this paper comes from Trafikverket's delivery monitoring system. It consists of information about planned train missions on the Swedish railways for the years 2014 to 2017 during week four (except planned train missions on Roslagsbanan and Saltsjöbanan). Trafikanalys with help from Trafikverket presents public statistics for short-distance trains, middle-distance trains and long-distance trains on Trafikanalys website. The three classes of trains have no scientific basis. The purpose of this study is therefore to analyze if today's classes of trains can be used and which variables that have importance for the classification. The purpose of this study is also to analyze if there is a better way to categorize the classes of trains when Trafikanalys publishes public statistics. The statistical methods that are used in this study are decision tree, neural network and hierarchical clustering. The result obtained from the decision tree was a 92.51 percent accuracy for the classification of Train type. The most important variables for Train type were Train length, Planned train kilometers and Planned km/h.Neural networks were used to investigate whether this method could also provide a similar result as the decision tree too strengthening the reliability. Neural networks got an 88 percent accuracy when classifying Train type. Based on these two results, it indicates that the larger proportion of train assignments could be classified to the correct Train Type. This means that the current classification of Train type works when Trafikanalys presents official statistics. For the new train classification, three groups were analyzed when hierarchical clustering was used. These three groups were not the same as the group's short-distance trains, middle-distance trains and long-distance trains. Because the new divisions have blended the various passenger trains, this result does not help to find a better subdivision that can be used for when Trafikanalys presents official statistics. / Datamaterialet som används i uppsatsen kommer ifrån Trafikverkets leveransuppföljningssystem. I datamaterialet finns information om planerade tåguppdrag för de svenska järnvägarna för år 2014 till 2017 under vecka fyra (bortsett från planerade tåguppdrag för Roslagsbanan och Saltsjöbanan). Trafikanalys med hjälp av Trafikverket redovisar officiell statistik för kortdistanståg, medeldistanståg och långdistanståg på Trafikanalys hemsida. De tre tågkategorierna har inte någon vetenskaplig grund. Syftet med denna studie är därför att undersöka ifall dagens tågindelning fungerar och vilka variabler som hänger ihop med denna indelning. Syftet är även att undersöka om det finns någon bättre tågindelning som kan användas när Trafikanalys redovisar officiell statistik. De statistiska metoder studien utgått ifrån är beslutsträd, neurala nätverk och hierarkisk klustring. Resultatet som erhölls från beslutsträdet var en ackuratess på 92.51 procent för klassificeringen av Tågsort. De variabler som hade störst betydelse för Tågsort var Tåglängd, Planerade tågkilometrar och Planerad km/h. Neurala nätverk användes för att undersöka om även denna metod kunde ge ett liknande resultat som beslutsträdet och därmed stärka tillförlitligheten. Neurala nätverket fick en ackuratess på 88 procent vid klassificeringen av Tågsort. Utifrån dessa två resultat tyder det på att den större andelen tåguppdrag kunde klassificeras till rätt Tågsort. Det innebär att nuvarande klassificering av Tågsort fungerar när Trafikanalys presenterar officiell statistik. För den nya tågklassificeringen analyserades tre grupper när hierarkisk klustring användes. Dessa tre grupper liknande inte dagens indelning för kortdistanståg, medeldistanståg och långdistanståg. Eftersom att de nya indelningarna blandade de olika persontågen går det inte med detta resultat att hitta en bättre indelning som kan användas när Trafikanalys presenterar officiell statistik. Probability Theory and Statistics Sannolikhetsteori och statistik

Search results