41 |
Fuzzy Set Theory Applied to Make Medical Prognoses for Cancer PatientsZettervall, Hang January 2014 (has links)
As we all know the classical set theory has a deep-rooted influence in the traditional mathematics. According to the two-valued logic, an element can belong to a set or cannot. In the former case, the element’s membership degree will be assigned to one, whereas in the latter case it takes the zero value. With other words, a feeling of imprecision or fuzziness in the two-valued logic does not exist. With the rapid development of science and technology, more and more scientists have gradually come to realize the vital importance of the multi-valued logic. Thus, in 1965, Professor Lotfi A. Zadeh from Berkeley University put forward the concept of a fuzzy set. In less than 60 years, people became more and more familiar with fuzzy set theory. The theory of fuzzy sets has been turned to be a favor applied to many fields. The study aims to apply some classical and extensional methods of fuzzy set theory in life expectancy and treatment prognoses for cancer patients. The research is based on real-life problems encountered in clinical works by physicians. From the introductory items of the fuzzy set theory to the medical applications, a collection of detailed analysis of fuzzy set theory and its extensions are presented in the thesis. Concretely speaking, the Mamdani fuzzy control systems and the Sugeno controller have been applied to predict the survival length of gastric cancer patients. In order to keep the gastric cancer patients, already examined, away from the unnecessary suffering from surgical operation, the fuzzy c-means clustering analysis has been adopted to investigate the possibilities for operation contra to nonoperation. Furthermore, the approach of point set approximation has been adopted to estimate the operation possibilities against to nonoperation for an arbitrary gastric cancer patient. In addition, in the domain of multi-expert decision-making, the probabilistic model, the model of 2-tuple linguistic representations and the hesitant fuzzy linguistic term sets (HFLTS) have been utilized to select the most consensual treatment scheme(s) for two separate prostate cancer patients. The obtained results have supplied the physicians with reliable and helpful information. Therefore, the research work can be seen as the mathematical complements to the physicians’ queries.
|
42 |
Análise de algoritmos de agrupamento para base de dados textuais / Analysis of the clustering algorithms for the databasesAlmeida, Luiz Gonzaga Paula de 31 August 2008 (has links)
Made available in DSpace on 2015-03-04T18:50:55Z (GMT). No. of bitstreams: 1
DissertacaoLuizGonzaga.pdf: 3514446 bytes, checksum: 517d9c7b241b2bd9c799c807d6eac037 (MD5)
Previous issue date: 2008-08-31 / The increasing amount of digitally stored texts makes necessary the development of computational tools to allow the access of information and knowledge in an efficient and efficacious manner. This problem is extremely relevant in biomedicine research, since most of the generated knowledge is translated into scientific articles and it is necessary to have the most easy and fast access.
The research field known as Text Mining deals with the problem of identifying new information and knowledge in text databases. One of its tasks is to find in databases groups of texts that are correlated, an issue known as text clustering. To allow clustering, text databases must be transformed into the commonly used Vector Space Model, in which texts are represented by vectors composed by the frequency of occurrence of words and terms present in the databases. The set of vectors composing a matrix named document-term is usually sparse with high dimension. Normally, to attenuate the problems caused by these features, a subset of terms is selected, thus giving rise a new document-term matrix with reduced dimensions, which is then used by clustering algorithms.
This work presents two algorithms for terms selection and the evaluation of clustering algorithms: k-means, spectral and graph portioning, in five pre-classified databases. The databases were pre-processed by previously described methods. The results indicate that the term selection algorithms implemented increased the performance of the clustering algorithms used and that the k-means and spectral algorithms outperformed the graph portioning. / O volume crescente de textos digitalmente armazenados torna necessária a construção de ferramentas computacionais que permitam a organização e o acesso eficaz e eficiente à informação e ao conhecimento nele contidos. No campo do conhecimento da biomedicina este problema se torna extremamente relevante, pois a maior parte do conhecimento gerado é formalizada através de artigos científicos e é necessário que o acesso a estes seja o mais fácil e rápido possível.
A área de pesquisa conhecida como Mineração de Textos (do inglês Text Mining), se propõe a enfrentar este problema ao procurar identificar novas informações e conhecimentos até então desconhecidos, em bases de dados textuais. Uma de suas tarefas é a descoberta de grupos de textos correlatos em base de dados textuais e esse problema é conhecido como agrupamento de textos (do inglês Text Clustering). Para este fim, a representação das bases de dados textuais comumente utilizada no agrupamento de textos é o Modelo Espaço-vetorial, no qual cada texto é representado por um vetor de características, que são as freqüências das palavras ou termos que nele ocorrem. O conjunto de vetores forma uma matriz denominada de documento-termo, que é esparsa e de alta dimensionalidade. Para atenuar os problemas decorrentes dessas características, normalmente é selecionado um subconjunto de termos, construindo-se assim uma nova matriz documento-termo com um número reduzido de dimensões que é então utilizada nos algoritmos de agrupamento.
Este trabalho se desdobra em: i) introdução e implementação de dois algoritmos para seleção de termos e ii) avaliação dos algoritmos k-means, espectral e de particionamento de grafos, em cinco base de dados de textos previamente classificadas. As bases de dados são pré-processadas através de métodos descritos na literatura, produzindo-se as matrizes documento-termo. Os resultados indicam que os algoritmos de seleção propostos, para a redução das matrizes documento-termo, melhoram o desempenho dos algoritmos de agrupamento avaliados. Os algoritmos k-means e espectral têm um desempenho superior ao algoritmos de particionamento de grafos no agrupamento de bases de dados textuais, com ou sem a seleção de características.
|
43 |
Uma nova metáfora visual escalável para dados tabulares e sua aplicação na análise de agrupamentos / A scalable visual metaphor for tabular data and its application on clustering analysisMosquera, Evinton Antonio Cordoba 19 September 2017 (has links)
A rápida evolução dos recursos computacionais vem permitindo que grandes conjuntos de dados sejam armazenados e recuperados. No entanto, a exploração, compreensão e extração de informação útil ainda são um desafio. Com relação às ferramentas computacionais que visam tratar desse problema, a Visualização de Informação possibilita a análise de conjuntos de dados por meio de representações gráficas e a Mineração de Dados fornece processos automáticos para a descoberta e interpretação de padrões. Apesar da recente popularidade dos métodos de visualização de informação, um problema recorrente é a baixa escalabilidade visual quando se está analisando grandes conjuntos de dados, resultando em perda de contexto e desordem visual. Com intuito de representar grandes conjuntos de dados reduzindo a perda de informação relevante, o processo de agregação visual de dados vem sendo empregado. A agregação diminui a quantidade de dados a serem representados, preservando a distribuição e as tendências do conjunto de dados original. Quanto à mineração de dados, visualização de informação vêm se tornando ferramental essencial na interpretação dos modelos computacionais e resultados gerados, em especial das técnicas não-supervisionados, como as de agrupamento. Isso porque nessas técnicas, a única forma do usuário interagir com o processo de mineração é por meio de parametrização, limitando a inserção de conhecimento de domínio no processo de análise de dados. Nesta dissertação, propomos e desenvolvemos uma metáfora visual baseada na TableLens que emprega abordagens baseadas no conceito de agregação para criar representações mais escaláveis para a interpretação de dados tabulares. Como aplicação, empregamos a metáfora desenvolvida na análise de resultados de técnicas de agrupamento. O ferramental resultante não somente suporta análise de grandes bases de dados com reduzida perda de contexto, mas também fornece subsídios para entender como os atributos dos dados contribuem para a formação de agrupamentos em termos da coesão e separação dos grupos formados. / The rapid evolution of computing resources has enabled large datasets to be stored and retrieved. However, exploring, understanding and extracting useful information is still a challenge. Among the computational tools to address this problem, information visualization techniques enable the data analysis employing the human visual ability by making a graphic representation of the data set, and data mining provides automatic processes for the discovery and interpretation of patterns. Despite the recent popularity of information visualization methods, a recurring problem is the low visual scalability when analyzing large data sets resulting in context loss and visual disorder. To represent large datasets reducing the loss of relevant information, the process of aggregation is being used. Aggregation decreases the amount of data to be represented, preserving the distribution and trends of the original dataset. Regarding data mining, information visualization has become an essential tool in the interpretation of computational models and generated results, especially of unsupervised techniques, such as clustering. This occurs because, in these techniques, the only way the user interacts with the mining process is through parameterization, limiting the insertion of domain knowledge in the process. In this thesis, we propose and develop the new visual metaphor based on the TableLens that employs approaches based on the concept of aggregation to create more scalable representations of tabular data. As application, we use the developed metaphor in the analysis of the results of clustering techniques. The resulting framework does not only support large database analysis but also provides insights into how data attributes contribute to clustering regarding cohesion and separation of the composed groups
|
44 |
Applications of MALDI-TOF/MS combined with molecular imaging for breast cancer diagnosisChiang, Yi-Yan 26 July 2011 (has links)
The incidence of breast cancer became the most common female cancer, and the fourth cause of female cancer death. In this study, matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF/MS) have been combined with multivariate statistics to investigate breast cancer tissues and cell lines.
Core needle biopsy and fine needle aspiration (FNA) are techniques largely applied in the diagnosis of breast cancer. In this study, we have established an efficient protocol for detecting breast tissue and FNA samples with MALDI-TOF/MS. With the help of statistical analysis software, we can find the lipid-derived ion signals which can be use to distinguish breast cancer tumor tissues from non-tumor parts. This strategy can differentiate normal and tumor tissue, which is potential to apply in clinical diagnoses.
The analysis of breast cancer tissue is challenging as the complexity of the tissue sample. Direct tissue analyses by matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI-IMS) allows us to investigate the molecular structure and their distribution while maintaining the integrity of the tissue and avoiding the loss of signals from extraction steps. Combined MALDI-IMS with statistic software, tissues can be analyzed and classified based on their molecular content which is helpful to distinguish tumor regions from non-tumor regions of breast cancer tissue. Our result shows the differences in the distribution and content of lipids between tumor and non-tumor tissue which can be supplements of current pathological analysis in tumor margins.
In this study, MALDI-TOF/MS combined with multivariate statistics were used to rapidly differentiate breast cancer cell lines with different estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) status. The protocol for efficiently detecting peptides and proteins in breast cancer cells with MALDI-TOF/MS was established, two multivariate statistics including principle component analysis (PCA) and hierarchical clustering analysis were used to process the obtaining MALDI mass spectra of six different breast cancer cell lines and one normal breast cell lines. Based on the difference of the peptide and protein profiles, breast cancer cell lines with same ER and HER-2 status were grouped in nearby region on the PCA score plot. The results of hierarchical cluster analysis also revealed high conformity between breast cancer cell protein profiles and respective hormone receptor types.
|
45 |
An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. SmithSmith, Eugene Herbie January 2009 (has links)
Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch. / Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010.
|
46 |
An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. SmithSmith, Eugene Herbie January 2009 (has links)
Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch. / Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010.
|
47 |
Crop decision planning under yield and price uncertaintiesKantanantha, Nantachai 25 June 2007 (has links)
This research focuses on developing a crop decision planning model to help farmers make decisions for an upcoming crop year. The decisions consist of which crops to plant, the amount of land to allocate to each crop, when to grow, when to harvest, and when to sell. The objective is to maximize the overall profit subject to available resources under yield and price uncertainties.
To help achieve this objective, we develop yield and price forecasting models to estimate the probable outcomes of these uncertain factors. The output from both forecasting models are incorporated into the crop decision planning model which enables the farmers to investigate and analyze the possible scenarios and eventually determine the appropriate decisions for each situation.
This dissertation has three major components, yield forecasting, price forecasting, and crop decision planning. For yield forecasting, we propose a crop-weather regression model under a semiparametric framework. We use temperature and rainfall information during the cropping season and a GDP macroeconomic indicator as predictors in the model. We apply a functional principal components analysis technique to reduce the dimensionality of the model and to extract meaningful information from the predictors. We compare the prediction results from our model with a series of other yield forecasting models. For price forecasting, we develop a futures-based model which predicts a cash price from futures price and commodity basis. We focus on forecasting the commodity basis rather than the cash price because of the availability of futures price information and the low uncertainty of the commodity basis. We adopt a model-based approach to estimate the density function of the commodity basis distribution, which is further used to estimate the confidence interval of the commodity basis and the cash price. Finally, for crop decision planning, we propose a stochastic linear programming model, which provides the optimal policy. We also develop three heuristic models that generate a feasible solution at a low computational cost. We investigate the robustness of the proposed models to the uncertainties and prior probabilities. A numerical study of the developed approaches is performed for a case of a representative farmer who grows corn and soybean in Illinois.
|
48 |
Dynamics underlying epileptic seizures: insights from a neural mass modelFan, Xiaoya 17 December 2018 (has links) (PDF)
In this work, we propose an approach that allows to explore the potential pathophysiological mechanisms (at neuronal population level) of ictogenesis by combining clinical intracranial electroencephalographic (iEEG) recordings with a neural mass model. IEEG recordings from temporal lobe epilepsy (TLE) patients around seizure onset were investigated. Physiologically meaningful parameters (average synaptic gains of the excitatory, slow and fast inhibitory population, Ae, B and G) were identified during interictal to ictal transition. We analyzed the temporal evolution of four ratios, i.e. Ae/G, Ae/B, Ae/(B + G), and B/G. The excitation/inhibition ratio increased around seizure onset and decreased before seizure offset, suggesting the disturbance and restoration of balance between excitation and inhibition around seizure onset and before seizure offset, respectively. Moreover, the slow inhibition may have an earlier effect on the breakdown of excitation/inhibition balance. Results confirm the decrease in excitation/inhibition ratio upon seizure termination in human temporal lobe epilepsy, as revealed by optogenetic approaches both in vivo in animal models and in vitro. We further explored the distribution of the average synaptic gains in parameter space and their temporal evolution, i.e. the path through the model parameter space, in TLE patients. Results showed that the synaptic gain values located roughly on a plane before seizure onset, dispersed during ictal and returned when the seizure terminated. Cluster analysis was performed on seizure paths and demonstrated consistency in synaptic gain evolution across different seizures from individual patients. Furthermore, two patient groups were identified, each one corresponding to a specific synaptic gain evolution in the parameter space during a seizure. Results were validated by a bootstrapping approach based on comparison with random paths. The differences in the path revealed variations in EEG dynamics for patients despite showing an identical seizure onset pattern. Our approach may have the potential to classify the epileptic patients into subgroups based on different mechanisms revealed by subtle changes in synaptic gains and further enable more robust decisions regarding treatment strategy. The increase of excitation/inhibition ratios, i.e. Ae/G, Ae/B and Ae/(B+G), around seizure onset makes them potential cues for seizure detection. We explored the feasibility of a model based seizure detection algorithm. A simple thresholding method was employed. We evaluated the algorithm against the manual scoring of a human expert on iEEG samples from patients suffering from different types of epilepsy. Results suggest that Ae/(B+G), i.e. excitation/(slow + fast inhibition) ratio, allowed the best performance and that the algorithm best suited TLE patients. Leave-one-out cross-validation showed that the algorithm achieved 94.74% sensitivity for TLE patients. The median false positive rate was 0.16 per hour, and median detection delay was -1.0 s. Of interest, the values of the threshold determined by leave-one-out cross-validation for TLE patients were quite constant, suggesting a general excitation/inhibition balance baseline in background iEEG among TLE patients. Such a model-based seizure detection approach is of clinical interest and could also achieve good performance for other types of epilepsy provided that more appropriate model, i.e. better describe epileptic EEG waveforms for other types of epilepsy, is implemented. Altogether, this thesis contributes to the field of epilepsy research from two perspectives. Scientifically, it gives new insights into the mechanisms underlying interictal to ictal transition, and facilitates better understanding of epileptic seizures. Clinically, it provides a tool for reviewing EEG data in a more efficient and objective manner and offers an opportunity for on-demand therapeutic devices. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
|
49 |
Risk–based modeling, simulation and optimization for the integration of renewable distributed generation into electric power networks / Modélisation, simulation et optimisation basée sur le risque pour l’intégration de génération distribuée renouvelable dans des réseaux de puissance électriqueMena, Rodrigo 30 June 2015 (has links)
Il est prévu que la génération distribuée par l’entremise d’énergie de sources renouvelables (DG) continuera à jouer un rôle clé dans le développement et l’exploitation des systèmes de puissance électrique durables, efficaces et fiables, en vertu de cette fournit une alternative pratique de décentralisation et diversification de la demande globale d’énergie, bénéficiant de sources d’énergie plus propres et plus sûrs. L’intégration de DG renouvelable dans les réseaux électriques existants pose des défis socio–technico–économiques, qu’ont attirés de la recherche et de progrès substantiels.Dans ce contexte, la présente thèse a pour objet la conception et le développement d’un cadre de modélisation, simulation et optimisation pour l’intégration de DG renouvelable dans des réseaux de puissance électrique existants. Le problème spécifique à considérer est celui de la sélection de la technologie,la taille et l’emplacement de des unités de génération renouvelable d’énergie, sous des contraintes techniques, opérationnelles et économiques. Dans ce problème, les questions de recherche clés à aborder sont: (i) la représentation et le traitement des variables physiques incertains (comme la disponibilité de les diverses sources primaires d’énergie renouvelables, l’approvisionnement d’électricité en vrac, la demande de puissance et l’apparition de défaillances de composants) qui déterminent dynamiquement l’exploitation du réseau DG–intégré, (ii) la propagation de ces incertitudes sur la réponse opérationnelle du système et le suivi du risque associé et (iii) les efforts de calcul intensif résultant du problème complexe d’optimisation combinatoire associé à l’intégration de DG renouvelable.Pour l’évaluation du système avec un plan d’intégration de DG renouvelable donné, un modèle de calcul de simulation Monte Carlo non–séquentielle et des flux de puissance optimale (MCS–OPF) a été conçu et mis en oeuvre, et qui émule l’exploitation du réseau DG–intégré. Réalisations aléatoires de scénarios opérationnels sont générés par échantillonnage à partir des différentes distributions des variables incertaines, et pour chaque scénario, la performance du système est évaluée en termes économiques et de la fiabilité de l’approvisionnement en électricité, représenté par le coût global (CG) et l’énergie non fournie (ENS), respectivement. Pour mesurer et contrôler le risque par rapport à la performance du système, deux indicateurs sont introduits, la valeur–à–risque conditionnelle(CVaR) et l’écart du CVaR (DCVaR).Pour la sélection optimale de la technologie, la taille et l’emplacement des unités DG renouvelables,deux approches distinctes d’optimisation multi–objectif (MOO) ont été mis en oeuvre par moteurs de recherche d’heuristique d’optimisation (HO). La première approche est basée sur l’algorithme génétique élitiste de tri non-dominé (NSGA–II) et vise à la réduction concomitante de l’espérance mathématique de CG et de ENS, dénotés ECG et EENS, respectivement, combiné avec leur valeurs correspondent de CVaR(CG) et CVaR(ENS); la seconde approche effectue un recherche à évolution différentielle MOO (DE) pour minimiser simultanément ECG et s’écart associé DCVaR(CG). Les deux approches d’optimisation intègrent la modèle de calcul MCS–OPF pour évaluer la performance de chaque réseau DG–intégré proposé par le moteur de recherche HO.Le défi provenant de les grands efforts de calcul requises par les cadres de simulation et d’optimisation proposée a été abordée par l’introduction d’une technique originale, qui niche l’analyse de classification hiérarchique (HCA) dans un moteur de recherche de DE.Exemples d’application des cadres proposés ont été élaborés, concernant une adaptation duréseau test de distribution électrique IEEE 13–noeuds et un cadre réaliste du système test de sous–transmission et de distribution IEEE 30–noeuds. [...] / Renewable distributed generation (DG) is expected to continue playing a fundamental role in the development and operation of sustainable, efficient and reliable electric power systems, by virtue of offering a practical alternative to diversify and decentralize the overall power generation, benefiting from cleaner and safer energy sources. The integration of renewable DG in the existing electric powernetworks poses socio–techno–economical challenges, which have attracted substantial research and advancement.In this context, the focus of the present thesis is the design and development of a modeling,simulation and optimization framework for the integration of renewable DG into electric powernetworks. The specific problem considered is that of selecting the technology, size and location of renewable generation units, under technical, operational and economic constraints. Within this problem, key research questions to be addressed are: (i) the representation and treatment of the uncertain physical variables (like the availability of diverse primary renewable energy sources, bulk–power supply, power demands and occurrence of components failures) that dynamically determine the DG–integrated network operation, (ii) the propagation of these uncertainties onto the system operational response and the control of the associated risk and (iii) the intensive computational efforts resulting from the complex combinatorial optimization problem of renewable DG integration.For the evaluation of the system with a given plan of renewable DG, a non–sequential MonteCarlo simulation and optimal power flow (MCS–OPF) computational model has been designed and implemented, that emulates the DG–integrated network operation. Random realizations of operational scenarios are generated by sampling from the different uncertain variables distributions,and for each scenario the system performance is evaluated in terms of economics and reliability of power supply, represented by the global cost (CG) and the energy not supplied (ENS), respectively.To measure and control the risk relative to system performance, two indicators are introduced, the conditional value–at–risk (CVaR) and the CVaR deviation (DCVaR).For the optimal technology selection, size and location of the renewable DG units, two distinct multi–objective optimization (MOO) approaches have been implemented by heuristic optimization(HO) search engines. The first approach is based on the fast non–dominated sorting genetic algorithm(NSGA–II) and aims at the concurrent minimization of the expected values of CG and ENS, thenECG and EENS, respectively, combined with their corresponding CVaR(CG) and CVaR(ENS) values; the second approach carries out a MOO differential evolution (DE) search to minimize simultaneously ECG and its associated deviation DCVaR(CG). Both optimization approaches embed the MCS–OPF computational model to evaluate the performance of each DG–integrated network proposed by the HO search engine. The challenge coming from the large computational efforts required by the proposed simulation and optimization frameworks has been addressed introducing an original technique, which nests hierarchical clustering analysis (HCA) within a DE search engine. Examples of application of the proposed frameworks have been worked out, regarding an adaptation of the IEEE 13 bus distribution test feeder and a realistic setting of the IEEE 30 bussub–transmission and distribution test system. The results show that these frameworks are effectivein finding optimal DG–integrated networks solutions, while controlling risk from two distinctperspectives: directly through the use of CVaR and indirectly by targeting uncertainty in the form ofDCVaR. Moreover, CVaR acts as an enabler of trade–offs between optimal expected performanceand risk, and DCVaR integrates also uncertainty into the analysis, providing a wider spectrum ofinformation for well–supported and confident decision making.
|
50 |
Uma nova metáfora visual escalável para dados tabulares e sua aplicação na análise de agrupamentos / A scalable visual metaphor for tabular data and its application on clustering analysisEvinton Antonio Cordoba Mosquera 19 September 2017 (has links)
A rápida evolução dos recursos computacionais vem permitindo que grandes conjuntos de dados sejam armazenados e recuperados. No entanto, a exploração, compreensão e extração de informação útil ainda são um desafio. Com relação às ferramentas computacionais que visam tratar desse problema, a Visualização de Informação possibilita a análise de conjuntos de dados por meio de representações gráficas e a Mineração de Dados fornece processos automáticos para a descoberta e interpretação de padrões. Apesar da recente popularidade dos métodos de visualização de informação, um problema recorrente é a baixa escalabilidade visual quando se está analisando grandes conjuntos de dados, resultando em perda de contexto e desordem visual. Com intuito de representar grandes conjuntos de dados reduzindo a perda de informação relevante, o processo de agregação visual de dados vem sendo empregado. A agregação diminui a quantidade de dados a serem representados, preservando a distribuição e as tendências do conjunto de dados original. Quanto à mineração de dados, visualização de informação vêm se tornando ferramental essencial na interpretação dos modelos computacionais e resultados gerados, em especial das técnicas não-supervisionados, como as de agrupamento. Isso porque nessas técnicas, a única forma do usuário interagir com o processo de mineração é por meio de parametrização, limitando a inserção de conhecimento de domínio no processo de análise de dados. Nesta dissertação, propomos e desenvolvemos uma metáfora visual baseada na TableLens que emprega abordagens baseadas no conceito de agregação para criar representações mais escaláveis para a interpretação de dados tabulares. Como aplicação, empregamos a metáfora desenvolvida na análise de resultados de técnicas de agrupamento. O ferramental resultante não somente suporta análise de grandes bases de dados com reduzida perda de contexto, mas também fornece subsídios para entender como os atributos dos dados contribuem para a formação de agrupamentos em termos da coesão e separação dos grupos formados. / The rapid evolution of computing resources has enabled large datasets to be stored and retrieved. However, exploring, understanding and extracting useful information is still a challenge. Among the computational tools to address this problem, information visualization techniques enable the data analysis employing the human visual ability by making a graphic representation of the data set, and data mining provides automatic processes for the discovery and interpretation of patterns. Despite the recent popularity of information visualization methods, a recurring problem is the low visual scalability when analyzing large data sets resulting in context loss and visual disorder. To represent large datasets reducing the loss of relevant information, the process of aggregation is being used. Aggregation decreases the amount of data to be represented, preserving the distribution and trends of the original dataset. Regarding data mining, information visualization has become an essential tool in the interpretation of computational models and generated results, especially of unsupervised techniques, such as clustering. This occurs because, in these techniques, the only way the user interacts with the mining process is through parameterization, limiting the insertion of domain knowledge in the process. In this thesis, we propose and develop the new visual metaphor based on the TableLens that employs approaches based on the concept of aggregation to create more scalable representations of tabular data. As application, we use the developed metaphor in the analysis of the results of clustering techniques. The resulting framework does not only support large database analysis but also provides insights into how data attributes contribute to clustering regarding cohesion and separation of the composed groups
|
Page generated in 0.1311 seconds