Global ETD Search

191	A SYSTEMATIC STUDY OF SPARSE DEEP LEARNING WITH DIFFERENT PENALTIES Xinlin Tao (13143465) 25 April 2023 (has links) <p>Deep learning has been the driving force behind many successful data science achievements. However, the deep neural network (DNN) that forms the basis of deep learning is</p> <p>often over-parameterized, leading to training, prediction, and interpretation challenges. To</p> <p>address this issue, it is common practice to apply an appropriate penalty to each connection</p> <p>weight, limiting its magnitude. This approach is equivalent to imposing a prior distribution</p> <p>on each connection weight from a Bayesian perspective. This project offers a systematic investigation into the selection of the penalty function or prior distribution. Specifically, under</p> <p>the general theoretical framework of posterior consistency, we prove that consistent sparse</p> <p>deep learning can be achieved with a variety of penalty functions or prior distributions.</p> <p>Examples include amenable regularization penalties (such as MCP and SCAD), spike-and?slab priors (such as mixture Gaussian distribution and mixture Laplace distribution), and</p> <p>polynomial decayed priors (such as the student-t distribution). Our theory is supported by</p> <p>numerical results.</p> <p><br></p> Deep learning Statistics not elsewhere classified Network Compression Sparse Deep Learning Nonlinear feature selection Posterior Consistency
192	Domain Expertise–Agnostic Feature Selection for the Analysis of Breast Cancer Data Pozzoli, Susanna January 2019 (has links) At present, high-dimensional data sets are becoming more and more frequent. The problem of feature selection has already become widespread, owing to the curse of dimensionality. Unfortunately, feature selection is largely based on ground truth and domain expertise. It is possible that ground truth and/or domain expertise will be unavailable, therefore there is a growing need for unsupervised feature selection in multiple fields, such as marketing and proteomics.Now, unlike in past time, it is possible for biologists to measure the amount of protein in a cancer cell. No wonder the data is high-dimensional, the human body is composed of thousands and thousands of proteins. Intuitively, only a handful of proteins cause the onset of the disease. It might be desirable to cluster the cancer sufferers, but at the same time we want to find the proteins that produce good partitions.We hereby propose a methodology designed to find the features able to maximize the clustering performance. After we divided the proteins into different groups, we clustered the patients. Next, we evaluated the clustering performance. We developed a couple of pipelines. Whilst the first focuses its attention on the data provided by the laboratory, the second takes advantage both of the external data on protein complexes and of the internal data. We set the threshold of clustering performance thanks to the biologists at Karolinska Institutet who contributed to the project.In the thesis we show how to make a good selection of features without domain expertise in case of breast cancer data. This experiment illustrates how we can reach a clustering performance up to eight times better than the baseline with the aid of feature selection. / Högdimensionella dataseter blir allt vanligare. Problemet med funktionsval har redan blivit utbrett på grund av dimensionalitetens förbannelse. Dessvärre är funktionsvalet i stor utsträckning baserat på grundläggande sanning och domänkunskap. Det är möjligt att grundläggande sanning och/eller domänkunskap kommer att vara otillgänglig, därför finns det ett växande behov av icke-övervakat funktionsval i flera områden, såsom marknadsföring och proteomics.I nuläge, till skillnad från tidigare, är det möjligt för biologer att mäta mängden protein i en cancercell. Inte undra på att data är högdimensionella, människokroppen består av tusentals och tusentals proteiner. Intuitivt orsakar bara en handfull proteiner sjukdomsuppkomsten. Det kan vara önskvärt att klustrera cancerlidarna, men samtidigt vill vi hitta proteiner som producerar goda partitioner.Vi föreslår härmed en metod som är utformad för att hitta funktioner som kan maximera klustringsprestandan. Efter att vi delat proteinerna i olika grupper klustrade vi patienterna. Därefter utvärderade vi klustringsprestandan. Vi utvecklade ett par pipelines. Medan den första fokuserar på de data som laboratoriet tillhandahåller, utnyttjar den andra både extern data på proteinkomplex och intern data. Vi ställde gränsen för klusterprestationen tack vare biologerna vid Karolinska Institutet som bidragit till projektet.I avhandlingen visar vi hur man gör ett bra utbud av funktioner utan domänkompetens vid bröstcancerdata. Detta experiment illustrerar hur vi kan nå en klusterprestation upp till åtta gånger bättre än baslinjen med hjälp av funktionsval. breast cancer clustering clustering performance evaluation feature selection proteomics unsupervised learning Computer and Information Sciences Data- och informationsvetenskap
193	Applications of graph theory in the energy sector, demonstrated with feature selection in electricity price forecasting / Tillämpningar av grafteori inom energisektorn, demonstrerat med variabelselektering för prognostisering av elpriset Vu, Duc Tam January 2020 (has links) Graph theory is a mathematical study of objects and their pairwise relations, known as nodes and edges respectively. The birth of graph theory is often considered to take place in 1736 when the Swiss mathematician Leonhard Euler tried to solve a routing problem involving seven bridges of Königsberg in Prussia. In more recent times, graph theory has caught the attention of companies from all types of industries due to its power of modelling and analysing exceptionally large networks. This thesis investigates the usage of graph theory in the energy sector for a utility company, in particular Fortum whose activities consist of, but not limited to, production and distribution of electricity and heat. The output of the thesis is a wide overview of graph-theoretic concepts and their practical applications, as well as a study of a use-case where some concepts are put into deeper analysis. The chosen use-case within the scope of this thesis is feature selection - a process for reducing the number of features, also known as input variables, typically before a regression model is built to avoid overfitting and increase model interpretability. Five graph-based feature selection methods with different points of view are studied. Experiments are conducted on realistic data sets with many features to verify the legitimacy of the methods. One of the data sets is owned by Fortum and used for forecasting the electricity price, among other important quantities. The obtained results look promising according to several evaluation metrics and can be used by Fortum as a support tool to develop prediction models. In general, a utility company can likely take advantage graph theory in many ways and add value to their business with enriched mathematical knowledge. / Grafteori är ett matematiskt område där objekt och deras parvisa relationer, även kallade noder respektive kanter, studeras. Grafteorins födsel anses ofta äga rum år 1736 när den schweiziske matematikern Leonhard Euler försökte lösa ett vägsökningsproblem som involverade sju broar av Königsberg i Preussen. På senare tid har grafteori fått uppmärksamhet från företag inom flera branscher på grund av dess kraft att modellera och analysera väsentligt stora nätverk. Detta arbete undersöker användningen av grafteori inom energisektorn för ett allmännyttigt företag, närmare bestämt Fortum vars verksamhet består av, dock ej begränsat till, produktion och distribution av elektricitet och värme. Arbetet resulterar i en bred översiktlig genomgång av grafteoretiska begrepp och deras praktiska tillämpningar, samt ett fallstudium där några begrepp sätts in i en djupare analys. Det valda fallstudiet inom ramen för arbetet är variabelselektering - en process för att minska antalet ingångsvariabler, vilket vanligtvis genomförs innan en regressionsmodell skapas för att undvika överanpassning och öka modellens tydbarhet. Fem grafbaserade metoder för variabelselektering med olika ståndpunkter studeras. Experiment genomförs på realistiska datamängder med många ingångsvariabler för att verifiera metodernas giltighet. En av datamängderna ägs av Fortum och används för att prognostisera elpriset, bland andra viktiga kvantiteter. De erhållna resultaten ser lovande ut enligt flera utvärderingsmått och kan användas av Fortum som ett stödverktyg för att utveckla prediktionsmodeller. I allmänhet kan ett energiföretag sannolikt dra fördel av grafteori på många sätt och skapa värde i sin affär med hjälp av berikad matematisk kunskap. Graph theory feature selection energy company Grafteori variabelselektering energiföretag Mathematics Matematik
194	Causal Inference for Scientific Discoveries and Fairness-Aware Machine Learning / 科学的発見と公平な機械学習を志向した因果推論 Chikahara, Yoichi 26 September 2022 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24257号 / 情博第801号 / 新制\|\|情\|\|135(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島久嗣, 教授山本章博, 教授下平英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Causal discovery Treatment effect estimation Machine learning and fairness Kernel methods Feature selection 007
195	Machine Learning-based Feature Selection and Optimisation for Clinical Decision Support Systems. Optimal Data-driven Feature Selection Methods for Binary and Multi-class Classification Problems: Towards a Minimum Viable Solution for Predicting Early Diagnosis and Prognosis Parisi, Luca January 2019 (has links) This critical synopsis of prior work by Luca Parisi is submitted in support of a PhD by Published Work. The work focuses on deriving accurate, reliable and explainable clinical decision support systems as minimum clinically viable solutions leveraging Machine Learning (ML) and evolutionary algorithms, for the first time, to facilitate early diagnostic predictions of Parkinson's Disease and hypothermia in hospitals, as well as prognostic predictions of optimal postoperative recovery area and of chronic hepatitis. Despite the various pathological aetiologies, the underlying capability of ML-based algorithms to serve as a minimum clinically viable solution for predicting early diagnosis and prognosis has been thoroughly demonstrated. Feature selection (FS) is a proven method for increasing the performance of ML-based classifiers for several applications. Although advances in ML, such as Deep Learning (DL), have denied the usefulness of any extrinsic FS by incorporating it in their architectures, e.g., convolutional filters in convolutional neural networks, DL algorithms often lack the required explainability to be understood and interpreted by clinicians within the context of the diagnostic and prognostic tasks of interest. Their relatively complicated architectures, the hardware required for running them and the limited explainability or interpretability of their architectures, the decision-making process – although as assistive tools - driven by the algorithms’ training and predictive outcomes have hindered their application in a clinical setting. Luca Parisi’s work fills this translational research gap by harnessing the explainability of using traditional ML- and evolutionary algorithms-based FS methods for improving the performance of ML-based algorithms and devise minimum viable solutions for diagnostic and prognostic purposes. The work submitted here involves independent research work, including collaborative studies with Marianne Lyne Manaog (MedIntellego®) and Narrendar RaviChandran (University of Auckland). In particular, conciliating his work as a Senior Artificial Intelligence Engineer and volunteering commitment as the President and Research Committee Leader of a student-led association named the “University of Auckland Rehabilitative Technologies Association”, Luca Parisi decided to embark on most research works included in this synopsis to add value to society via accurate, reliable and explainable, hence clinically viable applications of AI. The key findings of these studies are: (i) ML-based FS algorithms are sufficient for devising accurate, reliable and explainable ML-based classifiers for aiding prediction of early diagnosis for Parkinson’s Disease and chronic hepatitis; (ii) evolutionary algorithms-based optimisation is a preferred method for improving the accuracy and reliability of decision support systems aimed at aiding early diagnosis of hypothermia; (iii) evolutionary algorithms-based optimisation methods enable to devise optimised ML-based classifiers for improving postoperative discharge; (iv) whilst ML-based algorithms coupled with ML based FS methods are the minimum clinically viable solution for binary classification problems, ML-based classifiers leveraging evolutionary algorithms for FS yield more accurate and reliable predictions, as reducing the search space and overlapping regions for tackling multi-class classification problems more effectively, which involve a higher number of degrees of freedom. Collectively, these findings suggest that, despite advances in ML, state-of-the-art ML algorithms, coupled with ML-based or evolutionary algorithms for FS, are enough to devise accurate, reliable and explainable decision support systems for performing both an early diagnosis and a prediction of prognosis of various pathologies. Machine learning Genetic algorithms Feature selection Optimisation Clinical decision support Early diagnosis Parkinson’s Disease
196	Information Visualization of Participant Behavior in Market Surveillance Kesuma, Badai January 2021 (has links) Financial markets are now undergoing exponential growth in data, as high-frequency trading is widespread. The need for effective market surveillance is, therefore, become more prominent. Domain experts in exchanges, trading participants, and regulators must provide evidence in their market surveillance investigation. Still, the increased number of participants and its transaction leads to a complicated task that needs to be analyzed more resource-efficient. One way of performing market surveillance is through an at-a- glance view, which can systematically and timely handle this data. Dashboards are today the widely adopted tool for processing large amounts of data in the financial sector. This study seeks to enhance the user experience of a market surveillance system developed with information visualization of participants’ statistical measures. The research was carried out in an industrial setting and followed the case study paradigm. The user research produced a list of expected tasks, translating into design requirements by reflecting on related research on effective dashboard design for interactive high-dimensional data exploration. The design requirements formed the design elements embedded into the low- and high-fidelity prototypes development. During prototype development, the participants’ statistical measures shown as the predefined dimensions on the dashboard were selected using feature selection methods correlate to their number of alerts. User evaluation of the final high-fidelity prototype suggests that interactive high-dimensional data exploration using parallel coordinates plots could improve the market surveillance process. The gap of effectiveness and efficiency scores between first-time users and experts and the feedback from both users show a steep learning curve in visual exploration. / Finansmarknaderna genomgår nu en exponentiell tillväxt i data, eftersom högfrekvent handel ar utbredd. Behovet av effektiv marknadsövervakning har därför blivit mer framträdande. Domänexperter inom utbyten, handelsdeltagare och tillsynsmyndigheter måste tillhandahålla bevis i sin marknadsövervakningsutredning. Anda leder det okade antalet deltagare och dess transaktion till en komplicerad uppgift som behöver analyseras mer resurseffektivt. Ett satt att utföra marknadsövervakning ar genom en översiktsverk som systematiskt och i tid kan hantera dessa uppgifter. Dashboards ar idag det allmänt använda verk tyget for att bearbeta stora mängder data inom finanssektorn. Denna studie syftar till att förbättra användarupplevelsen av ett marknadsövervakningssystem utvecklat med informationsvisualisering av deltagarnas statistiska matt. Forskningen utfördes i en industriell miljö och följde fallstudieparadigmet. Användarundersökningen producerade en lista över förväntade uppgifter som översattes till designkrav genom att reflektera över relaterad forskning om effektiv instrumentbrädesdesign for interaktiv högdimensionell datautforskning. Konstruktionskraven bildade designelementen inbäddade i utvecklingen av prototyper med lag och hög trohet. Under prototyputvecklingen valdes deltagarnas statistiska matt som visade som de fördefinierade dimensionerna på instrumentpanelen med hjälp av funktionsvalsmetoder som korrelerar med deras antal varningar. Användarutvärdering av den slutliga högtroh prototypen antyder att interaktiv högdimensionell data utforskning med parallella koordinatdiagram kan förbättra marknadsövervakningsprocessen. Gapet mellan effektivitets- och effektivitetsresultat mellan förstagångsanvändare och experter och feedback från bada an vandare visar en brant inlärningskurva i visuell utforskning. Information visualization User experience Feature selection Market surveillance Computer and Information Sciences Data- och informationsvetenskap
197	Optimal Feature Selection for Spatial Histogram Classifiers Thapa, Mandira January 2017 (has links) No description available. Electrical Engineering Point Set Classification Optimal Feature Selection Spatial Histogram Pyramid Match Kernel Pattern Recognition
198	Improved Feature-Selection for Classification Problems using Multiple Auto-Encoders Guo, Xinyu 29 May 2018 (has links) No description available. Computer Science auto-encoder feature selection feature learning deep learning neuroimaging
199	Pattern Recognition in Large Dimensional and Structured Datasets Kurra, Goutham 11 March 2002 (has links) No description available. Computer Science feature selection partial profile clustering pattern recognition clustering structured data gene expression data analysis
200	Robust and Efficient Feature Selection for High-Dimensional Datasets Mo, Dengyao 19 April 2011 (has links) No description available. Information Systems Feature Selection Data Mining Machine Learning Statistical Modeling Knowledge Discovery in Database

Search results