Spelling suggestions: "subject:"colearning methods"" "subject:"bylearning methods""
91 |
Statistical Design of Sequential Decision Making AlgorithmsChi-hua Wang (12469251) 27 April 2022 (has links)
<p>Sequential decision-making is a fundamental class of problem that motivates algorithm designs of online machine learning and reinforcement learning. Arguably, the resulting online algorithms have supported modern online service industries for their data-driven real-time automated decision making. The applications span across different industries, including dynamic pricing (Marketing), recommendation (Advertising), and dosage finding (Clinical Trial). In this dissertation, we contribute fundamental statistical design advances for sequential decision-making algorithms, leaping progress in theory and application of online learning and sequential decision making under uncertainty including online sparse learning, finite-armed bandits, and high-dimensional online decision making. Our work locates at the intersection of decision-making algorithm designs, online statistical machine learning, and operations research, contributing new algorithms, theory, and insights to diverse fields including optimization, statistics, and machine learning.</p>
<p><br></p>
<p>In part I, we contribute a theoretical framework of continuous risk monitoring for regularized online statistical learning. Such theoretical framework is desirable for modern online service industries on monitoring deployed model's performance of online machine learning task. In the first project (Chapter 1), we develop continuous risk monitoring for the online Lasso procedure and provide an always-valid algorithm for high-dimensional dynamic pricing problems. In the second project (Chapter 2), we develop continuous risk monitoring for online matrix regression and provide new algorithms for rank-constrained online matrix completion problems. Such theoretical advances are due to our elegant interplay between non-asymptotic martingale concentration theory and regularized online statistical machine learning.</p>
<p><br></p>
<p>In part II, we contribute a bootstrap-based methodology for finite-armed bandit problems, termed Residual Bootstrap exploration. Such a method opens a possibility to design model-agnostic bandit algorithms without problem-adaptive optimism-engineering and instance-specific prior-tuning. In the first project (Chapter 3), we develop residual bootstrap exploration for multi-armed bandit algorithms and shows its easy generalizability to bandit problems with complex or ambiguous reward structure. In the second project (Chapter 4), we develop a theoretical framework for residual bootstrap exploration in linear bandit with fixed action set. Such methodology advances are due to our development of non-asymptotic theory for the bootstrap procedure.</p>
<p><br></p>
<p>In part III, we contribute application-driven insights on the exploration-exploitation dilemma for high-dimensional online decision-making problems. Such insights help practitioners to implement effective high-dimensional statistics methods to solve online decisionmaking problems. In the first project (Chapter 5), we develop a bandit sampling scheme for online batch high-dimensional decision making, a practical scenario in interactive marketing, and sequential clinical trials. In the second project (Chapter 6), we develop a bandit sampling scheme for federated online high-dimensional decision-making to maintain data decentralization and perform collaborated decisions. These new insights are due to our new bandit sampling design to address application-driven exploration-exploitation trade-offs effectively. </p>
|
92 |
Model-Based Prediction of an Effective Adhesion Parameter Guiding Multi-Type Cell SegregationRoßbach, Philipp, Böhme, Hans-Joachim, Lange, Steffen, Voß-Böhme, Anja 24 February 2022 (has links)
The process of cell-sorting is essential for development and maintenance of tissues. With the Differential Adhesion Hypothesis, Steinberg proposed that cellsorting is determined by quantitative differences in cell-type-specific intercellular adhesion strengths. An implementation of the Differential Adhesion Hypothesis is the Differential Migration Model by Voss-Böhme and Deutsch. There, an effective adhesion parameter was derived analytically for systems with two cell types, which predicts the asymptotic sorting pattern. However, the existence and form of such a parameter for more than two cell types is unclear. Here, we generalize analytically the concept of an effective adhesion parameter to three and more cell types and demonstrate its existence numerically for three cell types based on in silico time-series data that is produced by a cellular-automaton implementation of the Differential Migration Model. Additionally, we classify the segregation behavior using statistical learning methods and show that the estimated effective adhesion parameter for three cell types matches our analytical prediction. Finally, we demonstrate that the effective adhesion parameter can resolve a recent dispute about the impact of interfacial adhesion, cortical tension and heterotypic repulsion on cell segregation. / Der Prozess der Zellsortierung ist für die Entwicklung und Erhaltung von Geweben unerlässlich. Mit der Differentiellen Adhäsionshypothese schlug Steinberg vor, dass die Zellsortierung durch quantitative Unterschiede in den zelltypspezifischen interzellulären Adhäsionsstärken bestimmt wird. Eine Umsetzung der Differentiellen Adhäsionshypothese ist das Differentielle Migrationsmodell von Voss-Böhme und Deutsch. In diesem wurde für Systeme mit zwei Zelltypen ein effektiver Adhäsionsparameter analytisch hergeleitet, der das asymptotische Sortiermuster vorhersagt. Die Existenz und Form eines solchen Parameters für mehr als zwei Zelltypen ist jedoch unklar. Hier verallgemeinern wir analytisch das Konzept eines effektiven Adhäsionsparameters für drei und mehr Zelltypen und zeigen numerisch seine Existenz für drei Zelltypen auf der Basis von in silico Zeitreihendaten, die von einem zellulären Automaten des Differentiellen Migrationsmodells erzeugt werden. Darüber hinaus klassifizieren wir das Segregationsverhalten mithilfe statistischer Lernverfahren und zeigen, dass der geschätzte effektive Adhäsionsparameter für drei Zelltypen mit unserer analytischen Vorhersage übereinstimmt. Schließlich zeigen wir, dass der effektive Adhäsionsparameter eine kürzlich aufgekommene Diskussion über den Einfluss von Grenzflächenadhäsion, Kortikalspannung und heterotypischer Abstoßung auf die Zellsegregation lösen kann.
|
93 |
Generische Verkettung maschineller Ansätze der Bilderkennung durch Wissenstransfer in verteilten Systemen: Am Beispiel der Aufgabengebiete INS und ACTEv der Evaluationskampagne TRECVidRoschke, Christian 08 November 2021 (has links)
Der technologische Fortschritt im Bereich multimedialer Sensorik und zugehörigen Methoden zur Datenaufzeichnung, Datenhaltung und -verarbeitung führt im Big Data-Umfeld zu immensen Datenbeständen in Mediatheken und Wissensmanagementsystemen. Zugrundliegende State of the Art-Verarbeitungsalgorithmen werden oftmals problemorientiert entwickelt. Aufgrund der enormen Datenmengen lassen sich nur bedingt zuverlässig Rückschlüsse auf Güte und Anwendbarkeit ziehen. So gestaltet sich auch die intellektuelle Erschließung von großen Korpora schwierig, da die Datenmenge für valide Aussagen nahezu vollumfänglich semi-intellektuell zu prüfen wäre, was spezifisches Fachwissen aus der zugrundeliegenden Datendomäne ebenso voraussetzt wie zugehöriges Verständnis für Datenhandling und Klassifikationsprozesse. Ferner gehen damit gesonderte Anforderungen an Hard- und Software einher, welche in der Regel suboptimal skalieren, da diese zumeist auf Multi-Kern-Rechnern entwickelt und ausgeführt werden, ohne dabei eine notwendige Verteilung vorzusehen. Folglich fehlen Mechanismen, um die Übertragbarkeit der Verfahren auf andere Anwendungsdomänen zu gewährleisten. Die vorliegende Arbeit nimmt sich diesen Herausforderungen an und fokussiert auf die Konzeptionierung und Entwicklung einer verteilten holistischen Infrastruktur, die die automatisierte Verarbeitung multimedialer Daten im Sinne der Merkmalsextraktion, Datenfusion und Metadatensuche innerhalb eines homogenen Systems ermöglicht.
Der Fokus der vorliegenden Arbeit liegt in der Konzeptionierung und Entwicklung einer verteilten holistischen Infrastruktur, die die automatisierte Verarbeitung multimedialer Daten im Sinne der Merkmalsextraktion, Datenfusion und Metadatensuche innerhalb eines homogenen aber zugleich verteilten Systems ermöglicht. Dabei sind Ansätze aus den Domänen des Maschinellen Lernens, der Verteilten Systeme, des Datenmanagements und der Virtualisierung zielführend miteinander zu verknüpfen, um auf große Datenmengen angewendet, evaluiert und optimiert werden zu können. Diesbezüglich sind insbesondere aktuelle Technologien und Frameworks zur Detektion von Mustern zu analysieren und einer Leistungsbewertung zu unterziehen, so dass ein Kriterienkatalog ableitbar ist. Die so ermittelten Kriterien bilden die Grundlage für eine Anforderungsanalyse und die Konzeptionierung der notwendigen Infrastruktur. Diese Architektur bildet die Grundlage für Experimente im Big Data-Umfeld in kontextspezifischen Anwendungsfällen aus wissenschaftlichen Evaluationskampagnen, wie beispielsweise TRECVid. Hierzu wird die generische Applizierbarkeit in den beiden Aufgabenfeldern Instance Search und Activity in Extended Videos eruiert.:Abbildungsverzeichnis
Tabellenverzeichnis
1 Motivation
2 Methoden und Strategien
3 Systemarchitektur
4 Instance Search
5 Activities in Extended Video
6 Zusammenfassung und Ausblick
Anhang
Literaturverzeichnis / Technological advances in the field of multimedia sensing and related methods for data acquisition, storage, and processing are leading to immense amounts of data in media libraries and knowledge management systems in the Big Data environment. The underlying modern processing algorithms are often developed in a problem-oriented manner. Due to the enormous amounts of data, reliable statements about quality and applicability can only be made to a limited extent. Thus, the intellectual exploitation of large corpora is also difficult, as the data volume would have to be analyzed for valid statements, which requires specific expertise from the underlying data domain as well as a corresponding understanding of data handling and classification processes. In addition, there are separate requirements for hardware and software, which usually scale in a suboptimal manner while being developed and executed on multicore computers without provision for the required distribution. Consequently, there is a lack of mechanisms to ensure the transferability of the methods to other application domains.
The focus of this work is the design and development of a distributed holistic infrastructure that enables the automated processing of multimedia data in terms of feature extraction, data fusion, and metadata search within a homogeneous and simultaneously distributed system. In this context, approaches from the areas of machine learning, distributed systems, data management, and virtualization are combined in order to be applicable on to large data sets followed by evaluation and optimization procedures. In particular, current technologies and frameworks for pattern recognition are to be analyzed and subjected to a performance evaluation so that a catalog of criteria can be derived. The criteria identified in this way form the basis for a requirements analysis and the conceptual design of the infrastructure required. This architecture builds the base for experiments in the Big Data environment in context-specific use cases from scientific evaluation campaigns, such as TRECVid. For this purpose, the generic applicability in the two task areas Instance Search and Activity in Extended Videos is elicited.:Abbildungsverzeichnis
Tabellenverzeichnis
1 Motivation
2 Methoden und Strategien
3 Systemarchitektur
4 Instance Search
5 Activities in Extended Video
6 Zusammenfassung und Ausblick
Anhang
Literaturverzeichnis
|
94 |
PRODUCT-APPLICATION FIT, CONCEPTUALIZATION, AND DESIGN OF TECHNOLOGIES: PROSTHETIC HAND TO MULTI-CORE VAPOR CHAMBERSSoumya Bandyopadhyay (13171827) 29 July 2022 (has links)
<p>From idea generation to conceptualization and development of products and technologies is a non-linear and iterative process. The work in this thesis follows a process that initiates with the review of existing technologies and products, examining their unique value proposition in the context of the specific applications for which they are designed. Next, the unmet needs of novel or emerging applications are identified that require new product or technologies. Once these user needs and product requirements are identified, the specific functions to be addressed by the product are specified. The subsequent process of design of products and technologies to meet these functions is enabled by engineering tools such as three-dimensional modelling, physics-based simulations, and manufacturing of a minimum viable prototype. In these steps, un-biased decisions have to be taken using weighted decision matrices to cater to the design requirements. Finally, the minimum viable prototype is tested to demonstrate the principal functionalities. The results obtained from the testing process identify the potential future improvements in the next generations of the prototype that would subsequently inform the final design of product. This thesis adopted this methodology to initiate the design two product-prototypes: i) an image-recognition-integrated service (IRIS) robotic hand for children and ii) cascaded multi-core vapor chamber (CMVC) for improving performance of next-generation computing systems. Minimum viable product-prototypes were manufactured to demonstrate the principal functionalities, followed by clear identification of future potential improvements. Tests of the prosthetic hand indicate that the image-recognition based feedback can successfully drive the actuators to perform the intended grasping motions. Experimental testing with the multi-core vapor chamber demonstrates successful performance of the prototype, which offers notable reduction in temperatures relative to the existing benchmark solid copper spreader. </p>
|
95 |
Applications and challenges in mass spectrometry-based untargeted metabolomicsJones, Christina Michele 27 May 2016 (has links)
Metabolomics is the methodical scientific study of biochemical processes associated with the metabolome—which comprises the entire collection of metabolites in any biological entity. Metabolome changes occur as a result of modifications in the genome and proteome, and are, therefore, directly related to cellular phenotype. Thus, metabolomic analysis is capable of providing a snapshot of cellular physiology. Untargeted metabolomics is an impartial, all-inclusive approach for detecting as many metabolites as possible without a priori knowledge of their identity. Hence, it is a valuable exploratory tool capable of providing extensive chemical information for discovery and hypothesis-generation regarding biochemical processes. A history of metabolomics and advances in the field corresponding to improved analytical technologies are described in Chapter 1 of this dissertation. Additionally, Chapter 1 introduces the analytical workflows involved in untargeted metabolomics research to provide a foundation for Chapters 2 – 5.
Part I of this dissertation which encompasses Chapters 2 – 3 describes the utilization of mass spectrometry (MS)-based untargeted metabolomic analysis to acquire new insight into cancer detection. There is a knowledge deficit regarding the biochemical processes of the origin and proliferative molecular mechanisms of many types of cancer which has also led to a shortage of sensitive and specific biomarkers. Chapter 2 describes the development of an in vitro diagnostic multivariate index assay (IVDMIA) for prostate cancer (PCa) prediction based on ultra performance liquid chromatography-mass spectrometry (UPLC-MS) metabolic profiling of blood serum samples from 64 PCa patients and 50 healthy individuals. A panel of 40 metabolic spectral features was found to be differential with 92.1% sensitivity, 94.3% specificity, and 93.0% accuracy. The performance of the IVDMIA was higher than the prevalent prostate-specific antigen blood test, thus, highlighting that a combination of multiple discriminant features yields higher predictive power for PCa detection than the univariate analysis of a single marker. Chapter 3 describes two approaches that were taken to investigate metabolic patterns for early detection of ovarian cancer (OC). First, Dicer-Pten double knockout (DKO) mice that phenocopy many of the features of metastatic high-grade serous carcinoma (HGSC) observed in women were studied. Using UPLC-MS, serum samples from 14 early-stage tumor DKO mice and 11 controls were analyzed. Iterative multivariate classification selected 18 metabolites that, when considered as a panel, yielded 100% accuracy, sensitivity, and specificity for early-stage HGSC detection. In the second approach, serum metabolic phenotypes of an early-stage OC pilot patient cohort were characterized. Serum samples were collected from 24 early-stage OC patients and 40 healthy women, and subsequently analyzed using UPLC-MS. Multivariate statistical analysis employing support vector machine learning methods and recursive feature elimination selected a panel of metabolites that differentiated between age-matched samples with 100% cross-validated accuracy, sensitivity, and specificity. This small pilot study demonstrated that metabolic phenotypes may be useful for detecting early-stage OC and, thus, supports conducting larger, more comprehensive studies.
Many challenges exist in the field of untargeted metabolomics.
Part II of this dissertation which encompasses Chapters 4 – 5 focuses on two specific challenges. While metabolomic data may be used to generate hypothesis concerning biological processes, determining causal relationships within metabolic networks with only metabolomic data is impractical. Proteins play major roles in these networks; therefore, pairing metabolomic information with that acquired from proteomics gives a more comprehensive snapshot of perturbations to metabolic pathways. Chapter 4 describes the integration of MS- and NMR-based metabolomics with proteomics analyses to investigate the role of chemically mediated ecological interactions between Karenia brevis and two diatom competitors, Asterionellopsis glacialis and Thalassiosira pseudonana. This integrated systems biology approach showed that K. brevis allelopathy distinctively perturbed the metabolisms of these two competitors. A. glacialis had a more robust metabolic response to K. brevis allelopathy which may be a result of its repeated exposure to K. brevis blooms in the Gulf of Mexico. However, K. brevis allelopathy disrupted energy metabolism and obstructed cellular protection mechanisms including altering cell membrane components, inhibiting osmoregulation, and increasing oxidative stress in T. pseudonana. This work represents the first instance of metabolites and proteins measured simultaneously to understand the effects of allelopathy or in fact any form of competition.
Chromatography is traditionally coupled to MS for untargeted metabolomics studies. While coupling chromatography to MS greatly enhances metabolome analysis due to the orthogonality of the techniques, the lengthy analysis times pose challenges for large metabolomics studies. Consequently, there is still a need for developing higher throughput MS approaches. A rapid metabolic fingerprinting method that utilizes a new transmission mode direct analysis in real time (TM-DART) ambient sampling technique is presented in Chapter 5. The optimization of TM-DART parameters directly affecting metabolite desorption and ionization, such as sample position and ionizing gas desorption temperature, was critical in achieving high sensitivity and detecting a broad mass range of metabolites. In terms of reproducibility, TM-DART compared favorably with traditional probe mode DART analysis, with coefficients of variation as low as 16%. TM-DART MS proved to be a powerful analytical technique for rapid metabolome analysis of human blood sera and was adapted for exhaled breath condensate (EBC) analysis. To determine the feasibility of utilizing TM-DART for metabolomics investigations, TM-DART was interfaced with traveling wave ion mobility spectrometry (TWIMS) time-of-flight (TOF) MS for the analysis of EBC samples from cystic fibrosis patients and healthy controls. TM-DART-TWIMS-TOF MS was able to successfully detect cystic fibrosis in this small sample cohort, thereby, demonstrating it can be employed for probing metabolome changes.
Finally, in Chapter 6, a perspective on the presented work is provided along with goals on which future studies may focus.
|
Page generated in 0.1322 seconds